Thursday, December 31, 2009

Ubuntu 9.04: Backing Up and Copying Webpages and Websites

As described in a previous post, I had been using rsync to make backups of various files.  This strategy was not working so well in the case of webpages and websites, or at least I wasn't finding much guidance that I could understand.  (Incidentally, I had also tried the Windows program HTTrack Website Copier, but had found it to be complicated and frustrating.  It seemed to want either to download the entire Internet or nothing at all.)

The immediate need driving this investigation was that I wanted to know how to back up a blog.  I used the blog on which I am posting this note as my test bed.

Eventually, I discovered that maybe what I needed to use was wget, not rsync.  The wget manual seemed thorough if a bit longwinded and complex, so I tried the Wikipedia entry.  That, and another source, gave me the parts of the command I used first:

wget -r -l1 -np -A.html -N -w5 http://raywoodcockslatest.blogspot.com/search?max-results=1000 --directory-prefix=/media/Partition1/BlogBackup1

The parts of this wget command have the following meanings:

  • -r means that wget should recurse, i.e., it should go through the starting folder and all folders beneath it (e.g., www.website.com/topfolder and also www.website.com/topfolder/sub1 and sub2 and sub3 . . .)
  • -l1 (that's an L-one) means stay at level number one.  That is, don't download linked pages.
  • -np means "no parent" (i.e., stay at this level or below; don't go up to the parent directory)
  • -A.html means Accept only files with this extension (i.e., only .html files)
  • -N is short for Newer (i.e., only download files that are newer than what's already been downloaded).  In other words, it turns on timestamping
  • -w5 means wait five seconds between files.  This is because large downloads can overload the servers you are downloading from, in which case an irritated administrator may penalize you
  • The URL shown in this command is the URL of this very blog, plus the additional information needed to download all of my posts in one html file.  But it didn't work that way.  What I got, with this command, was each of the posts as a separate html file, which is what I preferred anyway
  • --directory-prefix indicates where I want to put the download.  If you don't use this option, everything will go into the folder where wget is running from.  I came across a couple of suggestions on what to do if your path has spaces in it, but I hadn't gotten that far yet

Incidentally, I also ran across another possibility that I didn't intend to use now, but that seemed potentially useful for the future.  Someone asked if there was a way to save each file with a unique name, so that every time  you run the wget script, you get the current state of the webpage.  One answer involved using mktemp.  Also, it seemed potentially useful to know that I could download all of the .jpg files from a webpage by using something like this:  wget -e robots=off -r -l1 --no-parent -A.jpg http://www.server.com/dir/

The first download was pretty good, but I had learned some more things in the meantime, and had some questions, so I decided to try again.  Here's the script I used for my second try:
wget -A.html --level=1 -N -np -p -r -w5 http://raywoodcockslatest.blogspot.com --directory-prefix=/media/Partition1/BlogBackup2

This time, I arranged the options (or at least the short ones) in alphabetical order.  The -p option indicated that images and style sheets would be downloaded too.  I wasn't sure I needed this -- the basic html pages looked pretty good in my download as they were -- but I thought it might be interesting to see how much larger that kind of download would be.  I used a shorter version of the source URL and I designated a different output directory.

I could have added -k (long form:  --convert-links) so that the links among the downloaded html pages would be modified to refer to the other downloaded pages, not to the webpage where I had downloaded them from; but then I decided that the purpose of the download was to give me a backup, not a local copy with full functionality; that is, I wanted the links to work properly when posted as webpages online, not necessarily when backed up on my hard drive.  I used the long form for the "level" option, just to make things clearer.  Likewise, with a bit of learning, I decided against using the -erobots=off option.  There were probably a million other options I could have considered, in the long description of wget in the official manual, but these were the ones that others seemed to mention most.

The results of this second try were mixed.  For one thing, I was getting a lot of messages of this form:

2010-01-01 01:43:03 (137 KB/s) - `/[target directory]/index.html?widgetType=BlogArchive&widgetId=BlogArchive1&action=toggle&dir=open&toggle=MONTHLY-1196485200000&toggleopen=MONTHLY-1259643600000' saved [70188]

Removing /[target directory]/index.html?widgetType=BlogArchive&widgetId=BlogArchive1&action=toggle&dir=open&toggle=MONTHLY-1196485200000&toggleopen=MONTHLY-1259643600000 since it should be rejected.

I didn't know what this meant, or why I hadn't gotten these kinds of messages when I ran the first version of the command (above).  It didn't seem likely that the mere rearrangement of options on the wget command line would be responsible.  To find out, I put it out of its misery (i.e., I ran "pkill wget" in a separate Terminal session) and took a closer look.

Things got a little confused at this point.  Blame it on the late hour.  I thought, for a moment, that I had found the answer.  A quick glance at the first forum that came up in response to my search led me to recognize that, of course, my command was contradictory:  it told wget to download style sheets (-p), but it also said that only html files would be accepted (-A.html).  But then, unless I muddled it somehow, it appeared that, in fact, I had not included the -p option after all.  I tried re-running version 2 of the command (above), this time definitely excluding the -p option.  And no, that wasn't it; I still got those same funky messages (above) about removing index.html.  So the -p option was not the culprit.

I tried again.  This time, I reverted to using exactly the command I had used in the first try (above), changing only the output directory.  Oh, and somewhere in this process, I shortened the target URL.  This gave me none of those funky messages.  So it seemed that the order of options on the command line did matter, and that the order used in the first version (above) was superior to that in the second version.  To sum up, then, the command that worked best for me, for purposes of backing up my Blogger.com (blogspot) blog, was this:

wget -r -l1 -np -A.html -N -w5 http://raywoodcockslatest.blogspot.com --directory-prefix=/media/Partition1/BlogBackup1

Since there are other blog hosts out there, I wanted to see if exactly the same approach would work elsewhere.  I also had a WordPress blog.  I tried the first version of the wget command (above), changing only the source URL and target folder, as follows:

wget -r -l1 -np -A.html -N -w5 http://raywoodcock.wordpress.com/ --directory-prefix=/media/Partition1/WordPressBackup

This did not work too well.  The script repeatedly produced messages saying "Last-modified header missing -- time-stamps turned off," so then wget would download the page again.  As far as I could tell from the pages I examined in a search, there was no way around this; apparently WordPress did not maintain time stamps.

The other problem was that it did not download all of the pages.  It would download only one index.html file for each month.  That index.html file would contain an actual post, which was good, but what about all the other posts from that month?  I modified the command to specify the year and month (e.g., http://raywoodcock.wordpress.com/2009/03/).  This worked.  Now the index.html file at the top of the subtree (e.g., http://raywoodcock.wordpress.com/2009/03/index.html) would display all of the posts from that month, and beneath it (in e.g., .../2009/03/01) I had named subfolders for each post, each of which contained the index.html file displaying that particular post.  So at this rate, I would have to write wget lines for each month in which I had posted blog entries.  But then I found that removing the -A.html option solved the problem.  But if I ran it at the year level, it worked only for some months, and skipped others.  I tried what appeared to be the suggestion of running it twice at the year level (i.e., at .../wordpress.com/ with an ending slash), with --save-cookies=cookies.txt --load-cookies=cookies.txt --keep-session-cookies.  That didn't seem to make a difference.  So the best I could do with a WordPress blog, at this point, was to enter separate wget commands for each month, like this:

wget -r -l1 -np -N -A.html -w5 http://raywoodcock.wordpress.com/2009/01 --directory-prefix=/media/Partition1/WordPressBackup

I added back the -A.html option, as shown, because it didn't seem to hurt anything; html pages were the only ones that had been downloaded anyway.

Since these monthly commands would re-download everything, I would run the older ones only occasionally, to pick up the infrequent revision of an older post.  I created a bunch of these, for the past and also for some months into the future.  I put the historical ones in a script called backup-hist.sh, which I planned to run only occasionally, and I put the current and future ones into my backup-day.sh, to run daily.

But, ah, not so fast.  When I tried this on another, unrelated WordPress blog, it did not consistently download all posts for each month.  I also noticed that it duplicated some posts, in the sense that the higher-level (e.g., month-level) index.html file seemed to contain everything that would appear on the month-level webpage on WordPress.  So, for example, if you had your WordPress blog set up to show a maximum of three posts per page, this higher-level webpage would show all three of those.  The pages looked good; it was just that I was not sure how I would use this mix in an effective backup-and-restore operation.  This raised the question for my own blog:  if I ever did have to restore my blog, was I going to examine the HTML for each webpage manually, to re-post only those portions of text and code that belonged on a given blog page?

I decided to combine approaches.  First, since it was yearend, I made a special-case backup of all posts in each blog.  I did this by setting the blogs to display 999 posts on one page, and then printed that page as a yearend backup PDF.  Second, I noticed that rerunning these scripts seemed to catch additional posts on the subsquent passes.  So instead of separating the current and historical posts, I decided to stay with the original idea of running one command to download each WordPress post.  I would hope that this got most of them, and for any that fell through the crack, I would refer to the most recent PDF-style copy of the posts.  The command I decided to use for this purpose was of this form:

wget -r -l1 -np -N -A.html -w5 [URL] --directory-prefix=/media/Backups/Blogs/WordPress

I had recently started one other blog.  This one was on Livejournal.com.  I tried the following command with that:

wget -r -l1 -np -N -A.html -w5 http://rwclippings.livejournal.com/ --directory-prefix=/media/Backups/Blogs//LiveJournal

This was as far as I was able to get into this process at this point.

Notes on Converting Word Processing Documents from 1985-1995

I was using Ubuntu 9.04 (Jaunty Jackalope) and VMware Workstation 6.5.2, running Windows XP virtual machines (VMs) as guests. I was trying, in one of those VMs, to convert some data files from the 1980s and 1990s. This post conveys some notes from that process.


I had used a number of different database, spreadsheet, and word processing programs back then. The filenames had extensions like .sec and .95. These suggested that the file in question was probably not a spreadsheet (whose extensions would probably have been .wks or .wk1 or .wq1). I suspected these were word processing docs, but what kind?

I had a copy of WordPerfect Office X4, so I tried opening them in that. The formats I had used principally back then were WordStar (3.3, I think), WordPerfect 6.0 for DOS, XyWrite III+, and plain ASCII text. So for some documents it took several tries, telling WordPerfect X4 to try these different formats, before the document would open properly. Even then, not all of them did.

I also tried the approach of highlighting a bunch of these files, right-clicking, and indicating that I wanted to convert them to Adobe Acrobat 8, or to combine them in Acrobat. Unfortunately, these efforts tended to cause Windows Explorer and/or Acrobat to crash.

It occurred to me to try another approach. I left Windows in VMware and dropped down to Ubuntu. I selected 57 files that I wanted to convert. OpenOffice 3.0 Writer started up by default. It opened them all. They had been last modified in 1993 and thereabouts. I think they were created with Word 3.1. For each file, I clicked a button and got a PDF created in the same folder with the same name and a PDF extension.

OOo Writer wasn't able to open some WordStar 3.3 files from the mid-1980s. Several sources referred me to Advanced Computer Innovations for that sort of conversion. Their prices weren't bad, but I didn't want to pay $1 per file per 50K for these old materials. Instead, I looked into old Microsoft converters.  Those, unfortunately, did not appear to be available anymore.  A search led to a forum that led to WordStar.org converters.  Those, however, did not appear to go back to WordStar for DOS 3.3.  Graham Mayor's page looked like a better bet.  It gave me a Wrdstr32.zip file, but by the time I got around to it, I had already addressed my needs, so I didn't actually try this one.

Separately, somehow, I found (or maybe I had always retained) a copy of a program that seemed willing to install "Microsoft Word 97 Supplemental Converters."  Searching for this led to a Microsoft page where I was able to download the Word 97-2000 Import Converter (wrd97cnv.exe); unfortunately, that proved to be a backwards conversion from Word 97 to Word 95.  Trying again, I found that the Microsoft Office 2003 Resource Kit webpage led to a list of downloads that included an Office Converter Pack that I downloaded (oconvpck.exe).  I seem to have installed this, and I think this is what ultimately did the job for me.

Resources for converting XyWrite III+ files were pretty scarce by now, a decade after what appears to have been the last (short-lived) effort to reconstruct a manual of its text-formatting codes. Apparently nobody who has a copy of the paper manual has gotten around to PDFing and posting it; or perhaps Nota Bene (which apparently bought XyWrite in the 1990s), for some reason, is unwilling to allow any such reference to be made available. But here are some examples of codes used, from what I've been able to figure out and recall:

«PT23» start using proportional type font no. 23
«PG» page break
«TS5,10» set tabs 5 and 10 spaces to the right
«DC1=A 1 a» set DC1 outline structure (first level = A, B, C ...)
lm=0 set left text margin at zero characters (i.e., not indented)
«FC» format centered, i.e., center text
«MDBO» begin boldface
«MDNM» end special formatting (e.g., boldface)
«SPi» set page number to i (e.g., for preface)



I also had some old .wpd (WordPerfect) documents.  Not all of them had .wpd extensions to begin with.  To bulk rename the ones that didn't, I searched for a bulk renamer, to rename them all to be .wpd files.  I tried Bulk Rename, but its interface was complex and inflexible compared to that of ExplorerXP -- just select the files you want to rename, press F2, and set the parameters.

Once I had the files named with .wpd extensions, the next question was, how to get them into PDF format.  That was easy with the others, above, to the extent that Microsoft Word could read them; I could PDF them from there.  I shouldn't say it was "easy"; it was still a manual process, and I was now searching for a way to automate it.  Unfortunately, I was not finding any freeware ways to convert from .wpd to .pdf.  Later versions of WordPerfect include a Conversion Utility to bring those files into the modern era, but they are still wpd files.  Adobe Acrobat 8.0 was able to recognize and convert the files (select multiple files in Windows Explorer, right-click and choose the Convert to PDF option), but they proceed one by one, and I had hundreds of files, and it took several seconds for each one to process.  Also, it added an extra blank page to the ends of some if not all of these old WordPerfect documents.  I didn't find any wpd to odt (OpenOffice Writer) converters.  I thought about trying Google Docs, which someone said could bulk convert to pdf, but they didn't accept wpd as input.  I tried looking for a converter from wpd to doc, and that led me to Zamzar.com, which would convert directly from .wpd to .pdf, but would only let me upload one file at a time. I found that the Options in OpenOffice (I was using the Ubuntu version) could be set to save automatically as Word documents, so I did that, and then uploaded a few of them to Google Docs and downloaded them as PDFs.  The formatting was messed up on a couple of them.  I tried a comparison without Google Docs, just converting to pdf from the .doc files that OpenOffice had saved.  The formatting was better that way, so Google Docs didn't add anything; and the process of converting the Word docs to PDF was the same one-file-at-a-time thing as if I were printing from WordPerfect itself, so involving Word didn't add anything either.  In the end, the best and probably fastest approach seemed to be to select a bunch of wpd files in Windows Explorer, right-click and select Convert to Adobe PDF.

This seemed likely to be a continuing effort, but these notes ended here. 

Wednesday, December 30, 2009

Sorting and Manipulating a Long Text List to Eliminate Some Files

In Windows XP, I made a listing of all of the files on a hard drive.  For that, I could have typed DIR *.* > OUTPUTFILE.TXT, but instead I used PrintFolders.  I selected the option for full pathnames, so each line in the file list was like this:  D:\FOLDER\SUBFOLDER\FILENAME.EXT, along with date and file size information.

I wanted to sort the lines in this file list alphabetically.  They already were sorted that way, but DIR and PrintFolders tended to insert blank lines and other lines (e.g., "=======" divider lines) that I didn't want in my final list.  The question was, how could I do that sort?  I tried the SORT command built into WinXP, but it seemed my list was too long.  I tried importing OUTPUTFILE.TXT into Excel, but it had more than 65,536 lines, so Excel couldn't handle it.  It gave me a "File not loaded completely" message.  I tried importing it into Microsoft Access, but it ended with this:

Import Text Wizard

Finished importing file 'D:\FOLDER\SUBFOLDER\OUTPUTFILE.TXT to table 'OUTPUTTXT'.  Not all of your data was successfully imported.  Error descriptions with associated row numbers of bad records can be found in the Microsoft Office Access table 'OUTPUTFILE.TXT'.

And then it turned out that it hadn't actually imported anything.  At this point, I didn't check the error log.  I looked for freeware file sorting utilities, but everything was shareware.  I was only planning to do this once, and didn't want to spend $30 for the privilege.  I did download and try one shareware program called Sort Text Lists Alphabetically Software (price $29.99), but it hung, probably because my text file had too many lines.  After 45 minutes or so, I killed it.

Eventually, I found I was able to do the sort very quickly using the SORT command in Ubuntu.  (I was running WinXP inside a VMware virtual machine on Ubuntu 9.04, so switching back and forth between the operating systems was just a matter of a click.)  The sort command I used was like this:
sort -b -d -f -i -o SORTEDFILE.TXT INPUTFILE.TXT
That worked.  I edited SORTEDFILE.TXT using Ubuntu's GEDIT program (like WinXP's Notepad).  For some reason, PrintFolders (or something) had inserted a lot of lines that did not match the expected pattern of D:\FOLDER\SUBFOLDER\FILENAME.EXT.  These may have been shortcuts or something.  Anyway, I removed them, so everything in SORTEDFILE.TXT matched the pattern.

Now I wanted to parse the lines.  My purpose in doing the file list and analysis was to see if I had any files that had the same names but different extensions.  I suspected, in particular, that I had converted some .doc and .jpg files to .pdf and had forgotten to zip or delete the original .doc and .jpg files.  So I wanted to get just the file names, without extensions, and line them up.  But how?  Access and Excel still couldn't handle the list.

This time around, I took a look at the Access error log mentioned in its error message (above).  The error, in every case, was "Field Truncation."  According to a Microsoft troubleshooting page, truncation was occurring because some of the lines in my text file contained more than 255 characters, which was the maximum Access could handle.  I tried importing into Access again, but this time I chose the Fixed Width option rather than Delimited.  It only went as far as 111 characters, so I just removed all delimiting lines in the Import Text Wizard and clicked Finish.  That didn't give me any errors, but it still truncated the lines.  Instead of File > Get External Data > Import, I tried Access's File > Open command.  Same result.

I probably could have worked through that problem in Access, but I had not planned to invest so much time in this project, and anyway I still wasn't sure how I was going to use Access to remove file extensions and folder paths so that I would just have filenames to compare.  I generally used Excel rather than Access for that kind of manipulation.  So I considered dividing up my text list into several smaller text files, each of which would be small enough for Excel to handle.  I'd probably have done that manually, by cutting and pasting, since I assumed that a file splitter program would give me files that Excel wouldn't recognize.  Also, to compare the file names in one subfile against the file names in another subfile would probably require some kind of lookup function.

That sounded like a mess, so instead I tried going at the problem from the other end.  I did another directory listing, this time looking only for PDFs.  I set the file filter to *.pdf in PrintFolders.  I still couldn't fit the result into Excel, so I did the Ubuntu SORT again, this time using a slightly more economical format:
sort -bdfio OUTPUTFILE.TXT INPUTFILE.TXT
This time, I belatedly noticed that PrintFolders and/or I had somehow introduced lots of duplicate lines, which would do much to explain why I had so many more files than I would have expected.  As advised, I used another Ubuntu command:
sort OUTPUTFILE.TXT | uniq -u
to remove duplicate lines.  But this did not seem to make any difference.  Regardless, after I had cleaned out the junk lines from OUTPUTFILE.TXT, it did all fit into Excel, with room to spare.  My import was giving me lots of #NAME? errors, because Excel was splitting rows in such a way that characters like "-" (which is supposed to be a mathematical operator) were the first characters in some rows, but were followed by letters rather than numbers, which did not compute.  (This would happen if e.g., the split came at the wrong place in a file named "TUESDAY--10AM.PDF."  So when running the Text Import Wizard, I had to designate each column as a Text column, not General.

I then used Excel text functions (e.g., MID and FIND) on each line, to isolate the filenames without pathnames or extensions.  I used Excel's text concatenations functions to work up a separate DIR command for each file I wanted to find.  In other words, I began with something like this:
D:\FOLDER\SUBFOLDER\FILENAME.EXT

and I ended with something like this:
DIR "FILE NAME."* /b/s/w >> OUTPUT.TXT

The quotes were necessary because some file names have spaces in them, which confuses the DIR command.  I forget what the /b and other options were about, but basically they made the output look the way I wanted.  The >> told the command to put the results in a file called OUTPUT.TXT.  If I had used just one > sign then that would have meant I wanted OUTPUT.TXT to be recreated every time a match was found.  Using two >> signs was an indication that OUTPUT.TXT should be created if it does not yet exist, but otherwise the results of the command should just be appended to whatever is already in OUTPUT.TXT.

In cooking up the final batch commands, I would have been helped by the MCONCAT function in the Morefunc add-in, but I didn't know about it yet.  I did use Morefunc's TEXTREVERSE function in this process, but I found that it would crash Excel when the string it was reversing was longer than 128 characters.  Following other advice, I used Excel's SUBSTITUTE command instead.

I took the thousands of resulting commands (such as the DIR FILENAME.* >> OUTPUT.TXT shown above), one for each file type (e.g., FILE NAME.*) that I was looking for, into a DOS batch file (i.e., a text file created in Notepad, with a .bat extension, saved in ANSI format) and ran it.  It began finding files (e.g., FILENAME.DOC, FILENAME.JPG) and listing them in OUTPUT.TXT.  Unfortunately, this thing was running very slowly.  Part of the slowness, I thought, was due to the generally slower performance of programs running inside a virtual machine.  So I thought I'd try my hand at creating an equivalent shell script in Ubuntu.  After several false starts, I settled on the FIND command.  I got some help from the Find page in O'Reilly's Linux Command Directory, but also found some useful tips in Pollock's Find tutorial.  It looked like I could recreate the DOS batch commands, like the example shown above, in this format:

find -name "FILE NAME.*" 2>/dev/null | tee -a found.txt

The "-name" part instructed FIND to find the name of the file.  There were three options for the > command, called a redirect:  1> would have sent the desired output to the null device (i.e., to nowhere, so that it would not be visible or saved anywhere), which was not what I wanted; 2> sent error messages (which I would get because the lost+found folder was producing them every time the FIND command tried to search that folder) to the null device instead; and &> would have sent both the standard output and the error messages to the same place, whatever I designated.  Then the pipe ("|") said to send everything else (i.e., the standard output) to TEE.  TEE would T the standard output; that is, it would send it to two places, namely, to the screen (so that I could see what was happening) and also to a file called found.txt.  The -a option served the same function as the DOS >> redirection, which is also available in Ubuntu's BASH script language, which is what I was using here:  that is, -a appended the desired output to the already existing found.txt, or created it if it was not yet existing.  I generated all of these commands in Excel - one for each FILE NAME.* - and saved them to a Notepad file, as before.  Then, in Ubuntu, I made the script executable by typing "chmod +x " at the BASH prompt and then ran it by typing "./" and it ran.  And the trouble proved to be worthwhile:  instead of needing a month to complete the job (which was what I had calculated for the snail's-pace search underway in the WinXP virtual machine), it looked like it could be done in a few hours.

And so it was.  I actually ran it on two different partitions, to be sure I had caught all duplicates.  Being cautious, I had the two partitions' results output to two separate .txt files, and then I merged them with the concatenation command:  "cat *.txt > full.lst."  (I used a different extension because I wasn't sure whether cat would try to combine the output file back into itself.  I think I've had problems with that in DOS.)  Then I renamed full.lst to be Found.txt, and made a backup copy of it.

I wanted to save the commands and text files I had accumulated so far, until I knew I wouldn't need them anymore, so I zipped them using the right-click context menu in Nautilus.  It didn't give me an option to simultaneously combine and delete the originals.


Next, I needed to remove duplicate lines from Found.txt.  I now understood that the command I had used earlier (above) had failed to specify where the output should go.  So I tried again:

sort Found.txt | uniq -u >> Sorted.txt

This produced a dramatically shorter file - 1/6 the size of Found.txt.  Had there really been that many duplicates?  I wanted to try sorting again, this time by filename.  But, of course, the entries in Sorted.txt included both folder and file names, like this:

./FOLDER1/File1.pdf
./SomeotherFolder/AAAA.doc
Sorting normally would put them in the order shown, but sorting by their ending letters would put the .doc files before the .pdfs, and would also alphabetize the .doc files by filename before foldername.  Sorting them in this way would show me how many copies of a given file there had been, so that I could eyeball the possibility that the ultimate list of unique files would really be so much shorter than reported in Found.txt.  I didn't know how to do that in bash, so I posted a question on it.


Meanwhile, I found that the output of the whole Found.txt file would fit into Excel.  When I sorted it, I found that each line was duplicated - but only once.  So plainly I had done something wrong in arriving at Sorted.txt.  From this point, I basically did the rest of the project in Excel, though it belatedly appeared that there were some workable answers in response to my post.

Tuesday, December 29, 2009

Ubuntu: Schedule Items with Cron

I wanted to schedule regular backups in Ubuntu 9.10.  I had already worked out the rsync commands I wanted to use; now it was a matter of running them automatically at certain times or on certain days.  I began by seeing what was already scheduled in my crontab (i.e., my chronological table).  Actually, I had two of them:  one for me, and one for the root (i.e., administrator).  I checked them with "crontab -l" (that's a small L) and "sudo crontab -l" and both say "no crontab."  This supposedly meant that there were no crontab files in /var/spool/cron/crontabs.  I verified that via "sudo nautilus."  It seemed like that would apply to the root's cron, but I wasn't able to find any different location where the user's cron should be, so I just moved on to the next step.

The next step was to edit the crontab by using "crontab -e."  This seemed to be creating a crontab for me, as distinct from root:  it said "no crontab for ray - using an empty one."  To confirm that, I tried in a separate Terminal session with "sudo crontab -e."  It seemed to flash the same choice as had appeared for me, showing a choice of editors; but then it went directly into nano, which the other Terminal session was describing as the "easiest" of the three available editors.  So, OK, since I ordinarily ran my rsync backups as me, user, not as root, I figured I would want to set up my own crontab, not a root crontab.  (Later, I found some statements that it was a bad idea to edit root's crontab.)  So I killed that nano session and went back to the first Terminal session.  There, I chose no. 3, nano, as my editor.

The top of the nano screen was showing me "# m h  dom mon dow   command."  This was my cue for the things that I needed to enter on a line, in order to schedule a cron job:  minute, hour, day of month, month, day of week, and command.  (The leading # was to indicate that this sample line was just a comment and should not be executed.)

According to About.com, up through the week level, permissible values began at zero:  that is, minutes of the hour ran from 0 to 59, hours of the day ran from 0 to 23, and days of the week ran from 0 to 7 (where Sunday was both 0 and 7, as you prefer).  Beyond that, days of the month ran from 1 to 31, and months of the year ran from 1 to 12.  If they had names, you could use their first three letters (e.g., "Mon" and "Jul" but not "minute 23"). Cron uses the union (not the intersection) of the two day commands.  That is, if you specify a day of the week (e.g., Fri) and also a day of the month (e.g., 15), the command will run on both days (e.g., every Friday, and also the 15th of every month).  You could use an asterisk to indicate "every"; for example, * * * * * would indicate that you want to run the command every minute of every hour of every day of every month.

There were additional options for numbers below the date level; that is, these wouldn't work on days of the week or of the month.  One of these options was to use fractions:  for instance, */4 would mean "every fourth" minute or hour or whatever.  You could also use a range:  40-45 would mean it should run every minute of the hour from 40 to 45 minutes (i.e., 12:40 AM, 12:41 AM, 12:42 AM . . . 12:45 AM, 1:40 AM . . . 11:45 PM).  You could use lists, separating items with commas, so that 40,41 would mean that it should run only on the 40th and 41st minutes of the hour. So designating the hour as 0-11/2 would mean that it should run every other hour, in the morning only.  You could use a list of ranges; for example, an hour designation of 0-1,10-11 would indicate that the command should run in the first two and also in the last two hours of the morning.  Range and list commands start on the first number; for instance, 2-6/2 runs at 2 AM, 4 AM, and 6 AM (at whatever minute you specify).

There was one other category of entry:  special words.  These words would replace all five numbers.  In other words, if you wanted the precise control offered by the numbers for minute, hour, etc., use the numbers; but if you want the convenience of just entering one word without having to think much about what it means, use the word.  These special words were @reboot (run at reboot), @yearly or @annually (once a year), @monthly, @weekly, @daily or @midnight (run once a day), and @hourly.  These all run as soon as the time period starts (e.g., January 1, 12:00 midnight).  There was more to know about these commands, in the official documentation.

I decided the first thing to schedule was a backup of my CURRENT partition to a backup internal hard drive.  I wanted this backup to run several times a day.  I hadn't set it up as a RAID array because I didn't want it to happen immediately; I wanted to allow some time in case I accidentally deleted something, or made some other stupid mistake.  The more frequently it ran, the more likely it would contain the most recent version of the relevant folder - which could mean it would be more likely to have the version that existed *after* my stupid mistake.  My compromise was to set it up to run every two hours.  The cron line I used, then, was this:

0   */2   *   *   *   [rsync command]

In crontab, I put several spaces between the numbers for readability.  Here, in this blog posting, I had to use nonbreaking spaces for the display shown above, because plain old spaces tend to get ignored in HTML.  I haven't reproduced, here, the long command that I want executed, because I want to focus on the cron parts of the line.  (The rsync command of choice is shown in the previous post.)

Then it occurred to me that, instead of putting that long command in cron, where I would have to do some minor translation every time I wondered what it meant, I could probably write a basic Ubuntu shell script that would do the same thing and would allow me to add explanatory notes and other commands.  So I took a brief detour into the land of scripts.  By the time I returned and finished my look at rsync, I had two scripts.  One was called backup-hour.sh, to be run every few hours.  The other was called backup-day.sh.  I put them into /home/ray/bin and wrote the following cron lines for them:

0   */2   *   *   *   ./home/ray/bin/backup-hour.sh

0   2   *   *   *   ./home/ray/bin/backup-day.sh

The first one would hopefully run backup-hour.sh every two hours, all day and all night.  The second one was intended to run backup-day.sh every day at 2 AM.

To put these lines into crontab, I typed crontab -e.  It all looked good.  But nothing was happening.  A couple of days went by, and cron didn't run.  The problem, I suspected, was with those periods I had put at the start of my path names.  I had thought that was part of a command, but nobody else was using them in their cron files.  So I deleted those and waited until the next even-numbered hour, to see if backup-hour.sh would run.

Then I wondered whether I was saving crontab in the right place.  I noticed that nano, my default crontab editor, was saving it to /tmp/crontab.zfItNF/crontab.  Somehow, that didn't look right.  I did a quick search and found a variety of theories on where crontab should be, and none of them involved the /tmp folder.  Someone suggested typing "which crontab" at the prompt.  That came back with /usr/bin/crontab, which wasn't one of the options those other people had suggested.  I tried to save this crontab to /usr/bin/crontab and got a message that the file already existed.  I tried "gedit /usr/bin/crontab," but even with sudo I got a message that the file could not be opened.  I decided to pass on that one for the time being and, selecting a seemingly knowledgeable opinion, I thought about saving it in /var/spool/cron.  It turned out that there was a subdirectory there, and more specifically we had a file called /var/spool/cron/crontabs/ray.  When I looked in that, I saw a copy of my crontab file, the one that nano had been trying to save in a /tmp folder, except that it began with the line, "DO NOT EDIT THIS FILE - edit the master and reinstall."  So it seemed that maybe nano knew what it was doing after all.  So I let nano save crontab in that /tmp folder after all.  Then, following some advice, I decided to output the error messages, if any, to a crontab.log file.  So the first complete line in my crontab looked like this:

0 */2 * * * /home/ray/bin/backup-hour.sh 2>> /Folder1/crontab-errors.log

Unfortunately, at 4:00 PM, nothing happened.  I decided to follow the suggestion that it is much easier to use the gnome-schedule package (though there were problems for those who upgraded from Ubuntu 9.04 to 9.10 rather than doing a fresh install), so I installed that in Synaptic.  This gave me a new option at Applications > System Tools > Scheduled Tasks.  (Scheduled Tasks was apparently a simple front end for cron.)  But before Scheduled Tasks would work, I had to make sure that crontab and a program called At were installed.  Both were marked as installed in Synaptic.  Cron was for recurrent tasks, and At was for one-time jobs (e.g., run this "At" startup).

The basic idea, it developed, was that crontab -e would add scheduling files to the /var/spool/cron/crontabs folder, but it was apparently advisable to just let crontab -e do that, and not try to find and edit those files directly.  I tried typing "sudo /etc/init.d/cron restart," but that gave me a suggestion:  "Rather than invoking init scripts through /etc/init.d, use the service(8) utility, e.g. service cron restart."  So, OK, I tried that.  This gave me a long message that started with "restart: Rejected send message."  That didn't sound good, so I tried "ps -eaf | grep cron" and that gave me these three lines:

ray       8381  8343  0 19:46 pts/0    00:00:00 man 5 crontab
root      8409     1  0 19:50 ?        00:00:00 cron
ray       8416  8343  0 19:52 pts/0    00:00:00 grep --color=auto cron

When I had tried running that command previously, I had gotten only the second and third lines, not the first.  So perhaps the problem had been that my cron had not been running when I had tried to run it previously, and now it was running.  On that assumption, I could have just gone back to the command-line approach at this point, but I liked the added option of using At to schedule one-time events. But as I checked further, "crontab -l" said "no crontab for ray," so apparently crontab was *still* not running.  But no, one source said, "The output: no crontab for [username] means crontab is installed."

I found a long thread that told me I could use gedit instead of nano, to edit crontab, by typing this:

export EDITOR=gedit && crontab -e

(The "&&" part combined separate commands; apparently I could have achieved the same thing by typing these two on separate lines.)  Someone said I could make this permanent by putting the command in my .bashrc file, which they said I would find in /home/ray, which was true.  But if .bashrc hadn't been there, apparently I could have used gedit to create it, with something like this:

# .bashrc - bash config file #
# export variables
export EDITOR=gedit

Since .bashrc was already there, I just added those last two lines to the end of it.  But anyway, back at the cron issue, someone in that long thread said I could change my cron line to look like this:

* * * * * export DISPLAY:=0 && xterm [command]

if I wanted to see something on the screen when the command was executing.  But this got me back to the realization that, for all of the flexibility I was seeing as I worked my way through page 9 of that very long thread, I would probably prefer, right now, to just get something working.  So I guessed that Scheduled Tasks would work just fine if plain old crontab was working.  So, as others had done, I wrote up a simple command to test crontab.  The entry looked like this:

* * * * * export DISPLAY:=0 && xterm dir

Sadly, this did nothing.  I opened Scheduled Tasks, thinking I would try something similar there, and instead I saw that my crontab line was already there, albeit in ugly form: 

Recurrent    At every minute     export DISPLAY:=0 &amp&amp dir

Anyway, that didn't seem to be running, so I deleted it, in Scheduled Tasks, and tried doing similar as a one-time task.  Here's what I ran:

dir > /home/ray/DIRDIRDIR

and it worked!  I got a text file named DIRDIRDIR that contained a directory listing.  So it seemed the one-time part of Scheduled Tasks was working properly.  So I returned to the question of recurrent tasks.  I remembered that, in Ubuntu, we use "ls" rather than "dir."  So in Scheduled Tasks, I used more or less the same command to append updated directory listings each minute, showing the time when DIRDIRDIR had been last updated:

ls -l >> /home/ray/DIRDIRDIR

and that worked too.  So now I felt I should try again with the lines I had attempted earlier, as revised.  This time, I entered them into Scheduled Tasks rather than into crontab, so I didn't need the * * * * * parts of the entries.  So here are the commands I entered in Scheduled Tasks:

/home/ray/bin/backup-hour.sh 2>> /Folder1/crontab-errors.log
/home/ray/bin/backup-day.sh 2>> /Folder1/crontab-errors.log

I didn't actually enter them both right away; I started with the first one, and made it run just once, at nine minutes past the hour (which was about two minutes ahead of when I was working on it).  When the time came, it ran, or it seemed to:  there was now a file named "crontab-errors.log" with 0 bytes in Folder1.  That was good enough for now.  I tried another line, this time telling Scheduled Tasks to make it an "X application" rather than "Default behavior."  That didn't seem to do anything, but whatever.  It looked I had what I needed, and I could add more knowledge later.

Basic Ubuntu (Bash) Shell Scripts

In Ubuntu 9.04, I wanted to write a script that would execute an rsync command, so that I could put a brief reference to the script into my crontab file, instead of putting the whole long rsync command there.

For a brief, tiny moment, I was almost tempted to consider learning the GAMBAS (Gambas Almost Means BASIC) programming language, just because (pre-Visual) BASIC was the only programming language I ever learned.  Instead, I moved toward basic instructions on writing a shell script.  Here's what I wrote:

#!/bin/bash
# This is backup-hour.sh
# It backs up CURRENT to CURRBACKUP every few hours
rsync -qhlEtrip --progress --delete-after --ignore-errors --force --exclude=/.Trash-1000/ --exclude=/lost+found/ /media/CURRENT/ /media/CURRBACKUP

As the instructions said, the first line was essential to tell the computer to use BASH to interpret the following lines.  The next two lines were comments, and the final line (wrapping over onto multiple lines here) was exactly the rsync line I'd been using to do the backup.  In other words, learning how to write the command was almost all I needed to write the script.

The next step was to save it somewhere.  I had previously heard, and the instructions said, that the common place to put it is in your bin folder.  I went with that, but made a note that I would need to be sure to back up my bin folder, because I didn't want to go to the trouble of writing all these scripts and then see them vanish.

The usual location for the user's bin folder is at /home/[username]/bin.  In my case, that's /home/ray/bin.  Getting there in Nautilus (i.e., Ubuntu's File Browser, also started by typing "nautilus" in Terminal) can be confusing:  you can get there via the Home Folder option (assuming you're showing Tree rather than Places or something else (or nothing) at the left side of the file browser), or you can go to File System/home/[username]/bin.  Same thing, either way.  So I saved the script (above) in my bin folder as backup-hour.sh.  That turned the comment lines (beginning with #) blue in gedit.

Next, the instructions said, I needed to set permissions.  This was a confusing aspect of Ubuntu.  The documentation seemed to say that there were ten permissions that a person could set.  These were represented by ten hyphens or minus signs:  ----------.  I couldn't tell what the first one was for, but the remaining nine were divided into three sets of three.  The first three belonged to the owner, the second three to the group, and the last three to "other."  Within each set of three, the first one was for read, the second was for write, and the third was for execute (i.e., run it as a program).  So if you set the owner's permissions to read (r), write (w), and execute (x), your ten hyphens would now change to this:  -rwx------.  If you set all three parties (i.e., owner, group, and other) the same, they would look like this:  -rwxrwxrwx.

You could set the permissions using the chmod command.  I found an Ubuntu manual page on chmod.  It was not really that complicated, but it looked like it was going to require a time investment to make sure I had it right, and at this point I was getting impatient.  The basic idea seemed to be that you could use chmod to enter values of 4 (for read permission), 2 (for write permission), and/or 1 (for execute permission).  So, for example, you could type "chmod 755" and that would give a value of 7 to the first of the three users mentioned above (i.e., the owner), a value of 5 to the second of the three (i.e., the group), and a value of five to the third of the three (i.e., other).  The 7 would mean that you gave read + write + execute (4 + 2 + 1) permissions to the owner, whereas the 5 would mean that you gave only read + execute (4 + 1) permissions to the rest.  Since that's what the instructions suggested, I went with that.  To set the script with those permissions, I typed "chmod 755 backup-hour.sh."

I wasn't too sure of who the owner was (i.e., me or root), not to mention the group or other.  I mean, this was for my home computer.  Not a lot of people milling around, waiting to take my hard drive for a spin.  These kinds of options seemed to be set up for networked computers, where the "accounting" department might be a group that would own a file.  I found what looked like a good tutorial on file owners, and another interesting (yawn!) page about permissions, but fortunately I did not have time to work through them.

When I typed "chmod 755 backup-hour.sh," I got "cannot access 'backup-hour.sh': No such file or directory."  One solution was to use a change directory (cd) command to get the Terminal prompt into the bin subfolder, so it would see what I was looking for.  But since I planned to put more scripts into that folder, and anyway since I wanted cron or other programs to know right away what I was talking about when I referred to something like backup-hour.sh, I decided to figure out how to put the bin folder in my "path."  The path is the list of folders where the operating system looks for guidance on what a command means.  To change my path so that the system would always know to look in bin, they said I needed to find and edit my .bash_profile file.  Unfortunately, they didn't say where it was.  It wasn't easy to find.  I ran searches in Nautilus (both user and root), but while those were grinding away, I found that I could just type "locate .bash_profile."  That turned up nothing, but very quickly.  Then I got some advice that, if it didn't exist, I could create it by using "touch ~/.bash_profile."  So I did that, and then tried again with "chmod 755 backup-hour.sh."  Still no joy.  Ah, but maybe that was because I hadn't rebooted; backup-hour.sh would run only on startup.  OK, so I used the other approach after all:  I changed directory to the bin folder and tried again.  Now I got "Permission denied."  What if I gave everybody full permissions with chmod 777?  I tried that instead of chmod 755.  That seemed to do it.  The hard drive was doing its thing now.

I wanted to see what was going on, so I decided to create a log file.  I wanted it to store only the error messages, not to list the thousands of files that backup-hour.sh was backing up successfully, so I put this on the end of (that is, on the same command line as) my rsync command in backup-hour.sh (above):
2> /media/CURRENT/backup-hour.log

The log filename thus matched the script filename.  I put "backup" first so that I could see all of my backup scripts in the same part of the folder's directory listing, and then I set up a backup-day.sh script along the same lines.  New problem:  these backup scripts would generate empty log files if there were no errors, and I didn't want to have to delete them manually.  So I found a forum post with advice on how to delete them automatically, using the "find" command.  In my version, it looked like this:
find /media/CURRENT/ -name "*.log" -size 0c -exec rm -f {} \;

I put that at the end of the backup-day.sh script, and it seemed to work.  It said, basically, look in the CURRENT folder for files whose name ends with .log and have zero bytes; and if you find any files like that, execute the "remove" command without asking for permission.  I didn't know what that ending punctuation is about, but that's what the advisor suggested.

In my backup-day.sh (not backup-hour.sh) script, I included the instructions for updating my USB jump drive (above).  I also included a set of commands to save my e-mail (I was using Thunderbird in Ubuntu) as a .tar compressed file. Actually, as seven .tar files, one for each day of the week.  That part of backup-day.sh looked like this:
# Assign Thunderbird mail & profile to be backed up
backup_files="/home/ray/.mozilla-thunderbird"
dest="/media/BACKROOM/Backups/Tbird"
# Create archive filename
day=$(date +%A)
hostname=$(hostname -s)
archive_file="$hostname-$day.tgz"
# Back up to a tgz file
tar zcvf $dest/$archive_file $backup_files 2>> /media/CURRENT/A-INCOMING/T-bird-Backup.log
So these seemed to be the basic kinds of tools I needed to set up rsync scripts and crontab entries.

Ubuntu: Backup with Rsync

In a previous post, I got as far as concluding that rsync was the tool of choice for backing up my computer in Ubuntu 9.04. I didn't pursue it because I was short on time and patience for writing scripts at that point. But eventually the need for a regular backup system became acute. So this post logs the steps I took to make rsync and cron work for me.

First, here's what I wrote previously:
As an alternative to rdiff-backup, what people had actually mentioned more frequently was rsync. It did not have the incremental backup features of rdiff-backup, to my knowledge, but it seemed to be an established tool for backup purposes. So for now, at least, I thought I might try that instead. Once again, I did a Google search and got a package details page with no apparent link to any help files. Eventually I found what looked like the official rsync webpage and, after looking at their FAQs and some other pages, landed on their Examples page. It was intimidating.
This time, I went to their Documentation page. This gave me links to, among other things, Michael Holve's rsync tutorial. The tutorial said, "You must set up one machine or another of a pair to be an "rsync server" by running rsync in a daemon mode." I was curious, so I did a Google search for "what is daemon mode" and I got back, would you believe, exactly one page. One webpage in the entire known planet answered the question, "What is daemon mode?" Except it didn't really answer it. It just said, "It makes wget put standard output into a log file and not bug you while downloading." Accepting that as the best available answer (and ten points to the answerer!), I typed "rsync --daemon" in Ubuntu's Terminal and proceeded to the next step, "Setting Up a Server." After reading it, I decided it didn't seem to apply to me. It was for people who wanted to back up files between computers. I just wanted to back up to another drive.

So I went on to the tutorial's "Using Rsync Itself" section. Since I wasn't sure what daemon mode did, or if it was necessary, I killed that Terminal session and started another. I didn't know if that would shut off daemon mode, or if doing so was what I should do. I read the section and then checked another source of documentation, the rsync man page ("man" being short for "manual"). The man page would ordinarily be output in response to a Terminal command, but someone had put it here in html form, so that's what I used. It reminded me, first, to check Ubuntu's System > Administration > Synaptic Package Manager to make sure I had rsync already installed, here on my secondary computer. I searched Synaptic for rsync and got back a couple dozen listed programs; rsync was among them and was shown as being installed. I looked partway into the man page and got an answer to one question I had from the tutorial. So here's how I translated what the tutorial was telling me. First, the tutorial listed these lines:

rsync --verbose --progress --stats --compress --rsh=/usr/local/bin/ssh --recursive --times --perms --links --delete \
--exclude "*bak" --exclude "*~" \
/www/* webserver:simple_path_name

The first thing to know was that this all represented a single command line. It was too long to fit on one line, though, so apparently the trailing backslash said, "This line continues on the next line." The command would be typed into a file and saved as a script, not typed directly into Terminal. I didn't know why the first line didn't end with a backslash. I decided I would want to experiment with this in a relatively safe place -- with a junk directory on my secondary computer, perhaps -- to see what it was doing.

So as the tutorial explained it, the first line of this example told rsync how to proceed: verbosely (i.e., with lots of information about what it was doing), showing a progress report, with statistics. The rsh part was for encryption, to be used optionally if you were sending your stuff online to another computer. I wasn't, so I decided to try leaving that off. I also didn't want to compress the output, because that made the process slower and required more attention from the CPU.

The second line of the example, above, told rsync to recurse -- to work through all of my directories and subdirectories under the folder that I would be naming. It also told rsync to preserve file timestamps and file permissions -- so if, for example, a file was readable only by root on the source drive, it would be the same way on the target drive. The -- links command was an instruction to preserve symbolic links -- not sure what that meant -- and the --delete command, as I understood it, would tell rsync to delete anything on the target that wasn't on the source. So you'd have a mirror, and not just an accumulation of backups of files that you have deliberately trimmed out of your file collection.

The third line of the example told rsync not to bother copying some kinds of files. I liked the sound of that at first, but then I decided I would rather be able to do a Properties comparison of source and target and verify that both had exactly the same number of files. So I decided to leave out this line when I used rsync.

The fourth line of the example named the source and target locations. I wasn't going to be using it with a remote source or target, so mine was going to look somewhat different from this.

On that basis, here's what I assembled as a test version of the rsync example from above:

rsync --verbose --progress --stats \
--recursive --times --perms --links --delete \
/media/DATA/Source /media/DATA/Target

I created a Source subfolder in my DATA folder and put a TestFile.txt file into it. I also created a Target subfolder in DATA. Then I copied those three rsync lines into a file in Ubuntu's Text Editor (gedit) and saved it to Desktop as TestRun. To make it executable, I went into Terminal, typed "cd /home/ray/Desktop" and then "chmod +x TestRun" and then double-clicked on TestRun and said Run. And, you know, it worked. Just like that. Not exactly as intended -- I had not only TestFile.txt but also the whole Source folder underneath my Target folder -- but, yeah, there it was. I deleted the Target folder and ran it again and, sure enough, it created the Target folder and then inserted a copy of the Source folder into it. Excellent!

Now it was time to try something a little bolder. I wanted to see how it worked if I tried to copy the whole DATA folder to an external drive. This part was a little confusing. The external drive seemed to have two different names. If I looked in /media, its name was simply "disk." But if I hit the Computer icon in File Browser, it came up as "193.8 GB Media." I decided the latter sounded more specific, so I would try that first. So now the third line of my TestRun file read like this:

/media/DATA "/media/193.8 GB Media"

I used quotation marks because there were spaces in the name. I saved TestRun and double-clicked it again on the Desktop. It didn't seem to do anything. I realized that I had probably made a mistake in that line, and tried again like this:

/media/DATA "/193.8 GB Media"

That didn't do it either, so I tried again, without the leading slash:

/media/DATA "193.8 GB Media"

That still didn't work, so I tried the other approach:

/media/DATA /media/disk

Still nothing. I went into System > Administration > Partition Editor (GPartEd), wiped out the target partition, and recreated it as a FAT32 partition. Now the drive was totally invisible to Ubuntu. I went back into GPartEd and reformatted it as an ext3 partition. Then I realized: it was an IDE drive, so apparently it would not be recognized until I rebooted. I decided to reboot into Windows (I had a dual-boot system) and format it as NTFS. I named it 186GB (which seemed to be the net amount of space available in NTFS format) and rebooted into Ubuntu. I revised TestRun's last line again:

/media/DATA /media/186GB

and ran it again. This time, it seemed to be working -- the external 186GB drive was making noise -- but I wondered why I wasn't getting a verbose indication of what was going on. I guessed that, if I wanted the verbose information, I would have to execute rsync on the command line, not in an executable script. While I was rooting around for an answer to that question, I was reminded that I could also use the shorthand versions of these commands. So instead of typing --verbose into the script, I could just type -v and whatever other letters I needed. In this approach, the final contents of TestRun, which were as follows:

rsync --verbose --progress --stats \
--recursive --times --perms --links --delete \
/media/DATA /media/186GB

could instead be expressed like this, if I understood the man page's Options Summary section correctly:

rsync -vshlEPtrip --del --delete-excluded --force \
--exclude RECYCLER \
--exclude "System Volume Information" \
/media/DATA /media/186GB

In that version, I added a couple other options that seemed appropriate, and also told rsync to exclude (i.e., don't copy) those extra folders that Windows XP seemed to put on every drive. I revised TestRun along these lines and, when the external drive settled down and it looked like the foregoing TestRun process had ended, I ran it again. But it didn't seem to make any difference. The extra folders were still there. I had understood that it would delete them. Part of the problem seemed to be that I had not used the syntax correctly. I was supposed to use an equals sign: --exclude=RECYCLER. But another part of the problem was that it was not clear whether the "exclude" command was supposed to work with directories. It didn't seem so. The man page just referred to excluding files. I tried again with equals signs, but still no change. I posted a question on it, but the kind response was unfortunately not able to resolve the issue.

Next, I tried a modified version of TestRun on the primary computer. I went through several revisions and wound up with this version, which seemed to work:

rsync -vchlEtrip --progress --del --ignore-errors --force /media/CURRENT/ "/media/OFFSITE/P4 CURRENT"

The partition being backed up, in this case, was an ext3 partition named CURRENT, and the target to which it was being backed up was a USB external drive named OFFSITE. (Some weeks have passed since I started this post, so there may be some discontinuity in my writing at this point.)

I did not run this command as a script within a file called TestRun that I would start by double-clicking on it, because I discovered that you would only get the detailed output if you entered the rsync command on the command line. I was able to enter all of the foregoing rsync command on one line. I did not need the "exclude" commands because this was not an NTFS drive formatted by Windows. I still had ext3 "lost+found" and Trash folders that got copied over in this way, but they were small, so it was OK.

As I think I may have said before, I got the selected rsync parameters by typing "man rsync" at the command line. It did take some trial and error to get this particular set. The resulting backup, when checked by right-clicking and selecting Properties, seemed to be virtually identical.

When I ran that rsync command, it showed me lots of detail on what it was doing. It concluded with this message:
rsync error: some files could not be transferred (code 23) at main.c(977) [sender=2.6.9]
Eventually, however, I did figure out how to do it.  Here is an example of an rsync command that worked for me:
rsync -qhlEtrip --progress --delete-after --ignore-errors --force --exclude=/.Trash-1000/ --exclude=/lost+found/ /media/CURRENT/ /media/CURRBACKUP

This one would back up what Windows sees as drive D (named CURRENT) to a partition that Windows sees as drive G (named CURRBACKUP).  Both partitions had to be mounted in Ubuntu before this would work.  I used a similar command to copy a folder on CURRENT to a USB jump drive named KINGSTON.  That gave me a portable copy of the current state of that folder, ready to take along.

The next thing I needed to do was to back up my blogs.  I started by just wanting to be able to back up a webpage.  I had discovered that all of the posts on a Blogger (i.e., Blogspot) blog like this one could be displayed in a single webpage, at least if you had less than 1,000 posts.  To do that, you just needed to go to this URL:  http://blogname.blogspot.com/search?max-results=1000.  I wasn't sure what would happen if you entered a larger number than 1000.  So now that I had that webpage, I wanted to know how to save a copy of it automatically.  Strangely, at this point, Google searches for any of these sets of terms
ubuntu "back up a webpage"
rsync "back up a webpage"
rsync "copy a webpage"


produced zero hits.  Eventually, it started to look like this was because I was barking up the wrong tree.  As described in a separate post, it seemed that what I wanted for this purpose might be wget, not rsync.

Collarbone Surgery

This post has been moved to another blog.

Ubuntu Linux, VMware, 64-bit WinXP Guest: Getting Online

I was using Ubuntu 9.04 (Jaunty Jackalope). On Ubuntu, I was running VMware Workstation 6.5.2. In VMware, I had just installed 64-bit Windows XP. I was now trying to get online.

At first, I thought the problem was with my Linksys WRT54GL Wireless-G 2.4GHz 54Mbps broadband router. I inserted the Linksys setup CD and ran through its setup steps. It said, "Checking your computer settings, Please wait." After a minute or so, it gave me an error message: "Setup Wizard MFC Application has encountered a problem and needs to close." It did this repeatedly. But when I connected the computer directly to my DSL modem, bypassing the router, I still couldn't go online. Anyway, a post I saw somewhere said that the Linksys setup CD was to set up the router, not the computer. The router had already been set up from a previous installation, so that didn't seem to be the issue. I was able to access the Internet in Firefox in the underlying Ubuntu layer, so the problem was just with getting Windows connected from within the virtual machine (VM).

In VMware, I went to VM > Settings > Network. I saw that it was set to Bridged. I believed it was supposed to be NAT, not Bridged, so I changed it. I did the same thing with Network Adapter 2. I saved that and tried again to go to a webpage in Internet Explorer, but once again got "The page cannot be displayed." I connected the computer directly to the DSL modem again. This didn't seem to make any difference, but I left it that way for the moment, just in case I had more than one problem.

It occurred to me that maybe I was supposed to restart VMware in order for the changes to take effect, so I suspended the WinXP VM, closed VMware, and restarted it. Just to test it, I started a different VM and tried to go online. Internet Explorer worked with no problems in that machine, but still wouldn't work in the new 64-bit WinXP VM. I dug out my AT&T Yahoo SBC installation CD -- I had forgotten that I had such a thing, but I got reminded of it when I ran Start > Settings > Network Connections > New Connection Wizard > Next > Connect to the Internet. All the hardware was already plugged in, so I moved pretty quickly to AT&T's Software Installation dialog. When I clicked there, I got a message, "You need to install an Ethernet adapter in your computer." So the problem seemed to be that Windows was not recognizing the VMware virtual network connector.

Looking again at the VM settings, I noticed that the working VM only had one Network Adapter. There was no Network Adapter 2 there. The VMware FAQs said there could be problems if you had two network interface cards (NICs), so I deleted Network Adapter 2. This didn't help with the AT&T installation; I was still getting the message that I needed to install an Ethernet adapter. A Google search for that message didn't turn up anything.

I went to Start > Settings > Control Panel > System > Hardware > Device Manager, as I should have done at the beginning. There, I saw a yellow question mark and a yellow circle with a black exclamation mark next to "Ethernet Controller." I right-clicked on it and said, Update driver. The Hardware Update Wizard couldn't find a driver. Someone said they had resolved this problem by installing a new WinXP x64 VM using the 64-bit rather than the default 32-bit WinXP setup. I powered down the VM and checked in VMware's "Edit virtual machine settings" option for that VM. It showed that, under Options > Guest Operating System, I had already indicated that the guest was Microsoft Windows, Windows XP Professional x64 Edition. This didn't seem like it was the problem in my case, so I posted a question on it.

Long-Term Backup Verification: Beyond Compare, in Windows and Ubuntu

Some time back, I had looked into software that would verify that I was not losing data without realizing it. Data could disappear, as I have discovered, when files become corrupted but continue to look the same (until you try to open them). Data could also disappear if files quietly vanish through unnoticed mistakes (e.g., hitting Delete when an archival folder is highlighted). I had a backup system, in other words, but I lacked a way of checking whether anything might be falling through the cracks. The programs I had examined in my previous investigation had not turned out to be quite what I was looking for, so I still had this need.


Then I became aware of Beyond Compare from Scooter Software. BC had gotten a lot of very positive reviews from programmers and other users here and there. It came with a 30-day free trial offer, after which it would cost me $30. Amid praises that sometimes seemed to come from BC's own friends and/or employees, there were also references to Araxis Merge, which some considered much superior. After expiration of the trial period, it was available for $169/259 (standard/professional), but there was supposedly an academic discount of about 70%. Araxis Merge didn't offer a Linux version. There were also a number of other file and folder comparison tools, some of which were free but few of which offered CRC checksum calculation, which I wanted. I decided to start with BC and see how that went.

First Try: Ubuntu Installation

There were versions of BC for Windows and for Linux. In the spirit of my gradual, long-term effort to move away from Windows, I decided to start by trying the Linux version of the program. I was running 64-bit Ubuntu 9.04 (Jaunty Jackalope). Beyond Compare was a 32-bit program.

The 32-bit version may have been very easy to install. But it seemed I would have to make some adjustments in order to run this program on my 64-bit system. I found two different sets of advice on how to make those adjustments. One was for running BC on 32-bit Kubuntu 8.04. I figured it would probably work, if I wanted to try it. But the other was for Ubuntu 9.04, so I decided to try that one.

Following the latter set of instructions, I downloaded the .tar.gz version of BC. Ordinarily, it seems, it would have been necessary to download ia32-libs and libqt3-mt; but in my way of installing Ubuntu, Synaptic showed that these packages were already installed. I unzipped (technically, I guess, I should say untarred) the file by using the "tar -zxf [filename]" format instead of the "tar -vxf [filename]" format that I had previously decided I should use. (I was definitely still in a learning mode for purposes of Ubuntu commands.) Just out of curiosity, I deleted the resulting folder, went into a WinXP VM, right-clicked on the .tar.gz file, and told 7Zip (one of my Windows XP utilities, with a right-click context menu option) to unzip it. It did. Once again, I had that folder containing a file called bcompare-3.1.4.10554.tar. So if, like me, you weren't smart enough to just right-click on the tar.gz file and select Open with Archive Manager > Extract, you could do it this other way. Actually, in this case, the Windows approach may have been superior, because when I did belatedly try the Archive Manager approach, I got "An error occurred while extracting files." So I went back and did it with Windows again after all. This gave me a much larger .tar archive. I used 7zip in WinXP again on this .tar file. Now I had a regular folder called bcompare-3.1.4.10554.

I decided to continue trying the GUI approach. In Ubuntu's File Browser, I went into the unzipped folder and double-clicked on install.sh. I got a dialog asking, "Do you want to run 'install.sh', or display its contents?" I chose Run. Nothing seemed to happen. I went back to Terminal, navigated into that folder, and (reverting to the instructions) typed "sudo ./install sh". Now it seemed to install. At the end of its various messages, it said, "Please place the following in your .bashrc or .cshrc (etc.): export PATH=/home/ray/bin:$PATH," where "ray" was my username. It also said, "Executable is /home/ray/bin/bcompare." It was apparently telling me that I had to add /home/ray/bin to my computer's path, so that the program would know where to look when I typed "bcompare" (or whatever) to start the program.

I tried just typing "bcompare" right where I was in Terminal, but no joy. So I navigated over to where it said it had installed itself: "cd /home/ray/bin." Sure enough, there was a file called "bcompare." But when I typed "bcompare" there, I just got "command not found." Double-clicking on bcompare didn't do anything either. The installation instructions cited above said nothing about this. Was I supposed to make it executable? I typed "chmod +x bcompare" and then typed "bcompare" again, but this still just gave me "command not found."

It seemed I would have to figure out how to add something to my path, though I didn't understand what good that would be, if the damn thing wasn't executable. I found instructions that seemed to work, or at least they got me to an open .bashrc file. I didn't find any path lines in that file to use as a model, so I gathered that I was just supposed to type exactly what they said. I added it at the end of the .bashrc file as follows:

# add Beyond Compare to path [this is a non-executing comment]
export PATH=/home/ray/bin:$PATH

I saved and closed .bashrc. I opened a new Terminal session and typed "bcompare" at the $ prompt. It said "bcompare: command not found." I navigated to /home/ray/bin and tried again. Same result. I decided to back up and try the approach from that other webpage, the one that was supposed to work in Kubuntu 8.04 with the .deb download. Given ia32-libs and libqt3-mt (above), it now appeared that I had assumed that the installation of these libraries meant that the files I needed were installed in the right places -- that, in other words, Synaptic had already taken care of putting those lib files in their proper places. But now it seemed that I should have copied over program files from one folder to another. Putting this Kubuntu approach on hold, I reverted to the instructions from the approach I had already worked through. Specifically, the first command I apparently needed to enter was:

dpkg-deb --extract libqt3-mt_3.3.8-b-5ubuntu1_i386.deb libqt3-mt

changing the specific libqt3-mt file name as needed. But at this point, not knowing what particular file that might be, I said to hell with it and downloaded the Windows version instead. I then looked for a way to uninstall this version of Beyond Compare from my Ubuntu installation; but since I had not used Synaptic to install it, I did not seem to be finding uninstallation instructions that applied. So it's still installed.

Second Try: Windows XP Installation


I downloaded and ran the WinXP installer. Interestingly, they had an option to create a "portable install," which could apparently be put on a removable USB drive or wherever, without making any changes to the registry. Presumably it would still fail to work after the 30-day trial period unless I bought a license. But if I was going to keep the program, this would definitely be a useful form for it. So I went with that approach. (One special advantage of this approach, for my purposes, was that I could put it on a drive other than drive C, within my computer, and could therefore make it available to all of my virtual machines under VMware Workstation, without having to reinstall it on each VM.)

The program seemed good. I was impressed with the comparisons. When I tried to use its help feature, I got an error message, "Navigation to the webpage was canceled." I searched the database and didn't find anything, so I sent Scooter Software an e-mail about that.

The folder comparison feature was very smooth. Folders were color-coded according to whether they matched or not. Black folders matched -- that is, they were identical. Other colors seemed to indicate some degree of mismatch. Lacking the help feature, I looked for a user's manual on the website. All they had was a bunch of knowledgebase articles. I'm sure these were very helpful for some purposes, but their titles revealed none on the subject of folder contents or colors. Nonetheless, when I clicked on a purple folder, I saw that only one of its subfolders differed. Eventually, I found that I could actually configure my own preferred colors for the following statuses: same (i.e., the folders in the two comparison panes are the same), orphan, older, newer, and different. There were also color options for file comparisons.

Cool feature: when you click on a folder on one side of the comparison screen, the problem automatically opens the parallel folder on the other side. In other words, I'm looking into a subfolder on drive F, and it's opening up that folder for me; but at the same time it's also opening the comparison subfolder on drive H. With just this much knowledge, within about a minute after starting to fool with the program for the first time, I was able to detect that there was one varying file, nested seven layers down, in a folder containing almost 30,000 files. As far as I could tell, the difference was that the filename for the one had been truncated.

One thing that I didn't find was the ability to reload an old comparison log and compare it against a new drive. For example, suppose that, on December 31, 2007, I burn a CD to archive some files. I compare that CD against the source folder on the hard drive. Everything looks good. Then, sometime during 2008, something happens and it appears that I may have lost some stuff from the hard drive. What did I lose? I don't know, because the hard drive has changed by now, and unfortunately I can't find the CD. Ah, but if I could run a comparison of the hard drive's current state against the previously saved log comparing the hard drive and the CD on December 31, 2007, at least I could know what might be missing and take appropriate steps to replace or compensate for it. Another scenario: I back up my data every year, and now I want to see a single list of all the changes in my folders since 2003.

About this time, I realized that I had not actually examined the support forums at Scooter Software. As it turned out, there were several hundred threads in those forums. Some of them appeared to be exchanges between the proprietor of Scooter Software and his chief programmer, but whatever; it was still good to see the effort and interest in the product.

There was a lot more that I wanted to try with Beyond Compare, but I was not quite set up for some of that, so this is where the matter stopped for the time being.

Installing Ubuntu 9.04: External USB Drive

I was installing Ubuntu 9.04 (Jaunty Jackalope) in a Compaq CQ60-420US laptop computer from which I had removed the original Windows Vista installation. Although people generally advised against removing Vista and its recovery partition, I had several reasons for doing so: (1) I wanted to use Ubuntu (I would use WinXP in a virtual machine when I needed to run Windows-compatible software); (2) given the limited size of my laptop's hard drive, I could use the extra 10GB or so that was used by the factory-installed Vista recovery partition; and (3) Vista had a reputation of being slow, which was especially undesirable on the relatively underpowered hardware of a cheap laptop.

After I repartitioned and reformatted the hard drive, I found that I could not get the Ubuntu system to recognize the external USB hard drive. (That drive was not a USB flash drive; it was a regular internal SATA hard drive in a Rosewill RX-358-S SLV external enclosure.) First, I took a few steps, recommended in various posts, that did not seem to make much difference:
  1. Following some users, I typed (in Terminal) "sudo gedit /etc/modules" and then added "usb_storage" as a separate line at the end of that file, exit, and reboot.
  2. When that didn't solve the problem (though, for all I know, it may have helped), I typed "sudo mkdir /mnt/OFFSITE," where OFFSITE was the name I wanted for my external drive. Then I typed "sudo mount -a." But the rest of those instructions failed because the external drive still wasn't mounted.
  3. I tried the advice to use System > Administration > NTFS Configuration Tool > Enable write support for external disk. Possibly this step takes place automatically if you type "sudo apt-get install ntfs-config" instead of installing ntfs-config through Synaptic. But it did not solve the problem.
I also tried the advice to install ntfsprogs (now included in the Synaptic list, above) and then type "sudo ntfsfix /dev/sdXX" where XX designates the correct drive (e.g., c1). This gave me "Volume is corrupt. You should run chkdsk." Given the NTFS orientation of ntfsprogs, this appeared to refer to the chkdsk program in Windows.


I didn't plan to install any version of Windows on this computer. My desktop computer was dual-boot, but for purposes of simplicity and to preserve disk space on the laptop's relatively small hard drive, I wanted to make this a Linux-only machine. So to run chkdsk, I tried restarting the computer with a Windows installation CD. But this introduced a new problem. Each time I booted the computer and hit the space bar in response to the CD's message, "Press any key to boot from CD," it went through the process of loading various drivers and then, when it reached the point of saying, "Setup is starting Windows" (in the bar at the bottom of the screen), it gave me a blue screen of death (BSOD): "A problem has been detected and Windows has been shut down to prevent damage to your computer," etc. The bottom line of the message referred to "STOP: 0x0000007B." (Note that I was getting this error before reaching the point of deciding whether I wanted to enter Recovery Console.)

Research on this problem led to a Microsoft troubleshooting page. It identified a number of possible problems, but I was not sure any of them applied precisely to me, and I hated to spend a lot of time on a troubleshooting wild goose chase (TWGC). I did run self-tests of hard drive and memory in the laptop's BIOS setup; that is, I rebooted the machine, hit the Esc key to go into BIOS setup, and ran its diagnostics. Those tests found no problems with those pieces of the laptop's hardware.

It occurred to me, at this point, that possibly I needed a BIOS upgrade. I went to the support webpage for this laptop and discovered, there, a flaw in my plan to make this a Linux-only laptop: Compaq was not distributing firmware upgrades for its laptops in Linux form. To upgrade the firmware, I needed to be able to run Windows.

Before addressing that problem, I tried rebooting the laptop from the Windows CD without the external USB drive connected. This could potentially defeat the purpose of booting from the Windows CD (i.e., I wanted to use it to run chkdsk on the external drive), but now I was curious. This, however, did not make a difference. Evidently the external USB drive was not the reason for this STOP 0x0000007B message.

In that case, the Windows CD appeared to be objecting that it did not find a working version of Windows already installed on the laptop's internal hard drive. But that seemed silly. Of course there would be no version of Windows installed on a hard drive, if the purpose of running the Windows XP CD was to install Windows on that drive. How did this error message expect me to "run chkdsk /f" if I could not even run Windows?

Well, I could install WinXP after all, make it a dual boot machine, and deal with the inevitable GRUB issues, as Windows and Ubuntu fought over which system would load first and then, just when I needed the machine most, refuse to load either. Or I could give up on the idea of running Windows diagnostics or firmware updates on this laptop. But I had previously experimented with booting Linux from a USB drive (a/k/a "thumb drive" or "jump drive"). Since then, I had become aware of another Linux USB drive approach known as Penlinux, where you could either buy one of their pre-configured Ubuntu 6.10 USB drives (for about $45, which was less than the cost of a 2GB USB drive back in 2007, when they apparently came up with this); or you could make one for yourself. I had heard of similar things in Windows; I wondered whether I could boot Windows XP from a USB flash drive and do these various maintenance tasks that way. The discussion of setting up a bootable WinXP USB flash drive appears in a separate post.

When I did boot the system with the bootable WinXP USB flash drive in TXT mode, I went into Recovery Console. There, I got a screen I had not received previously: "Setup has recognized the following mass storage devices in your computer: [none]." At this point, I did not have the external USB hard drive connected, so apparently this was OK. I pressed Enter and got the ordinary Recovery Console. There, a "dir" command produced "drive is not valid" messages for all drives other than C and D. For C, I got "There is no floppy disk or CD in the drive." D was recognized as the USB flash drive. So Recovery Console, started by the USB flash drive, appeared unable to recognize even the mere existence of the Ubuntu-formatted internal hard drive. I typed "exit" and rebooted the system without the USB flash drive inserted. Ubuntu still booted up OK. When I pressed a key to try to boot from the WinXP CD, I still got the BSOD and the STOP 0x0000007B message, as above. So merely starting the system with the USB flash drive, by itself, did not seem to have made any helpful change in the system. I booted with the USB flash drive again. This time, after booting in TXT mode, I chose the Setup option instead of the Recovery Console option in the Windows XP Professional Setup process. Unfortunately, this directed me to that same message: "Setup has recognized the following mass storage devices . . . ." But this time, when I pressed Enter to continue, I got the partitioner: WinXP was willing to set itself up in the jump drive. Again, though, it didn't see any other drives.

* * * * *

I thought that I might need to create (and, optionally, delete) a WinXP partition in order to completely rid this hard drive of the traces of Vista. (I assumed Vista was the culprit; I had not seen these problems previously in WinXP.) I booted my Gparted CD and inserted a 15GB NTFS partition before any other, where drive C would normally be. I had hoped to use my new bootable USB flash drive to install WinXP on it, but that drive was still not booting this laptop successfully, so I tried to install WinXP on that partition using a slipstreamed WinXP SP3 CD instead. That again gave me the "A problem has been detected" BSOD (above).

It began to appear that the process of deleting all partitions and starting over was more difficult than I had realized. That Recovery partition was no ordinary partition. There seemed to be a process by which it could be deleted from inside Vista itself; but once the Vista partition was gone, so was that option. At this point, I discovered Hiren's BootCD, another apparently famous funky tool that supposedly contained an unbelievable list of disk utilities. I downloaded a copy (version 9.9) and burned it to CD. I booted it, and it gave me four options: Boot from Hard Drive, Start BootCD, Start Mini Windows XP (which might well have done the same thing as the BartPE boot CD, as I now realized), and Windows Memory Diagnostic. I chose Start BootCD and used it to run PartitionMagic 8.05 Pro. PM did not see anything on the drive other than unallocated space. I tried running another program, but got a mouse error; apparently at least some of the programs on Hiren's BootCD were best run after a fresh reboot. I rebooted and tried Mini Windows XP. It was able to recognize a USB flash drive, so I used one to copy over the DiskPart.exe program I had previously downloaded. An AvaFind search of my drives revealed, at this point, that WinXP already contained DiskPart.exe, although in an apparently earlier (or at least smaller) version (i.e., 160KB rather than 191KB). With the larger one onboard, I moved the USB flash drive to the target computer and tried to run it from within Mini Windows XP. Unfortunately, this gave me an error message: "Error creating process [msiexec.exe /l diskpart.msi]. Reason: The system cannot find the file specified." I found msiexec.exe on the other (working) computer and copied it, and the older DiskPart.exe (i.e., same date as the msiexec.exe), to the flash drive. I tried again with this on the target computer. This did give me the DISKPART prompt and the options described in my other post. I typed "list disk." It showed that I had only Disk 0, which was right (if you don't count the jump drive). I typed "select disk 0" and then "clean." After a moment, it said, "DiskPart succeeded in cleaning the disk." I typed "list volume" and it did not show any volumes on the hard drive. So it seemed that it had indeed cleaned the drive. To test it, I tried something that had failed previously (see, again, my other post): I typed "exit" (to get out of DiskPart.exe), inserted the Windows XP installation CD, and tried to boot it. I got "NTLDR is missing" because, silly me, I had not yet removed the USB drive. I removed it and tried again. The disk churned for a long time and then gave me the familiar old BSOD. The older version of DiskPart.exe had failed to fix the hard drive. I restarted Hiren's BootCD and went into Hard Disk Tools > HDD Regenerator > Scan and repair. It ran for about an hour and detected no errors. (At this point, and repeatedly after this, I tried a number of other tools on Hiren's BootCD, not listed here.) Seagate SeaTools for DOS told me that it was a Seagate hard drive. I used that program's Advanced Features > Erase Track ZERO. I went into the Fujitsu low-level format tool, also on Hiren's BootCD, but it did not seem to recognize any drives. I tried rebooting the Windows XP installation CD again, but it still gave me a BSOD. In Hiren's BootCD, I went back into Mini Windows XP > BootCD WinTools, but didn't see anything that looked useful. I tried Start > Programs > Check Disk, but it did not find a volume to work on. Taking a different approach, I restarted the computer and chose Hiren's BootCD option (rather than Mini Windows XP) > Partition Tools > PartitionMagic Pro. There, I selected the unallocated space (i.e., the only thing listed) and created an unformatted primary partition that used up all of the unallocated space. I rebooted the WinXP installation CD, but still got a BSOD. Back in Hiren's BootCD, I tried the Active Partition Recovery program. It said this:
Logical C:

Detected file systems:
from partition table: FAT32 (LBA)
from BOOT sector: Unknown
Get file system from BOOT sector? [Y/N]
I said Yes. Next, it gave me an option to Perform Extended Disk Scan, with this note:
Extended Disk Scan may detect partitions being deleted even if you have created new ones instead, formatted and used them!
The screen also showed two partitions within HDD 80h. I arrowed down so that the first of them was highlighted: "Logical C." This gave me an error message: "Error reading sector # 0 or BOOT sector is invalid." When I highlighted the second one, "Unallocated," I got no error message, but it was weird that this partition was only 2.49MB. Anyway, I went back and ran the Extended Disk Scan. It was slow. After a while, it detected a 14.9GB NTFS logical drive. The program gave me the option of adding this partition to the drives list. I said yes. There was another one, a while later, a 35GB partition. Yes to that too. Both of these, I thought, were WinXP partitions that I had created, during the past month or so of fooling around with this laptop. As the scan reached the halfway mark, I was getting concerned that it hadn't yet found either the original Vista partition or the original Recovery partition. But then, about three-fourths of the way through, it did find a 10.9GB partition. The whole process took maybe six hours, on a 250GB drive. When it was done, I had three logical drives and two unallocated portions. Now what? I thought maybe I would take a look at these partitions in one of the other programs there on the Hiren's BootCD, so I hit Esc. This gave me an option of writing the detected partition information back to the hard drive. As I reflected on the matter, I decided that these former partitions were sort of like a former girlfriend: sure, things hadn't been so good with them, but then, things hadn't been so great without them either. So I decided sure, why not, let's take them back. Just like that, they were written to the hard drive, and I was rebooting. I went back into the partition tools, there on Hiren's BootCD, and took another look via PartitionMagic. Now PartitionMagic said it had "detected an error 116," and offered to fix it. I said sure. Now there was another error, partition table error #108. PartitionMagic was willing to format the single partition that it did find, so I went with that. It said it was "formatting partition *: (BADMBR)," but then it gave me Error #4, "Bad argument/parameter." Bad argument, indeed: this really was turning out like a reunion with an ex-girlfriend!

I decided to fight fire with fire. Vista had created this problem; maybe Vista could undo it. Probably the clearer reasoning would have been, they did this at a factory, so I need a factory to fix it. But I wasn't reasoning along those lines. I had gotten my hands on a copy of Microsoft Vista, and I was going to install it and see what would happen. The Vista DVD gave me an option of recovering my system, but I didn't have any previous images it could work from, so I went ahead with the installation option. Vista detected the five partitions or unallocated spaces that Active Partition Recovery had restored. I told Vista to delete each of those partitions. Now I had one big 232.9GB unallocated space. I went ahead with the installation of Vista into that big empty space; maybe I should have used the option to format that unallocated space first. But the main thing was that Vista installed, where WinXP had failed to do so. That was promising. Maybe it wouldn't even have been necessary to install Vista; maybe WinXP would have installed OK after I ran Active Partition Recovery. But maybe not; after all, PartitionMagic was still confused. Speaking of which, I decided to run it again, after Vista finished installing itself. This time around, when PartitionMagic on Hiren's BootCD detected an error, I didn't let it fix it; I now dimly recalled that PartitionMagic never got entirely comfortable with larger hard drives. Maybe if I hadn't let it fix the imaginary error the last time, it would have worked OK. But probably not: as before, it still thought we had a partition table error #105. I took out Hiren's BootCD and put in the Windows XP installation CD, and rebooted. But, wouldn't you know it, I still got a BSOD. Vista was happy with the hard drive; Ubuntu was happy with the hard drive; but WinXP couldn't be satisfied. Well, I had a Seagate Seatools CD from mid-2007. I assumed it was newer than the Seatools for DOS found on Hiren's BootCD, but perhaps not. What could this Seatools CD tell me? I rebooted with that CD, and I discovered that it could tell me, "No Hard Drives Found."

It seemed that Vista had its own special way with my hard drive. This called for a choice. I could leave Vista alone, giving it its own 15-20GB partition - that is, I could venture into the land of the Vista-Ubuntu dual boot. That might be the eventual outcome, but first I felt it was a good idea to screw around some more. I thought I might start by using Gparted to reduce the size of the Vista partition, as if I really were going to set up a dual-boot system after all, and see what happened then. So that's what I did. I booted with my Gparted CD, removed the 1MB unallocated space prior to the Vista partition, shrank the Vista partition to 30GB, inserted a 30GB ext3 and 5GB swap partition after that, and left the rest for an ext3 partition for data. Then I rebooted into Vista. I got Windows Boot Manager, telling me this:
Windows failed to start. A recent hardware or software change might be the cause. To fix the problem:

1. Insert your Windows installation disc and restart your computer.
2. Choose your language settings, and then click "Next."
3. Click "Repair your computer."
I did this, using the Vista DVD. I chose the "Repair and Restart" option, removed the DVD, and rebooted. Vista ran a disk check and then came up OK, but wanted to reboot because of hardware changes. After that second reboot, it seemed to be functioning normally. I rebooted with Gparted, to see what had changed. There was now a 6MB unallocated space after the Vista program partition. So possibly what screwed things up was that I removed that 1MB unallocated space that had been before the Vista partition. I ran a check on the two ext3 partitions in Gparted. They were OK. I rebooted with the WinXP CD and, as usual, I pressed a key to boot from the CD. This time, I got this:
Windows Boot Manager

Choose an operating system to start, or press TAB to select a tool:
(Use the arrow keys to highlight your choices, then press ENTER.)
Vista (recovered) was the only option. I restarted the computer and tried again to boot from the WinXP CD. Still got a BSOD. I could have uninstalled Vista and reformatted the drive, just to see if there was somehow a "right" way to remove Vista, thereby making the drive once again accessible from the WinXP installation CD. I decided instead to get my Ubuntu installation in place in a Vista dual-boot. Maybe I wouldn't need to go back to WinXP or worry about it anymore, and could perhaps even replace the Vista partition with an occasional Vista boot from a USB drive (for purposes of e.g., upgrading the laptop's firmware).

In that case, the final question was whether Vista and Ubuntu could work with the external USB hard drive. That, you will recall, was the question that triggered this expedition.  By this point, however, I had begun to suspect that the problem was with the external hard drive, which appeared to be defective.