Tuesday, September 23, 2008

Ubuntu, GRUB, and Acronis True Image: Restoring

I was in the process of working through my Ubuntu and Windows XP dual boot installation, when I encountered a problem that probably most dual-booters encounter: the periodic need to replace a confused WinXP installation with a fresh new one had incidentally messed up my GRUB bootloader, so that I could no longer choose to boot into Ubuntu Linux. This post describes the steps I took to solve that problem, and concludes with what I learned from the effort. I had resolved this problem once before. I looked at my previous notes and saw that apparently I had used the Super Grub Disk (SGD) to solve this problem. Attempting to recreate that solution, I booted my copy of the SGD and selected Boot & Tools. That didn't seem to have what I needed, so I went back and chose Advanced > GRUB > Restore GRUB in Hard Disk (MBR). That seemed to be the place. On one of its pages, it said this:

Example: ======== Has Windows rewriten your MBR? Now you will be able to boot your Gnu/Linux AFTER rebooting from Grub Super Disk.
(The typographical error was theirs, not mine.) I hit Enter and then chose the recommended Automatically Install option. This resulted in a message that looked like one that had flashed past me when the SGD was booting. Here, it said,
Booting 'trying /grub/stage1' findf /grub/stage1 Error 15: File not found Booting 'trying /boot/grub/stage1'
and so forth. It said it was running "setup (hd0)" and then "Checking if [various boot grub stages] exists . . . yes" and then running stages 1.5 and 1, and finally "SGD as succeeded!" I hit Enter again and then kept selecting SGD's "Go back" options until I got options to Quit and then "Reboot P.C." But no, it actually hadn't succeeded. On reboot, the machine again defaulted automatically to Windows. I rebooted, to watch more carefully and see if I had missed an option. Nope. It just went straight to Windows. I found a How-To Geek webpage with instructions on how to reinstall GRUB after a Windows reinstallation wipes it out. It said to boot from the Linux live CD (i.e., the one I installed from). This meant choosing the option that said, "Try Ubuntu without any change to your computer." Before I could proceed with the next steps, I had to figure out where I had installed GRUB during the Ubuntu installation. One thread said that operating systems tend to put it on the first partition of the first hard drive by default. I didn't recall putting it anywhere else. I guessed this must have been one of those options for which I accepted the default value. So I could proceed with the How-To steps without change. The recommended next steps, then, were to type these lines in Ubuntu's Applications > Accessories > Terminal:
sudo grub root (hd0,0) setup (hd0) exit
and then reboot. I started doing that, but when I typed "setup (hd0)," I got "Error 17: Cannot mount selected partition." I typed "exit" and got "Error 27: Unrecognized command." The correct command, I eventually figured out, was actually "quit," not "exit." Next, I went into System > Administration > Partition Editor. There, I was reminded that Ubuntu (or at least Gparted) referred to partitions as sd, not hd, and I also saw that Windows was installed at /dev/sdc1 and Ubuntu was at /dev/sda6. Another advisor seemed to confirm my understanding that these would be translated as hd2,0 and hd0,5. That is, you change sd to hd and then, starting with 0 in both cases, you assign a number to the letter (a = 0, b = 1, etc.) and a number to the number (1 = 0, 2 = 1, etc.). But this didn't tell me where GRUB had been installed, and I wasn't seeing any clarification on that in the several threads I examined. The error messages had indicated that GRUB could not mount hd0, so maybe that meant it wasn't on the drive where Ubuntu was installed. So, OK, I tried the foregoing sequence of commands again, this time focusing on hd2,0, where Windows was installed. When I typed "setup (hd2)," again I got "Cannot mount selected partition." I then realized that, the first time, I had indicated "root (hd0,0)" whereas I had just figured out that Ubuntu was at hd0,5. So I tried again, and this time the results were different. What I typed was this:
sudo grub root (hd0,5) setup (hd0)
When I typed that, it gave me the same sequence of notes as SGD had given me, above. That is, it said, "Checking if [various stages] exists . . . yes," and it ended with "succeeded" and then "Done." So I typed "quit" and rebooted without the Ubuntu live CD. But, dammit! Still no GRUB menu. I found a Fedora thread that recommended steps to take if you were using Vista instead of WinXP and if you wanted to use the Windows boot loader instead of GRUB. That seemed a little bit removed from my situation, so I went to the next post, which provided a solution for those using Gentoo Linux. Another post translated that one, somewhat, into Ubuntu terms. The core of those instructions seemed to be this: (1) Boot with the live CD. (2) Delete the Stage 1.5 files. (3) Reinstall GRUB. To do step (2), in more detail, I went to Terminal and typed "sudo -i" and then nautilus. In File Browser (i.e., Nautilus), I navigated to "36.7 GB Media," which seemed to represent the hard drive where I had installed Ubuntu. Once there, I went into boot > grub. There, I selected all files with stage1_5 in their names, and deleted them. Then, in that same folder -- moving on to step (3), here -- I opened menu.lst and searched for "(hd" (without the quotation marks, but with the opening parenthesis) to find the places where there might be a hard drive command. Not counting the commented lines (i.e., those beginning with the # symbol), there seemed to be a section that designated the Debian (i.e., Ubuntu) GRUB menu items and another section for the Other (i.e., Windows) menu items. I wasn't going to change the ones for Windows, so I focused on the Ubuntu part. It said, in several places, that root was at (hd1,5). Using the approach described above, I translated that as sdb6. I went to System > Administration > Partition Editor and observed that sdb6 did not exist on my system. So there, it seemed, was a problem. As noted above, the Ubuntu location was supposed to be sda6, i.e., hd0,5. Not hd1,5. But I wasn't even getting the GRUB menu, so I thought the advice was probably correct as far as it went: I still needed to reinstall GRUB. The advice seemed to be that this menu.lst file gave me the information I needed for that purpose: it said that GRUB was installed at hd1,5. So I should have typed hd1,5 instead of hd0,5 in the sequence of commands described above. But that made no sense, because there was no such thing as hd1,5. I decided to stick with the installation I had apparently already done at hd0,5, and change those three root references in menu.lst to hd0,5 instead of hd1,5. While I was at it -- relying, again, on the information found in GParted -- I changed the root reference in the Windows part of menu.lst from hd1,0 to hd2,0, because that (i.e., sdc1) was where my Windows program files were installed. In other words, I was basically banking on the theory that what I needed to do, besides deleting the Stage 1.5 files and reinstalling GRUB, was to correct erroneous references in menu.lst. Then I saved menu.lst and told the Ubuntu CD that I wanted to restart the computer. The machine started to shut down, but then just died at a black screen with a flashing cursor. Weird. I tried to remove the Ubuntu CD, but it wouldn't come out. I punched the reset button and removed it. Still no GRUB menu; the machine booted straight to Windows again. I rebooted with the Ubuntu CD. It occurred to me that the computer was automatically looking at the Windows installation, and that that's where I should be installing GRUB. So I went back into Terminal and tried this sequence:
sudo grub root (hd2,0) setup (hd2)
That gave me Error 17 again, "Cannot mount selected partition." So, OK, by this point I was really mixed up. What if I left root (hd2,0) as it was (not knowing what this command achieved) and tried again with setup (hd0), as someone else had supposedly done? But no, that got Error 17 too. Root (hd0,5) and setup (hd0) was the only combination that seemed to work. I looked at menu.lst again and didn't see anything else to change. I tried rebooting again without the Ubuntu CD. This time, the reboot went normally, without freezing up; but it still went immediately to Windows. I needed different advice. I found some Ubuntu documentation that addressed several different scenarios. The first steps were -- you guessed it -- boot with the Ubuntu live CD, go into Terminal, and type "sudo grub." This time, though, before going on to type the root and setup lines, they had me type "find /boot/grub/stage1." The answer that came back was hd0,5. I was instructed to type this (as I had already done) in the root command, so I entered "root (hd0,5)" and then "setup (hd0)." Then I quit and rebooted. As expected, I got the same outcome as before: booted straight into Windows. I restarted the computer. But when I rebooted with the Ubuntu CD this time, after choosing the "Try Ubuntu" option, I got billions of error messages. They were zipping by too quickly to read, but the basic idea was like this:
[ 256.923837] SQUASHFS error: Unable to read page, block 250ab9de, size d104
Something like that, anyway. I found a thread that went through various possibilities; their basic idea was that the CD or the CD drive was screwed up. But I didn't pursue that because, meanwhile, I punched the reset button and tried again. This time, no problem: the CD booted Ubuntu. I went ahead with the next possibility offered by the Ubuntu documentation page: "Overwriting the Windows Bootloader." Here, they told me to type "sudo -i" and then "fdisk -l" (that's an L, not a one) to see where Ubuntu was installed. That command indicated that I had a Linux partition at sdc6. Say what? That would be hd2,5. Next, they told me, type "mkdir /mnt/root" to make a mountpoint. Then mount the partition with "mount -t ext3 /dev/hda2 /mnt/root". But -- what was this "hda2" supposed to represent? Did they mean sda2 (in their example), or did they mean hd1,2? Their "fdisk -l" produced references to hda2 and such, whereas mine had produced references to sdc5, sdc6, and so forth. Confusing! But they said we could try it out and we'd find out if it wasn't correct, so in place of their hda2 I typed my sdc6. (Complete command: "mount -t ext3 /dev/sdc6 /mnt/root".) That didn't get an error message, so I went to their next step, which was to type "ls /mnt/root" and see what I got. I think the idea of this command was to show me what folders existed under my mount point. The folders were more or less like theirs -- I had a bin folder, a media folder, etc. So apparently I was on the right track so far. I hadn't made a separate boot partition -- all my Linux program stuff was in that one partition -- so, as they seemed to intend, I skipped the part about mounting a boot partition if you have one. Next, they said this:
Now that everything is mounted, we just need to reinstall GRUB : sudo grub-install --root-directory=/mnt/root /dev/hda
But since I was replacing their references to hda with sdc, I typed exactly what they had, except the last part of mine was /dev/sdc. That gave me what looked like an error message: "/dev/sdc does not have any corresponding BIOS drive." They said, if you get BIOS warnings, type exactly the same thing but with a space and then a "--recheck" at the end. So I hit the up arrow (to recall the command) and typed that on the end. It said, "Probing devices to guess BIOS drives." Then it said a bunch of other stuff, including "Installation finished. No error reported," along with a list of devices that looked right. That's what the advice page said I should get. So then they said,
Now you can reboot and the GRUB menu should appear. If you see a warning message regarding XFS filesystem, you can ignore it.
So I rebooted without the Ubuntu CD. I got the SQUSHFS error again, but it was different this time: it gave me only a series of errors and an instruction to remove the CD and reboot. I did, but now I booted directly into Windows again. So we had achieved nothing. They offered another approach on the Ubuntu documentation page. They said you could download the Auto Super Grub Disk, run it within Windows, then reboot. When I ran it, it gave me a bunch of options. It didn't do anything for a minute, but then Spybot popped up and told me a command had been entered regarding UNetbootin Uninstaller. I told Spybot this was OK, and to remember this decision. This gave me a dialog:
Reboot Now? After rebooting, select the UNetbootin menu entry to boot. Reboot now?
I clicked OK. The Ubuntu documentation said to do nothing until you see your GRUB menu again. I did see that menu. I let it default into Ubuntu. But this gave me an error:
root (hd0,5) Error 22: No such partition Press any key to continue
So, ah, some of my editing must have screwed up something. I looked for the "Any" key. (Kidding.) I pressed a key, and this took me back to the GRUB menu. By the way, this was a modified GRUB menu; it had options to edit or reload GRUB commands, get a command line, etc. The last bit of Ubuntu advice on this was to say "yes" next time I booted into Windows using this menu, and that would remove this funky little Auto Super Grub Disk installation. But apparently we weren't quite ready for that; I had to figure out how to fix the Ubuntu installation first. I started with "e" to "edit the commands before booting." Since I'd had WinXP highlighted when I did that, it seemed to take me to the part of menu.lst that had to do with Windows. I hit Esc to get back to the menu. This time, I highlighted the top entry, the regular Ubuntu boot line, and I hit "e" there. I typed "e" again there to edit the first line, changing it back to "root (hd1,5)." I hit Enter and that saved it. I hit Esc then, to go back to the main menu, and hit Enter on that first Ubuntu option to boot it. But my change had not been saved. I tried again. This time, after saving the change, I went down to the next line in menu.lst before hitting Esc. That wasn't the solution either. Third time: after chaning it to "root (hd1,5)," I pressed "b" to boot. Hey -- that worked! Ubuntu was booting up! I got a bunch of command-line details that normally would have been invisible, possibly because Auto Super Grub Disk was still doing its job, but otherwise we were OK. I logged into Ubuntu and went back to /boot/grub/menu.list as root (i.e., sudo -i), to change those other two lines back to hd1,5. Obviously I did not fully understand the translation of sd to hd, but whatever. But when I got there, all three lines were still hd0,5. Apparently my change via the Auto Super Grub Disk (ASGD) edit menu did not result in a change in menu.lst. Sooo ... was all this advice about menu.lst a wild goose chase? It seemed that I needed to make my changes for the second and third lines of the GRUB menu via ASGD, not here in menu.lst. (By the way, the stage1_5 files were back, here in /boot/grub. Not sure what part of this mangled process had reinstalled them.) I rebooted and, just for the hell of it, told ASGD to edit the first boot option, the regular Ubuntu thing. Stupid thing still said "root (hd0,5)." So it seemed that I had been able to boot with a temporary override of menu.lst, and that maybe I should have edited menu.lst back to root (hd1,5) after all. I re-edited and booted this temporary fix so I could get back into Ubuntu and do that. I changed the Ubuntu root lines back to hd1,5 and also changed the Windows root line back to hd1,0. When that was done, I rebooted and took another look at the GRUB menu. Now each root line looked right. So far, the moral of the story seemed to be: forget about everything else; just use ASGD to solve this problem. I booted with each line. Regular Ubuntu worked; Recovery Mode Ubuntu worked (I didn't test its dpkg, root, or xfix options); the memtest86 option worked; and the Windows option . . . did not work. I got this:
Error 12: Invalid device requested
Was hd2,0 the correct answer after all? I tried that. No, this time it said, "This is not a bootable disk. Please insert a bootable floppy and press any key to try again." Floppy -- what? Where was Windows? I used a panic-combination of Esc and Enter to bail out of that. Back at the ranch, the question was where I should tell the computer to look next. Or maybe the problem was in these lines:
map (hd0) (hd1) map (hd1) (hd0)
This, it seemed to me, was telling the system to treat hd0 as though it were hd1, and vice versa. Maybe I needed to edit these lines, too, when I was changing the root line to hd2. I tried that, basically changing all references in this part of menu.lst from hd1 to hd2. But no, I still got "This is not a bootable disk." Oh, but now the thing was really fubar: it said "GRUB Loading stage1.5," and it just hung there. I punched the reset button. We were back to the point of editing GRUB for Windows. Whew. Now, I needed outside help. I found a thread where somebody explained those map commands:
Windows will not like being booted from any other partition than the first on the first disc. You should be able to use GRUB to fool Windows into thinking such, by using the map commands.
Another option that some people had was to use rootnoverify instead of just root. I tried editing this portion of menu.lst to that effect. I still got Error 12. Since I had now seen other posts indicating that my understanding of the translation from sdc to hd2, I was more confident that the reference really should be to hd2,0, not hd1,0. So I made those changes again, along with the rootnoverify change. This gave me the "not a bootable disk" error again. I thought I would try to make some of these changes permanent, even though they were not working, so that I did not have to keep re-editing menu.lst here in the GRUB menu. So I selected the first GRUB line and hit Enter to go into normal Ubuntu. But now, what's this? I got "Error 22: No such partition." I edited the line. It was saying "root (hd1,5)," which is what I had said and what had worked, but now it was not working and, in my opinion, it should never have been working. I changed it back to hd0,5 and tried again. But now that got an Error 22 too. Bizarre! The Recovery Mode option was still saying hd1,5, so I tried that. Error 22. Jeez. I hit Reset and booted the Ubuntu CD, to try to figure out where we were now. Then I got the idea to check my hard disks. I had seen some references to differences between the old ATA (or PATA) and the newer SATA drives, and now I was confirming that one of my three drives in this machine was a PATA. Could it be confusing the issue? The system was treating the 320GB PATA as sda. I didn't intend to keep all three drives in the system; I just had not gotten around to rearranging things. So now was my opportunity. I used GParted to make partitions on the unused SATA drive and, after some fiddling around, was able to move my NTFS partition data from the PATA drive over to that SATA drive. I shut down the computer, unplugged the PATA drive, and also switched the cables for the two SATA drives, so that the Windows drive would hopefully now be the first drive in the system. I also took advantage of the opportunity to test Acronis True Image as a restorer of an Ubuntu installation. That is, I inserted the Acronis CD and tried restoring my most recent Acronis True Image .tib backup, to see how well it was working. It was only a few days old, but it would predate the messing around I had lately done with GRUB, so possibly it would get me back closer to a clean slate in that regard. In restoring, I decided not to restore MBR and Track 0; I wasn't sure where those would go or what they would overwrite, and anyway I had now relocated the Ubuntu partition, so I would doubtless have to be editing that stuff. I just restored the Linux partition to the new location. Incidentally, I also made all the new partitions on the second SATA drive to be primary partitions, as were most or all of the partitions on the first SATA drive. I was pleased, by the way, that Acronis seemed willing to try to restore the Ubuntu partition even though the backup had been a 21GB partition and the new one was only 12GB. It seemed to be aware, or at least willing to try, to see if it needed more than 12GB to do the restoration. (It didn't; I had less than 7GB worth of files on the Ubuntu partition.) When it was done, I took out the Acronis CD and rebooted. I got "Error loading operating system." This hadn't happened, the last time I had restored Ubuntu with Acronis. That time, I had gotten a GRUB menu. So I thought what I would do first, this time, would be to boot with my Super Grub Disk (SGD), and see if that would give me back a GRUB menu. There, I went into Advanced > GRUB > Restore GRUB in Hard Disk (MBR) > Automatically Install. It said, "Done. SGD has succeeded!" Now what? There weren't any options. I hit Enter and then kept hitting the Go Back option until I got to Quit > Reboot P.C. I did that and removed the CD. What do you know: it worked. I had my GRUB menu. Now I chose a regular Ubuntu boot. "Error 22: No such partition." OK, how about Windows? The UNetbootin option was still there -- I hadn't uninstalled the Automatic SGD yet -- but by the time I was done typing these notes to keep up with it, it had already decided something and reverted back to the GRUB menu. I tried Windows again, but this time I got "Error 13: Invalid or unsupported executable format." Now what in the world was that all about? I needed to get a grip on what should be happening. I punched Reset, booted the Ubuntu CD, and went into Partition Editor. No surprises there: the Windows partition was now sda1 and the Ubuntu program partition was sdb1. As root, I went to menu.lst and changed the Windows root line to (hd0,0), and commented out the map lines in that section (because now Windows was in the primo position, drive zero partition zero, as God intended), and I changed the Ubuntu root lines to hd1,0. I rebooted without the Ubuntu CD. This time, selecting Ubuntu at the GRUB menu gave me "Error 17: Cannot mount selected partition." Windows was still hung up at Error 13. Gee, a whole new category of error messages to screw things up. OK, what did the authorities say? One source said that, here at the GRUB menu, I needed to press "c" to get a command line, and then type "find /vmlinuz". It said "(hd0,0)." The source said that this was supposed to be the root line for the Ubuntu sections in my menu.lst. So I hit Esc to get out of the grub> prompt (for some reason, "quit" didn't work here) and then "e" to edit the normal Ubuntu boot line in GRUB. And yes, that booted. So I went into menu.lst and changed that for all three Ubuntu lines. Now, rebooting back to GRUB, I tried the Windows option. Still Error 13. My Google search for this one led to a bunch of very confused people who didn't seem to be getting anywhere. I tried commenting out the makeactive line in GRUB for the Windows boot option, but that just provoked a reboot. One of them suggested that you could use Super Grub Disk (SGD) to do a FIXMBR. That was new to me, so I tried it. I chose SGD's Windows > Fix Boot of Windows option. Working through the steps on this introduced me to new information. Apparently "natural Linux" would refer to a partition as hda or hdb, while "IDE Linux" would call it sda or sdb, and SCSI Linux would call it hd0 or hd1. Anyway, after I indicated sda or hd0 or whatever you want to call it, and ran the procedure on that, I had to find my way back to SGD's Windows Basic options. There, I selected "Boot Windows." As the commands flashed by, I noticed that it ran "rootnoverify" and some other stuff and then put me back at the GRUB menu with just two options: Microsoft Windows XP Professional or UNetbootin. I tried the Windows option. Windows booted! Alright. We were getting somewhere. I felt I could probably fix any remaining Ubuntu GRUB problems now. But as I was typing these words, I noticed that time was passing and, you know, Windows was not actually completing the bootup process. It was just sitting there on that black screen with the Windows XP logo staring at me and the little progress bar rolling along. Then, after maybe five minutes, that went away and I just had a black screen. This was not really what I had intended. And there it stayed. Black forevermore. I punched Reset. This put me back at the SGD, which I had forgotten to take out of the CD drive. Reset again, sans CD. I got the normal GRUB menu and, when I chose Windows, I got the normal Error 13. I chose the normal Ubuntu and it booted. I inserted the Windows XP CD, rebooted from the CD, and chose Recovery Console. There, I ran FIXMBR and FIXBOOT and rebooted. Back at GRUB, I chose Windows again. I still got Error 13! One thread seemed to say that FIXMBR and FIXBOOT would have solved the problem, if the problem had been a corrupted NTLDR, whatever that was. So we knew, now, that that was not it. They also said I could type "geometry (hd0)" on the GRUB command line ("c") to get some information. They were right. I got an indication that the filesystem type for partition 0 was ext2fs, which was apparently not the same as ext3. Had GParted screwed this up? Had I accidentally indicated ext2 when I meant ext3? Or was GRUB reporting it wrong? A mystery. But, wait, why was hd0 a Linux partition at all? That was supposed to be where Windows was. I typed "geometry (hd1)." Just two partitions, filesystem type unknown. But I knew I had just two partitions on the Windows drive. Geometry reported the partition type was 0x7, which appeared to be shorthand for NTFS. One possible answer came from the advice to basically put the Windows section of menu.lst back the way it was. I booted into Ubuntu and went to menu.lst. There, I made it read as follows:
title Microsoft Windows XP Professional root (hd1,0) map (hd1) (hd0) map (hd0) (hd1) #savedefault makeactive chainloader +1
This worked. It seemed that the GRUB "geometry" command gave me more accurate information, for purposes of knowing how to edit GRUB's menu.lst, than I was able to get from GParted or whatever other sources I had been using. I tried rebooting with this setup. Selecting Windows at the GRUB menu gave me the option to restart Windows normally or choose Safe Mode. This was promising. And yes! The sucker went right on into WinXP. It gave me the option to uninstall UNetbootin, but I didn't take that option yet. First, did Ubuntu still boot at GRUB? I rebooted to find out. Yes, it did. Back to GRUB; back to Windows; uninstall UNetbootin. Problem solved. Morals of the story: use SGD and especially ASGD; use GRUB's geometry command for information; simplify the hard disk setup if possible. With this taken care of, it seemed I could get back to what I had been trying to do before this GRUB problem came up.