Showing posts with label backup. Show all posts
Showing posts with label backup. Show all posts

Tuesday, April 17, 2012

Windows 7: Missing System Restore Points

I was using Windows 7 with System Restore (Start > Run > SystemPropertiesProtection.exe) turned on for drive C only.  It was set to allocate 16GB of the drive to store System Restore points.  Yet it was storing only two such points.  Clicking "Show more restore points" did not increase the number. 

By way of background, I was not running a dual-boot system and had been making daily backups.  Recently, I had been making those backups manually by running "start "" SystemPropertiesProtection.exe" in a daily batch file, so that I could watch the situation.  (The purpose of the empty internal quotes ("") was to provide a null argument.  It probably wasn't necessary in this case; it seemed to be helpful sometimes.)  Previously, I had been using a VBS script triggered by a Task Scheduler entry.  I had also not rebooted recently.  According to Moo0, at this particular time my system had been up for nearly three days, but my only restore points were from within approximately the past 24 hours.

This problem seemed to have many possible causes.  Glancing through a list of potential fixes provided on a website oriented toward Windows XP, I saw a reference to Event Viewer (Start > Run > eventvwr.msc).  I was not very familiar with that, so I ran a search and found a suggestion to focus on VSS entries.  VSS stood for Volume Shadow Copy or Volume Snapshot Service.  The basic idea was that VSS would allow Windows to make a copy of a file even if that file was in use, which would be the situation for some Windows 7 system files while the system was running.  Apparently System Restore used VSS.  The suggestion was, in other words, to see what Event Viewer would say about VSS-related problems.  To do that, I went into Event Viewer > Filter Current Log > Event Sources > VSS > OK.  This gave me a list of items.  (Later -- see below -- I saw that I had taken incomplete notes here.)  I clicked on the Date and Time column heading to get the most recent ones.  These were all Information-level notices.  They were all the same:  "The VSS service is shutting down due to idle timeout."  This did not seem relevant to my problem.  VSS was running.  I was getting System Restore points.  They were just being deleted prematurely for some reason.

Another suggestion was to check Services (Start > Run > services.msc) to verify these settings:  Volume Shadow Copy = manual or automatic; Task Scheduler = automatic; Windows Backup = manual or automatic.  Again, these suggestions seemed irrelevant, since the restore points were being made.  In any case, they were set correctly on my system.  Along with those suggestions, though, was a worthier one:  go back into Event Viewer and look for System Restore entries with Event ID numbers 8194, 8195, 8196, or 8198.  Of these, the ones that seemed most likely to be relevant were 8195 (System Restore Deactivated) and 8198 (Restore Point Deleted).  A search focusing on 8198 did not turn up anything immediately obvious, so I shelved it for the moment.  In Event Viewer, I sorted by clicking on the Event ID column.  It took a while to sort.  It showed no errors anywhere near 8194 to 8198.

A seemingly related suggestion was to run Task Scheduler (Start > Run > taskschd.msc) > Task Scheduler Library (left pane) > Microsoft > Windows > System Restore > select an item in the top pane > History tab (middle pane).  There didn't seem to be a relevant item in the top pane, but I just clicked on something.  I saw that the History tab said "History (disabled)."  A search yielded the suggestion that I go into Task Scheduler's right pane > Enable All Tasks History.

One obvious suggestion was to make sure I had ample space on my drive for the System Restore points.  In Windows Explorer > right-click > Properties > General tab, drive C showed only 2.1GB free on an 80GB partition.  I was not sure whether that included what I had already allocated for System Restore points.  To figure this out, I wanted to see the size of the System Volume Information folder, which was where Windows 7 stored those points.  In Windows Explorer > right-click > Properties, System Volume Information was shown as having size zero.  To change that, I followed the suggestion to go into Properties > Security tab > Edit > Add > Administrators > Check Names > OK > Full Control > OK.  This gave me an error:

Error Applying Security

An error occurred while applying security information to:

C:\System Volume Information\WindowsImageBackup

Acesss is denied.
I got recurring messages like that.  At some point, I clicked Cancel.  I got a warning that I should Continue to avoid inconsistencies, but there was no option to Continue.  The box was now showing Full Control despite the error message.  It now reported that the System Volume Information folder had a size of 6.45 GB.  That roughly agreed with System Restore, which was reporting that my Current Usage was 6.10 GB.  I had made one or two more restore points; System Restore said I now had four.  I reduced the size allocated to System Restore from 16GB to 12GB and took another look at the Properties for drive C.  It still reported 2.1GB free.  So apparently the space allocated for System Restore was not marked as unavailable until it was actually used; the allocation was just a ceiling for System Restore, telling it when to start jettisoning old restore points.  As long as I remembered to exclude the System Volume Information folder from drive image backups (so as to prevent those backups from being unnecessarily inflated), there did not seem to be any particular drawback to allocating large amounts of space, just in case I would want to have access to historical backup points.  As a side note, it occurred to me that it might be possible to archive such points.  I could have experimented with making a ZIP copy of the contents of System Volume Information, to see if System Restore would work from a restored ZIP.

The next day, now that I had enlarged drive C, it showed plenty of space.  System Volume Information was back to showing zero bytes, so I repeated the steps above.  This time I went all the way through, clicking Continue until it stopped asking.  It only asked a half-dozen times.  Properties was still showing only about 6.6 GB allocated.  I still had only those restore points that had been made in the last 24 hours.  Disk space was not the issue.

It seemed that I must have overlooked something in Event Viewer.  I couldn't tell, from these notes, which part of Event Viewer I had examined the previous day.  Evidently I had selected something in the left pane in order to get the Filter Current Log option.  Maybe I had selected the wrong thing.  I saw, now, that the original suggestion was to click Windows Logs > Application in the left pane.  Filtering there for VSS still produced no errors numbered around 8194 to 8198.  Instead of filtering for VSS, I went to the top of the list and clicked "All Event Sources."  That produced nothing -- maybe it was still calculating -- so I changed the top line, "Logged," to "Last 7 days."  Sorting by the Event ID column, I saw an 8193 item (created restore point) but no indications that any restore points had been deleted.

I had rebooted in order to change the size of the partition.  Maybe restore points were being deleted during reboots?  I didn't plan to reboot during the next day or more, so I decided to let it sit another day and then take another look.  This time, I had the same restore points, now going back two days, plus some new ones.

Over the next several days, my restore points accumulated.  Apparently I had fixed something.  Rebooting may or may not have been removing them previously; if so, that was no longer happening.  At the time when I was writing these words, the system had been up for only about one day, but the available restore points stretched back almost a week.

So now I had an opportunity to distill a lesson from these remarks, because my secondary computer was doing the same thing.  It was keeping restore points going back only a day or two.  So what had I done to fix the problem?  I wasn't sure.  I went back through some of the foregoing steps.  First, I closed System Restore.  I went into System Volume Information > right-click > Properties > Security tab > Edit > Add > Administrators > Check Names > OK > Full Control > OK.  I clicked Continue through a half-dozen error messages.  I went back into System Restore (i.e., Start > Run > SystemPropertiesProtection.exe > select drive C > Configure > reserve 10GB > OK.  It was 6GB before; it was possible (but seemed unlikely) that that was the problem.

I let it go for a few days.  The problem was solved.  I was now getting restore points going back several days and surviving reboots.

Friday, March 30, 2012

A Backup Arrangement with Beyond Compare

I had been using Beyond Compare (BC) for a year or two.  Over that period, I had settled into what seemed like a decent backup arrangement.  This post describes that arrangement.

For a while, I had a spare internal partition to which I would make backups.  The original concept there was that I would use rsync or some other program to make backups on an hourly basis.  That setup had fallen into disrepair, mostly because I didn't quite like how it was working.  So I didn't have an hourly backup at this point.  The arrangement described here is more on the longer-term (e.g., daily, weekly, monthly) level.

My backup took place on external drives.  I had an external enclosure that I would have to open up (removing several screws) to swap drives, and I also had an inexpensive dock that I could just plug an internal SATA drive into.  Both seemed to work equally well.  The enclosure was handy for unplugging the drive and taking it with me.  Now and then -- especially when the tornado alarms sounded -- I visualized myself grabbing it and running for the basement.  I wondered if that would be one of those fateful delays that would cost me my life.

The external enclosure had an eSATA connector, but my previous motherboard had not been able to accommodate eSATA on a hot-swappable basis.  In other words, I had to reboot in order to get the system to recognize it.  It also had a USB connector.  The external dock (i.e., not the enclosure) was also a USB device.  USB was slower but very adaptable.  That was almost always what I used.  Some partitions on the external drive were compressed, to save space.  I had the impression that this did not help with the USB connection -- that the CPU would unpack the file before shipping it across the slow USB cable to the computer, resulting in at least as much data moving along the wire -- but I hadn't verified that.

For my purposes, Beyond Compare offered two key concepts.  First was the workspace.  If I plugged in the external drive that I used for daily backups, then I would open up the DAILY workspace in BC.  If I plugged in a drive that I used for weekly backups, then I would choose the WEEKLY workspace.  I also had a SIMPLE COMPARE workspace that I would use for random tasks -- say, comparing two folders on a one-shot basis.  And I had a NETBOOK workspace that I would use to synchronize my laptop.  GoodSync might have been better for that if I had been using the laptop frequently, but at this point it was mostly a case of keeping the data on the laptop current with the desktop.  That is, I was mostly doing one-way updates, from desktop to laptop.

My workspaces differed in the tabs they made available.  In the DAILY workspace, I had a tab for each day of the week, plus whatever other comparisons I would want BC to make on a daily basis.  Likewise for the WEEKLY and the other workspaces.  In other words, I used a workspace as a place where I would be able to see tabs for each comparison that I wanted BC to make, whenever I plugged in the weekly drive or the laptop or whatever.

I found that the best approach was to start BC first, let the workspace load, and only then turn on or connect the external USB drive.  That way, BC would not try to do complete comparisons for all of the open tabs.  It would do its calculations for the relevant folders on the drives inside the computer, which were already available to it, but on the external drive it would have to wait until I gave it the go-ahead within a particular tab.

Focusing on the DAILY workspace, I was writing these notes on a Friday.  So to guide my remarks, I opened BC at this point.  Somehow, I had arranged for the DAILY workspace to come up by default; or maybe BC just defaulted to the last open workspace.  I wasn't sure how I had arranged that.  When BC was up and running, I turned on the USB drive.  It took that drive a moment to become available.  (I found that AntiRun was useful, not only for protecting my system from autorun malware and such, but also for telling me when a drive really was online or offline, and for giving me a functional way of taking external drives offline.)

I went to the Friday tab.  BC had stalled because the Friday folder on the external drive had been unavailable.  I told it to retry; and now that the USB drive was connected, BC ran its comparison.  (Details on the kinds of comparisons available, and other program capabilities, are available at Beyond Compare's website.  Their forums and other tech support had been very responsive, the few times I had contacted them.)

I had modified my BC toolbar to present the red Mirror button.  This said, basically, just overwrite whatever is in the backup space (in this case, the Friday folder on the USB drive) with whatever is on drive D in the computer.  Drive D was the one that I backed up daily.  So in this case, a number of files had changed since the previous Friday.  Sometimes I would take a look at them; sometimes not.  Usually not.  It seemed pretty rare that a file would be accidentally deleted.  Daily examination of all changing files had seemed to be overkill.

When I say that I would take a look, I mean that BC showed me two panes, one for each of the folders being compared.  To keep things organized, the left-hand pane was almost always the authoritative one.  The left-hand pane would correspond, that is, to a partition inside the computer.  So I was looking at the right-hand pane, corresponding to the backup device.  If I saw a file listed in the right-hand pane, but not in the left-hand pane, that would mean that it was on the system when I made my backup a week ago, but now it was no longer on the system.  BC would also alert me, with a red font, if the file in the right-hand pane was newer.  Generally speaking, that wasn't supposed to happen.

I had an alternating weekly folder on this backup drive.  I used that one on Saturdays.  That's the one I examined more closely.  If I found that something was missing on Saturday, and I didn't think it should be missing, I could then click on the tabs for the other days of the week until I found the last backed-up version, and I could restore it from there.

Drive D contained the things that were in more active use.  I also had a separate partition, drive E, for things that took up a lot of space and didn't change very often.  Videos were the main example.  Because there were so few changes there, it was easier to look at the differences identified in BC, and verify that additions and deletions were desired.

In net terms, I liked this arrangement because it gave me some flexibility to combine automated and manual processes.  I wasn't vulnerable to one of those black-box backup solutions that would seem like they were working just fine until the moment of crisis, when I would painfully discover that I had failed to adjust some essential setting, or that the drive was malfunctioning, or whatever.

In this arrangement, if I was worried that files were missing, I could look down through lists of what was being added and deleted.  If I was confident that everything was fine, I could just click the Mirror button and the backup would happen.  I could also combine both approaches within a single tab, by telling BC to mirror only the selected folder(s).  This would gradually reduce the number of things remaining on the screen (assuming I had BC set, as usual, to Show Differences rather than Show All).  When confronted with what looked like a mess, I could thus eliminate the parts that seemed OK, and focus on the files and folders that didn't seem like they should have been getting added or deleted.

Like most other computer-related matters, my backup approach continued to evolve.  But as I say, I had been using BC for a while, at this point, and I was pretty much satisfied with the combination outlined here.

Saturday, May 21, 2011

Windows 7: EUBKMON.SYS Error: Driver Unloaded Without Cancelling Pending Operations

I was using Windows 7 Home Premium.  I tried to reboot into Safe Mode.  I got an error message:

A problem has been detected and Windows has been shut down to prevent damage to your computer.
EUBKMON.SYS
DRIVER_UNLOADED_WITHOUT_CANCELLING_PENDING_OPERATIONS
I did a search and got the advice to put the driver (in this case, EUBKMON.SYS) out of action.  The mission was to rename it to be EUBKMON.OLD, so as to keep the file (just in case) but to prevent it from being used.  How to achieve that mission?  The advice in that case involved Windows XP, so they were recommending using the installation CD to get to a recovery prompt and rename it that way.  Another possibility would be to boot Ubuntu or some other Linux variant, or perhaps something like BartPE, and use that to rename EUBKMON.SYS.  Since the machine was willing to boot into Windows 7 Normal Mode, I started with that.  I found EUBKMON.SYS in C:\Windows\System32\drivers.  I was able to rename it in Windows Explorer.  I was not sure whether I would have been able to do so if I had not previously taken ownership of that folder.  While I was there, having just rebooted the computer, I got a dialog:
Windows has recovered from an unexpected shutdown.
Windows can check online for a solution to the problem.
I went with that, but after I clicked on it, it disappeared.  Maybe it found and installed a solution; not sure.  I also went into Control Panel > Windows Updates.  There was only one update, an optional one for Microsoft Security Essentials.  I installed that.  I also ran Glary Registry Repair.  Then I rebooted into Normal Mode, just to see what would happen with EUBKMON.SYS.  After I told Windows to reboot, I noticed that the shutdown screen said "Waiting for EuWatch.  A backup schedule is running!"  That "EuWatch" part got my attention:  it seemed potentially related to EUBKMON.SYS, and the "backup schedule" note reminded me that I had just installed Backup Maker and then had uninstalled it and installed Easeus Todo Backup in its place.  The hard drive was spinning, so I let the thing run; apparently Easeus was in the process of doing a backup before shutdown, for some reason.  A sourceDaddy webpage said that a message like my DRIVER_UNLOADED message (above) could be due to a faulty driver.  While I was waiting, I ran a search for EUBKMON.SYS but got no insight.  The search turned up only eight hits, so it seemed this driver was not a part of Windows 7 itself, adding to the sense that perhaps the problem was caused by one of those two backup programs.  By this time, the computer was prepared to reboot.  It ended up at a black and white screen:
Windows Error Recovery
Windows failed to start.  A recent hardware or software change might be the cause.
It wanted to Launch Startup Repair, so I went with that, but it wanted a Windows installation disc, and I didn't have one.  (I was doing this on an ASUS Eee PC, with Win7 factory-installed.)  The error message status was 0xc000000e and the "Info" statement was, "The boot selection failed because a required device is inaccessible."  I tried Ctrl-Alt-Del > Start Windows Normally.  But that failed.  I was back at Windows Error Recovery, except the top line was now "Windows Boot Manager."  I ran a search and decided to try Safe Mode (hitting Ctrl-Alt-Del to reboot, and then hitting immediately and repeatedly hitting F8).  That didn't work either, but this time it just went back to Windows Error Recovery; no BSOD.  I wondered if the inability to get back into Win7 was due to the renaming of EUBKMON.SYS to be EUBKMON.OLD.  I thought about re-renaming EUBKMON.OLD to be EUBKMON.SYS again; but if I had a bad driver, that wouldn't solve the problem.

I thought maybe I should copy EUBKMON.SYS over from another computer where I had also installed Easeus Todo Backup.  First, though, I thought I'd better just test whether that was the issue.  To do this, I would need a bootable USB drive of some sort.  I have addressed that issue in another post.  Basically, I used XBoot to boot Ubuntu from the USB drive, and then went into C:\Windows\System32\drivers and renamed EUBKMON.OLD back to EUBKMON.SYS.  I shut down Ubuntu, yanked out the USB drive, and tried to boot Windows -- with, of course, the need to adjust the BIOS settings first.  Sure enough, Windows now booted.  The EUBKMON.SYS file was the whole issue.  Instead of replacing it with a working one from the desktop computer, I decided I didn't really like the idea of having my whole system rendered nonworking because one file for one backup utility was not quite right.  There were other backup alternatives. 

So I uninstalled Easeus Todo Backup.  It wouldn't uninstall easily from Programs and Features; I had to hit Ctrl-Alt-Del, go into Task Manager > Processes tab, and kill EuWatch.exe.  But no, that wasn't enough:  I got the same message and had to go back into Task Manager, where a careful look at the Description column showed me that I would also have to kill Agent.exe, TbService.exe, and TrayNotify.exe.  So now I was able to uninstall Easeus Todo Manager.  I tried to send them feedback on why I was doing that, but their feedback agent required email to be set up on this computer, and Windows Live Mail 2011 was taking a long time to start.  Eventually I just scrapped that.  Anyway, with Easeus uninstalled, after a reboot, I took another look at C:\Windows\System32\drivers.  EUBKMON.SYS was still there.  I renamed it to EUBKMON.OLD again and tried rebooting.  The system started.  So without Easeus Todo Backup, EUBKMON.SYS was apparently no longer essential, but the Easeus uninstallation process did not remove EUBKMON.SYS or the other EU*.* files in C:\Windows\System32\drivers (i.e., eubakup.sys, eudisk.sys, eudskacs.sys, and eufs.sys).  Possibly this situation would have been different if I had used Revo Uninstaller.  I wasn't sure if I could safely delete those other files, so I left them.

So now, with all that sorted out, I tried again to boot into Safe Mode.  It worked.  Problem solved.

Thursday, April 21, 2011

A Two-Computers-Per-User Desktop Arrangement

I was spending a lot of time at my desk, doing word processing and other typical desktop work.  For this purpose I was using a customized Windows 7 installation on two networked computers for maximum productivity.  This post describes that setup.

I had previously thought that, ideally, I would have four computers:  one laptop; one test machine to hook up the occasional hard drive or other component for wiping, testing, etc.; and two desktop machines running side-by-side.  Since then, however, I had switched from Ubuntu back to Windows and had found this to be a good move.  So now I was doing very little testing and tinkering with hardware.  Therefore, I dismantled and sold the parts from the fourth computer.

With almost all of my work happening on just two computers, and with a stable Win7 installation on each, the focus now was on getting the most out of them.  I was using two desktop computers instead of one because there were still many occasions when a computer would experience downtime.  I would be doing drive maintenance or imaging, or would still have to reboot Windows now and then to clear its head or to complete a program installation or upgrade, or Win7 would be running just fine but there would be some scanning or something else going on that would tend to monopolize the machine for practical purposes.  I was not yet very impressed with multiple desktop software and was considering a return to VMware or some other virtual machine software, perhaps in a virtual appliance, though I wasn't sure I wanted to get back into the performance issues that had prompted me to try to use a native and/or bootable virtual hard disk or RAID array to improve the really bad performance I had started getting in VMware.  So the second computer was also useful as a simple way of having a pretty solid alternate desktop.  I could start up a project or leave a set of folders open there and just visit it occasionally, when the primary computer was doing its own maintenance or was otherwise unavailable for a while.

The starting point for this two-computer arrangement was to set up two machines that were almost identical in terms of hardware and software.  In previous years, I had thought it was best to have dissimilar machines, so as to maximize resources.  One machine or the other would have the right hardware or software to deal with almost any kind of system problem.  That belief was probably justified for some purposes.  Now, however, I was less patient with that, and it also seemed less necessary.  A lot of the old problems had gone away.  Meanwhile, it was much easier to learn how to maintain and troubleshoot just one set of problems, rather than have to learn the whys and wherefores of divergent sets of hardware and software.  For purposes of getting my work done, Windows 7 was a significant improvement over operating systems I had used previously, including Windows XP and Ubuntu 10.10, in terms of networking and other capabilities.

The customized Win7 installation (see link above) was not as easy as a canned, plain-vanilla installation, but once I had it set up, it had some advantages.  One important step was to make my work files available on both computers.  My first attempt in this regard was to use a Synology network-attached storage (NAS) unit as a simplified file server, but that hadn't worked so well for me.  In the second attempt, I used my home network (basically, just a router and cables to the two computers, though possibly a crossover cable would have sufficed even without the router).  After some contemplation, I went with GoodSync to keep the two computers directly synchronized with one another.  This was an important development.  When combined with appropriate program settings (e.g., setting Microsoft Word to AutoRecover files every minute), it meant that, if the computer I was working on suddenly crashed or otherwise became unavailable, I could usually switch over to the other machine and pick up right where I left off.

I used GoodSync to synchronize my data partition (drive D), not the program partition (drive C).  I also used it to synchronize parts of the INSTALL partition, including particularly the funky but advantageous shared Start menu.  GoodSync did not need to be running on both computers, so I installed it on computer A.  As the installation evolved, I found that computer A was handling most of my computer maintenance and other functions, while I did more of my moment-by-moment productivity stuff on computer B.  In particular, computer A was becoming my backup hub.  I would save a file on computer B; GoodSync would copy it to computer A; and then my backup software would copy it to other drives.  After a variety of unpleasant backup surprises, I had evolved to two distinct backup systems running on computer A.  In the first backup system, I was using Robocopy, as part of my customized installation (above), to make frequent, incremental backups to a separate partition on computer A.  This was one of the few regards in which computer A differed from computer B in terms of hardware:  it had three hard drives rather than two, so as to speed this internal copying (since it was faster to copy from one hard drive to another, rather than between partitions on the same drive) and make it safer (since a failure of one drive would usually not affect the other).  In the second backup system, I was using Beyond Compare to do daily manual backups to an external drive that I could carry or store offsite as needed.  These were manual in the sense that I had to click things to make them happen, and could therefore examine or at least spot-check what was going to be changed, if I wanted to.

Again, I could still use either computer to do my work, since they both had the same synchronized files and nearly identical software installations.  Nonetheless, as the functions of the two computers diverged, I found that I was not really utilizing both monitors most of the time.  On computer B, I tended to be opening PDFs, Word docs, Excel spreadsheets, Windows Explorer sessions, and webpages, among which I would copy text, links, and other materials.  I could open some of that stuff on computer A instead, but it was cumbersome to have this happening on two different computers, and for the most part it actually was not happening on computer A.  That computer, and its associated monitor, were mostly just sitting there, working up a file comparison in Beyond Compare or otherwise doing things that did not really need to be watched constantly.

What I really wanted was to make monitor A available for computer A, when I wanted to see what was happening on computer A, but to have monitor A also available for computer B, when I was doing my ongoing work on computer B.  This called for a keyboard-video-mouse (KVM) switch.  The PS/2 type of KVM was better for purposes of providing keyboard and mouse input during BIOS setup and in programs that would boot from a CD (e.g., Acronis Drive Image) and would therefore be at least partly unresponsive to a USB mouse and/or keyboard.  Unfortunately, I did not realize that the type of motherboard I had installed in both computers did not have two PS/2 ports, so I had to use a USB KVM.  It also seemed that I might have to spring for a more expensive DVI-compatible KVM, since I'd gotten some poor video performance when I had connected the monitor to the computer using the older D-Sub rather than the newer DVI kind of cable.  In recent months I had been using the KVM only for the keyboard, while leaving each monitor dedicated to one computer and experimenting with having a separate mouse for each computer, so that I could click without having to transfer keyboard (and, optionally, monitor) focus between computers.  It had lately occurred to me, though, that the D-Sub video quality problems might just be due to the quality of the video circuits on the motherboard.  So at this point I was planning to get a dedicated video card for each computer and see whether its D-Sub connection would work acceptably, in which case I could use the USB/D-Sub KVM for the keyboard and for D-Sub video with monitor A.  In other words, monitor B would continue to be dedicated to computer B, but monitor A would run to the KVM and could thus toggle between computers A and B.

This left the problem that, as I had discovered, when I was not seeing events on computer A, I tended not to use that computer.  That was not terrible -- it would still be there as a running backup, ready to jump into service when I needed it, unless it hibernated itself in the meantime -- but experience suggested that, if I could not just glance to see what was happening on computer A, I would tend not to toggle over there on the KVM and take a peek.  I thought of two solutions to this.  One was to set up a reminder that would prompt me, every hour or two, to interrupt what I was doing on computer B, toggle the KVM, and look at events on computer A as displayed on monitor A.  I suspected I might tend to disregard that kind of reminder, but I decided to give it a try.  An alternative was to get a small, dedicated monitor that would just always be displaying events on computer A, though I realized its tiny resolution would not very well display all the stuff that would tend to appear on my widescreen monitor A.  It looked like I could get a monochrome 10-inch Miracle Business MT209A CRT on eBay for $25 including shipping, but I didn't want the clutter or the extra power consumption.  What seemed like a more practical option was rather to go with a full-sized monitor dedicated to computer A.

That's where this matter rested for the time being.

Saturday, January 22, 2011

Windows 7: RAID or Mirror Across Computers?

Where to put the data ... hmm.  I had a home network with two computers running Windows 7.  If the data I needed to work with was on one computer and it went down or had one of those frequent Microsoft maintenance or service interruption needs, I couldn't get to it from the other computer.  But if I put the data on a server, then (a) I had to buy and maintain the server, cables, routers, etc., (b) I had slower access times, (c) the data would then be unavailable to *both* machines (unless I wanted to swap out one or more hard drives) if the server went down, and (d) I had found that, if I accidentally corrupted or deleted the wrong file, a server might not be willing to undelete it.  Not a big deal, assuming you had good backup, but there were painful exceptions.

So it occurred to me:  can you put the data on one computer, so that it can function as a standalone, and also put the data on the other computer, so that it is a standalone too, but then have a constant RAID or mirror arrangement between the computers, so that whatever you do with the data on one computer is immediately duplicated on the other computer?  That way, you've got local speed, no server, and redundancy during downtime on either machine.  Basically, two-way mirroring:  when a file is modified, it checks the other computer, and the two of them figure out which version is newer, and it overwrites the older version on the other machine as well.  All you need is a router, if that.

I figured possibly everybody else already knew the answer to this.  But since I didn't, I started with a search.  Only six hits.  It looked like the concept of "RAID between computers" was a nonstarter.  Alright, a different search.  Wow, "mirror between computers" produced 13 hits.  But, OK, not to complain, it seemed most of those hits were for TreeNetCopy.  Take it out of the equation, and the search produced only five hits.  So TreeNetCopy seemed to provide the path forward.  But it didn't look like CNET, PCMag, or other big-name sites had reviewed it.  I went to the product's home website and found out why:  it was for systems using Windows NT or Windows 2000.

Apparently mirror and RAID were not the concepts I wanted.  How about incremental backup?  You couldn't have it running constantly; it would have to finish one scan of the system's files before it could start on the next one.  So maybe you'd set it to run every 15 or 30 minutes, or however a scan would take to finish, across the network connection.  This wouldn't be nearly as good as software that would detect and propagate changes as soon as they were made, but I wasn't seeing how to find anything like that.  With a 15- or 30-minute delay, you couldn't have someone being able to open the file on computer B as soon as someone else updated it on computer A, unless possibly if you had a script that would somehow be able to run the incremental backup manually for a given folder by just maybe choosing a right-click context menu option.

Alright, a different approach.  I had been using Beyond Compare, a file comparison tool.  It still looked like one of the more capable file comparison tools, so how about using it?  As I was thinking about that possibility, I realized that I didn't like the idea of having to do a right-click or other manual update.  The computer could crash before I got around to that, and then I wouldn't have the current data in the parallel folder on the other computer, and therefore really couldn't just keep right on working where I left off.  I had only used Beyond Compare as a manual comparison tool, where I would start it up and it would run for a while and compare directories and then show me what needed to be mirrored to my backup drive, and then I would click the buttons necessary to do that.  I knew it was possible to write scripts to automate some of this, but it seemed unlikely that scripts would help Beyond Compare remain up-to-the-minute on all of the file changes made on the system.  Most likely, I could set up scripts to run in some frequently used folders, and maybe even to automate the mirroring of those folders, but other folders would be left out in the cold.  Possibly I could have multiple scripts doing comparisons of more- and less-frequently used folders on different schedules, so as to increase the likelihood that most folders would be mirrored relatively often.  But with enough scripts running simultaneous file comparisons, I'd start to take a performance hit.

Lacking a better option, I did a search to learn more about Beyond Compare scripts.  The search came up with a number of interesting concepts, right there among the top ten hits.  One was the concept of automated synchronization.  Duh!  Of course.  Synchronization was the Windows term for what I wanted.  So I did a search for that, dropping Beyond Compare for the moment.  But the only thing that came of it was the discovery of Super Flexible File Synchronizer, which cost $60 for a two-year license (unless, for some bizarre reason, I would think that I could do without the pro version's ability to copy ZIP files!).  It did look like it might have some advantages over Beyond Compare, such as the ability to detect that I had moved a folder, so that it could just repeat the move rather than delete the folder from one location and create it in another (which might involve a lot of copying, if it was a large folder).  It had very good ratings on CNET.  I could download and try it out free for 30 days.  But it was ultimately still a backup program, running on a schedule, not a mirroring program, so I was still basically working with the same scenario:  design a set of backup scripts, profiles, or whatever, and set them to run at different frequencies, backing up what I would consider the most heavily used folders most frequently.

TopTenReviews ranked Super Flexible File Synchronizer eighth in its list of sync programs.  Their comparison page had a number of relevant criteria, including the ability to do bidirectional sync, to mirror files, and to operate across a network.  It actually looked like their number two program, GoodSync ($19.95), had better features for my purposes than their number one choice, Syncables 360 ($39.95).  Their review of GoodSync made it sound good indeed.  They said it couldn't sync or merge Microsoft Outlook files, which was OK because I was using Thunderbird.  (Later, I encountered a review by a user who said s/he was using it for this purpose, so I assumed they had updated the program.)  They said that working over networks could be complicated.  I wasn't sure if they meant that as a generic remark that would be relevant to all kinds of work over networks.  They seemed to rank it number two rather than one because it "lacks some of the advanced features professional users expect."  CNET's review likewise ranked it number two, but in the category of "file management."  I wasn't sure what they considered the number one program; their webpage didn't indicate which criterion they used for that ranking.  But GoodSync was the most frequently downloaded program during the prior week.  GoodSync's awards webpage mostly listed awards and positive reviews that were at least a couple of years old.  So apparently it had been created and was now coasting.  I did a search among its many reviews on CNET, looking for more info about using it on a network.  Unfortunately, CNET's links to specific reviews weren't working for me at that point, so I wasn't able to get details, but what I was able to read from the summary results was positive with the exception of one person for whom GoodSync did not work well.

There were hardware options.  SyncSharp offered a device that would synchronize via USB.  It sounded similar to The Tornado and to the Windows 7 Easy Transfer option.  I didn't want an additional device, and since ethernet was faster and was already in place, I didn't want to use USB.  For purposes of speed and also capacity, not to mention reducing dependency on external data sources, I was obviously not going to be interested in a cloud (i.e., web-based) solution, even if I had found one that offered constant, continuous, real-time synchronization.

It turned out that CNET had another category, for "data transfer & sync software."  As with some other CNET searches, I looked at the top 30 both in terms of downloads last week and user ratings.  Setting aside those that were for special purposes (e.g., Blackberry, Outlook), and focusing on those that were for Windows 7, I found that only two were free:  CopyTo Synchronizer, which had only two votes and which I therefore deemed insufficiently tested, and Microsoft Live Mesh, which had only one vote but which I was willing to assume was better developed.  The nonfree alternatives that came up in this search included BeyondSync, ViceVersa Pro, Easy Computer Sync, and Syncables 360 Premium.  I reduced this set to Live Mesh, Beyond Sync, and ViceVersa.  A search for further information on Live Mesh suggested it was web-based, more like DropBox, leaving me to focus on the other two.  A search led back to a TopTenReviews comparison -- it may have been the same one as before, but a couple of weeks had passed by thie time -- naming GoodSync (above) as No. 2, ViceVersa as No. 5, and BeyondSync as No. 10.  Of the ten, the ones offering bidirectional sync, network synchronizing, and Windows 7 support were these three plus Syncables 360, SugarSync, Laplink, and Super Flexible.  Most of the same names also appeared in a CEOWorld review.  I eliminated SugarSync as another cloud solution.  The TopTen review for Beyond Sync made it sound unappealing.  A dotTech review echoed that. 

I ran a search looking for comparisons of Syncables against the others that also sounded good for bidirectional synchronization.  A couple of reviews alerted me to the feature, evidently present in GoodSync but not all others, of being able to see which changes would overwrite.  I began to get a sense that Syncables was more of a glossy product, designed for people who wanted simple and trouble-free synchronization without necessarily having an option to scrutinize every step of what was happening.  Having been burned by the occasional backup program that would not save (and would also not tell me that it was not saving) files of a certain kind, or nested too deep, or had an umlauted character in their filenames, or were otherwise secretly exempt from what I thought was happening, I had become more inclined to use transparent software.  At least until I gained a lot of trust and experience with a program, I wanted to see what it was doing.  So this feature of GoodSync appealed to me.  I noticed, also, that a review described Liuxz Sync as being "of most use to users that need to carryout real time synchronizations over a network or between hard drives."  As I continued to look at other opinions, ViceVersa still sounded relatively good too.

On this basis, I decided to start with GoodSync, as described in a separate post.  After some days of using it, and comparing its results against an external backup drive via manual comparisons using Beyond Compare (as described in more detail in that other post), I concluded that GoodSync was a good product for this purpose.  I set its sync rules so as to check most frequently those partitions in which I was most likely to make changes.  For practical purposes, I could change files on one computer and I would see those changes on the other computer when I went looking for them.

In short, I wound up using GoodSync to synchronize files on two computers on a home network.  The files were generally available on the other computer within minutes.  I did this without using a server.  That is, the files were available locally on each computer, so that I could keep right on working if the other one went down.  I arranged backup via external drive, and I occasionally checked that external drive against the internal drive manually using Beyond Compare.  I had better performance than on a network, and was not very vulnerable to network problems; presumably I could have set up the same arrangement via crossover cable, without even having a router.  This really felt like a solution that I had been seeking for years.

Thursday, December 31, 2009

Ubuntu 9.04: Backing Up and Copying Webpages and Websites

As described in a previous post, I had been using rsync to make backups of various files.  This strategy was not working so well in the case of webpages and websites, or at least I wasn't finding much guidance that I could understand.  (Incidentally, I had also tried the Windows program HTTrack Website Copier, but had found it to be complicated and frustrating.  It seemed to want either to download the entire Internet or nothing at all.)

The immediate need driving this investigation was that I wanted to know how to back up a blog.  I used the blog on which I am posting this note as my test bed.

Eventually, I discovered that maybe what I needed to use was wget, not rsync.  The wget manual seemed thorough if a bit longwinded and complex, so I tried the Wikipedia entry.  That, and another source, gave me the parts of the command I used first:

wget -r -l1 -np -A.html -N -w5 http://raywoodcockslatest.blogspot.com/search?max-results=1000 --directory-prefix=/media/Partition1/BlogBackup1

The parts of this wget command have the following meanings:

  • -r means that wget should recurse, i.e., it should go through the starting folder and all folders beneath it (e.g., www.website.com/topfolder and also www.website.com/topfolder/sub1 and sub2 and sub3 . . .)
  • -l1 (that's an L-one) means stay at level number one.  That is, don't download linked pages.
  • -np means "no parent" (i.e., stay at this level or below; don't go up to the parent directory)
  • -A.html means Accept only files with this extension (i.e., only .html files)
  • -N is short for Newer (i.e., only download files that are newer than what's already been downloaded).  In other words, it turns on timestamping
  • -w5 means wait five seconds between files.  This is because large downloads can overload the servers you are downloading from, in which case an irritated administrator may penalize you
  • The URL shown in this command is the URL of this very blog, plus the additional information needed to download all of my posts in one html file.  But it didn't work that way.  What I got, with this command, was each of the posts as a separate html file, which is what I preferred anyway
  • --directory-prefix indicates where I want to put the download.  If you don't use this option, everything will go into the folder where wget is running from.  I came across a couple of suggestions on what to do if your path has spaces in it, but I hadn't gotten that far yet

Incidentally, I also ran across another possibility that I didn't intend to use now, but that seemed potentially useful for the future.  Someone asked if there was a way to save each file with a unique name, so that every time  you run the wget script, you get the current state of the webpage.  One answer involved using mktemp.  Also, it seemed potentially useful to know that I could download all of the .jpg files from a webpage by using something like this:  wget -e robots=off -r -l1 --no-parent -A.jpg http://www.server.com/dir/

The first download was pretty good, but I had learned some more things in the meantime, and had some questions, so I decided to try again.  Here's the script I used for my second try:
wget -A.html --level=1 -N -np -p -r -w5 http://raywoodcockslatest.blogspot.com --directory-prefix=/media/Partition1/BlogBackup2

This time, I arranged the options (or at least the short ones) in alphabetical order.  The -p option indicated that images and style sheets would be downloaded too.  I wasn't sure I needed this -- the basic html pages looked pretty good in my download as they were -- but I thought it might be interesting to see how much larger that kind of download would be.  I used a shorter version of the source URL and I designated a different output directory.

I could have added -k (long form:  --convert-links) so that the links among the downloaded html pages would be modified to refer to the other downloaded pages, not to the webpage where I had downloaded them from; but then I decided that the purpose of the download was to give me a backup, not a local copy with full functionality; that is, I wanted the links to work properly when posted as webpages online, not necessarily when backed up on my hard drive.  I used the long form for the "level" option, just to make things clearer.  Likewise, with a bit of learning, I decided against using the -erobots=off option.  There were probably a million other options I could have considered, in the long description of wget in the official manual, but these were the ones that others seemed to mention most.

The results of this second try were mixed.  For one thing, I was getting a lot of messages of this form:

2010-01-01 01:43:03 (137 KB/s) - `/[target directory]/index.html?widgetType=BlogArchive&widgetId=BlogArchive1&action=toggle&dir=open&toggle=MONTHLY-1196485200000&toggleopen=MONTHLY-1259643600000' saved [70188]

Removing /[target directory]/index.html?widgetType=BlogArchive&widgetId=BlogArchive1&action=toggle&dir=open&toggle=MONTHLY-1196485200000&toggleopen=MONTHLY-1259643600000 since it should be rejected.

I didn't know what this meant, or why I hadn't gotten these kinds of messages when I ran the first version of the command (above).  It didn't seem likely that the mere rearrangement of options on the wget command line would be responsible.  To find out, I put it out of its misery (i.e., I ran "pkill wget" in a separate Terminal session) and took a closer look.

Things got a little confused at this point.  Blame it on the late hour.  I thought, for a moment, that I had found the answer.  A quick glance at the first forum that came up in response to my search led me to recognize that, of course, my command was contradictory:  it told wget to download style sheets (-p), but it also said that only html files would be accepted (-A.html).  But then, unless I muddled it somehow, it appeared that, in fact, I had not included the -p option after all.  I tried re-running version 2 of the command (above), this time definitely excluding the -p option.  And no, that wasn't it; I still got those same funky messages (above) about removing index.html.  So the -p option was not the culprit.

I tried again.  This time, I reverted to using exactly the command I had used in the first try (above), changing only the output directory.  Oh, and somewhere in this process, I shortened the target URL.  This gave me none of those funky messages.  So it seemed that the order of options on the command line did matter, and that the order used in the first version (above) was superior to that in the second version.  To sum up, then, the command that worked best for me, for purposes of backing up my Blogger.com (blogspot) blog, was this:

wget -r -l1 -np -A.html -N -w5 http://raywoodcockslatest.blogspot.com --directory-prefix=/media/Partition1/BlogBackup1

Since there are other blog hosts out there, I wanted to see if exactly the same approach would work elsewhere.  I also had a WordPress blog.  I tried the first version of the wget command (above), changing only the source URL and target folder, as follows:

wget -r -l1 -np -A.html -N -w5 http://raywoodcock.wordpress.com/ --directory-prefix=/media/Partition1/WordPressBackup

This did not work too well.  The script repeatedly produced messages saying "Last-modified header missing -- time-stamps turned off," so then wget would download the page again.  As far as I could tell from the pages I examined in a search, there was no way around this; apparently WordPress did not maintain time stamps.

The other problem was that it did not download all of the pages.  It would download only one index.html file for each month.  That index.html file would contain an actual post, which was good, but what about all the other posts from that month?  I modified the command to specify the year and month (e.g., http://raywoodcock.wordpress.com/2009/03/).  This worked.  Now the index.html file at the top of the subtree (e.g., http://raywoodcock.wordpress.com/2009/03/index.html) would display all of the posts from that month, and beneath it (in e.g., .../2009/03/01) I had named subfolders for each post, each of which contained the index.html file displaying that particular post.  So at this rate, I would have to write wget lines for each month in which I had posted blog entries.  But then I found that removing the -A.html option solved the problem.  But if I ran it at the year level, it worked only for some months, and skipped others.  I tried what appeared to be the suggestion of running it twice at the year level (i.e., at .../wordpress.com/ with an ending slash), with --save-cookies=cookies.txt --load-cookies=cookies.txt --keep-session-cookies.  That didn't seem to make a difference.  So the best I could do with a WordPress blog, at this point, was to enter separate wget commands for each month, like this:

wget -r -l1 -np -N -A.html -w5 http://raywoodcock.wordpress.com/2009/01 --directory-prefix=/media/Partition1/WordPressBackup

I added back the -A.html option, as shown, because it didn't seem to hurt anything; html pages were the only ones that had been downloaded anyway.

Since these monthly commands would re-download everything, I would run the older ones only occasionally, to pick up the infrequent revision of an older post.  I created a bunch of these, for the past and also for some months into the future.  I put the historical ones in a script called backup-hist.sh, which I planned to run only occasionally, and I put the current and future ones into my backup-day.sh, to run daily.

But, ah, not so fast.  When I tried this on another, unrelated WordPress blog, it did not consistently download all posts for each month.  I also noticed that it duplicated some posts, in the sense that the higher-level (e.g., month-level) index.html file seemed to contain everything that would appear on the month-level webpage on WordPress.  So, for example, if you had your WordPress blog set up to show a maximum of three posts per page, this higher-level webpage would show all three of those.  The pages looked good; it was just that I was not sure how I would use this mix in an effective backup-and-restore operation.  This raised the question for my own blog:  if I ever did have to restore my blog, was I going to examine the HTML for each webpage manually, to re-post only those portions of text and code that belonged on a given blog page?

I decided to combine approaches.  First, since it was yearend, I made a special-case backup of all posts in each blog.  I did this by setting the blogs to display 999 posts on one page, and then printed that page as a yearend backup PDF.  Second, I noticed that rerunning these scripts seemed to catch additional posts on the subsquent passes.  So instead of separating the current and historical posts, I decided to stay with the original idea of running one command to download each WordPress post.  I would hope that this got most of them, and for any that fell through the crack, I would refer to the most recent PDF-style copy of the posts.  The command I decided to use for this purpose was of this form:

wget -r -l1 -np -N -A.html -w5 [URL] --directory-prefix=/media/Backups/Blogs/WordPress

I had recently started one other blog.  This one was on Livejournal.com.  I tried the following command with that:

wget -r -l1 -np -N -A.html -w5 http://rwclippings.livejournal.com/ --directory-prefix=/media/Backups/Blogs//LiveJournal

This was as far as I was able to get into this process at this point.

Tuesday, December 29, 2009

Basic Ubuntu (Bash) Shell Scripts

In Ubuntu 9.04, I wanted to write a script that would execute an rsync command, so that I could put a brief reference to the script into my crontab file, instead of putting the whole long rsync command there.

For a brief, tiny moment, I was almost tempted to consider learning the GAMBAS (Gambas Almost Means BASIC) programming language, just because (pre-Visual) BASIC was the only programming language I ever learned.  Instead, I moved toward basic instructions on writing a shell script.  Here's what I wrote:

#!/bin/bash
# This is backup-hour.sh
# It backs up CURRENT to CURRBACKUP every few hours
rsync -qhlEtrip --progress --delete-after --ignore-errors --force --exclude=/.Trash-1000/ --exclude=/lost+found/ /media/CURRENT/ /media/CURRBACKUP

As the instructions said, the first line was essential to tell the computer to use BASH to interpret the following lines.  The next two lines were comments, and the final line (wrapping over onto multiple lines here) was exactly the rsync line I'd been using to do the backup.  In other words, learning how to write the command was almost all I needed to write the script.

The next step was to save it somewhere.  I had previously heard, and the instructions said, that the common place to put it is in your bin folder.  I went with that, but made a note that I would need to be sure to back up my bin folder, because I didn't want to go to the trouble of writing all these scripts and then see them vanish.

The usual location for the user's bin folder is at /home/[username]/bin.  In my case, that's /home/ray/bin.  Getting there in Nautilus (i.e., Ubuntu's File Browser, also started by typing "nautilus" in Terminal) can be confusing:  you can get there via the Home Folder option (assuming you're showing Tree rather than Places or something else (or nothing) at the left side of the file browser), or you can go to File System/home/[username]/bin.  Same thing, either way.  So I saved the script (above) in my bin folder as backup-hour.sh.  That turned the comment lines (beginning with #) blue in gedit.

Next, the instructions said, I needed to set permissions.  This was a confusing aspect of Ubuntu.  The documentation seemed to say that there were ten permissions that a person could set.  These were represented by ten hyphens or minus signs:  ----------.  I couldn't tell what the first one was for, but the remaining nine were divided into three sets of three.  The first three belonged to the owner, the second three to the group, and the last three to "other."  Within each set of three, the first one was for read, the second was for write, and the third was for execute (i.e., run it as a program).  So if you set the owner's permissions to read (r), write (w), and execute (x), your ten hyphens would now change to this:  -rwx------.  If you set all three parties (i.e., owner, group, and other) the same, they would look like this:  -rwxrwxrwx.

You could set the permissions using the chmod command.  I found an Ubuntu manual page on chmod.  It was not really that complicated, but it looked like it was going to require a time investment to make sure I had it right, and at this point I was getting impatient.  The basic idea seemed to be that you could use chmod to enter values of 4 (for read permission), 2 (for write permission), and/or 1 (for execute permission).  So, for example, you could type "chmod 755" and that would give a value of 7 to the first of the three users mentioned above (i.e., the owner), a value of 5 to the second of the three (i.e., the group), and a value of five to the third of the three (i.e., other).  The 7 would mean that you gave read + write + execute (4 + 2 + 1) permissions to the owner, whereas the 5 would mean that you gave only read + execute (4 + 1) permissions to the rest.  Since that's what the instructions suggested, I went with that.  To set the script with those permissions, I typed "chmod 755 backup-hour.sh."

I wasn't too sure of who the owner was (i.e., me or root), not to mention the group or other.  I mean, this was for my home computer.  Not a lot of people milling around, waiting to take my hard drive for a spin.  These kinds of options seemed to be set up for networked computers, where the "accounting" department might be a group that would own a file.  I found what looked like a good tutorial on file owners, and another interesting (yawn!) page about permissions, but fortunately I did not have time to work through them.

When I typed "chmod 755 backup-hour.sh," I got "cannot access 'backup-hour.sh': No such file or directory."  One solution was to use a change directory (cd) command to get the Terminal prompt into the bin subfolder, so it would see what I was looking for.  But since I planned to put more scripts into that folder, and anyway since I wanted cron or other programs to know right away what I was talking about when I referred to something like backup-hour.sh, I decided to figure out how to put the bin folder in my "path."  The path is the list of folders where the operating system looks for guidance on what a command means.  To change my path so that the system would always know to look in bin, they said I needed to find and edit my .bash_profile file.  Unfortunately, they didn't say where it was.  It wasn't easy to find.  I ran searches in Nautilus (both user and root), but while those were grinding away, I found that I could just type "locate .bash_profile."  That turned up nothing, but very quickly.  Then I got some advice that, if it didn't exist, I could create it by using "touch ~/.bash_profile."  So I did that, and then tried again with "chmod 755 backup-hour.sh."  Still no joy.  Ah, but maybe that was because I hadn't rebooted; backup-hour.sh would run only on startup.  OK, so I used the other approach after all:  I changed directory to the bin folder and tried again.  Now I got "Permission denied."  What if I gave everybody full permissions with chmod 777?  I tried that instead of chmod 755.  That seemed to do it.  The hard drive was doing its thing now.

I wanted to see what was going on, so I decided to create a log file.  I wanted it to store only the error messages, not to list the thousands of files that backup-hour.sh was backing up successfully, so I put this on the end of (that is, on the same command line as) my rsync command in backup-hour.sh (above):
2> /media/CURRENT/backup-hour.log

The log filename thus matched the script filename.  I put "backup" first so that I could see all of my backup scripts in the same part of the folder's directory listing, and then I set up a backup-day.sh script along the same lines.  New problem:  these backup scripts would generate empty log files if there were no errors, and I didn't want to have to delete them manually.  So I found a forum post with advice on how to delete them automatically, using the "find" command.  In my version, it looked like this:
find /media/CURRENT/ -name "*.log" -size 0c -exec rm -f {} \;

I put that at the end of the backup-day.sh script, and it seemed to work.  It said, basically, look in the CURRENT folder for files whose name ends with .log and have zero bytes; and if you find any files like that, execute the "remove" command without asking for permission.  I didn't know what that ending punctuation is about, but that's what the advisor suggested.

In my backup-day.sh (not backup-hour.sh) script, I included the instructions for updating my USB jump drive (above).  I also included a set of commands to save my e-mail (I was using Thunderbird in Ubuntu) as a .tar compressed file. Actually, as seven .tar files, one for each day of the week.  That part of backup-day.sh looked like this:
# Assign Thunderbird mail & profile to be backed up
backup_files="/home/ray/.mozilla-thunderbird"
dest="/media/BACKROOM/Backups/Tbird"
# Create archive filename
day=$(date +%A)
hostname=$(hostname -s)
archive_file="$hostname-$day.tgz"
# Back up to a tgz file
tar zcvf $dest/$archive_file $backup_files 2>> /media/CURRENT/A-INCOMING/T-bird-Backup.log
So these seemed to be the basic kinds of tools I needed to set up rsync scripts and crontab entries.

Ubuntu: Backup with Rsync

In a previous post, I got as far as concluding that rsync was the tool of choice for backing up my computer in Ubuntu 9.04. I didn't pursue it because I was short on time and patience for writing scripts at that point. But eventually the need for a regular backup system became acute. So this post logs the steps I took to make rsync and cron work for me.

First, here's what I wrote previously:
As an alternative to rdiff-backup, what people had actually mentioned more frequently was rsync. It did not have the incremental backup features of rdiff-backup, to my knowledge, but it seemed to be an established tool for backup purposes. So for now, at least, I thought I might try that instead. Once again, I did a Google search and got a package details page with no apparent link to any help files. Eventually I found what looked like the official rsync webpage and, after looking at their FAQs and some other pages, landed on their Examples page. It was intimidating.
This time, I went to their Documentation page. This gave me links to, among other things, Michael Holve's rsync tutorial. The tutorial said, "You must set up one machine or another of a pair to be an "rsync server" by running rsync in a daemon mode." I was curious, so I did a Google search for "what is daemon mode" and I got back, would you believe, exactly one page. One webpage in the entire known planet answered the question, "What is daemon mode?" Except it didn't really answer it. It just said, "It makes wget put standard output into a log file and not bug you while downloading." Accepting that as the best available answer (and ten points to the answerer!), I typed "rsync --daemon" in Ubuntu's Terminal and proceeded to the next step, "Setting Up a Server." After reading it, I decided it didn't seem to apply to me. It was for people who wanted to back up files between computers. I just wanted to back up to another drive.

So I went on to the tutorial's "Using Rsync Itself" section. Since I wasn't sure what daemon mode did, or if it was necessary, I killed that Terminal session and started another. I didn't know if that would shut off daemon mode, or if doing so was what I should do. I read the section and then checked another source of documentation, the rsync man page ("man" being short for "manual"). The man page would ordinarily be output in response to a Terminal command, but someone had put it here in html form, so that's what I used. It reminded me, first, to check Ubuntu's System > Administration > Synaptic Package Manager to make sure I had rsync already installed, here on my secondary computer. I searched Synaptic for rsync and got back a couple dozen listed programs; rsync was among them and was shown as being installed. I looked partway into the man page and got an answer to one question I had from the tutorial. So here's how I translated what the tutorial was telling me. First, the tutorial listed these lines:

rsync --verbose --progress --stats --compress --rsh=/usr/local/bin/ssh --recursive --times --perms --links --delete \
--exclude "*bak" --exclude "*~" \
/www/* webserver:simple_path_name

The first thing to know was that this all represented a single command line. It was too long to fit on one line, though, so apparently the trailing backslash said, "This line continues on the next line." The command would be typed into a file and saved as a script, not typed directly into Terminal. I didn't know why the first line didn't end with a backslash. I decided I would want to experiment with this in a relatively safe place -- with a junk directory on my secondary computer, perhaps -- to see what it was doing.

So as the tutorial explained it, the first line of this example told rsync how to proceed: verbosely (i.e., with lots of information about what it was doing), showing a progress report, with statistics. The rsh part was for encryption, to be used optionally if you were sending your stuff online to another computer. I wasn't, so I decided to try leaving that off. I also didn't want to compress the output, because that made the process slower and required more attention from the CPU.

The second line of the example, above, told rsync to recurse -- to work through all of my directories and subdirectories under the folder that I would be naming. It also told rsync to preserve file timestamps and file permissions -- so if, for example, a file was readable only by root on the source drive, it would be the same way on the target drive. The -- links command was an instruction to preserve symbolic links -- not sure what that meant -- and the --delete command, as I understood it, would tell rsync to delete anything on the target that wasn't on the source. So you'd have a mirror, and not just an accumulation of backups of files that you have deliberately trimmed out of your file collection.

The third line of the example told rsync not to bother copying some kinds of files. I liked the sound of that at first, but then I decided I would rather be able to do a Properties comparison of source and target and verify that both had exactly the same number of files. So I decided to leave out this line when I used rsync.

The fourth line of the example named the source and target locations. I wasn't going to be using it with a remote source or target, so mine was going to look somewhat different from this.

On that basis, here's what I assembled as a test version of the rsync example from above:

rsync --verbose --progress --stats \
--recursive --times --perms --links --delete \
/media/DATA/Source /media/DATA/Target

I created a Source subfolder in my DATA folder and put a TestFile.txt file into it. I also created a Target subfolder in DATA. Then I copied those three rsync lines into a file in Ubuntu's Text Editor (gedit) and saved it to Desktop as TestRun. To make it executable, I went into Terminal, typed "cd /home/ray/Desktop" and then "chmod +x TestRun" and then double-clicked on TestRun and said Run. And, you know, it worked. Just like that. Not exactly as intended -- I had not only TestFile.txt but also the whole Source folder underneath my Target folder -- but, yeah, there it was. I deleted the Target folder and ran it again and, sure enough, it created the Target folder and then inserted a copy of the Source folder into it. Excellent!

Now it was time to try something a little bolder. I wanted to see how it worked if I tried to copy the whole DATA folder to an external drive. This part was a little confusing. The external drive seemed to have two different names. If I looked in /media, its name was simply "disk." But if I hit the Computer icon in File Browser, it came up as "193.8 GB Media." I decided the latter sounded more specific, so I would try that first. So now the third line of my TestRun file read like this:

/media/DATA "/media/193.8 GB Media"

I used quotation marks because there were spaces in the name. I saved TestRun and double-clicked it again on the Desktop. It didn't seem to do anything. I realized that I had probably made a mistake in that line, and tried again like this:

/media/DATA "/193.8 GB Media"

That didn't do it either, so I tried again, without the leading slash:

/media/DATA "193.8 GB Media"

That still didn't work, so I tried the other approach:

/media/DATA /media/disk

Still nothing. I went into System > Administration > Partition Editor (GPartEd), wiped out the target partition, and recreated it as a FAT32 partition. Now the drive was totally invisible to Ubuntu. I went back into GPartEd and reformatted it as an ext3 partition. Then I realized: it was an IDE drive, so apparently it would not be recognized until I rebooted. I decided to reboot into Windows (I had a dual-boot system) and format it as NTFS. I named it 186GB (which seemed to be the net amount of space available in NTFS format) and rebooted into Ubuntu. I revised TestRun's last line again:

/media/DATA /media/186GB

and ran it again. This time, it seemed to be working -- the external 186GB drive was making noise -- but I wondered why I wasn't getting a verbose indication of what was going on. I guessed that, if I wanted the verbose information, I would have to execute rsync on the command line, not in an executable script. While I was rooting around for an answer to that question, I was reminded that I could also use the shorthand versions of these commands. So instead of typing --verbose into the script, I could just type -v and whatever other letters I needed. In this approach, the final contents of TestRun, which were as follows:

rsync --verbose --progress --stats \
--recursive --times --perms --links --delete \
/media/DATA /media/186GB

could instead be expressed like this, if I understood the man page's Options Summary section correctly:

rsync -vshlEPtrip --del --delete-excluded --force \
--exclude RECYCLER \
--exclude "System Volume Information" \
/media/DATA /media/186GB

In that version, I added a couple other options that seemed appropriate, and also told rsync to exclude (i.e., don't copy) those extra folders that Windows XP seemed to put on every drive. I revised TestRun along these lines and, when the external drive settled down and it looked like the foregoing TestRun process had ended, I ran it again. But it didn't seem to make any difference. The extra folders were still there. I had understood that it would delete them. Part of the problem seemed to be that I had not used the syntax correctly. I was supposed to use an equals sign: --exclude=RECYCLER. But another part of the problem was that it was not clear whether the "exclude" command was supposed to work with directories. It didn't seem so. The man page just referred to excluding files. I tried again with equals signs, but still no change. I posted a question on it, but the kind response was unfortunately not able to resolve the issue.

Next, I tried a modified version of TestRun on the primary computer. I went through several revisions and wound up with this version, which seemed to work:

rsync -vchlEtrip --progress --del --ignore-errors --force /media/CURRENT/ "/media/OFFSITE/P4 CURRENT"

The partition being backed up, in this case, was an ext3 partition named CURRENT, and the target to which it was being backed up was a USB external drive named OFFSITE. (Some weeks have passed since I started this post, so there may be some discontinuity in my writing at this point.)

I did not run this command as a script within a file called TestRun that I would start by double-clicking on it, because I discovered that you would only get the detailed output if you entered the rsync command on the command line. I was able to enter all of the foregoing rsync command on one line. I did not need the "exclude" commands because this was not an NTFS drive formatted by Windows. I still had ext3 "lost+found" and Trash folders that got copied over in this way, but they were small, so it was OK.

As I think I may have said before, I got the selected rsync parameters by typing "man rsync" at the command line. It did take some trial and error to get this particular set. The resulting backup, when checked by right-clicking and selecting Properties, seemed to be virtually identical.

When I ran that rsync command, it showed me lots of detail on what it was doing. It concluded with this message:
rsync error: some files could not be transferred (code 23) at main.c(977) [sender=2.6.9]
Eventually, however, I did figure out how to do it.  Here is an example of an rsync command that worked for me:
rsync -qhlEtrip --progress --delete-after --ignore-errors --force --exclude=/.Trash-1000/ --exclude=/lost+found/ /media/CURRENT/ /media/CURRBACKUP

This one would back up what Windows sees as drive D (named CURRENT) to a partition that Windows sees as drive G (named CURRBACKUP).  Both partitions had to be mounted in Ubuntu before this would work.  I used a similar command to copy a folder on CURRENT to a USB jump drive named KINGSTON.  That gave me a portable copy of the current state of that folder, ready to take along.

The next thing I needed to do was to back up my blogs.  I started by just wanting to be able to back up a webpage.  I had discovered that all of the posts on a Blogger (i.e., Blogspot) blog like this one could be displayed in a single webpage, at least if you had less than 1,000 posts.  To do that, you just needed to go to this URL:  http://blogname.blogspot.com/search?max-results=1000.  I wasn't sure what would happen if you entered a larger number than 1000.  So now that I had that webpage, I wanted to know how to save a copy of it automatically.  Strangely, at this point, Google searches for any of these sets of terms
ubuntu "back up a webpage"
rsync "back up a webpage"
rsync "copy a webpage"


produced zero hits.  Eventually, it started to look like this was because I was barking up the wrong tree.  As described in a separate post, it seemed that what I wanted for this purpose might be wget, not rsync.

Full Disk in Ubuntu 9.04

I was using Ubuntu 9.04 (Jaunty Jackalope). I started getting error messages indicating that my Linux program partition (mounted as / ) was full. For example: "Error while copying to 'tmp'. There is not enough space on the destination." This surprised me, because when I went to Gparted (Ubuntu > System > Administration > Partition Editor, installed via System > Administration > Synaptic Package Manager), I saw that the Linux root partition was 107GB, of which 102GB was used and only 5GB free.

Andre Mangan advised typing "sudo find / -type f -size +100000k" in Terminal to see what large files I had (with the option of changing the 100000 value). I added a zero, so I was looking for files of roughly 1GB and larger in size. This searched everywhere, including on my mounted data partitions (in /media and/or /mnt), so probably I should have unmounted those before searching: it included .avi (video) and other large but known files. It didn't seem to turn up any unexpected large files, though.

I posted a question on this. Meanwhile, I was noticing all kinds of errors resulting from this problem. Once I closed Gparted, I was not able to start it again: I got an error message, "Failed to run /usr/sbin/gparted as user root. Unable to copy the user's Xauthorization file." I got the same message when attempting to run Synaptic and other programs. I suspected this was also the culprit behind the recent failure of certain rsync (backup) jobs. Whatever was filling space on my drive was still operating; I had had 5MB free just a half-hour earlier.

Drs305's response to my post pointed me to his tutorial on disk space problems. Following the relevant portions of that tutorial, I first unmounted all partitions other than the active root ( / ) partition by typing "sudo umount -a". Then I typed "sudo find / -name '*' -size +1G" and saw that the /media/OFFSITE folder, which was supposed to be a reference to my external OFFSITE hard drive, was instead referring to a folder on the root partition. No wonder I had been having problems with my external backup! I deleted /media/OFFSITE and then typed this:

sudo find / -type d -name '*Trash*' | sudo xargs du -h | sort

This seemed to show me that there were lots of files in the /root trash folder(s), which I probably had never emptied. They appeared to be in /root/.local/share/Trash/files, and maybe elsewhere. To delete root trash, I typed "gksudo nautilus" and, in Nautilus (i.e., File Browser), I made sure to check View > Show Hidden Files. Then, in that session of Nautilus, I navigated to File System /root/.local/share/Trash. I selected the Trash folder and hit Shift-Delete. This said, "Are you sure you want to permanently delete 'Trash'?" I said Delete! with joy. It notified me that it was deleting about 37,000 files. So, duh, this could have been part of the problem. Then I noticed that the faster way to check disk space was, instead of using GParted, to type "df -Th | grep -v "fs".When it was done, I refreshed GParted. Now it showed only 37GB used. Still a lot!  So I thought I should probably pursue some of the other options described in the tutorial.

Before I could do that, though, I ran into a separate problem: my programs drive completely wiped itself out, so I had to restore Ubuntu from a backup.