Friday, March 18, 2011

Thunderbird for Windows: Transition from Portable to Desktop; Duplicate Email Remover

I was using Thunderbird Portable 3.1.4 in Windows 7.  I wanted to use an add-on (Remove Duplicate Messages (Alternate) 0.3.6) to delete duplicate email messages.  I got the impression that it wouldn't run on the portable version.  I had been thinking about switching to the desktop version of Thunderbird anyway, and now seemed like the time.  To figure out how to transition from portable to installed versions of Thunderbird, I ran a search and found advice that seemed on point.  I did not precisely track all of the steps I took in this process, but the following is a pretty close approximation.

I started by installing regular (i.e., not portable) Thunderbird.  I think I created an email account at that point.  This generated C:\Users\Administrator\AppData\Roaming\Thunderbird\Profiles\f0xqaflh.default.  (The f0xqaflh part was randomly generated -- other installations would have a different ????????.default file.)  I closed Thunderbird and moved C:\Users\Administrator\AppData\Roaming\Thunderbird\Profiles\f0xqaflh.default to D:\Thunderbird\Profiles\f0xqaflh.default.  I put it on D so that it would be saved in case of Windows reinstallation.

Then I went to Start > Run > "thunderbird.exe -ProfileManager."  In Profile Manager, I clicked on Create Profile > Next > Choose Folder and pointed to D:\Thunderbird\Profiles.  I exited Profile Manager and moved the contents of ThunderbirdPortable\Data\profile (i.e., just the profile subfolder) to D:\Thunderbird\Profiles.  I clicked on my Start Menu shortcut for Thunderbird (not portable).  It ran, and it seemed that all of my emails were there.  I deleted the folder containing the portable version.

I hoped this was all I needed.  Now it was time to try to delete duplicate emails.  I installed the duplicate email remover add-on (Tools > Add-ons > Extensions tab > Install) and ran it (Tools > Remove Duplicates).  It wouldn't check my archive folder until I turned off the Skip Special Folders option (Tools > Add-ons > Extensions tab > Options > Message Comparison tab).  At first, I used the default comparison criteria in that same tab:  Author, Recipients, CC List, Message ID, Send Time, Size, Body, and Subject.  This did not identify too many duplicates, but it appeared they were exact duplicates, so I could delete them all without much manual comparison.  I ran another search, without the Message ID criterion, and yet another, without the Size comparison.  The former likewise seemed not to require much manual comparison; the latter did.  In other words, the final comparison criteria (Author, Recipients, CC List, Send Time, Subject) produced many alleged duplicates, some of which were of very different size.

The add-on did not allow me to open individual emails (via double-click or right-click), to see why two emails bearing the same subject, date, time, etc. would be so radically different in size, so I had to do a lot of manual toggling back and forth between the duplicate remover and Thunderbird, and then searching for individual items in T-bird, to check emails one by one.  In this regard, it was not like DoubleKiller, which I had found to be an excellent duplicate file finder.  But the manual selection process was similar:  check or uncheck the desired item under the "Keep?" column.  Both of these programs would probably have been easier to use if it had been possible to select or deselect items by clicking anywhere on the line, rather than having to mouse over to precisely the checkbox spot each time.

The add-on did allow arrow-key and spacebar navigation and selection.  Playing with this, I eventually discovered that the Enter key would open T-bird to one of the identified duplicate messages, but in that case the comparison window disappeared and I was back in Thunderbird, leaving me to wonder why I was now seeing only one of the duplicates.  Then I realized, oops, hitting the spacebar had not actually opened the selected duplicate; it had gone ahead and run the deletion.  Well, I hoped those 700 messages really were duplicates.  I had been verging toward just saying to hell with the time-consuming and awkward manual comparison process anyway; I just wasn't quite ready for this to happen.  I looked in Thunderbird's Trash folder and realized that I had not emptied the trash before running the duplicate checker (another ideal feature for the duplicate checker), so now I would have to restore not just the 700 messages that I had apparently just deleted, without an "Are you sure?" message, but would also have to restore about 700 other messages that were apparently in the Trash previously, since I was now seeing a total of 1400 messages there.  As I looked at the Trash, I found myself wondering, actually, what was wrong with those 700 other messages.  They didn't seem to be messages that I would have wanted to delete, unless they too were duplicates.  I decided to move the whole lot of them to the archive folder that I had been dup-checking.  At this point, needless to say, I was beginning to fear that I might just be turning my whole email archive into a giant hash.  I started back through a sequence of dup-checks, beginning with the most conservative (i.e., with the most comparison criteria checked), but of course this time I had no patience for checking individual items.  Instead, I just dreamt of an update that would actually display large thumbnails of alleged duplicates, right there in the add-on.

The column headings in the dup-check results window permitted sorting in ascending or descending order.  At first, I thought that feature was not working for some criteria.  Then I figured out that it was meant to sort only within a comparison.  For example, if Size was not a comparison criterion, it would not be in boldface in the top row, and then clicking on it would sort alleged duplicates according to size; but if Size was a comparison criterion, it would be bolded, and then clicking on that heading in the top row would do nothing, since in that case all duplicates within a set would be identical by definition.  It would have been helpful if selected comparison criteria headings had enabled a sorting of all pairs.  That is, if I was comparing by Send Time, I wanted to be able to show the earliest ones (i.e., the pairs of allegedly time-identical messages) first, so that I wouldn't have to do so much jumping-around when I toggled to Thunderbird for a manual comparison.

After running the several comparisons mentioned above, I tried running one with only the Send Time and Subject criteria checked.  This revealed some apparent duplicates whose only difference was that for some reason one item in a pair would be enclosed in quotation marks (e.g., a message from "Joe") while the other would not (e.g., a message from Joe).

That was the end of my use of the add-on at this point.  I returned to finish this post several hours after completing these processes.  It appeared, at that point, that the transition to desktop Thunderbird and the use of the add-on to delete duplicate emails were both successful.



It later occurred to me that possibly I could have used DoubleKiller after all, at least for those emails that I was going to export as separate EML files. If they were duplicates, DoubleKiller would identify them as such, once they existed as independent EML files.


A later post updates the duplicate email part of this post.