Sunday, May 16, 2010

Importing Microsoft Word Autocorrect Entries into OpenOffice.org Writer

I had been looking, for some years, for a way to import my list of AutoCorrect entries from Microsoft Word 2003 into the OpenOffice.org (OOo) word processing program.

In Word, I had found AutoCorrect invaluable for converting shorthand expressions into longer terms, saving me a lot of typing. For example, I could type “fttt” and watch it expand to “from time to time,” having previously defined it as such. My list of Word AutoCorrect terms had grown long, into the thousands of entries, so I could not just retype them into OOo Writer manually.

I did know how to export the AutoCorrect entries from Word to a text file. There were apparently several macros available for this purpose. The challenge had been in getting the items from there to Writer. A Linuxtopia webpage now suggested a possible approach, however, and I decided to explore it.

My first step was to get into Writer’s DocumentList.xml file. To do this, in Ubuntu’s Nautilus (i.e., File Browser) I went to /usr/lib/openoffice/basis-link/share/autocorr. I double-clicked on acor_en-US.dat (there were files for other languages and for other flavors of English). There was DocumentList.xml. Now, what to do with it? I right-clicked on it and chose Extract > Extract. This gave me an error message: “Extraction not performed. You don’t have the right permissions.” So I went into Applications > Accessories > Terminal and typed “sudo nautilus,” and then, using that superuser File Browser session, went back to that same autocorr folder and tried again. This time, I didn’t try extracting; I just right-clicked on acor_en-US.dat and chose “Open with Archive Manager” and then right-clicked on DocumentList.xml and chose “Open with” and chose gedit. I went to the end of the file, right before the “</block-list:block-list>” entry, and copied the whole previous entry. In my case, it was the one that would change “yuor” to “your.” In full, it read like this:

<block-list:block block-list:abbreviated-name="yuor" block-list:name="your"/>
They all seemed to follow that same format.  So apparently it was just a matter of getting my Word abbreviations into that form.  To test this, I added an entry right after that “your” entry.  Mine read like this:
<block-list:block block-list:abbreviated-name="yr" block-list:name="your"/>
After making that change, I saved the file.  This provoked a File Roller message:  “Update the file ‘DocumentList.xml’ in the archive ‘acor_en-US.dat’?”  I said yes, i.e., Update.  Then I started Writer and tried typing “yr.”  It didn’t work.  It would correct “yuor” to “your,” but it wouldn’t correct “yr” to “your.”  I rebooted the system, in case that would make a difference, and tried again.  It didn’t.  Yr was still not listed in Writer’s autocorrect replacement list.  I went back and looked at the end of DocumentList.xml.  “Yr” was still there.  Had I not entered it correctly?  It looked like I might have entered it twice, possibly from a previous try at the same thing.  I made sure there was just one entry for “yr.”  Then it occurred to me to delete the one for “yuor” and see what would happen.  Or, even better, I deleted the one for “yr,” the one that I had added, and I changed the one for “yuor” to be for “yr” instead.  I went back into DocumentList.xml but, what’s this, there were two entries for “yr” again.  Then I realized that the file edit time had not changed:  it seemed I was editing and saving the changes, no error messages, but I hadn’t come in as root, so there was not anything actually happening.  Editing as root, I saw another problem:  I had apparently inserted a copy of the list-ending “/block-list:block-list” command before my “yr” entry.  So perhaps Writer wasn’t going beyond that, and this was why it wasn’t seeing the “yr” item.  I made those changes, started Writer, and it worked!  “Yr” became “your.”  I went into Writer’s AutoCorrect options, looked at the end of the list, and sure enough, there was “yr.”

So now the mission was to incorporate a bazillion Word AutoCorrect entries into this DocumentList.xml file.  Or, no, as I thought of it, I decided the first step was to make a backup copy of this xml file and then delete its contents.  I had been working with my Word AutoCorrect list for years.  I didn’t need any surprises from whatever might be in DocumentList.xml.  Actually, to make it easier, I just made a quick copy of the whole acor_en-US.dat file.  Then, in DocumentList.xml, I deleted everything except the file starting and file ending lines:

<?xml version="1.0" encoding="UTF-8"?>
<block-list:block-list xmlns:block-list="http://openoffice.org/2001/block-list">

</block-list:block-list>

Since I would probably be doing this again – adding to the OOo AutoCorrect list from the Word AutoCorrect list, or possibly vice versa – I decided to manage it all through an Excel 2003 spreadsheet.  This, I thought, would also be a good way to compare the AutoCorr lists that I had developed on different computers.  That is, I was using AutoCorrect on more than one computer, and it seemed likely that there would be some cases where those lists were not compatible.  So I began with that part of the project.  I ran the AutoCorrect macro in Word on each computer and brought all of the resulting wordlists together into one folder.  I opened one of those wordlists, copied the whole thing, waited a few minutes to make sure it was all there, and pasted it all into an Excel spreadsheet.  Here, too, I wished the AutoCorrect feature included a column indicating the date last used, because a lot of these entries were totally unfamiliar to me and others were for things I was no longer writing about.  Probably I should have done this spreadsheet thing when I first installed Word.  Then it occurred to me that I could set up a virtual machine, install Word on it, and do something like that now.  But without manual examination, I still wouldn’t be able to tell which of those original Word AutoCorr entries I had ever used.

I did manage to come up with some sorting rules that helped somewhat.  After deleting exact duplicates from the several combined AutoCorr files, I sorted alphabetically according to Value (i.e., the term that resulted from the auto-correction) and then according to value length.  For example, I had given “acl” a value of “actual,” and Word came with “actualyl” as also having a value of “actual.”  I could have left both, but it seemed pretty unlikely that I would let a paper go out with “actualyl” in it (not to mention “additinal” and “adequit”).  Actually, I reasoned, I would rather risk letting a paper go out with “actualyl” in it than to endure the insult of having such a spelling correction in my AutoCorrect file.  So I deleted a bunch of those.  I also searched for items containing a space, since those tended to be from Word, not me (e.g., “witht he” becomes “with the”).  I searched for items of the same length before and after, since these tended to be Word’s typo corrections.  When I was done, I copied and pasted it from the Excel file back into the Word AutoCorr list.  Doing that involved creating a new table with enough rows to accommodate all of the Excel entries, highlighting all those empty cells, and pasting the Excel cells into the highlighted space.  There were some extra rows, which Word redundantly filled by starting over at the start of the table and continuing until all rows were filled; I had to delete those.

The next step was to get rid of the existing AutoCorrect entries in Word, so that the unwanted ones that I had deleted would really be gone.  I did this by creating a Word macro to remove them all.  I had no idea how to do this, but it was easy:  in Word 2003, I went into Tools > Macro > Macros > Create.  It had a space for my new macro, starting with Sub AddTBMenuItem() and continuing on to End Sub.  I pretty much replaced that with the following macro, posted in 2001:

Sub RemoveAllDefaultAutoCorrects()
Dim aCor As AutoCorrectEntry
If MsgBox("This is a very destructive macro. Be sure that you " & vbCr & _
"want to delete all the AutoCorrect entries. There is no " & vbCr & _
"for this action. Click OK to continue", vbCritical + vbOKCancel, "CAUTION") _
= vbOK Then
For Each aCor In Application.AutoCorrect.Entries
aCor.Delete
Next aCor
End If
End Sub

I closed that, went back into Tools > Macro > Macros, selected that new macro entry, and ran it.  I gathered from somewhere that Word would restore the old list if you didn’t replace it with at least one new AutoCorrect entry, so I created a dummy one, exited Word, and then came back in to see what it looked like.  Sure enough, there was only that one dummy entry.  So now I ran the macro to restore my new list, and that took care of getting Word’s AutoCorr list updated.

Now, how to do the same thing in OOo Writer?  Using the format shown in that Linuxtopia webpage, I went back to the Excel spreadsheet, added another column on the right side, and used text concatenation to add all the missing stuff – basically, everything other than “yuor” and “your” in that example.  The formula I used was this:
=”<block-list:block block-list:abbreviated-name=”&CHAR(34)&A4&CHAR(34)&”block-list:name=&CHAR(34)&B4&CHAR(34)&”/>”
CHAR(34) was the Excel command for a regular (double) quotation mark.  I had to use CHAR(34) because the quotation mark means something different.  This formula said, take the name in cell A4 (e.g., “yuor”) and replace it with the value in cell B4 (e.g., “your”).  So I copied that formula all the way down the spreadsheet, in my column E (using column D to show the date when I did this, for future reference).  Then I copied all of those cells from column E into Notepad, made sure that Format > Word Wrap was turned off, and saved that as AC.TXT.  Back in Ubuntu, I opened AC.TXT in gedit.  Testing confirmed that OOo Writer was going to have a hard time with items that gedit displayed in funky format, like these:




Most of the items that caused problems that way were due to the use of smart apostrophes (i.e., single quotes) in Word.  So back in Windows, in Notepad, I opened AC.TXT, found an example of a smart apostrophe, highlighted and copied it into the Find & Replace box, and replaced it with a simple apostrophe.  I made a note in the spreadsheet, next to these items, to indicate that they were not compatible with Writer.  Word's em dash () was also problematic as an import into Writer, so I had to replace it, in this imported list, with two hyphens (--).

With these changes made, back in Ubuntu, I was ready to paste the revised AC.TXT into DocumentList.xml.  I closed Writer, did the paste, started Writer, and tried it out.  It didn’t work.  I looked at the AutoCorr list.  It had imported only a few items.  It looked like it had stopped at an item containing an ampersand (&).  I deleted that item from DocumentList.xml and tried again.  Now its AutoCorr list was longer, but still only a fraction.  Sure enough, it had stopped at another ampersand.  I went back to the spreadsheet and deleted or changed all items containing an ampersand, and marked them on the spreadsheet for incompatibility as well.  Trying again:  still no cigar.  This time, it seems there was an item in my list that was already in quotation marks.  So I was trying to import something like ““This”” and Writer wasn’t buying it.  I fixed that and tried again.  This time for sure.  It worked.  I had the entire list, and I played with it.  It looked like they were all going to work.  I doctored up the list by putting a copy of Writer’s special character for the em dash into a document (Insert > Special Character > Box Drawing) and then copying it into the places in the Tools > AutoCorrect list where I had had to import double hyphens (--) instead.  At some point, I would probably do the same with the smart apostrophes, ampersands, and other items if I decided to use Writer frequently.

So it worked.  I could now use my list of abbreviations in OOo Writer instead of having to use Word.

3 comments:

Anonymous

This is a long story with a lot of "didn't work". Can you please boil it down to a few steps to just to get the job done? That would be SO helpful because I'm also an autocorrect addict.

raywood

I haven't quite reached that point myself yet. But next time I work through it, I'll make another step in the direction you suggest.

raywood

A later post describes steps I took to manipulate the list of Autocorrect entries in Word 2010 on Windows 7.