Thursday, December 31, 2009

Notes on Converting Word Processing Documents from 1985-1995

I was using Ubuntu 9.04 (Jaunty Jackalope) and VMware Workstation 6.5.2, running Windows XP virtual machines (VMs) as guests. I was trying, in one of those VMs, to convert some data files from the 1980s and 1990s. This post conveys some notes from that process.

I had used a number of different database, spreadsheet, and word processing programs back then. The filenames had extensions like .sec and .95. These suggested that the file in question was probably not a spreadsheet (whose extensions would probably have been .wks or .wk1 or .wq1). I suspected these were word processing docs, but what kind?

I had a copy of WordPerfect Office X4, so I tried opening them in that. The formats I had used principally back then were WordStar (3.3, I think), WordPerfect 6.0 for DOS, XyWrite III+, and plain ASCII text. So for some documents it took several tries, telling WordPerfect X4 to try these different formats, before the document would open properly. Even then, not all of them did.

I also tried the approach of highlighting a bunch of these files, right-clicking, and indicating that I wanted to convert them to Adobe Acrobat 8, or to combine them in Acrobat. Unfortunately, these efforts tended to cause Windows Explorer and/or Acrobat to crash.

It occurred to me to try another approach. I left Windows in VMware and dropped down to Ubuntu. I selected 57 files that I wanted to convert. OpenOffice 3.0 Writer started up by default. It opened them all. They had been last modified in 1993 and thereabouts. I think they were created with Word 3.1. For each file, I clicked a button and got a PDF created in the same folder with the same name and a PDF extension.

OOo Writer wasn't able to open some WordStar 3.3 files from the mid-1980s. Several sources referred me to Advanced Computer Innovations for that sort of conversion. Their prices weren't bad, but I didn't want to pay $1 per file per 50K for these old materials. Instead, I looked into old Microsoft converters.  Those, unfortunately, did not appear to be available anymore.  A search led to a forum that led to converters.  Those, however, did not appear to go back to WordStar for DOS 3.3.  Graham Mayor's page looked like a better bet.  It gave me a file, but by the time I got around to it, I had already addressed my needs, so I didn't actually try this one.

Separately, somehow, I found (or maybe I had always retained) a copy of a program that seemed willing to install "Microsoft Word 97 Supplemental Converters."  Searching for this led to a Microsoft page where I was able to download the Word 97-2000 Import Converter (wrd97cnv.exe); unfortunately, that proved to be a backwards conversion from Word 97 to Word 95.  Trying again, I found that the Microsoft Office 2003 Resource Kit webpage led to a list of downloads that included an Office Converter Pack that I downloaded (oconvpck.exe).  I seem to have installed this, and I think this is what ultimately did the job for me.

Resources for converting XyWrite III+ files were pretty scarce by now, a decade after what appears to have been the last (short-lived) effort to reconstruct a manual of its text-formatting codes. Apparently nobody who has a copy of the paper manual has gotten around to PDFing and posting it; or perhaps Nota Bene (which apparently bought XyWrite in the 1990s), for some reason, is unwilling to allow any such reference to be made available. But here are some examples of codes used, from what I've been able to figure out and recall:

«PT23» start using proportional type font no. 23
«PG» page break
«TS5,10» set tabs 5 and 10 spaces to the right
«DC1=A 1 a» set DC1 outline structure (first level = A, B, C ...)
lm=0 set left text margin at zero characters (i.e., not indented)
«FC» format centered, i.e., center text
«MDBO» begin boldface
«MDNM» end special formatting (e.g., boldface)
«SPi» set page number to i (e.g., for preface)

I also had some old .wpd (WordPerfect) documents.  Not all of them had .wpd extensions to begin with.  To bulk rename the ones that didn't, I searched for a bulk renamer, to rename them all to be .wpd files.  I tried Bulk Rename, but its interface was complex and inflexible compared to that of ExplorerXP -- just select the files you want to rename, press F2, and set the parameters.

Once I had the files named with .wpd extensions, the next question was, how to get them into PDF format.  That was easy with the others, above, to the extent that Microsoft Word could read them; I could PDF them from there.  I shouldn't say it was "easy"; it was still a manual process, and I was now searching for a way to automate it.  Unfortunately, I was not finding any freeware ways to convert from .wpd to .pdf.  Later versions of WordPerfect include a Conversion Utility to bring those files into the modern era, but they are still wpd files.  Adobe Acrobat 8.0 was able to recognize and convert the files (select multiple files in Windows Explorer, right-click and choose the Convert to PDF option), but they proceed one by one, and I had hundreds of files, and it took several seconds for each one to process.  Also, it added an extra blank page to the ends of some if not all of these old WordPerfect documents.  I didn't find any wpd to odt (OpenOffice Writer) converters.  I thought about trying Google Docs, which someone said could bulk convert to pdf, but they didn't accept wpd as input.  I tried looking for a converter from wpd to doc, and that led me to, which would convert directly from .wpd to .pdf, but would only let me upload one file at a time. I found that the Options in OpenOffice (I was using the Ubuntu version) could be set to save automatically as Word documents, so I did that, and then uploaded a few of them to Google Docs and downloaded them as PDFs.  The formatting was messed up on a couple of them.  I tried a comparison without Google Docs, just converting to pdf from the .doc files that OpenOffice had saved.  The formatting was better that way, so Google Docs didn't add anything; and the process of converting the Word docs to PDF was the same one-file-at-a-time thing as if I were printing from WordPerfect itself, so involving Word didn't add anything either.  In the end, the best and probably fastest approach seemed to be to select a bunch of wpd files in Windows Explorer, right-click and select Convert to Adobe PDF.

This seemed likely to be a continuing effort, but these notes ended here.