Showing posts with label bulk. Show all posts
Showing posts with label bulk. Show all posts

Sunday, June 10, 2012

Batch Converting Many WordStar (.ws) Files to PDF

I had previously worked out a command that would convert all of the Microsoft Word (.doc) or WordPerfect (.wpd) files in a folder to PDF.  Now I wanted to try that on a batch of old WordStar (.ws) files.  This post discusses that task.

As described in the previous post, I set my PDF printer (Bullzip) to print without opening the resulting files and without interruptions, except that I think I did let it notify me of error messages.  I didn't want to have to approve each conversion manually.  Also, I had named the WordStar documents to have a .ws extension, even though that extension was not necessary back in WordStar's heyday.

I had also configured my copy of Word 2003 to recognize and open .ws documents.  I was not entirely sure how I had managed this.  My records suggested two possibilities.  One was to run a .reg file containing the following lines, so as to modify the registry in some hopefully appropriate way:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Text Converters\Export\WordStar]
"Extensions"="ws"
"Name"="WordStar 3.3 - 7"
"Path"="C:\\Program Files\\Common Files\\Microsoft Shared\\TextConv\\Wrdstr32.cnv"

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Text Converters\Import\WordStar]
"Extensions"="ws"
"Name"="WordStar 3.3 - 7"
"Path"="C:\\Program Files\\Common Files\\Microsoft Shared\\TextConv\\Wrdstr32.cnv"
The other possibility was that I had apparently found a program that required me to add certain files to the Windows 7 Program Files folder, including particularly one called Wrdstr32.cnv.  A search suggested that anyone hoping to download such files from a virus-free source had best be using something like WOT.  It had been a while since I had set up my system, and in any case I had not tested these options individually to determine whether they were useful or necessary.  For all I knew, Word was capable of reading .ws files without any of this.  The point is that, at least on my system, Word was now capable of doing so.

With all that in place, I was set to run a command that would hopefully process a lot of WordStar files without much intervention from me.  I started Notepad, created a blank file called Converter.bat, and put this line in it:
FOR /F "usebackq delims=" %%g IN (`dir /b "*.ws"`) DO "C:\Program Files (x86)\Microsoft Office\Office11\winword.exe" "%%g" /q /n /mFilePrintDefault /mFileExit && TASKKILL /f /im winword.exe
I saved Converter.bat and put it in the folder containing the .ws files.  I probably could have used Excel to mass-produce commands that would have done the conversion in-place, for .ws files scattered among multiple folders, but my approach to that sort of situation tended to involve bringing the files to be converted together into one folder anyway, and then putting their converted replacements back where the original files had come from.

I ran Converter.bat in the .ws folder.  It ran successfully; I had PDFs in my Bullzip output folder for each WS document in the input folder.  Mission accomplished.

Thursday, March 15, 2012

Batch Converting Many Text Files to PDF

I had a bunch of .TXT files that I wanted to convert to PDF.  I had solved this problem previously, but it looked like I hadn't written it out clearly, so that's the purpose of this post.  This explanation includes solutions to several other sub-problems.  All together, the things presented here were useful for solving a variety of problems.

First, I made a list of the files to convert.  My preferred way of doing this was to use DIR.  First, I would open a command window.  My preferred way of doing *that* was to use the "Open command window here" context menu (i.e., right-click in Windows Explorer) option.  An alternative was to use Start > Run > cmd, but then I would have to navigate to the desired folder using commands like CD.

The DIR command I usually used, to make a list of files, was DIR /s /a-d /b > filelist.txt.  (Information on DIR and other DOS-style commands was available in the command window by typing the command followed by /?.  For example, DIR /? told me that that the /s option would tell DIR to search subdirectories.  A variation on the DIR command:  DIR *.txt /s /a-d /b.  The addition of *.txt, in that example, would tell DIR that I wanted a list of only the *.txt files in the folder in question (and its subfolders).  If I wanted to search a whole drive, I'd make it DIR D:\*.txt /s /a-d /b > filelist.txt.  If I wanted to search multiple drives, I'd use >> rather than > in the command for the second drive, so that the results would add to rather than overwrite the filelist.txt created by the preceding command.

Using DIR that way could gather files from all over the drive.  Sometimes it was better to gather the files into one folder first, and then run my DIR command just on that folder.  An easy way of finding certain kinds of files was to use the Everything file finding utility, and then just cut and paste all those files from Everything to the desired folder.  For instance, a search in Everything for this:

"see you tomorrow" *.txt
would find all text files whose names contained that phrase.  Cutting and pasting that specialized list into a separate folder would quickly give me a manageable set of files on which I could focus my DIR command.  (There were other directory listing or printing programs that would also do this work; I just found them more convoluted than the simple DIR command.)

Once I had dirlist.txt, I copied its contents into Excel (or I could have used Excel to open dirlist.txt) and used various formulas to create the commands that would convert my text files into PDF.  The form of the command was like this:
notepad /p textfile.txt
I wasn't sure in the case of Notepad specifically, but I was able to run some programs (e.g., Word) from the command line by just typing one word (instead of e.g., "notepad.exe," or a longer statement of the path to the folder where e.g., winword.exe was located) because I had put the necessary shortcuts in C:\Windows.

Those Notepad commands would send the text files to my default printer.  My default printer was Bullzip.  When I installed it, it gave me a separate shortcut leading to its options.  For this purpose, I set its options so that it did not open the document after creation (General tab), specified an output folder (General tab), and indicated that no dialogs or questions should be asked (Dialogs tab).

I copied the desired commands from Excel to a Notepad text file and saved it with a .bat extension.  The rest of the file name didn't matter, but the .bat extension was important to make it an executable program.  In other words, if I double-clicked on PrintThoseFiles.bat (or if I selected PrintThoseFiles.bat and hit Enter) in Windows Explorer, the batch file would run and those commands would execute.  (I could also run the batch file from the command line, just by typing its name and hitting Enter -- which meant that I could have a batch file running other batch files.)

So that pretty much did it for me.  I ran the batch file, running lots of Notepad commands, and it produced lots of good-looking PDFs.

Please feel free to post questions or comments.

Thursday, March 8, 2012

File Naming Conventions

I had a bunch of files. I was looking at ways to sort them. It seemed that it might help if they were all named according to file naming conventions, so that files of a certain type would be named in certain standard ways.

It was not immediately clear if there was any scholarly consensus as to what approach would be best. Along with general advice, there seemed to be at least two fundamentally different philosophies. On one hand, sources like the National Technology Assistance Project (NTAP) recommended using a folder hierarchy and relatively plain-English filenames. For instance, a file named "2009-05-15 Letter to Smith, R P" might be found in the 2009 > Correspondence subfolder; or in a different scheme, it might be in the Completed Projects > Waterway subfolder. On the other hand, Vincent Santaguida recommended putting the information into the filename itself and avoiding folder hierarchies. (I found that document and others on an Exadox webpage.) Santaguida's first example said this:

Do: Z:\Prod\QA\AssL7_WO_Suzuki_L3688_20090725.xls

Don't: Z:\Production \Quality Control\Assembly Line7\Work Orders\Clients\Suzuki Motors\ LOT3688_July‐25‐2009.xls
Depending on who was using the files and how much they knew about the variety of filenames in the archive, it seemed that Santaguida's approach might benefit from a formal, elaborate naming scheme -- with, for instance, a reference work where users could look up "Quality Control" or "Assembling Line 7" (and other variations) and find the proper rules for naming documents related to those topics, and a list or guidance system leading to relevant documents already filed. I could see where such a system might be valuable in some settings. I had a couple of concerns about it, though. One was, what happens if you lose the reference list, or if the specialized database management system creating such filenames goes on the fritz?

It seemed that, for most purposes, PC Magazine had the better idea: make your filenames indicative of what the file contained, in terms that potential users could understand -- and, I would add, within a scheme that would not require more maintenance than users or database managers would devote to it. For instance, aside from special projects like this one, I was not generally going to invest the time to create a highly precise file naming arrangement. It did seem that Santaguida's approach could help reduce file duplication, but I felt that DoubleKiller gave me an adequate solution for that. The other thing was that I didn't actually know how life was going to turn out yet. File arrangements grew up on the fly, as new situations emerged. I wasn't positioned to put it all into a rigid structure.

In other words, while adopting some principles recommended in the Best Practice for File-Naming produced by the Government Records Branch of North Carolina, I was concerned that "Records will be moved from their original location" -- that, in other words, I might have to re-sort things that I had already sorted once -- but I didn't see an easy way around that. Building their location into the filename would have been a bad idea because, in many cases, I *wanted* to be able to re-sort things at random.

Within the individual filename, Santaguida's second principle seemed right: "Put sufficient elements in the structure [particularly in the filename] for easy retrieval and identification but do not overdo it." I had been working toward a couple of basic formats:
2009-05-04 13.59 Message from X to Y re Z.pdf
Shakespeare--Romeo and Juliet.pdf
Shakespeare--Romeo and Juliet - with notes.pdf
Garfunkel--Bridge Over Troubled Water.mp3
The present project, I decided, was one in which I could mostly tend toward the first example: Year-Mo-Da Hr.Mn DocType from Sender to Recipient re Subject.ext. I would use periods and hyphens (-)for some limited purposes, but would tend not to use other punctuation. This tended to agree with Santaguida's third rule: Do not use characters such as ! # $ % & ' @ ^ ` ~ + , . ; =)]([. Santaguida said don't use spaces, but I had rejected that in opting for plain-English filenames. He also said to use the underscore (_) to separate filename elements, but that was unnecessary in my approach. It also had the potential to confuse things. I noticed that some naming and conversion programs used the underscore in place of the space, giving me "File_name.exe" instead of "File name.exe." In Santaguida's approach, that would falsely suggest that "File" and "name" were two different elements. I planned to scout out and remove underscores. The intention to minimize punctuation also seemed generally consistent with various uses of special punctuation in Windows.

I also had to think about Santaguida's seventh principle. He recommended putting surname first, first name second: "Smith, Roger" rather than "Roger Smith." Actually, in his approach, it was "Smith-Roger." It seemed to me that there were some reasons not to do it that way, at least in my system. One was that I would have to sweep filenames (at least new filenames) for consistency occasionally at any rate, to catch and rename those newly created documents where some variation appeared (e.g., "Smith, R.P.," "Smith, R P," 'R Smith"). There didn't seem to be any difference between one approach and the other in that sense. What seemed more practical was to use whatever name I actually tended to use for someone, so that I would be most likely to name it correctly the first time, when I was thinking about something other than my file naming scheme. Typically, this would be along the lines of "Roger Smith" -- which would also have the advantage of eliminating extra hyphens and commas.

Once I had such ideas in mind, I went through the list of files, using Excel to generate batch commands to rename many files at once. Where the files had one of the structures shown above, I was able to use FIND and other text functions to segregate certain elements (e.g., Author, Date) into separate columns, and then use unique filters and other tools to eliminate variations (in e.g., personal names).

Monday, February 6, 2012

Batch Merging Many Scattered JPGs into Many Multipage PDFs - Streamlined

I was facing a task that I had undertaken before.  This presented an opportunity to revise and streamline the writeup that I had produced during that previous effort.

First, I had previously converted multiple JPGs to PDF, scattered among multiple folders, in some cases combining multiple JPGs into a single PDF.  The general approach taken there was to rename the JPGs to unique names, move them to a single folder, conduct my operations on them there, delete the JPGs, and move the new PDFs back to the folders where the JPGs had come from.

In a related effort, I had also previously converted MHTs to PDF.  Those were not multipage.  In that effort, I had taken the strategy of converting them in-place (i.e., without moving them to a single folder, transforming them there, and moving them back to the place of origin) and then deleting the MHTs.

Between these two strategies, I liked the former better.  Having all files in one folder made it easier to verify that my processes yielded the same number of output files as source files.  It was also easier to spot-check the output and make sure that the files looked like they should.

Now I wanted to clean up any remaining JPGs that should be converted and, in some cases, combined into PDFs.  Using techniques described in more detail in those previous posts, I ran a DIR to get a full list of JPGs.  I put that list into a spreadsheet, used a REVERSE function to identify paths, sorted the spreadsheet by path, and deleted those rows that contained images that I did not want to convert (e.g., photos).  This made it possible to reduce a starting set of thousands of JPGs into a list of a few hundred that actually needed to be converted.

Using my spreadsheet, I cooked up REN commands to run in a batch file.  This produced unique filenames, as described previously, so that I could use MOVE commands (or cut and paste them from a file finder) to pool them all into a single folder without fear of overwriting.  Of course, I needed to keep the spreadsheet, so that I would know what the original names and locations were, so that I could write MOVE commands to put the resulting PDFs back in the source folders.

Once the uniquely named JPGs were combined in one folder, I used IrfanView's batch capability -- again, as detailed in the previous posts -- to convert them to PDFs.  These were all one-page PDFs; I had one PDF per JPG.  I made sure the spreadsheet had a column associating these one-page PDFs (with names like ZZZ_0001.pdf) with their original filenames (e.g., Letter from Ed page 01.jpg).  That is, the current PDF name and the original JPG name would be on the same row.  Now I could use the original JPG name to calculate the name of the new multipage PDF (e.g., Letter from Ed.pdf).  So there would be spreadsheet rows like these:

ZZZ_0001.pdf   Letter to Ed.pdf
ZZZ_0002.pdf   Letter to Ed.pdf
ZZZ_0003.pdf   Letter to Jane.pdf
ZZZ_0004.pdf   Memo from ABCD.pdf
ZZZ_0005.pdf   Memo from ABCD.pdf
(Letter to Jane.pdf is there because I did a sweep for all JPGs, some of which would wind up having just one page.)  I sorted the spreadsheet alphabetically by the output filename and input PDF name (as shown, with the pages that would be going into Letter to Ed arranged in proper order, and with Letter to Ed coming before Memo from ABCD).  For convenience, I assigned a simple working name to the output filenames (e.g., Letter to Ed.pdf would be represented by YYY_0001.pdf, and Memo from ABCD.pdf by YYY_0002.pdf).  I expressed the relationship between the input (single-page) PDF filenames and the output (potentially multipage) PDF filenames with commands that would create the lines of the necessary XML files, in this format:
echo ^<file value="D:\Workspace\ZZZ_0001.pdf"/^> >> YYY_0001
That produced 460 batch file lines, each starting with "echo," that would generate most of the contents of 159 different XML files needed to produce 159 different single- or multi-page PDFs.  The XML files would each need to begin with these two lines:
<?xml version="1.0" encoding="UTF-8"?>
<filelist>
and end with "</filelist>" (without quotes).  I put those lines into Header.txt and Tailer.txt, respectively, and then combined them with a batch file containing 159 lines like this:
copy /b Header.txt+YYY_0001+Tailer.txt YYY_0001.xml
That batch file gave me 159 XML files, starting with YYY_0001.xml.  Now I needed 159 new batch files, one for each XML.  These batch files would tell PDFsam to do the actual work -- to merge the single-page PDFs listed in the XMLs into an appropriate output PDF.  So at this point, D:\Workspace contained those 159 XMLs and the 460 single-page PDFs that would soon be merged.  As above, each of these 159 batch files would begin with several lines.  Working in a separate folder, I saved those several lines in a new Header.txt file:
@echo off

set JAVA=%JAVA_HOME%\BIN\JAVA

set JAVA_OPTS=-Xmx256m -Dlog4j.configuration=console-log4j.xml

set CONSOLE_JAR="C:\Program Files (x86)\pdfsam\lib\pdfsam-console-2.3.1e.jar"

@echo on
Those lines made some assumptions about environment variables, which I had already set.  Only the final line of each of those 159 batch files would vary.  That final line would have two variables:  it would name the XML file that listed the PDFs to be combined, and it would name the output file that would contain those PDFs, as in this example:
%JAVA% %JAVA_OPTS% -jar %CONSOLE_JAR% -l D:\Workspace\YYY_00001.xml -o D:\Workspace\Merged\YYY_00001.pdf concat
I had to create the D:\Workspace\Merged folder to hold the output, and I had to write Excel spreadsheet formulas to mass-produce one final line, like the one just shown, for each of the 159 batch files.  The Excel formula I used was like this:
="echo %%JAVA%% %%JAVA_OPTS%% -jar %%CONSOLE_JAR%% -l D:\Workspace\"&A2&".xml -o D:\Workspace\Merged\"&A2&".pdf concat >> "&A2&".txt"
where cell A2 contained YYY_0001.  That gave me 159 batch commands that I put into Notepad, and saved and ran as Texter.bat.  That produced 159 files with names like YYY_0001.txt, each containing a single JAVA line.  Then, in the spreadsheet, I created another set of commands like this:
copy /b Header.txt+YYY_0001.txt YYY_0001.bat
to combine the header and the JAVA lines into 159 batch files, each of which would ask PDFsam to merge the single-page PDFs listed in the corresponding XML file into the appropriate single- or multi-page output PDF.  And finally, to run those 159 batch files and produce the desired PDFs, I used the spreadsheet to work up a batch file called Runner.bat that began like this:
@echo off
call YYY_00001.bat
call YYY_00002.bat
I had been doing some of this work in other folders, but at this point my simplistic approach would work only if I moved them all back to D:\Workspace before trying Runner.bat.

Altogether, the process worked for me, as it had before, and this time it took only a few hours to go through the foregoing steps, make the inevitable mistakes, and get the desired output.

The remaining step was to get these multipage PDFs back where they belonged.  I went back to the spreadsheet, prepared batch lines for that purpose, and finished the job.

Thursday, January 26, 2012

Windows 7: HTML (MHT) Files: Batch Printing/Converting to PDF

I had a bunch of MHT files in a folder.  (MHT was apparently short for mhtml, which was short for MIME html.)  I produced these files in Internet Explorer (IE).  To do this in a recent version of IE, the approach would be to look at a webpage and hit Ctrl-S > Save as type > Web archive, single file (*.mht).  The MHT format would try to build everything on the screen into a single file, unlike the HTML formats (which would either save only the HTML text or create a subfolder to contain the images and other stuff appearing on the webpage).

Attempts to Print MHTs Directly

My goal now was to print those MHT files.  I had Bullzip PDF Printer set as my default printer, and its settings (the default, I think) would have it pop up a dialog for each file being printed, asking me what I wanted to call the PDF output.  This wasn't as slick as having a command-line PDF printer that would automatically print a file with a name specified on the command line, but I believed I had two options there.  One would be to change Bullzip so that it just printed without a dialog; the other was to hit Enter for each file and let Bullzip print the PDF with the default filename.  Either way, I could then come back in a second pass, using a batch file and/or Bulk Rename Utility to alter filenames as desired.

I actually would have had a one-pass command-line option, if I had been able to get PrintHTML to work with MHTs.  I was briefly hoping that maybe I could use PRN from the command line, but Francois Degrelle said PRN would only work with text files.  A PowerShell function would have been another possibility, if I had known how to proceed with something like that.  There also appeared to be some older approaches that could provide a good way to spend a huge amount of time on something that wouldn't work, for reasons I couldn't understand.

I ran a search and found a webpage that made me think that PDFCreator might be a more useful PDF printer than Bullzip, for present purposes and also for the future.  PDFCreator was favorably reviewed on CNET and Softpedia, so I downloaded and installed it.  But it didn't seem to be printing PDFs automatically from these MHTs.  It would just open the MHT in Microsoft Word, my default word processor, and then it would sit there.  So I didn't continue to try using PDFCreator for this project.

Then again, Bullzip did the same thing:  it opened the MHT in Word, and then stopped.  This happened even after I went into Bullzip's options and changed them to what seemed to be the most streamlined approach possible.  Word was resource-intensive; I couldn't very well open a hundred documents in it at once.  Not that that was an option anyway.  If I highlighted more than 15 MHTs in Windows Explorer, the right-click context menu wouldn't even give me a Print option.

Wordpad was less resource-intensive than Word, but it would open the MHT files as source code, same as Notepad:  not pretty.  I would also get the MHT opened in Word when I right-clicked on a couple of MHTs and selected "Convert to Adobe PDF."  (I got that option because I had Acrobat installed.)

The easiest way to just open the MHTs and print them manually, if I wanted to do that, seemed to be to select a bunch of them and hit Enter, and they would open in tabs in my web browser.  For some reason, they were opening in Opera, whereas I would have thought that Firefox would be the default, as it was for other kinds of web-type files.  I couldn't even open them in Firefox by doing File > Open inside Firefox:  somehow they would still open in Opera.  I could have uninstalled Opera and then tried again, if I really cared; but in any event I still wasn't getting an automated solution.

PDF via Internet Explorer > Print All Linked Documents

Diamond Architects suggested creating an HTML file that would have links to all of the HTML files in a folder, and then using Internet Explorer to print that one HTML file, using Alt-F > Print > Options tab > Print all linked documents.  The .mht files were obviously not .html files, but they contained HTML code.  So it seemed like the same approach would work either way; or, at worst, I thought I could probably just type REN *.MHT *.HTML in a command window opened in that folder, and mass-rename them that way.  I tried that.  It made a mess.  The files didn't look right anymore.  So I renamed them back to MHT.  (The easy way to open a command window in any folder was to go into Ultimate Windows Tweaker > Additional Tweaks > Show "Open Command Window Here."  With that installed, a right-click in Windows Explorer would open up that option.)

But anyway, to test the "print all linked documents" concept, I needed to create the HTML file containing links to all those individual files.  For that, I tried Arclab's Dir2HTML. But it didn't create links.  It just gave me a list of files.  If that was going to be the output, I preferred the kind of list I would get from this command:
DIR *.mht /a-d /b > dirlist.txt
That gave me a file, dirlist.txt, containing entries that looked like this:
File Name 1.mht
File Name 2.mht
To get them to function like links in an HTML file, I would have to change those lines so they looked like this:
<a href="One File Name.mht"</a>
<a href="Another File Name.mht"</a>
I could achieve that with a search-and-replace in Word, using ^p as the end-of-line character.  That is, I could search for ^p and replace it with this, including the quotation marks:
"></a>^p<a href="
That would put "</a> at the end of each line, and <a href=" at the start of the next.  Then I could paste the results back into dirlist.txt.  Note:  if smart quotes were turned on in Word, I would then have to do two additional search-and-replace operations, copying and pasting a sample of an opening and a closing smart quotation mark into Notepad's replace box, because smart quotes wouldn't work right.  Then I might have to manually clean up the first and last lines in dirlist.txt.  Another way to do this would be to paste the contents of dirlist.txt into Excel and massage them there.  (For Excel instructions, go to this post and search for CHAR(34).)  If I was going to do much of this, Excel would definitely be the way to go, because then I could just drop the new filenames into a column and let preexisting formulas parse them and output the HTML lines automatically.

That basically gave me an HTML file.  Now I would just have to add its opening and closing lines.  I wasn't sure what those should look like, so I right-clicked on some random webpage, selected "View Source" (an option that may not be available in all browsers, at least not without some add-ons; I wasn't sure), and decided that what I needed for an opening line would be "<!DOCTYPE html>" and the closing line should be "</html>" (without quotation marks), though I later realized that the latter was probably either unnecessary or incomplete.  I also needed a second line that read, "This is my file," because otherwise everything that I had done would create a completely blank-looking page, leaving me uncertain and confused.  So I added those lines to dirlist.txt, saved it as dirlist.htm, opened it in Internet Explorer (Ctrl-O or Alt-File > Open), and tried the Alt-F > Print > Options tab > "Print all linked documents" option mentioned above.  (Note that dirlist.htm still had to be in the same folder as the .mht files that I wanted to print.)

That worked, sort of.  It automatically gave me a boatload of .pdf files, and may I say it did so in a hell of a hurry.  Problem was, they were all blank.  It tentatively appeared that Bullzip and Internet Explorer were going to go through the motions of printing those linked files; but because I was dealing with MHTs instead of HTMs, they would passive-aggressively give me output with nothing inside.  So, like Columbus finding Haiti instead of Malaysia, I had figured out how to bulk-print HTML files, but that wasn't what I had told everyone I was trying to do.

Bulk Converting MHTs to HTML with mht2htm

Well.  Could I bulk-convert MHTs to HTMs and call it a day?  A search led to mht2htm.  I downloaded the Win32 versions (both GUI and command line), along with the Readme and the Help file.  Basically, it looked like I just needed to (1) copy mht2htmcl.exe into the folder containing my MHT files, (2) create a subfolder, there, called OutputDir, (3) edit dirlist.htm to comment out the non-file (i.e., starting and ending) lines, and then (4) do another couple of searches and replaces in dirlist.htm, so that my lines looked like this:
mht2htmcl "First File Name.mht" OutputDir
mht2htmcl "Another File Name.mht" OutputDir
According to the very brief documentation accompanying mht2htm, these commands would do the trick.  I made these changes, and then renamed dirlist.htm to be dirlist.bat, made sure it was in the folder containing the MHTs and mht2htmcl.exe, and ran it.  It didn't work.  I wasn't sure why not.  So I tried the GUI version instead.  Much easier, and it did produce something in the Output directory.  What it produced was a bunch of folders, one for each MHT file, with names like "First File Name_Files."  Each folder held a couple dozen files, mostly GIFs for the graphic elements of the HTM file.  The key file in each folder wa scalled _0_start_me.htm.  If I double-clicked on that, it would open in Firefox (my default web browser), with a line near the top that said, "Click here to open page"; and if I clicked on that, I got a nice-looking webpage in Firefox.

So that was not fantastic.  Now, instead of opening MHT files one at a time in Word or a web browser, and printing from there, I would have to convert them to HTM so that I could dig into their separate folders and do the same thing with a _0_start_me.htm file.  It would probably be easier to print HTMs than it had been to print MHTs, but there was the problem that those _0_start_me.htm files did not have the original filename.  Fortunately, the file name had been preserved in the name of the folder created by mht2htm.  So I would have to use an Excel spreadsheet to produce printing or renaming commands that would rename the PDF version of the first _0_start_me.htm file to be "First File Name.pdf," and likewise for all the others.  But I wasn't ready to do that yet.

Reviewing How to Use wkHTMLtoPDF

So far, as discussed in a previous post, the best tool I had found for batch converting HTMs to PDFs was wkHTMLtoPDF.  Somewhat past the halfway point in that long post, in a section titled "Revised Final Step:  Converting TXT to HTML to PDF," I had worked out an approach for using wkHTMLtoPDF.  The first step, as I reconstructed my efforts from that previous post, was to install wkHTMLtoPDF.  That created a folder:  C:\Program Files\wktohtml.  wkHTMLtoPDF was a command-line program.  Windows would have to know where to look to find it.  To address that need, I copied everything from the C:\Program Files\wktohtml folder to a new, empty folder called D:\Workspace.  Now I could type a command referring to wkHTMLtoPDF, in a batch file or command window running in D:\Workspace, and the computer would be able to execute the command. I also created a subfolder, under D:\Workspace, called OutputDir.

Next, I went into a command window, running in D:\Workspace, and typed "wkhtmltopdf /?" to get a list of command options.  My previous post, interpreted in light of that command and a glance at wkHTMLtoPDF's manual, seemed to say that the command options that had worked best for me included "-s" to set the output paper size; options to set top (-T), bottom (-B), left (-L), and right (-R) margins (in millimeters); and --dpi (to specify dots per inch).  It seemed, then, that the command line that I would need to use, for each of the _0_start_me.htm files, would use this basic syntax: 
start /wait wkhtmltopdf [options] [input folder and HTM file name] [output folder and PDF file name]
I would run that command in the Workspace folder, where I had now placed the wkHTMLtoPDF program files.  With a command of that type, wkHTMLtoPDF would find the _0_start_me.htm file created by mht2htm (above), and would convert it to a PDF file saved in D:\Workspace\OutputDir.  The source folder and file names were pretty long in some cases, but this D:\Workspace\OutputDir part of the command was brief, so hopefully my full wkHTMLtoPDF command would not exceed any command line limits.  So now I was ready to try an actual command.  I made a copy of one of the folders created by mht2htm, renamed it to be simply "Test," and ran this command in D:\Workspace:
start /wait wkhtmltopdf -s Letter -T 25 -B 25 -L 25 -R 25 --minimum-font-size 10 "D:\Test\_0_start_me.htm" "D:\Workspace\OutputDir\Testfile.pdf"
That worked.  But, of course, the resulting Testfile.pdf was just a PDF of the HTML page that said, "Click here to open page."  I wouldn't get my actual MHT page in HTML format until I clicked on that link, in each of those _0_start_me.htm files, and the resulting HTML page would be open in Firefox, where I would still have to come up with a batch printing option to handle all of the tabs that I would be opening.  It still wasn't an automated solution.  I assumed that the approach of using Internet Explorer > Print All Linked Documents as above (but this time with HTMs instead of MHTs) would likewise give me webpages with that "Click here to open page" option. 

Trying VeryPDF HTML Converter

My immediate problem seemed to be that I didn't have a good way to automate the conversion of MHTs to HTMs -- a way that wouldn't give me that funky "Click here to open page" stuff from mht2htm.  My larger problem was that, of course, I didn't have a way to automate getting PDFs from those MHTs, which was the original issue.

The possibilities that I had developed so far seemed to be as follows:  (1) Forget automation; just print the MHTs manually, selecting 15 at a time and choosing the Print option, which would start 15 sessions of Word.  (2) Select and open them in Firefox or some other browser, which would open up 15 (or whatever number) of individual tabs, each likewise calling for manual printing as PDFs unless I could find a way to automate the printing of multiple browser tabs.  (3) Try to figure out why the Internet Explorer approach was giving me blank PDFs.  (4) Look again for something other than mht2htm, to convert MHTs to HTML.  (5) Play some more with the wkHTMLtoPDF approach, in case some automated solution emerged from that.

As I wrote those words of review, I wondered whether Windows XP might handle one or more of those alternatives differently. I had already installed Windows Virtual PC, with its pre-installed virtual Windows XP session; all I needed was to go in there and, if necessary, install programs in it.  But I hadn't encountered any specific indications that some program or approach had worked better in Windows XP, so I decided not to pursue this.

I thought I could at least search for some other MHT converter.  It suddenly appeared that, in my focus on PDF printers, I might not have done a simple search for an MHT to PDF converterThat search, done at this point, led to novaPDF, a piece of commercial software that would apparently do the job.  But on closer examination, novaPDF did not seem to have a batch printing capability.  Another program, VeryPDF HTML Converter, came in a command line version whose basic syntax was apparently like this:
htmltools [options] [source file] [output file]
This syntax assumed, as with wkHTMLtoPDF (above), that htmltools.exe was being run in a folder, like my D:\Workspace, where the command files would be present -- unless, again, the user wanted to fiddle with path or environment variable adjustments.  Typing just "htmltools" on the command line, or opening the accompanying Readme file, demonstrated that this had lots of options.  I thought I might try just using it, to see if it worked at all, before fiddling with options.  So I copied the full contents of the VeryPDF program folder (i.e., several folders and 15-20 files, including htmltools.exe) to D:\Workspace, made sure Test.mht was there as well, opened a command window there, and typed this:
htmltools Test.mht TestOut.pdf
The command window gave me a message, "You have 299 time to evaluate this product, you may purchase a full version from http://www.verypdf.com."  I didn't find a reference to htmltools on their products webpage or on their list of PDF Products By Functions, and this particular message didn't give me another name to look for, so I wasn't sure whether I would be buying the right program.  A review of a couple of webpages eventually revealed that this was VeryPDF HTML Converter.  The GUI version, which I didn't want, would cost $59.  Sixty bucks to convert MHTs?  But it got better, or worse.  The command-line version was $399.  I guess while I was at it, I could ask them to throw in Gold Support for only $1,200 a year.  Beyond a certain level of ridiculousness, a casual user might be forgiven for considering the option of just running this puppy in a disposable virtual machine, if uninstalling and reinstalling didn't do the trick.  In all fairness, they seemed to be thinking of server administators, not private home users.  And they did give us 300 free conversions.  Still, at prices like these, it would have been nice if that would be 300 copies a year, not 300 lifetime.  They were basically persuading me to use the program once and then forget about it.

Anyway, the program ran for a few seconds and then claimed it had succeeded.  I looked.  TestOut.pdf definitely did exist, and it looked good.  No apparent need for any additional options.  I wondered if it would default to the same filename with a PDF extension if I just typed "htmltools Test.mht," without specifying TestOut.pdf, so I ran the command again with that alteration.  That worked.  I tried it once more, this time specifying a source folder and an output folder without a filename ("htmltools D:\Workspace\Source\Test.mht D:\Workspace\Output").  This time, it said, "Save to file failed!"  Its messages seemed to say that it found Test.mht without a problem.  Why wouldn't it write to Output?  Maybe it was trying to write a file called Output, when I already had a folder by that name.  I repeated the command, this time with a trailing backslash (i.e., "htmltools D:\Workspace\Source\Test.mht D:\Workspace\Output\").  Still failed.  And the bastards docked me anyway.  I was down to 296 free tries.  So what were we saying:  it could output a file without a need to specify a filename, but it couldn't output to another folder?  If all else fails, RTFM.  But the Readme.txt didn't contain any references to folders or directories.  Well, would it at least work if I specified everything (i.e., "htmltools D:\Workspace\Source\Test.mht D:\Workspace\Output\Test.pdf")?  Yes, it would.  So that was the answer:  I would have to work up my command lines in Excel (above) to include the full file and path names for both the source and the target.  With those commands in a batch file, I decided to give it a run with a couple dozen files, just to make sure, before blowing my remaining 295 budgeted conversions on a futile gesture.  It ran well.  I was set.  My fear that some commands might be too long was unfounded: the htmltools commands ran successfully with a command as long as 451 characters.  I converted the rest of these MHTs and then deleted them, and hoped never to see them again.

Technically speaking, the project was done.  If I needed more MHT conversions than I could accommodate within the limited private usage of VeryPDF's htmltools.exe, I would go back to the five options enumerated at the start of this last section of this post.  Since I already had all this stuff in mind, and my Excel spreadsheet was set to go, I ran a couple more lines:
DIR D:\*.mht /s /a-d /b > D:\MHTlist.txt
DIR E:\*.mht /s /a-d /b >> D:\MHTlist.txt
to see if I had any other MHTs on D or E.  (Note the double >> marks in the second line -- that says add to MHTlist.txt instead of overwriting it, if it already exists.  Of course, once I had the command set, I could just hit the Up arrow in the command window to bring the previous command back, after running it, and then use Home and left & right arrow keys to revise it.)  This gave me a file called MHTlist.txt, containing a list of additional MHTs that I thought I might as well convert to PDFs while I was at it.  For these, the command lines would produce a PDF back in the source folder.  Once those PDFs were created in the source folders, I used Excel (and could probably also have used Ctrl-H in Notepad), to do a DIR [filename].* >> listing (which would show both \Source Folder\File.mht and \Source Folder\File.pdf in the resulting dirlist.txt file) for each specific file that I had converted.  This produced a nice pairs for each filename (i.e., x.mht and x.pdf).  The process seemed to work.  Now I just needed one more go with Excel, to produce DEL lines that would get rid of the MHTs in the source files.  One more check:  no MHTs left.  Project completed.

Tuesday, January 3, 2012

Converting Scattered WMA Files to MP3

I had .WMA files scattered around my hard drive.  I wanted to convert them to .MP3.  I could search for *.WMA, using a file finder or search tool like Everything, thereby seeing that those files were in many different folders.  Having already sorted them, I didn't want to move them anywhere for the format conversion.  I wanted to convert them right where they were.  A command-line tool would do this.  The general form of the command would be like this:  PROGRAM SOURCE TARGET OPTIONS.  For PROGRAM, I would enter the name of the command-line conversion program that I was going to be using.  For SOURCE and TARGET, I would enter the full pathname (i.e., the name of the folder plus the name of the file, like "D:\Folder\File to Convert.wma," where the target would end in mp3 rather than wma).  OPTIONS would be specified by the conversion program.  For instance, there might be an option allowing me to indicate that I wanted the resulting MP3 file to be 64bps.

The problem was, I didn't have a command-line WMA to MP3 conversion tool.  I ran a search and wound up trying the free Boxoft WMA to MP3 Converter.  (They also had lots of other free and paid conversion and file manipulation programs.)  When I ran their converter, it steered me to an instruction file that inspired me to compose the following command (all on one line):

AlltoMp3Cmd "D:\Folder\Filename.wma" "D:\Folder\Filename.mp3" -B56
I had to use quotation marks around the source and target names in some cases (though not in this particular example) because some of the path or file names contained spaces.  The -B56 option was supposed to tell it to produce a 56-bit MP3.  (I also tried it with a space:  "-B 56".)  I was able to produce similar commands en masse, for all of the WMAs that I wanted to convert, by exporting the results of the *.WMA search from Everything to a text file called wmalist.txt, making sure to remove entries for files that I did not wnat to convert.  (At the root of each drive containing files of interest, I could also have used this command, assuming wmalist.txt did not already exist:  dir *.wma /b /s >> D:\wmalist.txt.)  I then massaged the contents of wmalist.txt using Microsoft Excel.  So now I had all of these AlltoMp3Cmd commands ready to run.  I copied them all into a Notepad file named Renamer.bat.  All I had to do was double-click on it in Windows Explorer and it would run.

I decided to try Renamer.bat with just one WMA file.  So I created another file, X.bat, with just one line in it, like the line shown above.  To run X.bat from the command line, so that I could see what it was doing, I would need a command window that was ready to execute commands in the folder where X.bat was located.  Problem:  X.bat was not in the same folder as Boxoft's AlltoMp3Cmd.exe executable program, so X.bat would fail.  If I didn't want to get into changing the computer's PATH, I could either put X.bat in the Boxoft program folder or I could copy AlltoMp3Cmd.exe to the folder where X.bat was located.

Either way, I needed to open a command window in one of those two folders, so as to run X.bat.  I could start from scratch (Start > Run > cmd) and use commands (e.g., "D:" would take me to drive D and "cd \Folder" would take me to the folder where Filename.wma was located), or I could use Ultimate Windows Tweaker to install a right-click option to open a command window in any folder.  I had already done the latter, so this step was easy.

Once I had sorted out all that, I was ready to try running X.bat.  But when I did, it crashed the AlltoMp3Cmd.exe program.  If I clicked on Cancel when I got the crash dialog, the command window said this:
Exception Exception in module AlltoMp3Cmd.exe at 0005B4E1.
Installation file incorrect. Please re-install it!.
But reinstalling the Boxoft program didn't help.  I sent them a note to let them know of this problem and decided to try other approaches.  One possibility was that their program was suitable for Windows XP but not Windows 7, which I was using.  It didn't seem to be a question of how the main program was installed, since the error message was referring specifically to the AlltoMp3Cmd.exe command-line executable (which presumably would be the same on any Windows system).

I decided to try running it in a Windows XP virtual machine (VM).  I had already installed Microsoft's Windows Virtual PC, which came with a WinXP VM, so I fired it up to try the same command line in the same folder.  To move quickly to the proper folder in the WinXP command window, I ran my trusty old RegTweak2.reg file, created in Notepad, to install a right-click option to open a command window in any folder in Windows Explorer.  But when I tried to use it, I got an error:
'\\tsclient\D\Folder Name\Subfolder Name'
CMD.EXE was started with the above path as the current directory.
UNC paths are not supported.  Defaulting to Windows directory.
'\\tsclient\D\Folder Name\Subfolder Name'
CMD does not support UNC paths as current directories.
A bit more playing around persuaded me that what this message meant was that command-line work in the VM would have to be done on what the VM considered a "real" (actually a virtual) drive -- in other words, drive C.  So I put copies of X.bat and AlltoMp3Cmd.exe into the VM's drive C, in a new folder I called Workspace, and I tried running X.bat from the command line there.  But again I got an error:  "AlltoMp3Cmd.exe has encountered a problem and needs to close."  Maybe the program wasn't built to handle paths.  For whatever reason, it looked like the Boxoft AlltoMp3Cmd command-line utility was not going to work for me.

A search in CNET brought up some other possibilities.  One was IrfanView, reminding me that I had used that program to work partway through a somewhat similar problem months earlier.  Using IrfanView version 4.28 and various insights described more fully in that other writeup (and in a recent thread), I went back to my original list of files in wmalist.txt and prepared this command:
i_view32.exe /filelist=D:\wmalist.txt /convert=$D$N.mp3
This command was supposed to use the file names ($N) and directories (i.e., folders, $D) specified in wmalist.txt to produce MP3 files with those same names, in those same directories.  Before trying it out, I made a copy of wmalist.txt and changed the original so that it contained only two lines, referring to WMA files on two different drives.  I ran the command shown above in a CMD window.  I got an error:
'i_view32.exe' is not recognized as an internal or external command, operable program or batch file.
In other words, Windows 7 did not know where to look to find IrfanView.  I could have taken the steps mentioned above, moving the .txt file to wherever i_view32.exe was located; but since I used IrfanView often, I wanted to add it to the PATH variable so that Windows would permanently recognize it.  The solution was to go to Start > Run > SystemPropertiesAdvanced.exe (also available through Control Panel > System > Advanced System Settings) and then click on Environment Variables > System Variables > highlight Path > Edit.  To see clearly what I was doing, I cut the existing Variable Value out of the dialog and doctored it in Notepad.  The basic idea was to add, to the end of the existing value, a semicolon and then (without adding a space after the semicolon) paste the location of i_view32.exe (found easily enough via an Everything search > right-click > Copy path to clipboard).  I made sure to add a final backslash ("\") after the path to i_view32.exe.  I pasted that back into the dialog, OKed my way out of System Properties, went back into the command window, pressed the Up arrow key to repeat the command ... and it still didn't work.  I thought that possibly I would have to reboot to have the new PATH definition take effect.  That was the answer to that particular problem.  After rebooting, in a command window, I ran the command shown above, and there were no errors.  IrfanView was open, but nothing was in it.  I ran searches in Everything for the two files in my test WMAlist.txt file, with wildcard extensions (i.e., I searched for Filename.*).  No joy:  there were no MP3 versions of those files.  I tried a modified version of the command:
i_view32.exe /filelist=D:\wmalist.txt /convert=D:\*.mp3
but that produced no output in D.  The IrfanView command was not working.  I tried yet another variation, as above but without "D:\" but that wasn't it either.  I tried the original command without using the filelist option:
i_view32.exe "D:\File Path\File Name.wma" /convert=$D$N.mp3
This produced an error:
Error!  Can't load 'D:\File Path\File Name.wma'
Did that mean that the /convert option was not being recognized?  Everything indicated that no MP3 file had been created.  And why would IrfanView be unable to load the existing WMA file?  It could load it easily enough from Windows Explorer or Everything.  I tried again:
i_view32.exe "D:\File Path\File Name.wma"
That worked:  IrfanView played the file.  So the convert option was the problem.  Another variation:
i_view32.exe "D:\File Path\File Name.wma" /convert="File Name.mp3"
If that did work, I wasn't sure where the output file would turn up.  No worries there:  it didn't work.  I got the "Can't Load" error again.  IrfanView's help file said that it did support wildcards for /convert, so that was presumably not the problem.  I had seen an indication that IrfanView would not batch-convert certain kinds of files, but WMA was not on the list I saw.  I was going to post a question in the relevant IrfanView forum, but at this point they weren't letting me in, for some reason.  Eventually it occurred to me to look in IrfanView's File > Batch Conversion/Rename area, where it appeared that the program would support only image conversions, not audio.

It seemed I would need to continue searching for a command-line option.  Back at that CNET search, I looked at the Koyota Free Mp3 WMA Converter -- from another company that offered multiple free conversion products -- but saw no indications that it had command-line options.  Likewise for Power MP3 WMA Converter and others.

I finally opted for a kludge solution.  Using an Excel spreadsheet, I created a batch file (again, using techniques described in the other post referenced above and elsewhere) to rename each file in WMAlist.txt to a unique name (example:  ZZZ_00001.wma) -- after making sure I did not already have any files with that kind of name.  The unique names would help to insure that all WMA files would get the treatment, even if two of them had the same original name.  This produced 386 files.  Then, using Everything, I selected and moved all ZZZ_*.wma files to D:\Workspace.  Somehow, only 375 files made it to that folder.  It turned out that I had inadvertently included WMA program files from drive C after all, which I had not wanted to do, and for some reason a few of those were not moved to D:\Workspace -- probably for insufficient rights.  So now I would have to undo that damage.

After taking care of that, in D:\Workspace, I tried the Boxoft program again, this time using its Batch Convert mode.  It took a while.  Spot checks suggested that the conversion quality was good.  I wasn't sure what bitrate to use to convert the files.  It seems that, at 56 kbps for what appeared to be a bunch of voice (not music) files, I erred on the high side.  I started with 353 WMA files occupying a total of 237MB, and I ended up with 353 MP3 files occupying 405MB.  Those files were converted at what appeared, at quick glance, to be a rate of about 6MB per minute.  I then revised the spreadsheet to produce batch file command lines that would move those MP3s back to the folders where the similarly named WMA files had been and rename them back to their original names (but with MP3 extensions).

Thursday, April 21, 2011

Repairing Damaged JPGs

I was using Windows 7.  I had run a test and had determined that I had a bunch of damaged JPG image files.  Apparently this could happen sometimes when files were saved on a CD or other drive with an iffy file table.  In my case, it did not help to try to open the files in question on a different computer.  It was also not a case of recovering data from a damaged memory card, for which a tool like ZAR digital image recovery might be needed.  This was a situation of already having the files on the hard drive, but not being able to view them.  So:  how to repair them?

One possibility was to buy PixRecovery for $50.  They had a demo version, so I downloaded and tried that.  I had used Bulk Rename Utility to rename the corrupted JPGs so that their names ended with " - corrupted" without a JPG extension so that I would be able to pull them out if any of them got mixed with good JPGs.  But unlike IrfanView, PixRecovery was not able to detect them until they did have a JPG extension, so I had to rename them back again.  PixRecovery did not appear to be able to process the JPGs in bulk; I would have to fix them one at a time.  On the first one I tried, I got "No data to recover detected."  They did give me an option to "order a paid file review" for $199 per file.  I tried another file.  The program didn't remember the output directory I had just specified for the first one, so I had to trace back through the directory structure to find it again.  This time, I got a message, "Recovered with demo restrictions."  It didn't show me the actual picture, though, even with a watermark or stamp on it; it just showed me a JPG saying, "The image has been repaired in demo mode by PixRecovery."  So I couldn't verify that the picture was fully restored; I would just have to take their word for it until I paid and tried it.  PixRecovery also gav me a Corrupted Data Analysis Report for the "restored" photo, with a statement of recoverability (i.e., Low, Average, or Good).  This seemed like something they could have provided on a batch basis for all files -- at least in the paid version, if not in the free -- so that the user would not have to go through the manual steps for each photo regardless of recoverability.

Among sites offering to provide a file examination for a fee, VG Jpeg-Repair offered an online service that would evaluate up to 100MB of JPGs for 1 Euro.  Alternately, it sounded like the user could pay them 20 Euros and get an evaluation of an entire set of JPGs, and then pay around $1 per JPG for the ones that they could repair.  I didn't investigate this too closely at this point; this just seemed to be the general idea.  I recalled seeing other pay-per-file sites, but didn't look into those either at this stage.  An eHow article pointed toward several other data recovery services, including Ontrack Data Recovery, Total Recall Data Recovery and ACE Data Group.  Previous exposure and brief examination of these sites suggested that they were more oriented toward recovering data from damaged drives, though no doubt they could recover photos too -- but that they could be very expensive. 

I found a review of Jpeg Repair Picture Doctor ($100) that made it sound like software to avoid, in the sense that it could trash good photos and pretend that it had restored bad ones.  On the other hand, it apparently had a batch process and a trial phase, so if there was a good backup, it seemed like a way of possibly reducing the number of corrupted JPGs to restore.  Another review said that the best JPG repair option was to use PhotoRescue.  It looked like it was for recovering lost data from drives, not for JPG repair.  They offered Wizard 3.1 for everyone ($29), Expert 2.1 for power users ($29), and Advanced 2.1 for ultimate experts ($99).  I tried the Expert 2.1 demo.  It was indeed oriented toward recovering drives.  I couldn't figure out how to use it for fixing JPGs.

Another program, JPEG Recovery Pro ($50), seemed to be offering a 15-day trial that would at least show me low-quality watermarked copies of the photos after recovery.  They also had a Basic version ($40), but it seemed to lack some features that would be useful when editing numerous JPGs.  I downloaded and tried their Pro version.  When I ran it, I got an error:  "Access violation at address 00846DC1 in module 'JPEGRec5.exe."  Possibly it was due to the fact that I installed without first shutting down all other programs.  I uninstalled and tried again.  That wasn't it.  I tried on a different folder.  It worked.  Apparently the file and folder name combination was too long.  I moved the folder to a higher-level location, so that the full pathname would be shorter, and tried again.  Nope.  I removed spaces from the folder name.  No.  I made the folder name shorter than 9 characters.  No.  I removed the files from that folder to another high-level folder with an eight-character name.  Still got the error.  I put a copy of one of the corrupted JPGs in a different folder.  The program ran on that folder, in the sense of detecting several JPGs there, but it did not detect this particular JPG.  So, hmm, this program was a possibility, but I'd have to tinker with it to make it work.

There may have been other possibilities.  I did not fully explore the results of my search.  But at this point it did start to seem that, if I wanted to download a program and do my own file recovery, it appeared that it would have to be a manual, one-by-one recovery process, whether using PixRecovery or some other program.  I ran across references to JPEGsnoop and other programs that likewise seemed to require the user to do bit-by-bit editing of the JPG file in ways that were sometimes described as difficult.  It appeared that JPEGsnoop might provide a relatively easy way to locate where the errors were.
I looked at the corrupted JPGs in IrfanView again.  For the ones I looked at, the error was the same:  "Can't read file header!  Unknown file format or file not found!"  I did a search and found people going through various struggles.  One suggestion was to try opening the corrupted JPGs in a different image editing program.  IrfanView was often recommended, but these had already failed there, so now I tried Gimp.  It allowed me to select and try opening all of the corrupted JPGs.  None opened, so Gimp did not seem superior to IrfanView in this task.  Gimp did, however, produce more detailed error messages.  It showed only the first several onscreen and then redirected the rest of them to stderr.  I wasn't sure where that was.  The several that did appear onscreen indicated that several files had similar problems:  "starts with 0x31 0xd9" or "starts with "0xaa 0xe9."  These seemed to mean I would have to edit the files manually to correct those starting errors.  Microsoft said that stderr meant the command prompt window.  It seemed I could capture the full log of errors by starting Gimp from the command line and redirecting that output to a text file.  Right-clicking > Properties on the Gimp icon that I clicked on to run the program told me where the Gimp .exe file was.  I opened a command window there and typed the needed command.  I was running the portable version of Gimp, so in my case the command was:

start "" "W:\Start Menu\Programs\Multimedia\Images\Editors\GIMPPortable\GimpPortable.exe" > D:\GimpLog.txt
So then, when Gimp started, I tried again to open all those JPGs.  When Gimp was done trying and failing, I opened D:\GimpLog.txt.  Unfortunately, there was nothing in it, so apparently I hadn't done that quite right.

I still didn't have a plan for what I would do if I did find out exactly what the errors were, so I paused the error output project to think about that.  I decided that these were old files, and there was really no urgency to this project.  There was always the possibility that, in the next year or however long it would be until I would get back to it, someone would come along with a cheap or free program or other great solution that would really take care of it, without all that manual editing.  Therefore, I shelved this project for the time being.