Showing posts with label merge. Show all posts
Showing posts with label merge. Show all posts

Tuesday, February 14, 2012

Batch Merging Many Scattered JPGs into Many Multipage PDFs - Clarified

I had recently streamlined a spreadsheet-oriented technique for working up file names and batch commands to merge multiple JPGs into single- or multi-page PDFs.  I had yet another task calling for that process.  This post presents the same steps as in that previous writeup, but in clearer and more organized terms.  The various steps are described in different terms in the previous post and in its much more convoluted predecessor.  Note that the simplification of steps here yields a longer explanation.

Generate the Input PDF Filenames

In this discussion, the project had already been brought to the point where there was a spreadsheet containing the names of JPGs to be converted to PDF.  (A list of such files could come from a command like DIR D:\*.jpg /s /a-d /b > JPGLIST.TXT.)  In that spreadsheet, row 1 was reserved for column names (e.g., "Original Filename," "Path").  The name of the first file to be converted appeared in cell A2 (e.g., "Test File.jpg"), and the others appeared below it in column A.

Column B was used to calculate the endpoint of the path (e.g., "D:\Folder 1\Test File.jpg").  This calculation required or at least benefited from the use of a REVERSE function.  The REVERSE function was created by inserting the following text into a module:

Function Reverse(Text As String) As String
Dim i As Integer
Dim StrNew As String
Dim StrOld As String
StrOld = Trim(Text)
For i = 1 To Len(StrOld)
StrNew = Mid(StrOld, i, 1) & StrNew
Next i
Reverse = StrNew
End Function
To create the module, I closed all Excel files other than the one being worked on.  I went into Tools > Macro > Visual Basic Editor > Insert > Module.  I pasted the text there.  I went to to File > Close and Return to Microsoft Excel.  The formulas in the next several cells were then as follows:
B2:  =LEN(A2)-FIND("\",REVERSE(A2))+1
C2:  =LEFT(A2,B2)
D2:  =MID(A2,B2+1,LEN(A2))
I was reconstructing some of these steps after the fact; hopefully I was not misstating anything or leaving anything out of this description.  I focused on the formulas for row 2 because, when I was done filling out row 2, it would be easy to copy all of its formulas at once to fill the spreadsheet, so that the same formulas would be applied to the filenames in rows 3, 4, 5 et seq.

The next step was to identify which files I wanted to convert.  I was able to use the spreadsheet to help in this process.  For example, I had files with names like these:
Test File page 01.jpg
Test File page 02.jpg
I could determine that these belonged together by using a column that showed these filenames without the ending X characters.  That column would use a command like =RIGHT(D2,$E$1), where I would put a number in E1.  The number I used in this example was 12, because I would have to eliminate 12 characters from the right end of "Test File page 01.jpg" to get the name that I wanted to give to the new PDF, which would be simply "Test File."  I manually entered 12 in cell E2.  I filtered the rows so that the ones with numbers in column E would not show, so as to reduce clutter.  I repeated this process until each row had a number in column E, indicating how many characters I had to trim from the right end of the filename to get a common stem.  Then cell F2 contained =LEFT(D2,LEN(D2)-E2), to produce the stem filename.  So, for example, if there were two pages that had to go into Test File.pdf, then (assuming I had sorted my spreadsheet correctly), the contents of cells F2 et seq. would look like this:
F2:  Test File
F3:  Test File
F4:  Another File
and so forth, continuing down the list, so that each JPG would be assigned to a stem filename.  I discovered at this point that I had brought over a bunch of JPGs that did not belong in a multipage PDF, or at least I could not be sure of it.  I put these back where they came from, and thus had a smaller set of files to work on.

Now I could sort the spreadsheet to insure that the JPGs I wanted to merge into a specific PDF would be next to each other.  Once I had the files sorted into the desired order, I used column G as an Index column, numbering each row in that order.  This column would be useful to restore the desired order if I resorted the spreadsheet for some reason, and it would also provide the key ingredient in the unique filename that I was about to calculate.  (To generate an index number, I used either a +1 formula, incrementing the number in the preceding row by 1, followed by an Edit > Copy, Edit > Paste Special > Values sequence to convert those formulas to fixed numbers that would not be affected by resorting the spreadsheet; or, easier, I entered 1 in the first row, 2 in the second row, and then highlighted the cells to be filled and used Edit > Insert > Series to add the sequential numbers.)  Then I used columns G through I to calculate a simple, unique filename and produce a renaming command:
G2:  unique Index number
H2:  ="Input_"&REPT("0",4-LEN(G2))&G2&".pdf"
I2:  ="ren "&CHAR(34)&A2&CHAR(34)&" "&H2
Cell H2 would produce the file name of "Input_0001.pdf."  (More or fewer digits could be appropriate, depending on the number of files involved.)  This name worked for me because a search of my files verified that my computer contained no other files whose names began with Input_.  Otherwise, a different name might have been advisable.  Cell I2 would produce a command that would rename Test File page 01.jpg to be Input_0001.pdf.  (Of course, this wasn't a conversion.  IrfanView would be doing that.  This was just a translation of file names, to make subsequent steps easier.)

Then, as usual, I copied these commands from row 2 down to fill the remaining rows of the spreadsheet.  Next, I copied all of the commands from column I into Notepad, saved that file with a .bat extension (e.g., "Renamer.bat"), and ran it in Windows Explorer.  That renamed all of the JPGs that I hoped to merge into multipage PDFs.  (Obviously, I had a backup.)

Now I would use column J for commands to move all those JPGs to a single folder.  (I did the renaming before the moving in case I had two separate sets of incoming JPGs with the same name, such as D:\Folder A\Test File page 01.jpg and E:\Folder R\Test File page 01.jpg.)  In my actual example, the files were all moved to V:\Workspace, so my command in cell J2 looked like this:
J2:  ="move /-y "&CHAR(34)&C2&H2&CHAR(34)&" V:\Workspace"
I combined the commands in column J into another batch file, Mover.bat, and used it to move all those renamed JPGs to V:\Workspace.  Now that they were together, I could use IrfanView's File > Batch Conversion/Rename option to convert all those JPGs to single-page PDFs.  I output those PDFs to V:\Workspace and then removed the JPGs from V:\Workspace to V:\Workspace\Original JPGs, where they would probably not be needed further.

Generate the File Value Lines

So now I had files starting with Input_0001.pdf in V:\Workspace. The project now was to merge those single-page input PDFs into multipage output PDFs. In column K, I entered a number indicating where each input PDF should go. As I could see from column F (above), the files that used to be called Test File page 01.jpg and Test File page 02.jpg, which had now been converted to uniquely named single-page PDFs, were both going to end up in Output_0001.pdf, so both of those had the number 1 in column K.   I needed to pad out that number so that the resulting filename would sort correctly. So the next formulas were as follows:
K2: =LEFT(H2,11)&"pdf"
L2: =REPT("0",4-LEN(K2))&K2
But I wasn't ready, yet, to generate the ultimate Output PDF files.  I needed to make sure that PDFsam (below) would have all the instructions needed to insert multiple single-page PDFs into one Output PDF.  Those instructions would come from lines of XML code, one line per input PDF.  The lines of XML code would be combined into a smaller number of FileValueLines files.  In my example, the instruction lines for the erstwhile Test File page 01.jpg and Test File page 02.jpg files (now called Input_0001.pdf and Input_0002.pdf) would both go into a single FileValueLines file.  A FileValueLines file could contain any number of lines of XML code; it just depended on how many single-page PDFs were going to be combined into it.  What I needed, then, was the command that would shovel those XML code lines into the appropriate FileValueLines files.  The next formulas in my spreadsheet looked like this:
M2:  ="FileValueLines_"&L2
N2:  ="echo ^<file value="&CHAR(34)&"V:\Workspace\PDF\"&H2&CHAR(34)&" /^> >> "&M2
and cell N2 produced this command:
echo ^<file value="V:\Workspace\PDF\Input_0002.pdf" /^> >> FileValueLines_0001
Note that this command assumed that the PDFs created by IrfanView were in a folder called V:\Workspace\PDF.  I filled the lower rows of the spreadsheet with these formulas and copied the commands in column N into a new Notepad file called _ValueMaker.bat.  I ran _ValueMaker.bat in V:\Workspace. It produced 545 FileValueLines files in V:\Workspace, as expected.  (I added an underscore before the file's name so that it would appear near the top of the potentially cluttered list of files in V:\Workspace.)

This was a point of transition.  Until now, I had been working with a total of 3,894 spreadsheet rows, one for each of the 3,894 JPGs that I wanted to combine into multipage PDFs.  Each of those 3,894 JPGs (converted into single-page PDFs called Input_0001.pdf et seq.) was represented in exactly one File Value line.  Those 3,894 File Value lines were now contained within a total of 545 FileValueLine_ files.  In other words, we were transitioning from a focus on 3,894 individual JPGs to a focus on 545 multipage PDFs.

Building the XML Files

The 545 FileValueLines files (FileValueLines_0001 and so forth) needed more lines before they would function.  To add those lines, I needed a new spreadsheet.  I started this one with a list of the FileValueLines files I had just created.  (A command like "dir FileValueLines* /b /a-d > _FVLlist.txt" would produce that list.)  In other words, column A contained the list of the new FileValueLines files.  Next, I calculated the name of the resulting XML file, and then wrote the formula that would create those XML files:
B2:  =RIGHT(A2,4)
C2:  ="XMLfile_"&B2&".xml"
D2:  ="copy /b _Header.txt+"&A2&"+_Tailer.txt "&C2
That formula in D2 produced the command that I would need to create the complete XML files:  "copy /b _Header.txt+FileValueLines_0001+_Tailer.txt XMLfile_0001.xml."  This command called for me to create just one _Header.txt and one _Tailer.txt file that would be added to the start and end of each of my XML files.  The _Header.txt file read as follows:
<?xml version="1.0" encoding="UTF-8"?>
<filelist>
I saved those two lines in a new _Header.txt file.  It was important to make sure the file ended on a blank line.  In other words, if I moved my cursor down to the end of _Header.txt, it needed to be on the line below <filelist>, not at the end of the line containing <filelist>.  The same was true for _Tailer.txt, which contained this line:
</filelist>
Once _Header.txt and _Tailer.txt existed, I could run the commands contained in column D.  I copied those 545 commands into a new batch file called _Builder.bat and ran it.  That gave me 545 XML files, starting with XMLfile_0001.xml.  Again, V:\Workspace was filling up, so I moved the no longer needed FileValueLines files to their own archival subfolder.

Generating the PDFsam Commands

At this point, I would use a batch file to generate batch files.  This would call for the same kind of process as above:  a header text file (but no tailer) containing lines that would be standard in all of these batch files, combined with another text file containing unique commands.  In this new _HeaderB.txt file, I placed these lines:
@echo off

set JAVA=%JAVA_HOME%\BIN\JAVA

set JAVA_OPTS=-Xmx256m -Dlog4j.configuration=console-log4j.xml

set CONSOLE_JAR="C:\Program Files (x86)\pdfsam\lib\pdfsam-console-2.3.1e.jar"

@echo on
Apparently these first lines set up a few essential Java variables.  (In case of difficulty here, as elsewhere, see the preceding post.)  The second part of the batch file -- the unique command -- would tell PDFsam (below) what to do.  The command looked like this:
E2:  ="echo %%JAVA%% %%JAVA_OPTS%% -jar %%CONSOLE_JAR%% -l V:\Workspace\"&C2&" -o V:\Workspace\Merged\Output_"&B2&".pdf concat >> Command_"&B2&".txt"
Basically, this would produce a command that said, "Write a command that will create Output_*.pdf, and put that command in a file called Command_*.txt."  In other words, that spreadsheet formula in cell E2 produced a command that looked like this:
echo %%JAVA%% %%JAVA_OPTS%% -jar %%CONSOLE_JAR%% -l V:\Workspace\XMLfile_0001.xml -o V:\Workspace\Merged\Output_0001.pdf concat >> Command_0001.txt
Spreadsheet column E gave me 545 variations on that command.  I copied them all and ran them in a new batch file called _Texter.bat.  The resulting 545 Command_*.txt files would have to share V:\Workspace with my preexisting XMLfile_*.xml files; both would be needed shortly.

Now it was time to combine _HeaderB.txt and Command_*.txt into a batch file that would run PDFsam and produce an multipage Output PDF.  This was easy enough:
F2:  ="copy /b _HeaderB.txt+"&RIGHT(E2,16)&" Ready_"&B2&".bat"
The 545 versions of that command, run via _Commander.bat, created 545 Ready_*.bat files, each containing the header information from _HeaderB.txt plus the single command line from Command_*.txt.  Now I was done with the Command_*.txt files, and could archive them.

Final Run:  Creating the Merged Output PDFs

If I double-clicked on one of those Ready batch files, it would run and produce a final merged Output.pdf.  But I wanted to run them all at once.  So, back to the spreadsheet:
G2:  ="call "&RIGHT(F2,14)&" >> _Errorlog.txt"
That produced "call Ready_0001.bat >> _Errorlog.txt."  I copied the 545 CALL commands from column G into a file called _Runner.bat and -- you guessed it -- I ran it.  It took a few minutes, but it produced 542 merged PDFs.  Not 545.  I searched _Errorlog.txt for occurrences of "Error."  Sure enough, three occurrences.  In each case, the error was the same:  "Input_0838.pdf (and Input_0947.pdf, and Input_1090.pdf) not found as file or resource."  Apparenly PDFsam had not gone ahead to create the Output PDFs anyway, without the offending Input file.  I checked and, indeed, those three files were not included among my input.  I was not sure what happened to them.  They appeared to have disappeared during this process.  I made a note to retrieve them from backup.

Otherwise, though, the process seemed to have succeeded.  Spot checks indicated that I had working multipage PDFs.  Now it was time to rename them to be something other than Output_0001.pdf etc.  I went back to my first spreadsheet, saved it, and went to work on a temporary copy of it.  Specifically, I converted all of its formulas to numerical values.  (In Excel 2003, this involved going to the top left corner, selecting the whole spreadsheet, and using the Edit > Copy, Edit > Paste Special Values combination.)  With that done, I was free to delete and rearrange columns as needed, without having cells recalculate in undesirable ways.  I kept only columns C (showing the original source folder), F (showing the name I had chosen for the combined output PDF), and L (showing the output file number).  I rearranged those columns in reverse order, so that the output file number column could function as a rather redundant index.  I went to the second spreadsheet and, working likewise on a copy of that, deleted all columns except column B, containing the output file number.  In an adjacent column, I used VLOOKUP to look up the filenames, from the other spreadsheet, to which these Output_0001.pdf etc. files should be renamed.

Through such steps, I renamed these files.  Yet something went wrong.  Due to distraction and time delay during the last stages of this process, somehow the renaming did not bring me back to a coherent set of files.  Spot checks indicated that the ones that did rename were renamed correctly.  That is, if a file wound up being renamed as Letter to Joe, it did seem to be exactly that.  But I was left with 60 files -- more than 10% of the total -- that did not rename at all.  There may have been a way to fix them through the spreadsheet, but I had run out of time for the project.  Under the circumstances, I defaulted to putting those files aside into a separate folder, pending a future manual renaming effort.

The process described in this post did seem to work.  But its numerous steps continued to hold the threat of something going wrong, somewhere along the line, and being difficult to retrace.  This did not seem to be nearly the last word on batch merging scattered JPGs into multipage PDFs.

Monday, February 6, 2012

Batch Merging Many Scattered JPGs into Many Multipage PDFs - Streamlined

I was facing a task that I had undertaken before.  This presented an opportunity to revise and streamline the writeup that I had produced during that previous effort.

First, I had previously converted multiple JPGs to PDF, scattered among multiple folders, in some cases combining multiple JPGs into a single PDF.  The general approach taken there was to rename the JPGs to unique names, move them to a single folder, conduct my operations on them there, delete the JPGs, and move the new PDFs back to the folders where the JPGs had come from.

In a related effort, I had also previously converted MHTs to PDF.  Those were not multipage.  In that effort, I had taken the strategy of converting them in-place (i.e., without moving them to a single folder, transforming them there, and moving them back to the place of origin) and then deleting the MHTs.

Between these two strategies, I liked the former better.  Having all files in one folder made it easier to verify that my processes yielded the same number of output files as source files.  It was also easier to spot-check the output and make sure that the files looked like they should.

Now I wanted to clean up any remaining JPGs that should be converted and, in some cases, combined into PDFs.  Using techniques described in more detail in those previous posts, I ran a DIR to get a full list of JPGs.  I put that list into a spreadsheet, used a REVERSE function to identify paths, sorted the spreadsheet by path, and deleted those rows that contained images that I did not want to convert (e.g., photos).  This made it possible to reduce a starting set of thousands of JPGs into a list of a few hundred that actually needed to be converted.

Using my spreadsheet, I cooked up REN commands to run in a batch file.  This produced unique filenames, as described previously, so that I could use MOVE commands (or cut and paste them from a file finder) to pool them all into a single folder without fear of overwriting.  Of course, I needed to keep the spreadsheet, so that I would know what the original names and locations were, so that I could write MOVE commands to put the resulting PDFs back in the source folders.

Once the uniquely named JPGs were combined in one folder, I used IrfanView's batch capability -- again, as detailed in the previous posts -- to convert them to PDFs.  These were all one-page PDFs; I had one PDF per JPG.  I made sure the spreadsheet had a column associating these one-page PDFs (with names like ZZZ_0001.pdf) with their original filenames (e.g., Letter from Ed page 01.jpg).  That is, the current PDF name and the original JPG name would be on the same row.  Now I could use the original JPG name to calculate the name of the new multipage PDF (e.g., Letter from Ed.pdf).  So there would be spreadsheet rows like these:

ZZZ_0001.pdf   Letter to Ed.pdf
ZZZ_0002.pdf   Letter to Ed.pdf
ZZZ_0003.pdf   Letter to Jane.pdf
ZZZ_0004.pdf   Memo from ABCD.pdf
ZZZ_0005.pdf   Memo from ABCD.pdf
(Letter to Jane.pdf is there because I did a sweep for all JPGs, some of which would wind up having just one page.)  I sorted the spreadsheet alphabetically by the output filename and input PDF name (as shown, with the pages that would be going into Letter to Ed arranged in proper order, and with Letter to Ed coming before Memo from ABCD).  For convenience, I assigned a simple working name to the output filenames (e.g., Letter to Ed.pdf would be represented by YYY_0001.pdf, and Memo from ABCD.pdf by YYY_0002.pdf).  I expressed the relationship between the input (single-page) PDF filenames and the output (potentially multipage) PDF filenames with commands that would create the lines of the necessary XML files, in this format:
echo ^<file value="D:\Workspace\ZZZ_0001.pdf"/^> >> YYY_0001
That produced 460 batch file lines, each starting with "echo," that would generate most of the contents of 159 different XML files needed to produce 159 different single- or multi-page PDFs.  The XML files would each need to begin with these two lines:
<?xml version="1.0" encoding="UTF-8"?>
<filelist>
and end with "</filelist>" (without quotes).  I put those lines into Header.txt and Tailer.txt, respectively, and then combined them with a batch file containing 159 lines like this:
copy /b Header.txt+YYY_0001+Tailer.txt YYY_0001.xml
That batch file gave me 159 XML files, starting with YYY_0001.xml.  Now I needed 159 new batch files, one for each XML.  These batch files would tell PDFsam to do the actual work -- to merge the single-page PDFs listed in the XMLs into an appropriate output PDF.  So at this point, D:\Workspace contained those 159 XMLs and the 460 single-page PDFs that would soon be merged.  As above, each of these 159 batch files would begin with several lines.  Working in a separate folder, I saved those several lines in a new Header.txt file:
@echo off

set JAVA=%JAVA_HOME%\BIN\JAVA

set JAVA_OPTS=-Xmx256m -Dlog4j.configuration=console-log4j.xml

set CONSOLE_JAR="C:\Program Files (x86)\pdfsam\lib\pdfsam-console-2.3.1e.jar"

@echo on
Those lines made some assumptions about environment variables, which I had already set.  Only the final line of each of those 159 batch files would vary.  That final line would have two variables:  it would name the XML file that listed the PDFs to be combined, and it would name the output file that would contain those PDFs, as in this example:
%JAVA% %JAVA_OPTS% -jar %CONSOLE_JAR% -l D:\Workspace\YYY_00001.xml -o D:\Workspace\Merged\YYY_00001.pdf concat
I had to create the D:\Workspace\Merged folder to hold the output, and I had to write Excel spreadsheet formulas to mass-produce one final line, like the one just shown, for each of the 159 batch files.  The Excel formula I used was like this:
="echo %%JAVA%% %%JAVA_OPTS%% -jar %%CONSOLE_JAR%% -l D:\Workspace\"&A2&".xml -o D:\Workspace\Merged\"&A2&".pdf concat >> "&A2&".txt"
where cell A2 contained YYY_0001.  That gave me 159 batch commands that I put into Notepad, and saved and ran as Texter.bat.  That produced 159 files with names like YYY_0001.txt, each containing a single JAVA line.  Then, in the spreadsheet, I created another set of commands like this:
copy /b Header.txt+YYY_0001.txt YYY_0001.bat
to combine the header and the JAVA lines into 159 batch files, each of which would ask PDFsam to merge the single-page PDFs listed in the corresponding XML file into the appropriate single- or multi-page output PDF.  And finally, to run those 159 batch files and produce the desired PDFs, I used the spreadsheet to work up a batch file called Runner.bat that began like this:
@echo off
call YYY_00001.bat
call YYY_00002.bat
I had been doing some of this work in other folders, but at this point my simplistic approach would work only if I moved them all back to D:\Workspace before trying Runner.bat.

Altogether, the process worked for me, as it had before, and this time it took only a few hours to go through the foregoing steps, make the inevitable mistakes, and get the desired output.

The remaining step was to get these multipage PDFs back where they belonged.  I went back to the spreadsheet, prepared batch lines for that purpose, and finished the job.

Saturday, January 14, 2012

Batch Merging Many Scattered JPGs into Many Multipage PDFs - Second Try

I had previously looked for a way to combine multiple JPG files into a single PDF.  As explained in more detail in that previous post, the specific problem was that I might have sets of several JPGs in a single folder that should be merged into several different PDFs, and there might be multiple folders at various places that would have such PDF sets.  Hence, if I wanted to automate this project across dozens of folders containing hundreds of JPGs, it seemed that I would need a command-line solution rather than a GUI.  This post updates that previous attempt.

There were commercial programs that seemed to offer the necessary command-line capabilities, such as PDF Merger Deluxe ($30) and ParmisPDF Enterprise Edition ($300).  A search suggested, however, that PDFsam might offer a freeware alternative.

Assembling the List of JPGs to Be Converted

Before investigating PDFsam, I decided to get a more specific sense of what I needed to accomplish.  In a command window (Start > Run > Cmd), I navigated to the root of the drive I wanted to search (using commands like D: and "cd \").  (The root was the folder whose command prompt looked like C:\ or D:\, as distinct from e.g., C:\FolderZ.)  Being at the root folder meant that my command would apply to all subfolders on that drive.  Once I was there, I ran this command:

DIR *.jpg /s /b /a-d > jpgslist.txt
That gave me a list of files (but not directories, /a-d) in all subdirectories (/s), listed in bare (i.e., simple, /b) format, saved in a new file called jpgslist.txt.  (For more information on DIR or any other DOS-like command, type DIR /? at the command prompt, or search for informational webpages.)  If I'd had files with a .jpeg (as distinct from .jpg) extension, I would have added a second line, referring to *.jpeg and using a double-arrow (>>) to tell the program to add these to the existing jpgslist.txt, rather than creating a new jpgslist.txt (which was what the single arrow (>) would do).

Now I wanted to see which folders had more than one JPG.  I would use Microsoft Excel to help me with this.  I could either open jpgslist.txt and copy its contents into an Excel spreadsheet, or import it into Excel.  In Excel, I did a reverse text search to find the last backslash in each text line, so as to distinguish file names from the directories holding them.  I sorted the spreadsheet by folder name and file name.  I set up a column to detect folders containing more than one JPG, and deleted the rows of folders not containing more than one JPG.  I might still want to do another search and conversion for isolated JPGs at some point, but that would be a different project.

Next, I wanted to see if I could eliminate some folders.  For instance, I might not want to PDF and combine JPGs that were awaiting image editing, or important photos whose quality might get degraded in the PDF conversion.  In other words, I decided that this particular project was just for those JPGs that I was going to combine into a single PDF and then delete.  To get a concise list of folders containing multiple JPGs, I went into Data > Filter > Advanced Filter.  (That's Excel 2003.)  I moved the output into another worksheet.  I could then do a VLOOKUP to automatically mark rows to be deleted.  So that gave me the folders to work on.

Now it was time to decide which files to combine, and in what order.  In some cases, I had named files very similarly -- usually with just an ending digit change (e.g., Photo 01, Photo 02 ...).  So I set up a couple of columns to find the filename's length, subtract a half-dozen characters, and decide whether those first X characters were the same as in the preceding row.  If so, and if both were in the same folder, we had a match.  I discovered, at this point, that one or two folders contained large numbers of files.  I decided to combine those manually.  With those out of the way, it seemed that the next step was to decide the names of the resulting multi-image PDFs (e.g., Medieval Churches.pdf), and to put those names on the spreadsheet rows, next to the individual JPGs that would go into them.

At this point, as described in another post, I learned how to use PDFsam to combine several PDFs into one PDF.  So I had a rough idea of the start of my project (i.e., identify the JPGs that I would want to merge into a single output PDF), and I also had a basic sense of the end of my project (i.e., use PDFsam to merge multiple PDFs into that single output PDF).  I was missing the middle part, where I would convert the original JPGs into single-page PDFs and would get them into a form where PDFsam could work on them.

Converting Individual JPGs to Individual PDFs

I had originally assumed that I would start by converting the JPGs to PDFs within the various folders where they were originally located.  So if I had File1.jpg in E:\Folder1, and if I had File2.jpg in E:\Folder2, my conversion would result in two files in each of those folders:  File1.jpg and File1.pdf in Folder1, and File2.jpg and File2.pdf in Folder2.  Then I would use PDFsam to merge the PDFs (i.e., File1.pdf and File2.pdf) from those locations; delete the original JPGs and PDFs; and move Output.pdf to an appropriate location.

I didn't entirely like that scenario.  It seemed like it could make a mess.  As I reviewed another post in which I had worked through similar issues, I decided that a better approach might (1) make a list of original JPG file locations, (2) move those JPGs to a single folder where I could convert them to individual PDFs, (3) merge the individual PDFs into concatenated PDFs, (4) delete the individual JPGs and PDFs, and (5) move the concatenated PDFs to the desired locations.  I decided to try this approach.

One problem with moving files from many folders to one folder was that there might be two files with the same name.  They would coexist peacefully as long as they were in separate folders; but when they converged into one target folder, something would get overwritten or left behind.  It seemed that a first step, then, was to rename the source JPGs, so that each one would have a unique name -- preferably a short name without spaces, potentially making it easier to write commands for them as needed.  In this step, as in others, it would be important to keep a list indicating how various files were changed.  To rename the files where they were, I returned to my spreadsheet and used various formulas to produce a bunch of rename commands of this type:
ren "D:\Folder Name\File Name.jpg" "D:\Folder Name\ZZZ_00001.jpg"
after doing a search to make sure that my computer did not already have any files with names resembling ZZZ_?????.jpg.  The spreadsheet gave me one REN command for each JPG.  I copied those commands into a Notepad file, named it Renamer.bat, and double-clicked to run that batch file.  (A slower and more cautious approach would have been to run it in a command window, perhaps with Pause commands among its renaming lines, so that I could monitor what it was doing.)  A search in a file-finding program like Everything now confirmed that the number of files on my computer with ZZZ_?????.jpg names was equal to the number of files listed in my spreadsheet.  I cut and pasted all those ZZZ_?????.jpg files from Everything to a single folder, D:\Workspace.  (I could also have used the spreadsheet to generate Move commands to run in a batch file for the same purpose.)

Now I had a spreadsheet telling me what the original names of these ZZZ_?????.jpg files were, and I had all those ZZZ files together in D:\Workspace.  My spreadsheet also told me which of them were supposed to be put together into which merged output PDFs.  But they weren't ready to be merged by PDFsam, because they were still JPGs, not PDFs.

To convert the JPGs to PDFs, I could have prepared another batch file, using IrfanView commands to do the conversion, like those that I had previously played with in another project.  But I figured it would be easier to use IrfanView's File > Batch Conversion/Rename.  There, I told IrfanView to Add All of the ZZZ files to its list.  I specified PDF as the Batch Conversion Settings output format, and set its Options > General tab to indicate that Preview was not needed (and adjusted other settings as desired).  I told it to Use Current ("Look In") Directory as the Output Directory for Result Files (adding "Output" as the name of the output subfolder to be created).  Then I clicked Start Batch.

That produced one PDF, in the Output subfolder, for each original JPG.  I hadn't done anything to change their filenames, so ZZZ_00001.jpg had been converted to ZZZ_00001.pdf.  Spot checks indicated that the resulting single-page PDFs were good.  I deleted the original ZZZ*.jpg files, moved the output PDFs up into D:\Workspace, made a backup, and turned to the project of merging those single-page PDFs into multipage PDFs.

Preparing XML Files to Concatenate PDFs

In my spreadsheet, I had already decided which ZZZ files would be merged together, and what the resulting multipage PDFs would be called.  Now -- referring, again, to the other post in which I worked through the process of using PDFsam -- I needed that information to create File Value lines for a set of ConcatList.xml files that PDFsam would then merge into a set of output PDFs.

In other words, I would have a batch file that would run PDFsam, and I would have a data file, in XML format, to specify the single-page PDFs that PDFsam would combine into the multipage output PDF.  I would have a pair of such files (i.e., a batch file and an XML data file) for each resulting multipage PDF.  In my particular project, there were 65 single-page PDFs, and they would be combined into a total of eight multipage PDFs.  So I would have eight pairs of .bat + .xml files, and the eight XML files would contain a total of 65 File Value lines.

To the extent possible, I would want to automate the creation of these batch and data files.  Sorting 65 data lines into eight different XMLs would be tedious and easily confused.  Things would get much worse if I wanted, in some later project, to use these procedures for hundreds or thousands of JPGs or other files.

I began by adding a column to my spreadsheet that contained the exact text of the appropriate File Value line.  Example:  for ZZZ_00001.pdf, the line would read like this:
<file value="D:\Workspace\ZZZ_00001.pdf"/>
To produce that result, if the Excel spreadsheet's cell D2 contained ZZZ_00001.pdf, its cell E2 would contain this formula:
="<file value="&CHAR(34)&"D:\Workspace\"&D2&CHAR(34)&"/>"
(Note the use of CHAR(34) to add quotation marks where they would otherwise be misunderstood.)  Next, I wanted to assign those File Value lines to the appropriate batch files.  A search confirmed that I didn't have any files on my data drives with YYY_????? names, so I decided that my first multipage output PDF would be called YYY_00001.pdf, and that the pair of files used to produce it would be YYY_00001.bat and YYY_00001.xml.  In other words, the File Value line for ZZZ_00001.pdf (above) would have to be one of the File Value lines appearing in YYY_00001.xml.  But the next File Value line in YYY_00001.xml could be a ZZZ file out of sequence (e.g., ZZZ_00027.pdf), if that happened to be the next original file that I wanted to put into YYY_00001.pdf.

Since YYY_00001.pdf was going to be the temporary working name of the multipage PDF that I would ultimately be calling "Short Letters to Mother," I sorted the spreadsheet (making sure to first use Edit > Copy, Edit > Paste Special to convert formulas to values) by the column containing my those ultimate desired filenames, and worked up a column indicating the corresponding YYY filename.  In other words, each cell in that column contained one of eight different labels, from YYY_00001.pdf to YYY_00008.pdf.

With that in place, I was ready to generate some batch commands.  Each batch command would use the ECHO command to send the contents of spreadsheet cells to YYY*.xml files.  My first attempt looked like this:
echo <file value="D:\Workspace\ZZZ_00001.pdf"/> >> YYY_00001.xml
The double greater-than signs (">>") indicated that YYY_00001.xml would be created, if it didn't already exist, and that the File Value line (above) would be added to it.  This first try produced an error, as I feared it might:  ">> was unexpected at this time."  The less-than and greater-than symbols were confusing Windows.  I had to modify the formula in my spreadsheet (or use Ctrl-H) to add carets (^) before them, like this:
^<file value="D:\Workspace\ZZZ_00001.pdf"/^>
That worked.  Now YYY_00001.xml contained that line.  With commands like that for each of the 65 single-page PDFs, my spreadsheet now had cells like these:
echo ^<file value="D:\Workspace\ZZZ_00051.pdf"/^> >> YYY_00004.xml
echo ^<file value="D:\Workspace\ZZZ_00025.pdf"/^> >> YYY_00006.xml
I sorted the rows in my spreadsheet by the appropriate column to make sure the single-page PDFs would get added to their multipage PDFs in the proper order.  (If necessary, I would have added another column containing numbers that I could manipulate to insure the desired order.)  Then I copied all those cells over to Notepad and saved it as a new batch file that I called Sorter.bat.  I ran Sorter.bat and got eight XMLs, as hoped.  Spot checks seemed to indicate that the process had worked correctly.

My eight XML files were not complete for purposes of PDFsam.  Each of them would need lines of code preceding and following the File Value lines.  As described in the other post, those files would begin with
<?xml version="1.0" encoding="UTF-8"?>
<filelist>
and would end with
</filelist>
I saved those two beginning lines into a text file called Header.txt, and I saved that ending line into another text file called Tailer.txt.  Now I needed to combine them with the XML files that Sorter.bat had just created.  For that purpose, it seemed that my spreadsheet could benefit from a separate page dedicated to XML file manipulation.  I added that page, filtered my existing page for unique values in the YYY*.pdf column, and placed the results on that new page. 

I could now see that I was too early in adding .xml extensions to the eight files.  I went back into the spreadsheet and changed it to produce files without extensions (e.g., YYY_00001 instead of YYY_00001.xml), and then I deleted the XML files and re-ran Sorter.bat (as modified) to verify that it was all still working.

With that change, I returned to the spreadsheet's XML Files page.  Next to each of the eight XML filenames, I added columns to produce commands of this type:
copy Header.txt+YYY_00001+Tailer.txt YYY_00001.xml
I put those eight commands into a batch file and ran it.  It worked:  I had eight XML files with everything that PDFsam needed.  There was just one small glitch:  at the end of each resulting XML file, there was a little arrow, pointing to the right.  Searches didn't yield any obvious explanations.  I wasn't sure if it would make a difference; I thought it might just be a symbol representing end-of-file or line feed.  I decided to forge ahead and see what happened.  Except for that little character, I had exactly what I needed for my XML files.

Preparing Batch Files to Use the XMLs

The next step was to create a matching YYY_?????.bat file for each YYY_?????.xml file.  This batch file would run the commands necessary to merge the single-page PDFs listed in the XML file.  I would use the same techniques as in the XML files.  There would be no Tailer.txt file this time; the line that would need to change, in each BAT file, was the very last line.  So my COPY command (above) would just have Header.txt plus the variable line to produce the YYY_?????.bat file.  The variable (last) line of the batch file had to look like this:
%JAVA% %JAVA_OPTS% -jar %CONSOLE_JAR% -l D:\Workspace\YYY_00001.xml -o D:\Workspace\Merged\YYY_00001.pdf concat
In other words, it would have two variables:  the name of the XML file providing the input, and the name of the PDF file containing the output, saved in a Merged subfolder.  It was pretty straightforward, by now, to use the spreadsheet to generate the necessary commands and to run them in another Sorter.bat file (see above).  I just had to remember to delete my previous YYY_????? files (without extension), so that their contents would not get thrown into the mix.  I did wish that PDFsam's -l option were expressed as -L, so that nobody would think it was the number one, but I wasn't yet ready to experiment and find out whether -L would work just as well.  Anyway, to produce a line like the one shown immediately above, my Excel formula looked like this:
="echo %%JAVA%% %%JAVA_OPTS%% -jar %%CONSOLE_JAR%% -l D:\Workspace\"&A2&".xml -o D:\Workspace\Merged\"&A2&".pdf concat >> "&A2 
where cell A2 contained the filename without extension (e.g., YYY_00001).  I had to use double percentage symbols to get a single one to come through.  I put the resulting eight lines into Sorter.bat, and it produced eight YYY_????? files, as before.  I ran another batch file to combine Header.txt plus the YYY_????? files to produce YYY_?????.bat -- again, same as above, but without Tailer.bat and making sure to produce .bat files, not .xml files. 

These steps gave me eight pairs of .bat and .xml files.  The batch files looked good, except for the little arrow at the end (above).  Now, if all went well, the batch files would run, would consult the XML files for the lists of single-page PDFs to merge, and would produce eight YYY_?????.pdf output files in the Merged subfolder.  I would not want to run the batch files manually, if I were producing a large number of merged PDFs, so I wrote a batch file to run the batch files.  The commands in this file looked like this:
@echo off
call YYY_00001.bat
call YYY_00002.bat
and so forth.  I ran this batch file.  It gave me error messages.  I opened a command window and typed just its first action line:  call YYY_00001.bat.  The error was:
FATAL  Error executing ConsoleClient
java.lang.Exception: org.pdfsam.console.exceptions.console.ParseException: PRS001 - Parse error. Invalid name (D:\Workspace\YYY_00001.xml) specified for <l>, must be an existing, readable, file.
Oh.  Dumb mistake.  The XML files weren't in D:\Workspace.  I moved them and tried again on the command line.  Another error:
Error on line 5 of document file:///D:/Workspace/YYY_00001.xml : Content is not allowed in trailing section.
That little arrow was on line 5.  I moved it to line 6 and tried again.  Same error, except now it said the problem was on line 6.  I deleted the little arrow and tried again.  That solved that problem.  Now a different error:  "The system cannot find the path specified."  That was probably because I had not yet created the Merged subfolder.  Apparently PDFsam was not going to create a folder that did not already exist.  I created it and tried again.  Success!  YYY_00001.pdf was created in the Merged folder with the desired single-page PDFs in it.

Now I just had to figure out how to prevent that little arrow from appearing in the XML files. It came at the end of both the XML and the BAT files, and it got there when I used the COPY command to combine the header, the command, and the tailer text files.  The solution was to add the /b switch:
copy /b Header.txt+YYY_00001+Tailer.txt YYY_00001.xml
With that change, I went back through the process of creating the XML files.  Then I tried running YYY_00001.bat again.  I got an error:  "Cannot overwrite output file (overwrite is false)."  Oops.  I had forgotten to get rid of the previously produced YYY_00001.pdf in the Merged folder.  This time I was successful without having to manually remove the little arrow -- it wasn't there anymore in the new YYY_00001.xml.  I ran the batch file that called the eight YYY_?????.bat files.  It ran and produced eight multipage PDFs.  Those eight contained a total of 65 pages.  I combined them all in Acrobat, just to take a quick look.  They were all good.

Putting the Multipage PDFs Back Where They Belong

Now I needed to decide where to put the multipage PDFs.  In this case, the individual PDFs that went into each of the multipage PDFs had all come from the same folder.  That is, I did not have Folder1\PDF1 plus Folder73\PDF2 going into BigPDF-A.  So I could use the spreadsheet to determine semi-automatically the path and filename for a rename command.  I wound up with Rename commands like this:
ren YYY_00004.pdf "Medieval Craft Workers.pdf"
followed by Move commands like this:
move /y "Medieval Craft Workers.pdf" "E:\IMAGES\Medieval Craftsmanship\"
I verified that the multipage PDFs had returned to the folders whence the single-page PDFs had originated.  This project was finished.

Combining PDFs with PDFsam: Introductory Syntax

I was using Windows 7.  I had a project that would benefit from automated merging of multiple PDFs into a single PDF.  It looked like PDFsam would be useful for this purpose.

PDFsam had GUI and Console options.  In other words, it could be accessed through a user-friendly interface, like most Windows programs, and it could also be used on the command line.  My project had certain complexities, such that the GUI approach would not be ideal.  This post describes the steps I took to learn how to use the Console.

I began with the Console section of the PDFsam wiki.  It led to a page providing information on console parameters and commands.  The explanation was too thin, so I did a search for more guidance. This led to a 33-page Tutorial (installed with the program files). It also led to a thread that reminded me not to forget the PDFsam Forums.

The Tutorial (p. 18) seemed to say that, in PDFsam-speak, what I wanted to do was to Merge files, and for this I would use the Concat option. Other options, not of interest here, included Split and Encrypt. It appeared that PDFsam syntax would call for very long commands. Looking for examples, I went to a forum thread, but that pointed me back toward the wiki page (above).

The Tutorial said that, to make PDFsam run from the command line (i.e., Console), I could either type a certain command or just use one of the scripts in the bin folder where the program was installed (e.g., C:\Program Files\pdfsam\bin).  In that bin folder, it appeared I had my pick from two scripts, provided in Linux (.sh) and Windows batch (.bat) versions.  Since I wanted the Console, not the GUI, I focused on run-console.bat.  Its contents seemed to address various details that I didn't clearly understand, and didn't necessarily want to study; it just looked like the thing I would need to use.  So I created a shortcut to it and put that in my Start Menu. I also edited the Tutorial, adding bookmarks to the various sections, and moved it to the Start Menu too.  (My customized Start Menu would survive any subsequent Windows reinstallation, so I probably wouldn't need to do this housekeeping again, if I had to install PDFsam in a new Windows installation sometime down the line.)

Unfortunately, the run-console.bat batch file didn't work for me.  It gave me an endlessly scrolling set of messages. They were ripping past too quickly to read.  I hit PrintScreen, opened IrfanView (any image editor would do, as would Microsoft Word or Wordpad), and pasted the screenshot (Ctrl-V).  (I could have just hit Ctrl-C, or possibly the Pause key.)  Now I could see that it was just the same error message, repeating over and over:

java is not recognized as an internal or external command, operable program or batch file
Why wasn't my system recognizing java?  I right-clicked on run-console.bat, chose Edit, and looked for the line that referred to java. I couldn't quite figure out where the problem was, so I stuck in a "pause" command somewhere, saved the batch file, and, this time, ran it from the command line instead of from the shortcut. That way, the error statements would stay onscreen instead of scrolling past too quickly or disappearing when the batch file finished running. (This was another instance where it was handy to have the right-click option, "Open command window here," provided by Ultimate Windows Tweaker.)

Running the batch command meant just typing its name and hitting Enter. It paused where I had put the pause command, without any obvious errors, so I moved the pause command further down, saved, and repeated the cycle. (Running run-console.bat again required just hitting the Up key to repeat the command.) That's where the problem was: now I had the endless scrolling again. I hit Ctrl-C a couple of times to abort the batch file.

I played around with the batch file for a while, and eventually realized that maybe the problem was that the JAVA_HOME variable had not yet been assigned a value on my system. It appeared that the batch file was supposed to tell me this; if so, it wasn't working right. I went into Start > Run > SystemPropertiesAdvanced.exe > Environment Variables and, sure enough, no JAVA_HOME variable. I had already installed the Java Runtime Environment, and I almost always used the default installation paths when installing programs, so the advice seemed to be that the JAVA_HOME variable should point to C:\Program Files\Java\jre6. Since this folder name ("Program Files") had a space in it, apparently I would need to use the shortened, DOS-style name for it -- known as an "8.3" filename because it would have eight characters before the dot and three afterwards (e.g., yourfile.txt).

I knew the shortened name of that folder would probably contain Progra~1 (instead of "Program Files"), and I could have just experimented with that, but I had seen instances where it would be Progra~2 or something else, and anyway I wanted to know how to get the 8.3 name. Microsoft advised using the GetShortPathName function to figure it out, but that seemed to involve programming, and programming is a lot of work. Instead, I ran a search that took me to ShortPath by Marcello Zaniboni. To get ShortPath to work from the command line, I tried the C:\Windows shortcut trick, but it didn't work. I didn't want to add ShortPath to my PATH yet, so I just opened a command window in the folder where ShortPath.exe was located, typed "ShortPath " (with an ending space) but didn't hit Enter, and then dragged the C:\Program Files\Java\jre6 folder into that command window from Windows Explorer. (I think this worked because I had installed DropCommand. Otherwise I might have had to type it out, with quotation marks.)

ShortPath told me that, actually, the short path to that folder was C:\PROGRA~2\JAVA\JRE6. So I went back into SystemPropertiesAdvanced.exe > Environment Variables > System Variables > New > Variable Name = JAVA_HOME, Variable Value = C:\PROGRA~2\JAVA\JRE6. I OKed out of there and rebooted.

After doing that, I still had to play with the batch file for a long time, in a quest to learn, remember, get lucky, or otherwise do what I needed to make it work. By the time I was done, I almost thought that I would have been further ahead just using the command given in the wiki:
java -Dlog4j.configuration=console-log4j.xml -jar pdfsam-console-2.1.1e.jar
except that that didn't work either because, as I soon realized, it was a Linux command. I also did not fare too well with the advice to type "run-console.bat -h concat" for information on the syntax for the Concat option, because the run-console.bat file itself was not yet working.

The Tutorial (pp. 19-20) said that I had three ways to indicate which files I wanted to merge. Instead of entering one parameter to indicate a directory and then entering another parameter to indicate one or more files in that directory, it seemed I would want the option that would allow me to specify a file (including its path) on a single line. Evidently I could list a number of PDF files in a separate XML file, and invoke that file (with its list of PDFs) by using the -l (that's an L, not a one) option. But it wasn't working right. Ultimately, I posted a question on it. Andrea (a guy from the Netherlands), creator of PDFsam, posted a reply within 36 hours. And that got me where I needed to go. I was able to get a test run to work, with a run-console.bat file whose contents (viewed in something like Notepad, of course, not in a word processor like Word that would add all kinds of invisible junk) were as follows:
@echo off

set JAVA=%JAVA_HOME%\BIN\JAVA

set JAVA_OPTS=-Xmx256m -Dlog4j.configuration=console-log4j.xml

set CONSOLE_JAR="C:\Program Files (x86)\pdfsam\lib\pdfsam-console-
2.3.1e.jar"

@echo on

%JAVA% %JAVA_OPTS% -jar %CONSOLE_JAR% -l D:\Current\ConcatList.xml -o
D:\Current\PDFsamOut\Merged.pdf concat
While I wasn't entirely clear on what all those lines did, the basic idea seemed to be that the first lines would define JAVA, JAVA_OPTS, and CONSOLE_JAR, and then the last line would combine them all into one big command. That command seemed to say, "Run Java with these options, using this jar file for specific instructions; take your input from the PDF files listed in ConcatList.xml; and output a single PDF file, Merged.pdf, containing all of those PDF files." To make that work, I needed to know the format of the ConcatList.xml file. Here's the one that worked for me in this test run:
<?xml version="1.0" encoding="UTF-8"?>
<filelist>
<file value="D:\Current\TestDir1\x1.pdf"/>
<file value="D:\Current\TestDir2\x2.pdf"/>
</filelist>
I just needed a File Value line for each PDF to be merged, using the syntax shown.  To summarize, then, I used Notepad to create two files.  One, called run-console.bat, contained the first half-dozen lines of code shown above, beginning with @echo off.  The other, ConcatList.xml, contained these last five lines of code, beginning with the "xml version" line.  ConcatList.xml would contain File Value lines, each designating a PDF to be merged into the larger output PDF (and there were other options for ConcatList.xml; I just didn't need them for my project), and run-concat.bat would read those lines and do the actual concatenation into a single output PDF.

Monday, April 18, 2011

Batch Merging (Combining, Concatenating) PDFs from the Command Line

I was using Windows 7.  I had a bunch of JPGs that were images of successive pages in a document.  In other words, when the document was scanned, each page was saved to its own separate file.  They were named Page01.jpg, Page02.jpg, Page03.jpg, and so forth.  I had converted these JPGs to PDF, thinking that would help me toward my goal.  The goal was to combine them all -- whether as JPGs or PDFs -- into one PDF file containing the entire document.  I had a large number of documents like this, each consisting of several pages, all together in one directory.  It was too big a job to do manually.  But could I automate it?  This post describes my efforts to that end.
What I was looking for was, somehow, a program or script that could recognize the differences among these files in a directory, and combine only the ones that should be combined:

Doc1Page1
Doc1Page2
Doc2Page1
Doc3Page1
Doc3Page2
so that I would wind up with this:
Doc1 (pages 1 & 2)
Doc 2 (page 1)
Doc 3 (pages 1 & 2)
A search led to iText, which looked sleek and got some good recommendations but unfortunately (a) did not appear to be available in a Windows/DOS version and (b) was not for end users.  In other words, I had no idea what to do with it.  A Gizmo's Freeware article did not seem to identify programs that could do this.  The article led me to PDFill PDF Editor as its first choice for an all-around freeware PDF solution.  There, I went to the Merge PDF Files tool.  Its batch command option, available only in its $20 paid version, looked like it would come close to doing what I wanted.  The example they gave looked like this:
"C:\Program Files\PlotSoft\PDFill\PDFill.exe" MERGE Input1.pdf Input2.pdf Input3.pdf Output.pdf
With many files or long filenames, that approach would run into limits on how long a command could be.  I suspected I could vary their command with standard DOS input options, which I vaguely recalled would look like this:
"C:\Program Files\PlotSoft\PDFill\PDFill.exe" MERGE < inputfilelist.txt
So then the challenge would be to automate the process of identifying filenames that would belong together in the same inputfilelist.txt file:  Doc1Page1.pdf and Doc1Page2.pdf would be in Doc1inputfilelist.txt, whereas Doc3Page1.pdf and Doc3Page2.pdf would be in Doc3inputfilelist.txt.  Then all I'd have to do would be to construct a batch file with lines like this:
"C:\Program Files\PlotSoft\PDFill\PDFill.exe" MERGE < Doc1inputfilelist.txt
"C:\Program Files\PlotSoft\PDFill\PDFill.exe" MERGE < Doc3inputfilelist.txt
I wasn't sure if PDFill would allow me to select a name for each resulting output file, or how that would work.  With a large number of files, that manual process could be very time-consuming.  I could also look into other possibilities, like going back to the JPGs from which I had created these PDFs and merging them into multipage TIF files that I could then convert into multipage PDFs.

These were the steps I would have to pursue as this project continued. But I had to shelve it for now, to deal with other things.