I was using Windows 7. I had a bunch of JPGs that were images of successive pages in a document. In other words, when the document was scanned, each page was saved to its own separate file. They were named Page01.jpg, Page02.jpg, Page03.jpg, and so forth. I had converted these JPGs to PDF, thinking that would help me toward my goal. The goal was to combine them all -- whether as JPGs or PDFs -- into one PDF file containing the entire document. I had a large number of documents like this, each consisting of several pages, all together in one directory. It was too big a job to do manually. But could I automate it? This post describes my efforts to that end.
What I was looking for was, somehow, a program or script that could recognize the differences among these files in a directory, and combine only the ones that should be combined:
Doc1Page1so that I would wind up with this:
Doc1 (pages 1 & 2)A search led to iText, which looked sleek and got some good recommendations but unfortunately (a) did not appear to be available in a Windows/DOS version and (b) was not for end users. In other words, I had no idea what to do with it. A Gizmo's Freeware article did not seem to identify programs that could do this. The article led me to PDFill PDF Editor as its first choice for an all-around freeware PDF solution. There, I went to the Merge PDF Files tool. Its batch command option, available only in its $20 paid version, looked like it would come close to doing what I wanted. The example they gave looked like this:
Doc 2 (page 1)
Doc 3 (pages 1 & 2)
"C:\Program Files\PlotSoft\PDFill\PDFill.exe" MERGE Input1.pdf Input2.pdf Input3.pdf Output.pdfWith many files or long filenames, that approach would run into limits on how long a command could be. I suspected I could vary their command with standard DOS input options, which I vaguely recalled would look like this:
"C:\Program Files\PlotSoft\PDFill\PDFill.exe" MERGE < inputfilelist.txtSo then the challenge would be to automate the process of identifying filenames that would belong together in the same inputfilelist.txt file: Doc1Page1.pdf and Doc1Page2.pdf would be in Doc1inputfilelist.txt, whereas Doc3Page1.pdf and Doc3Page2.pdf would be in Doc3inputfilelist.txt. Then all I'd have to do would be to construct a batch file with lines like this:
"C:\Program Files\PlotSoft\PDFill\PDFill.exe" MERGE < Doc1inputfilelist.txtI wasn't sure if PDFill would allow me to select a name for each resulting output file, or how that would work. With a large number of files, that manual process could be very time-consuming. I could also look into other possibilities, like going back to the JPGs from which I had created these PDFs and merging them into multipage TIF files that I could then convert into multipage PDFs.
"C:\Program Files\PlotSoft\PDFill\PDFill.exe" MERGE < Doc3inputfilelist.txt
These were the steps I would have to pursue as this project continued. But I had to shelve it for now, to deal with other things.