Sunday, March 18, 2012

Troubleshooting Some Options for Converting PDF to JPG or Other Image Formats

I had a bunch of PDFs.  I wanted to glance at just the first page of each.  This post describes some steps I took to find a way of extracting those first pages and making them easily available.

I believed it was going to be a simple matter to export from PDF to JPG using either Adobe Acrobat or IrfanView.  I had used both successfully in the past for this kind of purpose.  Unfortunately, for some reason Acrobat's Advanced > Document Processing > Batch Processing option was not working for me on this particular day.  Instead, I went into IrfanView > File > Batch Conversion/Rename, set the output format to JPG, selected the folder containing the newly created PDFs, clicked "Add All," set the output directory, and clicked Start Batch. Yet here, again, unknown causes were conspiring against me. IrfanView said, "Error! Can't load [filename]" for each of the PDFs. I tried exporting to PNG instead of JPG; same result. Was I mistaken in believing that IrfanView could produce JPGs from PDFs?  It had worked in a previous post.  Why not now?

A search led to a thread that raised the question of whether the source files were corrupt. I checked several of these newly created PDFs and they did seem to open OK in Acrobat.  So corruption didn't seem to be the issue.

At first, in that previous post, I had gotten that "Can't load" error.  The solution there was to download an upgrade to Ghostscript. Maybe I had updated IrfanView since then and, in the process, had somehow broken the Ghostscript update? I'd also gotten the error, another time, when trying to use IrfanView for audio conversions, which it apparently couldn't do.  So, ah, maybe what I was trying to do was confusing IrfanView.  I tried again with just one PDF.  There didn't seem to be much that could go wrong with that.  But it still didn't work.  When I tried to open a PDF with IrfanView, it gave me a message:

"Decode error! Can't load Ghostscript or Ghostscript error. Install Ghostscript from http://sourceforge.net/projects/ghostscript or http://sourceforge.net.
I went to the former. It pointed me to another page, and that one pointed me to yet another. It looked like Ghostscript had indeed been updated within the past month or so. I downloaded and installed the update. It didn't solve the problem, though possibly a reboot would have helped.

Was there another way to export JPGs or PNGs from PDFs? I tried XnView, which Irfan claimed was somehow built on code appropriated from him. Its interface certainly was very similar in this particular task. It didn't produce any JPGs either. A post reminded me that I hadn't tried IrfanView in command-line mode. The formula I had worked out previously was like this:
i_view32 D:\Folder1\File35.pdf /c=d:\TestFolder\File35.jpg
but now that was giving me Ghostscript errors too. A search led to suggestions to try ImageMagick and iTextSharp. The latter seemed beyond me. There were also a couple of suggestions on using scripting in GhostScript. They were a bit technical for my taste at this time.

It looked like ImageMagick had been around for a long time -- there were books about it -- and I had run across a number of references to it. So I decided to start there. The copy I downloaded from CNET was corrupted, so I got another from the ImageMagick FTP site. Unzipped, the ImageMagick program folder was 146MB and contained 5,694 files. This was no little alternative to IrfanView. This was a doctoral dissertation.  I immediately cast about for a GUI front end -- I couldn't even find its executables -- and alighted on Converseen.  I downloaded it from Softpedia and installed it.  Would it convert my PDFs to JPGs? I named three PDFs to try it. All three failed with this error:
Error: Magick: Postscript delegate failed [filename]: No such file or directory @ error/pdf.c/ReadPDFImage/664
Now, what do you suppose that meant? I tried a search and got a couple of very helpful sites in Chinese. I modified the search and, lo, it looked like that was an ImageMagick error, not a Converseen error. I gathered there could be a couple of possibilities. One was that, as someone reported, this error was likely (for some unknown reason) when converting PDFs that were mostly text, as mine were, rather than mostly image. Also, there seemed to be a regular cascade of programs: Converseen was a front end for ImageMagick which used GhostScript. So I couldn't necessarily be sure which one was responsible. For that matter, I didn't even know where to put ImageMagick; I wasn't seeing a manual. Further reading suggested that resolving problems in these programs could be challenging.

I tried another search. At the top of the list: Boxoft PDF to JPG Converter. Its installation process said, "Requires Ghostscript. Do you want to get a free GPL Ghostscript?" It seemed I was destined to have a computer full of copies of Ghostscript.  Or maybe this installation would be the answer to all of the problems described above.  I downloaded and installed the proffered Ghostscript 8.71.  The process didn't look familiar.  I wasn't sure what I had been doing with Ghostscript previously, but this wasn't it.  So would this perchance fix IrfanView?  I opened IrfanView and, wow, now it ran, no reboot necessary.  The way to fix IrfanView was to install Boxoft (or, possibly, to reinstall IrfanView).

But something was wrong.  Didn't IrfanView normally give me multiple pages, one PDF per page, when I used it to convert PDFs to JPGs?  This time, I was getting only one JPG per PDF, no matter how many pages it had.  I assumed that later Ghostscript installations replaced earlier ones, but now I saw that was not necessarily so.  I went into Control Panel > Programs and Features.  I saw three different Ghostscript items.  I wasn't sure if I could uninstall two of them without screwing up the third one.  I tried it, keeping only the Ghostscript 8.71 that I had just installed.  Boxoft and IrfanView both ran as before.  So cleanup accomplished, but IrfanView problem not fixed.

Since I now had Boxoft installed, I tried that.  Like IrfanView, it had a command-line mode and a GUI mode.  I tried the GUI.  It had multiple options.  I liked it.  It did produce multiple JPGs per PDF.  They looked good.  I ran it again, this time selecting only page 1 (Settings > Common tab > PDF Convert Range), and that's exactly what it gave me.  So I wouldn't have to take a separate step of deleting JPGs for the pages other than page 1 of each PDF.  IrfanView didn't have that option, though it did have image editing options (if e.g., I wanted to change the dots per inch, size, or coloring of the resulting JPGs).

I could have continued to look at other options appearing in that search, but it appeared a working conclusion was to go with Boxoft and/or IrfanView, depending on the kind of output desired, and to make sure I was using just one stable, recent version of Ghostscript.