Sunday, January 29, 2012

JPG: Can't Read File Header - Unknown File Format or File Not Found

I was looking at various JPGs.  I noticed that a number of them produced this error message when I tried to view them in IrfanView.  I had checked the box to activate IrfanView's Unicode plug-in as suggested, and anyway these were not exotic file names.  So I didn't know what this error would mean for these JPGs.

A search indicated that numerous people had encountered this error.  I decided to start by verifying that this was not just a quirk of IrfanView.  It didn't seem to be; I also wasn't able to view these JPGs as icons in Windows Explorer, and when I tried to view them in Firefox, I got an error:  "The image [filename] cannot be displayed because it contains errors."  When I tried Internet Explorer, I got "Your web browser has blocked this site from using an ActiveX control in an unsafe manner."  Windows Photo Viewer said, "Windows Photo Viewer can't open this picture because the file appears to be damaged, corrupted, or is too large."  Chrome didn't show an error message; it just gave me a blank page.  Photoshop said, "Could not complete your request because an unknown or invalid JPEG marker type is found."  Microsoft Paint said, "Paint cannot read this file.  This is not a valid bitmap file, or its format is not currently supported."

Eliminating the Easy Solutions

A search led to indications that some problems of this type could be due to the program, as I had feared in the case of IrfanView.  For instance, one webpage indicated that a faulty Skype extension could produce the foregoing Firefox error.  Presumably an attempt to open the JPG in some other program, as above, would help to clarify whether it was a program issue rather than a JPG issue.  A discussion thread raised the prospect that this kind of thing could result from various kinds of file system or drive problems.  Other webpage said that USB flash drives (especially improperly removed) or Picasa could be an issue.

In some cases, a backup could be a solution, possibly beginning with a DoubleKiller search for other files having the same filename (just in case there might be another copy of the same file somewhere on the computer).  That discussion thread also suggested that it could be a machine-specific problem, but it wasn't in my case:  same problem when trying to open on another computer.

As noted in a previous post, building on an earlier effort and leading to some additional refinements, it was possible to use IrfanView to detect corrupted JPGs scattered around the computer.  The basic idea was to do a search for *.jpg (in e.g., a command window, or using a file finding program like Everything) and then run IrfanView (using either File > Batch or command line methods) to see which JPGs would fail to convert to another format (e.g., PDF).

I had previously reviewed commercial software for fixing JPGs. The prices were generally high and my confidence in them was not great. I only had a few dozen corrupted JPGs and wasn't eager to spend much money on fixing them.  One commenter, responding to my post, said that she'd had generally good results with JPEG Recovery Pro ($50), but she still seemed to be looking for a better solution.  Voters on CNET had given it less than two stars.

One reviewer on CNET (like others) said the demo version of Corel's Paint Shop Pro had been useful.  On CNET, it got 3.5 stars from 466 users, though it didn't look like the dozen users who had rated the current version were quite as pleased with it.  I downloaded the latest version from CNET.  It was large (366MB) and it took a while.  CNET said it would be a 30-day trial version, $60 purchase price after that.  (I could also have downloaded from Corel's website.  Oddly, when I started to do that, the download dialog said their version was only 282MB.)  Unfortunately, when I tried to open a few of my corrupted files, Paint Shop said, "An error occurred while trying to read from the file."  Same outcome with a half-dozen different corrupted files.

Digging into the Files

My efforts (above) suggested that I might or might not be able to find a program that would help to automate the repair of corrupted JPGs.  Assuming I did find such a program, the next question would be whether it would work on all corrupted JPGs.  I was not finding a clear, obvious solution.

In other words, it seemed that, sooner or later, I was going to find myself among those who were talking of manually editing JPGs to fix them.  I hadn't done that before.  I had no idea whether that sort of process could be even partially automated.  But the next step seemed to be that I should see if I could fix at least some simple problems in JPGs.

In a thread cited above, someone said that I could open a JPG in Notepad and could tell, from its first few characters, what kind of file it was.  A GIF would tend to begin with "GIF89," a JPG would begin with "ÿØÿà," and a PNG would start with "‰PNG."  I looked at a couple of my JPGs.  Sure enough, the uncorrupted ones did begin with with "ÿØÿà."  But the corrupted ones I examined didn't have anything like any of these three options.  I knew, anyway, that it wasn't a case of the file being saved with the wrong extension.  If that had been the problem, IrfanView would have caught it (given the program options I had selected) and would have offered to change it to whatever the correct extension should be.

Another site offered a step-by-step guide to the process of editing the JPG.  The editing required a hex editor.  That site recommended Frhed, whose interface was far friendlier than that of HexEdit.  I opened one of my corrupted JPGs in Frhed.  It looked a lot less crazy than it had looked in Notepad.  According to a user-friendly version of the step-by-step guide, the JPG consisted of two sections:  header and image.  Corruption resulted from having a bad header.  Or so we hoped.  The solution was to replace the bad header with a good one.  So I made a backup of the files I would be working on, and set to work.

The user-friendly guide recommended xvi32 instead of Frhed.  It was rated 4.2 out of 5 stars (Very Good) by 47 users (76,213 downloads) at Softpedia -- vastly more than Frhed -- so I downloaded and ran that instead.  I had to bump up its font a bit -- 8-point type seemed unduly ascetic.

Now that I was getting organized, I looked at my first corrupt JPG.  Unlike the other one I had just glanced at, it did not have anything except zeroes.  This file was completely toast.  The next one had data.  The guide said I should look for "ff da" in the hexadecimal data, so I did a Ctrl-F and searched for FF DA.  (For some reason, xvi32 insisted on entering capital letters.)  My search seemed to think that it had succeeded:  it stopped at something that read 9F FA.  Call me crazy, but that did not look exactly like FF DA to me.  I tried searching the same file in Frhed.  A search for ff da found nothing, and a search for just ff didn't find much.  Going for a trifecta, I tried in HexEdit -- and there, I did find ff da. 

These hex editors, sounding an uncertain trumpet, inspired me to search for that funky string noted above -- ÿØÿà -- in Notepad.  It wasn't there.  I tried five other JPGs in Notepad.  No ÿØÿà in any of them.  I felt lost.

According to the user-friendly guide, a hex search for ff da in xvi32 should have had some luck.  I searched a good JPG in Notepad for ÿØÿà and, sure enough, there it was, right at the start of the file.  I took a look at that same good JPG in xvi32 and it, too, found ff da in a hex search -- and this time it really was ff da.  So were we saying that my bad JPGs did not have the ÿØÿà or the ff da that would be necessary for a manual repair job?  Was that why neither Paint Shop Pro nor IrfanView nor anything else had been able to do anything with these files -- were they all completely fubar?  Were people using these hex editors, searching for ff da, when instead they could have achieved the same thing with IrfanView or PSP?

Diagnosis

I had a folder full of JPGs.  Some of them worked; some didn't.  Having made a backup, I started at the beginning and viewed each of these files in IrfanView.  I had set my IrfanView properties so that it would go immediately to the next JPG upon hitting an arrow key or upon deleting a file.  In other words, I could just hold down the Del key until it came to an error, and then hit Enter and right-arrow to get to the next one.  This would delete each good JPG, which was fine, since I didn't need to be editing this copy of it.

That left me with 159 apparently bad PDFs.  Wow.  More than I expected.  This could take a lot of manual editing.  IrfanView couldn't open them, which probably meant nothing else could either.  Just to be sure, I tried opening several dozen of them in Paint Shop Pro.  No dice.

I wondered if there was a fast way of searching all 159 of these files for ÿØÿà.  Copernic Desktop Search didn't find anything like that.  Well, what if I glued a bunch of them together with the COPY command and then viewed the one huge file in Notepad?  To test this approach, I went into a command window and typed "COPY File1.jpg + File2.jpg Newfile.txt," where File1 and File2 were good JPGs.  Notepad told me that File1.jpg and File2.jpg each had exactly one occurrence of ÿØÿà.  How about Newfile.txt?  Yes, indeed, it had two occurrences of ÿØÿà.  So this approach seemed to work, at least for purposes of preserving occurrences of ÿØÿà in a concatenated file.

To run this little test, I needed a more powerful concatenator, unless I was willing to type out 159 filenames:  COPY wouldn't use wildcards (e.g., COPY *.jpg Combined.txt) -- but I had forgotten that /b would fix that:  COPY /b *.jpg Combined.txt.  (I chose a .txt extension so that (a) Notepad would open it automatically and (b) Combined.jpg would not get mixed up in copying itself into itself with the *.jpg wildcard.)  I ran that and then opened Combined.txt.  It was a large file, of course, so Notepad took a while to open it.  Once it was opened, I did a search for ÿØÿà.  I found hundreds of occurrences -- far more than 159.  Did this mean that my concatenation messed things up, or did it mean that some files had multiple occurrences of ÿØÿà?  I tried concatenating just 10 bad JPGs:  I had named them using sequential numbers, starting with ZZZ_0001.jpg, so I was able to use a wildcard to select ten:  COPY /b ZZZ_003*.jpg Thirties.txt.  The results were confusing.  Eventually I wrote a batch file to open each file individually in Notepad.  The batch file contained lines like these:

start notepad.exe ZZZ_0013.jpg
start notepad.exe ZZZ_0027.jpg
I thought I might crash the system if I opened 159 sessions of Notepad at once, so I broke it into four parts of about 40 lines each.  I ran the first one and did a Ctrl-F and then Ctrl-V in each one to paste ÿØÿà on the search line.  Now I had my answer.  Some of these files (e.g., ZZZ_0074.jpg) contained many iterations of ÿØÿà, while others contained none.  I went through them all.  After a while, it wasn't hard to guess that the ones that seemed to be filled with Chinese characters (and there were quite a few of them) would have no occurrences of ÿØÿà, while some but not all of the files containing more familiar if gibberishy characters (e.g., 1 Òhhh) would have at least one such occurrence.  There might also have been a way of speeding up the process by doing spot checks, since it seemed that files near to one another (probably originating from the same folder) (e.g., ZZZ_0058.jpg and ZZZ_0059.jpg) tended to follow the same pattern of having or not having occurrences of ÿØÿà.  In the end, 27 of the 159 had at least one occurrence of ÿØÿà, and 132 did not.

Fun with Hex Editors

These facts seemed to call for two separate approaches.  For the 132 jpgs that had no occurrences of ÿØÿà, maybe the situation was that the front ends had gotten lopped off, and that's why Paint Shop Pro et al. couldn't make anything of them.  What would happen if I just arbitrarily rammed a header onto each of these files, down to the ÿØÿà point?  The answer to that might give me some clues for the minority of files that did have multiple occurrences of ÿØÿà.

In a survey of good JPGs, I noticed that most began with this:
ÿØÿà JFIF         ÿ
There seemed to be at least one invisible character in there, so the best approach (outside of a hex editor) seemed to be be to copy it from the start of a working JPG in Notepad (i.e., not from this webpage), and save it as header.txt.  Then, starting with ZZZ_0013.jpg, the first of my bad JPGs, I typed this:
COPY /b header.txt + ZZZ_0013.jpg new0013.jpg
and then I tried to open new0013.jpg.  IrfanView gave me an error:  "Decode error!  JPEG datastream contains no image."  Tried it with a couple other files; same result.  A brief search suggested that this "decode error" problem could be just as bad as the original one.  So it appeared that this COPY approach was not the answer.

Back at the user-friendly guide, I confirmed that, in their view, the new header approach required me to locate the hex string FF DA, which I had not been able to do in many files.  Given the uncertainties I had encountered in the several hex editors (above), I wondered if there was a way to output the hex contents of a JPG in text form, so that I could do ordinary searches in Notepad (or whatever) to confirm that there was no FF DA in these files.  The solution was easy enough:  Frhed (but not HexEdit or xvi32) had an option, File > Export as hexdump, that gave me a text file displaying the hex data.  So if I wanted to see a file's hex in a text file, I could use that; and if I wanted to see a file's ASCII in a text file, I could use Notepad.  Xvi32 did have a File > Print option that gave me pretty printed pages (in e.g., PDF or hard copy), displaying both hex and ASCII, so I could have searched for either text or hex values in its PDF output.

But now, this was odd.  When I searched ZZZ_0030.jpg for "ff da" in Frhed, it found nothing; but when I searched Frhed's hex dump text file for "ff da" in Notepad, it found multiple occurrences.  Ah, but the problem seemed to be that Frhed was searching the ASCII side, not the hex.  Frhed's search box told me to consult the online help for guidance, but the program contained no link to any webpage as far as I could tell.  Xvi32's Find option let me search for either text or hex, but as noted above, it often led to 9F instead of FF.  Back in Frhed's hex dump file, I searched for the first occurrence of ff da.  It was on row 0050b8 (which the hex dump displayed as 0050b835, mistakenly running together the row number, it seemed, with the first column of data).  I tried to locate row 50b8 in xvi32, but there was no such row.  I guessed that 0050b8 must refer to a specific location (such as the "ff" in "ff da"), so what I was calling "rows" would actually have different numbers within the same file, according to how many data points were being displayed on a single row onscreen in the hex editor.  Armed with that theory, I did now see an occurrence of FF DA in xvi32, near where location 0050b8 should be; and when I clicked on that occurrence, the status bars in both Frhed and xvi32 displayed indications that I was at "hex address" or "offset" 50CD.

OK, so what could I do with this information?  There seemed to be some confusion here.  Those webpages had said that I was looking for ÿØÿà.  An ASCII code list told me that the ASCII code for ÿ was 152.  (This meant that I could type ÿ by holding down Alt and hitting 152 on the keyboard's numeric keypad.  It turned out that not all ASCII code lists agreed:  some versions of the extended ASCII table (most, it seemed) would say that 152 would give me a tilde (~), but that wasn't the tale told by my keyboard.)  According to my preferred ASCII list, the ÿ had a decimal value of 152 and a hex value of 0x98.  A converter told me that 152 in decimal = 98 in hexadecimal, so that added up.

So, ahh, now I thought maybe I was figuring this out.  Looking at ZZZ_0030.jpg in xvi32, I noticed that selecting FF in the hex area would highlight ÿ in the ASCII area.  As far as xvi32 was concerned, ÿ = FF, not 152.  I tried FF in my hex converter.  It said FF in hex = 255.  Well, and of course it did:  FF was as high as hex would go.  In hex, you don't count from 0 to 9; you count 0 to 9, and after 9 you continue on with a, b,c, d, e, and then f.  What we call 16 in decimal is called F in hex.  After F, you start over back at 0 in hex, just as you start back over at 0 after 9 in base-10 (decimal) counting.  So FF in hex actually meant 0FF:  it was similar to 099 in decimal.  After 099 comes 100 in decimal; after FF comes 100 in hex.  So never mind my keyboard:  these hex programs were interpreting FF as 255, and that was the last possible number in the 256-character extended ASCII set (beginning with zero).  In these ASCII code lists, 255 was not represented by ÿ or any other character.  Apparently it had some special meaning.  To clean up a loose end, I found an occurrence of hex 98 in xvi32, clicked on it, and saw that, sure enough, it was linked with the ASCII tilde.

I looked at a good JPG file in Frhed.  It began, as shown above, with "ÿØÿà JFIF ÿ" -- that is, with this set of hex values:
ff d8 ff e0 00 10 4a 46 49 46 00 01 01 01 01 2c 01 2c 00 00 ff
After that ending ff, the contents of the good JPGs seemed to diverge.  That 17th character seemed to be where the image content began.  Would I get a working file if, instead of pasting in that header.txt file (which, being represented in ASCII, was apparently not able to capture all of the hex nuances), I pasted these 17 (or maybe 16) codes at the start of a bad JPG?  Or, wait.  After the hex dump search experience (above), was I sure that this sequence, or part of it, was not already in those files?  The Notepad search for ÿØÿà had produced mixed results, but maybe that wasn't the right way to go about it.

(Note that the second character, Ø, was represented by d8.  There were apparently several different Ø-like characters in use in different languages.  The hex calculator indicated that d8 in hex meant 216 in decimal.  But when I typed Alt-216, I got ╪, not Ø.  That seemed to be an error, according to indications of what I should have gotten in the Latin-1 (ISO-8859-1) character set.  The answer was that I should have been typing Alt-0216, not Alt-216.)

In Frhed, I opened one of those JPGs in which, using Notepad, I had found no occurrences of ÿØÿà.  I did a search (Ctrl-F) for ÿØ, which I could either have pasted into the search box or entered via Alt-152, Alt-0216.  I did find an occurrence of ÿØ.  I noticed that it came shortly after a big section full of zeroes, which made me think that much of that particular file might be gone forever.  I tried again, this time searching for the full ÿØÿà (Alt-152, Alt-0216, Alt-152, Alt-133).  Nothing found.  So Frhed seemed consistent with Notepad in that particular search, at least in this file.

By this point, I was a bit lost.  It did occur to me that I might be able to automate the triage of potentially salvageable JPGs by doing a hex dump, counting the occurrences of 00 (nothing, empty space) as a percentage of the total number of hex values in the file, doing some sampling of partially zeroed but still readable JPGs, and identifying a threshold (10%?  20%) beyond which a JPG would not be worth saving.  But I wasn't there yet, because my JPGs weren't readable at all, and they did have non-zero data.  Most of them, that is; I had found a total of two that were completely empty.

A Bit of Clarity

I went back to the first of my bad JPGs that had no occurrences of ÿØÿà.  My hex editors all seemed to have some way to insert characters or a whole file.  The latter, offered by xvi32 and Frhed, seemed easier, so in xvi32 I went into File > Insert.  I had not yet created the file that I wanted to insert, so now I went into Notepad, pasted my string of 17 hex characters (above), saved it as Header.txt, and proceeded to insert that into my bad JPG in xvi32.  Oops:  that pasted the hex codes (ff d8 ff e0 ...) as text, not as hex.  Xvi32 also gave me an Edit > Insert > Hex string option, so I tried that.  That worked.  I saved the bad JPG and tried opening it in IrfanView.  This gave me a new error:  "Decode error!  Bogus marker length."  Unfortunately, a search led nowhere from that.  Another search produced more, including a FileFormat.info webpage that said this:
The first two bytes of every JPEG stream are the Start Of Image (SOI) marker values FFh D8h. In a JFIF-compliant file there is a JFIF APP0 (Application) marker, immediately following the SOI, which consists of the marker code values FFh E0h and the characters JFIF in the marker data, as described in the next section. In addition to the JFIF marker segment, there may be one or more optional JFIF extension marker segments, followed by the actual image data.
This was helpful.  It seemed that all I really needed, after all, was the FF d8 bytes.  The rest of "ÿØÿà JFIF" was perhaps related to JFIF compliance, but I didn't know if I needed that to produce a file that a program like IrfanView could read.  So I went back into xvi32, went to the 17th byte (i.e., the last one I had just entered) and used the Edit > Delete to Cursor option to delete what I had just added, and then used Edit > Insert String to add back ff d8.  I saved and tried again.  Now we were back to the "Can't read file header" error in Irfanview.  Another FileFormat webpage seemed to say that, to have a JPG file, all you needed was the first four bytes (ff d8 ff e0).  In xvi32, I went into File > New, inserted those four bytes, and saved that file as Test.jpg.  IrfanView gave me an error:  "JPEG datastream contains no image."  I inserted four empty bytes (i.e., eight zeroes) after those four header bytes but still got that error.  I replaced those with bytes 5 through 8 from a good JPG and tried again.  Still the same error.

Another search led to a webpage that seemed to explain something I had noticed in another FileInfo webpage:  it seemed that a JPG file was (or at least could be) defined as one beginning with FF D8 and ending with FF D9.  This page also explained that JFIF was an alternative to EXIF, but I wasn't sure whether I needed either of them.  It seemed that I hadn't really added four data bytes, when I added bytes 5 through 8:  I was just adding the JFIF part of the header.  Assuming I had to have either EXIF or JFIF, in xvi32 I now modified Test.jpg so that it contained what appeared to be the standard JFIF header (ff d8 ff e0 00 10 4a 46), then added four bytes (90 60 1B 88), then added FF D9 at the end, saved, and tried again.  Still "contains no image."  I looked at Test.jpg in Notepad.  Interestingly, it looked like I had made a start on one of those Chinese-looking files.

A more focused search tended to confirm the thought that I was getting in over my head and/or that there just might not be a solution.  People were talking about serious programming, and they were also giving me the impression that, of course, the image data for a JPG would state, or be influenced by, its size, color, compression, and other factors.  This was probably why the user-friendly webpage advised finding a header from a JPG of similar size, if possible, taken by the same camera and edited with the same software.  I didn't have that kind of knowledge about these particular JPGs, so my chances of adding a good header were limited.
The user-friendly webpage hadn't actually been very clear, to me, so I returned to the original advice page that it was trying to present in more user-friendly terms.  That page confirmed that there would typically be "several" occurrences of ff da in a JPG.  This suggested that a file without any such occurrences, searched in a hex editor or dump rather than in Notepad, could be beyond saving.  I had decided I wasn't going to make that determination today, though.  If I couldn't save something now, I was going to zip it up and save it until maybe some better tool came along.

The advice here was to look for the *second* occurrence of ff da, occurring somewhere around 2000 to 4000 bytes into the file.  I hadn't understood that from the other page.  Everything up to that second occurrence was supposedly part of the header; everything after it was image data.  So I would be replacing all of that header section with a similar section from a good JPG.  If this was right, then my attempt to create Test.jpg (above) appaerently needed a second occurrence of ff d8, followed by something resembling image data, in order to work.

At about this point, I discovered that I might have been confusing ff d8 and ff da.  Both appear in the foregoing paragraphs, and I was no longer sure which one I was supposed to be interested in.  A look at a working JPG called Good.jpg indicated that the first two ASCII characters (ÿØ -- which, pronounced yo!, could be a great way to start a JPG) were represented by hex FF D8.  But the original advice page was saying that the boundary between header and image data was marked by FF DA, not FF D8.  So apparently I confused that.  In Good.jpg, FF DA -- that is, ASCII characters ÿÚ, produced (as I now realized) by Alt-0255, Alt-0218 -- first appeared at location (would it be called "offset" or perhaps "byte number"?) 261. This was not nearly as far into the file as the advisor had suggested.  Perhaps it varied with the contents of the file.  But, no, again, this was the first occurrence of FF DA, not the second.  In xvi32, I hit F3 to repeat the search.  But xvi32 again took me to an instance of 9F DA, not FF DA.  I tried the same thing in Fhred, finding once again that it searched for ASCII, not hex; so I searched Good.jpg, in Fhred, for ÿÚ.  It said the offset or address of the first hit was at 609 or, in hex terms, 0x261.  But it could not find any more occurrences.  Was that why Fhred had taken me to the irrelevant 9F DA -- because there was only one FF DA?  (Probably not, I decided later; probably it did that because it was accepting Ÿ as equivalent to ÿ, since I had not specified a case sensitive search.)  HexEdit, too, appeared to be finding only one occurrence of FF DA in Good.jpg.

A Glimpse of Light

It seemed that I would have to try to make this work using the first rather than the second occurrence of FF DA in Good.jpg.  In Fhred, with the cursor blinking on DA at offset 609, I went into Edit > Select Block.  There, I typed x0 as the Start of Selection, and left the End of Selection at x262.  I clicked OK.  This selected everything from the start of the file to DA at offset 609.  Then I realized I was doing this in the wrong file.  But it was OK.  There seemed to be another way to proceed.  In xvi32, I went into File > Export as Hexdump.  I left the same range (x0 to x262) and selected Export to File and Just Hex Digits on a Line.  I saved it as Header.txt.  Then I closed Good.jpg.

Hopefully that gave me a working header.  In Fhred, I went into one of my bad JPGs and searched for FF DA.  It found nothing.  I double-checked, doing the same search in xvi32.  It found an occurrence of FF DA.  Plainly, I had still not quite gotten the hang of using these hex editors, or else maybe Fhred really was buggy, as the writer of the user-friendly webpage believed.

Viewing Bad.jpg in xvi32, the first occurrence of FF DA that xvi32 found was near the end of the file.  That couldn't be an end-of-header marker, could it?  The hex address was 4E6D4, indicating that there was a lot of data before this point.  My guess was that this was one of the later occurrences of FF DA, not the early occurrence that would mark the header.  Could I fix this bad JPG by just attaching Header.txt at the start of the file?  In Fhred, I used Ctrl-Home to go to the start of Bad.jpg.  There, I saw nothing that looked like ÿØ, which presumably would have appeared at the start of any good JPG.  I went into File > Import from Hexdump.  I named the newly created Header.txt as my source and clicked OK.  I got a question:  "Does this data have the same format as the Fhred display?  This data contains only whitespace and hexdigits. (unlike Fhred display)."  I assumed that a header would naturally get a question like this, so I clicked Yes to proceed.  It said, "Unexpected end of data found.  Cannot continue!  Do you want to keep what has been found so far?"  I clicked Yes.  It gave me a blank screen.  As you might have predicted, this did't fix Bad.jpg.

I tried again, this time following the instructions more closely, in case that made a difference.  Specifically, in xvi32, I saved Good.jpg as Header.txt, searched it for FF DA (case sensitive), went to the byte immediately after DA, and then went into Edit > Delete from Cursor.  Finally:  Header.txt really did contain only the information up through FF DA.  I saved it and opened Bad.jpg.  Still in xvi32, and with the cursor located at the start of Bad.jpg, I went into Edit > Insert > Header.txt.  This, I hoped, would prepend a good header to the image body of Bad.jpg and heal it.  I saved Bad.jpg and tried opening it in IrfanView.  Sadly, I got "Bogus marker length."  Paint Shop Pro couldn't open it either.

I tried again with a different Bad.jpg.  This was one of the files that I had identified (above) as having many instances of ÿØÿà, whereas the previous Bad.jpg (i.e., the one I had just been experimenting with) had none.  I searched for the first instance of FF DA, used Ctrl-Shift-PgUp to mark everything up and through FF DA, pressed Del to delete it, and then inserted Header.txt at the start of the file.  I saved it and tried opening it in IrfanView.  Again, "Bogus marker length."  I tried again, this time going to the second instance of FF DA.  "Bogus marker length" once again.

Wrap-Up

I was out of time for this project.  Perhaps some ideas would come to me later, or I would become aware of some new program or technique.  As always, comments and suggestions were welcome.  In the meantime, all I could do at this point was to archive these bad JPGs in a zip file and put them aside.

11 comments:

dragonfly888

That was an exceptional analysis of the JPG Header problem. I'd like to find out if you have had any further revelations. I happened upon your blog as a result of having a couple of JPGs that have the same issue with the File Header. My JPGs were created with Snag-It (screen capture software), inadvertently deleted by myself, and recovered with a program called "Recuva". Once recovered, the JPGS were unreadable. It is not critical that I recover these files, but posed a challenge to me... that's why I am here ! Please let me know if you have success !

raywood

I'm all used up on that one. Completely spent. Burned out. Done. Fugeddaboutit. I had to zip the toasted JPGs and learn to pretend they don't exist anymore. Gone! But if I ever work up the courage to unzip them and take another look, I'll probably post something about it here. Good luck ...

Orthodox Daily

100% the best explanation of this problem and I've been dealing with it for YEARS.

Anonymous

I had a similar problem. I was organizing my pictures within same folder. Just creating folders with the names of the places I have taken the pictures and drag and dropping the relevant pictures into these folders. I even checked some of the files and they were ok.
Additionally I did backup these files after a while but never opened them before the backup. Howevever when I wanted to open these files, most of them were corrupted and wont open.
Their sizes were just like original picture sizes around 2.5 to 4 MB.
I opened them in notepad none of them begins with "ÿØÿà" as they should.
They were full of chinese characters. I copied some of them and in google translate, they were interpreted as Chinese. Even some chinese to english translation is made:)))))
Here is an example for first two line:
ORIGINAL CHINESE :
チ휊敖ⲱ⚗᱖趃賥זּ磴㊊愔焬ċ㒑枧臒鍝栛릡컙ᤘ鄔藳ꈢ䳛???佻鞘꼈峵韸???㤼懬丄곷氪杌㞬鐳桾룅㈔誮⫂붷禵䱼ᦅ氲㩍⟚⭭့㛍
GOOGLE TRANSLATION:
Wo Yuli the チ 휊 Ao ⲱ ⚗ ᱖ Tusui זּ Ishidan ㊊ serene the Yi ċ 㒑 soap to 릡컙 ᤘ Wu the Kao ꈢ 䳛??? The naughty  sheath 꼈 Rongpeng??? 㤼  Kuang Shang 곷, krypton Rongji 㞬 the radium Jun 룅 ㈔ Hua⫂ 붷  the Ti 䱼 ᦅ Yun 㩍 ⟚ ⭭ ့

I desperately need my pictures. I dont know what to do.
I can give more examples of chinese text. I am planning to use hex editors, didnt have tie yet. Please help...

raywood

I'm sorry. I haven't become any more expert in solving this kind of problem since the time when I entered this post. I suspect the Chinese translation you are getting is just gibberish.

If I were in your position, I would try a couple of steps: (1) See whether data recovery programs like Recuva may help you to recover previous (deleted) versions of these files from any hard drives where the files may ever have been located. (2) Explore techniques that others have used for fixing corrupted JPGs.

I would probably start with a file that seems fairly representative of the others, and work on that one extensively. It may take a lot of time, but if you can find a solution to that one file, maybe that will lead toward a relatively easier solution for many others.

I wouldn't do any more work on the hard drive containing those pictures until you've explored the data recovery option. Instead, you might try copying the files over to another drive and working on them there.

Please report back here if you do find any helpful tips or other advice. Many people have this sort of problem.

Good luck.

Anonymous

HI,
My CF card went corrupt and irfanview showed the bogus marker lenght.
I used Kroll ontrack Easy recovery (cracked, not bought I must admit) to do an advanced recovery.
It took 3 hours scanning and another 1.15 hour to recover, but the images are saved !
Maybe it can help you to recover your images.
Good luck.

niculaegeorge

It seems to me that all of this could have been avoided by simply running a scan-disk(chkdsk) and preventing the corruption of the file system on that drive, but now it's too late for that.

Anonymous

Hello Ray et al
I received the "can't read file header" message when trying to open certain JPGs that I copied from my HD onto a USB stick to view using a beamer.
The files appeared fine on the HD but ended up corrupted on the stick.
When I reformatted the stick and repeated the process, the copied files were also in order.
One cause of all this trouble would appear therefore to be bad disk sectors or som such thing.
My problem has been solved but this is of course no answer to yours. I thought I'd pass this on to you. Thanks a lot for all the detailed information. Best of luck
Ray Russell

Paul

An exceptionally clear piece of writing. Nice job.

I'm still wondering how such corruption occurs in the first place (stray magnetic fields? Weak error-correction routines? ) and if there's a larger lesson here, such as: Is corruption inevitable if a file copied enough times? Are .jpg files especially sensitive to such corruption somehow? Should some other format be used for long term storage or multiple copy operations?

Anonymous

Same issue. Have Linux Puppy on same machine/drive. No problem opening opening those files in it. On one occasion actually seemed to "reset" something so they opened in XP on next boot. Moved the files into a different folder - still open in Linux, but not XP.

Anonymous

(much later in time, I know).

Keep in mind that if your machine is using an SSD drive, it's very difficult if not impossible to restore a deleted file since space is allocated in a different manner. This would apply to any number of machines in the "ultrabook" class, including Mac Air, et. al.