Saturday, July 28, 2007

System Crash Woes

I assembled a new computer, and it kept crashing -- sometimes on bootup, and sometimes in the middle of nothing, out of the blue, as I was working away on something. It wasn't actually a new computer, per se. I just replaced key components. I kept the old case, hard drives, floppy drive, CD/DVD drive, and peripherals (monitor, keyboard, etc.). What I replaced was the motherboard, video card, processor (CPU), and RAM. I had already done overnight tests on the RAM, using a separate bootable CD and diagnostic program supplied for the purpose, and was confident that the RAM was OK. The motherboard, in detail, was an MSI (Micro-Star International) P6N SLI Platinum LGA 775 NVIDIA nForce 650i SLI ATX. (The links, there and below, go to the respective Newegg pages. Newegg was a highly rated computer component merchant. The links show that these items received high marks from a relatively large number of purchasers.) The product information page itself no longer seemed to have the model number, but the manual for that product provided an MSI model number of MS-7350, and on an older version of the product page I had seen a reference to "MSI Part No: MS-7350-020." (I noticed, when using Crucial's Memory Advisor Tool, that for some reason the MSI P6N SLI-FI motherboard also apparently had the same MS-7350 part number.) The video card I was using was an EVGA 256-P2-N624-AR GeForce 7900GS 256MB GDDR3 PCI Express x16 KO. The CPU was an Intel Core 2 Duo E4300 Allendale 1.8GHz LGA 775, model BX80557E4300. All were retail purchases (i.e., not refurbished, OEM, or open-box). Finally, the RAM (the only item not purchased from Newegg) consisted of four 1GB sticks of Crucial Ballistix 240-Pin SDRAM DDR2 800 (PC2 6400) Dual Channel, model BL2KIT12864AA804 (CAS latency 4). I bought this equipment with an eye toward the future. I had a bit of time to devote to an upgrade (though certainly not as much time as the process wound up taking!), and I expected to use this upgrade for several years. I had bought a new motherboard just the previous year, but the system before that had lasted since 2003. I hesitated to replace the one-year-old motherboard; I did so primarily because I anticipated increased system load as I moved toward running Windows XP Professional inside VMware on a Linux system. (I have already posted some remarks on my recent efforts with VMware.) I had also noticed some slowdowns while doing research, when I might have a number of webpages and PDF files open. For both such reasons, RAM had come to seem like a particular bottleneck; hence, I had also explored the possibility of adding non-system RAM for purposes of creating a RAM drive. But I was experiencing crashes with this new hardware. Lots of crashes. Some of them, I had figured out, were related to the effects of installing Ubuntu Linux in a dual-boot arrangement. But it seemed that I had sorted out those issues, and yet the crashes persisted. So this appeared to be yet another aspect of the upgrade process that would require detailed and determined attention if I hoped to resolve it. The time had come to do so: I was nearing the end of the 30-day return period for these items. If one of them was defective, I needed to figure that out. I had already replaced the motherboard once. Newegg had accepted the first one for refund and had sold me another one. Both the first one and its replacement had displayed what appeared to be a CMOS problem on bootup (especially after extended power-down), not remediable by mere battery replacement. It was a problem that some other users had also mentioned. The BIOS would forget its settings and I would have to go back into setup. This was not anything that I had experienced with other working mobos. It did not seem, to me, that this problem would explain why, sometimes, the system would barely boot, and at other times I could work all day without a problem. I honestly did not know, at this point, whether zero, one, or both of these MSI motherboards were defective from MSI's perspective. In the VMware/Ubuntu struggles just mentioned, I had discovered that installing Linux would screw up the system's master boot record (MBR), so that Windows would crash shortly after booting. I had learned how to use FIXMBR, booting from the Windows XP installation CD, as described in more detail in that other posting, to counteract this Linux problem, and had also essentially ceased dual booting except with the aid of a Super Grub Disk. For practical purposes, this was now a Windows machine that happened to have some unused Linux partitions on one of its four hard drives of varying sizes (two of which were primarily intended -- once I got the system organized -- for offline or infrequent storage). I used the WinXP install CD, not only to run FIXMBR, but also to run CHKDSK /R from the Recovery Console. It seemed that CHKDSK found and fixed errors almost every time I ran it. This, too, seemed unusual. Since three of the drives were large, it could take quite a while to run CHKDSK /R; and the need to do it frequently meant that, as at the time of this writing, I had to work on the laptop while waiting for the desktop to go through its maintenance paces. For troubleshooting purposes, I disconnected all of the (physical) drives except the boot drive. This one contained four partitions or logical (i.e., not physical) drives. The first, which Windows saw as drive C, was the PROGRAMS drive, so called because (of course) this was where I installed Windows and other program software. The second, drive D, was called STATIC because its contents were relatively unchanging. They consisted primarily of (a) software that I had to have somewhere on the system in order to do upgrade installations and (b) the original installation materials that I had used to install programs on C. In category (a), an example was that, at one point, I had been using an older version of Microsoft Frontpage, but then I had bought an upgrade. To avoid having to insert the old CD each time I wanted to make a modification to Frontpage (or, indeed, even the first time, when I was doing the upgrade), I copied the necessary program files from the old CD to a folder on D, and pointed the upgrade installer to that location. Until just recently, I had been doing this with the WinXP \I386 folder, and probably could have continued to do so. In category (b), I found that it was helpful to keep separate folders to store the programs that I installed on C. So let's suppose I used PowerQuest's Drive Image 2002 (no longer available; absorbed by Norton Ghost, which, I had decided, was an inferior product; probably best replaced at this point by Acronis True Image Workstation) to make an image of my disk on July 1. Then I installed a bunch of new programs. I would always try to install those programs from downloads or CDs, not on-the-fly from a webpage. That way, if something went wrong in the installation of some other program, I could quickly retrieve and retry the rest of the things I had installed. I would put all of the programs installed since July 1 (or, in the case of a CD, I would put a text file directing me to the CD), along with text files containing other information (e.g., pointing me to a website that would not let me download the software I was going to install, or providing configuration instructions for whatever I was installing), into a folder labeled "Installed Since July 1." So drive D contained a series of "Installed Since" folders full of software and notes. Drive E was a DATA work area, which I kept backed-up, and drive F was a big work area (for e.g., AVI files) that were too big to back up onto DVD/RW and were therefore not often backed up, but that could generally be restored (from e.g., the videotape from which I would have created the AVI). Since I did not burn all of my Drive Image PQI files (i.e., disk images) to DVD, drive F -- which I called the BACKROOM drive -- also contained large drive image files that were not backed up. (That is, I might have burned a backup once a month, but I might have made drive images of drives C and D several times a month.) At the point at which I began this particular post, as I say, I had disconnected the other drives, leaving only partitions C, D, E, and F (on a 500GB Maxtor SATA drive) visible to Windows. I had run Recovery Console from the WinXP installation disk, and in Recovery Console I had run FIXMBR (twice, for good measure) and had also run CHKDSK /R repeatedly on each partition until it no longer came up with errors (twice for two partitions, once for the other two). I then attempted to reboot the system into Windows XP (Normal Mode, as distinct from Safe Mode, which was typically available by hitting the F8 key (repeatedly, because it was easy to miss the window of opportunity) shortly after reboot). (Note that you would get a different, shorter set of options if you weren't using F8 -- if the machine was automatically taking you to a menu because the preceding boot failed). This, then, was the actual starting point for the effort chronicled in this particular posting. The attempt to reboot into Windows XP Normal Mode failed. I did not see exactly where it died, so I tried again. I had noticed, in recent weeks, that in some instances the system would progress to a different point on reboots, so I let it run a couple of times. This time around, it was not like that. I saw, in each instance, that it would show the mostly black screen with the big Windows XP logo; I saw that the progress bar at the bottom (consisting of three dots that would move from left to right, over and over again) would be moving normally, at first, but then would freeze; and then the system would reboot. Next time around, it would offer to reboot in Safe Mode, and when there were no takers it would go on to make its bid for Normal Mode again. After three or four tries at that, I saw that it was making no progress, but was instead rebooting at the same place each time. So then I selected Safe Mode, and it succeeded there. I dimly recalled that Windows maintained a boot log, or at least that you could set it to do so, so I started with the Microsoft page on that. They said, sure enough, that the NTBTLOG.TXT file was located in the %windir% directory -- which, as I translated it, meant C:\Windows. I did find that file there, and opened it in Notepad. It was a voluminous file. It seemed to contain the cumulative records of all programs (or at least drivers) that had been loaded, or not, in each bootup for some weeks into the past. I noticed, at the top of the file, that the first bootup's record began with the words "Service Pack 2," followed by the date and time. So I went to the bottom and worked my way up to the first occurrence of those same words. From there to the end of the file, I noticed, were a lot of instances when drivers were not loaded, which was what I would have expected from the record of the current Safe Mode bootup. Right before that, as it seemed from the starting time, would be the record of the next-to-last boot effort, which had involved a failed attempt to boot into Normal Mode, as described above. But on closer examination, it looked like there were a bunch of drivers installed in a row, one after the next, ending with MUP.SYS (used to be AGP440.SYS, but I no longer have an AGP video card), followed by a few others later on. MUP.SYS was suspicious because that was the last of the drivers that my system would report that it was loading, when it booted into Safe Mode. So I was not confident that I was reading this thing right -- that I was really looking at the log of attempts to boot into Normal Mode as well as Safe Mode. A PC Answers webpage told me that the NTBTLOG.TXT file was not very human-readable, and that I should use a boot log analyzer program instead. They pointed me toward, first, the evaluation version of RegRun 3 from Greatis Software and, second, the free A1 Bootlog Analyzer. A report of a Microsoft presentation made me think that I had been mistaken in going to Recovery Console first. Recovery Console, they said, was for situations where you could not get into Safe Mode. If you could get into Safe Mode, that was where you were supposed to begin your troubleshooting efforts. So, OK. Now I knew. The guy giving the presentation also reminded me that you could set these boot options by running MSCONFIG, inside Windows, from the command line or from Start > Run. The first idea I got, from starting to read through that long Microsoft presentation, was that I had a lot of old, extraneous stuff in NTBTLOG.TXT that I didn't want to analyze, either by myself or with the aid of a program. So I decided to rename it to NTBTLOG.OLD, and then I rebooted. I figured I'd have one go-round with Normal Mode, and then I'd come back into Safe Mode and see what my new, streamlined NTBTLOG.TXT might say. The guy in the Microsoft presentation, George Vordenbaum, said that NTBTLOG.TXT should capture that sort of information. Sure enough, back in Safe Mode, there was my new NTBTLOG.TXT file. But it looked like it had only the record of my reboot into Safe Mode, not of the immediately preceding effort to get into Normal Mode. Before fooling around with some analyzer program, I wanted to make sure of what we were analyzing. Looking again at the Microsoft presentation, although I don't think they said it in so many words, the idea seemed to be that the system would always log Safe Mode boots, but you had to choose the Enable Boot Logging option from the F8 menu, shortly after rebooting, and then make your way back toward Normal Mode, and finally wind up back in Safe Mode after crashing, where you could then analyze the appropriate parts of NTBTLOG.TXT. I had assumed that boot logging was already enabled, and that this was why I already had a NTBTLOG.TXT file. So now I renamed my NTBTLOG files, thus far, to be NTBTLOG.OLD1 and .OLD2, and rebooted, proceeding with F8 etc. as just described. Well, it seemed I still did not have the concept. NTBTLOG.TXT and NTBTLOG.OLD2, which I had created just before I rebooted and said Enable Boot Logging, were identical to one another. It looked like I was still logging nothing other than the re-entry into Safe Mode. It seemed like I did understand what the Microsoft guy was saying; but it wasn't turning out as he said -- unless, of course, we were indeed logging Normal Mode, and only Normal Mode, all along. To determine whether that might be the case, while I was in Safe Mode I deleted NTBTLOG.TXT and then synchronized my wristwatch and the computer's system time; and then I rebooted and made note of when the thing tried to enter Normal Mode and when it returned to Safe Mode; and then I compared those times against the time shown at the top of my new NTBTLOG.TXT. The system began to boot Normal Mode at about 1:58:01 PM, and returned to Safe Mode at about 1:58:45 PM. The time shown at the top of the new NTBTLOG.TXT (and the only time shown in that file), back in Safe Mode, was 1:58:44 PM, and Windows Explorer reported the file creation time, for NTBTLOG.TXT, as being 1:59 PM. From this, I concluded that I was seeing only Safe Mode information, and was not yet successfully logging Normal Mode boot efforts. That Microsoft presentation also said I could look at Device Manager (Start > Settings > Control Panel > System > Hardware > Device Manager), while I was in Safe Mode, and see if it gave any hints as to possible problems. It showed a yellow circle with an exclamation next to my APC battery backup. I figured this was just because Safe Mode was not able to communicate properly with the battery backup; but just in case, I unplugged the USB cable leading to the battery backup. That eliminated the yellow exclamation circle from Device Manager. Then I resumed my search for a way to log the details of the system's attempt to boot into normal mode. Something I hadn't realized, that emerged as I continued reading that Microsoft presentation, was that you could install Recovery Console on your hard drive, so that it wouldn't be necessary to boot from the CD. To do that, it sounded like you would boot into Windows, open a command window, navigate to the I386 folder on the WinXP CD, and run WINNT32.EXE /CMDCONS. I didn't experiment with this now, however. There were more details -- about how doing this would change your bootup, for example -- that I didn't want to get into. Another Microsoft page informed me about Event Logs. These, however, seemed to be available only within Normal Mode, and weren't really on point. Another Microsoft webpage made it sound like the boot log option was available only for boots into Safe Mode, though that didn't explain why the system automatically headed for Normal Mode as soon as I selected the boot log option. Yet another Microsoft page stated unequivocally that boot logging definitely recorded both Normal and Safe Mode attempts. So I tried again, selecting the Enable Boot Logging option. Again, this gave me a NTBTLOG.TXT that definitely appeared to contain only a record of the Safe Mode boot. Without deleting that NTBTLOG.TXT, I rebooted again, hit F8, did not select the Enable Boot Logging option, and went straight into Safe Mode. I looked again at NTBTLOG.TXT. Now it was a longer file, consisting of the records of two bootup attempts, and both looked pretty much the same. I concluded that the Enable Boot Logging option was not enabling boot logging of my attempt to boot into Normal Mode, and that Safe Mode was logged automatically. There was another way. You could enable boot logging so that it would always happen. Microsoft said so. To find out how, I went into Start > Help and Support Center and did a search for "boot log." But I didn't initially find what I was looking for. Instead, I found something there that seemed likely to be useful for troubleshooting purposes on an ongoing basis. It said you can set Windows to log what happens if the system stops unexpectedly -- not only at bootup, but anytime, apparently. They said you can save the relevant information in three memory dump formats: small, kernel, or complete. It seemed that, if I wanted this available on an ongoing basis, I would need to maintain a paging file on drive C, which normally I didn't do because I'd heard you get better performance by putting your paging file on a drive other than the program drive. The sizes of the required paging files varied quite a bit. For a small memory dump, they would need a paging file of as little as 2MB. For a kernel memory dump, the size was supposedly going to be somewhere between 50 and 800 MB. And for a complete memory dump, I would need a paging file at least slightly larger than the total of all physical RAM, which would mean between 2.7GB (if they were talking about the maximum that 32-bit WinXP could recognize) and 4GB (if they meant all installed RAM). I figured that a tiny, 3MB paging file would not harm system performance much -- I assumed that a drive that small would fill quickly, and therefore much of the contents of RAM would continue to be paged off to some other drive. Also, small was better: if I was going to have to analyze this myself, I didn't want to be dealing with gigabytes of system data. So I went into Control Panel > System > Advanced > Performance > Advanced > Virtual Memory and selected drive C. I specified a Custom Size with 3 (MB) as both the initial and maximum size, and then made sure to click on Set before okaying out of there. I didn't elect to restart the system yet. Instead, back in XP's built-in Help and Support Center, I saw that I had to do something else to enable this logging process. In Control Panel > System > Advanced > Startup and Recovery, I had to select the desired options. I was pleased to observe that the options I was most concerned about were already set: Write an event to the system log, and make it a small memory dump. I decided to shut off the option to Automatically Restart, since I had found it irritating to think that my system might be rebooting itself fifty times while I was temporarily distracted. The system was also already set to write the small memory dump; it would be going to a place called %SystemRoot%\Minidump, where I assumed %SystemRoot% meant C:\. I wasn't finding that any Minidump place presently existed, which made sense, considering that I hadn't had my pagefile on C for the dump. While I was there in that Startup and Recovery dialog, I clicked on the option to "edit the startup options file manually," and, lo and behold, here we had the boot.ini file. So if I could just figure out what to add to it, it seemed that my next reboot might have information about the Normal Mode crash in two separate locations: in Minidump, and also in NTBTLOG.TXT. For guidance on boot.ini, I turned to Microsoft again, for an indication that I could add the /bootlog parameter to the WinXP line in boot.ini. My guess, then, was that the thing should look like this (allowing one line from boot.ini to break into two, here, for purposes of readability):

multi(0)disk(0)rdisk(0)partition(1)\WINDOWS="Windows XP Home Edition" /fastdetect /bootlog
I noticed that my boot.ini also used a parameter that read /noexecute=optin. This, it seemed, was a security item that didn't relate to the present issue. I clicked OK and I got a dialog box telling me that I couldn't get administrative alerts unless I had the Alerts service running. I decided I didn't care about an Alerts service, I just wanted to be able to check Minidump or NTBTLOG.TXT in the event of a crash. So I unchecked the box for administrative alerts and was able, this time, to okay out. I deleted the cluttered old NTBTLOG.TXT file, rebooted, and waited for the result. To my disappointment, the system did reboot after freezing, on its way to Normal Mode. But the idea of it not rebooting had posed another approach. Instead of letting it try to reboot into either Safe Mode or Normal Mode, I inserted a Knoppix live CD and booted into that flavor of Linux. I used Knoppix (I could just as well have used an Ubuntu or other Linux live CD) to see if I could locate Minidump or NTBTLOG.TXT. Then I remembered that I had disconnected my other hard drives, so I didn't have a FAT32 partition available to copy them to, even if I did find them, though I could have used my floppy drive if they were small enough. As it turned out, there was no NTBTLOG.TXT file in C:\Windows. So I was still not getting a log of what was happening at boot. It seemed that both methods of creating an NTBTLOG.TXT file, to record the process of booting into Normal Mode, were failing -- or else the system was not even really getting far enough along to commence that process. There was, however, a Minidump folder, containing two Mini*.dmp files. In place of the asterisk, the names contained the date of creation. The dates in question (as confirmed by the file details shown, there, in the Knoppix file browser) were more than two weeks old. So apparently that part of the process was not working for me now, either. (I had no idea why there would have been two files, created on one date two weeks ago, while all the other crashes of recent weeks would have been ignored.) One site said that no NTBTLOG.TXT was created, in their case, because the system was hanging before it could get far enough into Safe Mode to write the file. I had already determined that the system, trying to boot into Normal Mode, would not even add lines to an already existing NTBTLOG.TXT. The solution, in their case, was to flash the BIOS, so as to correct a problem with something called the ESCD, stored in the BIOS, that would allocate system resources. I had already flashed the BIOS on this motherboard with the latest update; but I had still noticed a weird CMOS error if I left the machine powered down. So a bad motherboard was one possibility. It seemed really unlikely, though, because this was the second of these motherboards I was having this problem with, and as noted above, other users were overwhelmingly happy with it. Besides, unlike that other situation, I *was* able to boot into Safe Mode. Another webpage suggested that, in Recovery Console, I should have used, not only FIXMBR, but also FIXBOOT and BOOTCFG /REBUILD, in that order. Apparently I was then going to be asked whether I wanted to add this revised installation to the boot list and, if so, what I wanted to call it. It looked like I might also be overwriting some of the boot.ini changes I had just made. I decided to give it a whirl. Fortunately, the BOOTLOG /REBUILD part told me what had been there before (e.g., that my "OS Load Options" had included /bootlog, as described above), so I just had to retype what was already there. But I guess if your previous installation had been really trashed, you might have to have a brain at this point. I guess it would have helped me, too, because somehow I wound up with two options instead of one, when I rebooted, and both were called Microsoft Windows XP Professional. I chose the second one, reasoning that it was probably the second one added to the list, and I indicated that I wanted to proceed to boot Windows normally. But that didn't work: the system crashed as usual and rebooted. So I tried the first one. Same thing. So now I had two nonworking WinXP options, instead of just one. (On closer examination, I realized they were not the same after all. In a flashback, I had called the new one "Microsoft Windows 98," not XP. So now I was able to see that the new one had been added as the first in the list, not the second.) I went back into Recovery Console -- but then that seemed unhelpful because, according to a Microsoft webpage, the command I wanted (BOOTCFG /DELETE) was not even available in the Recovery Console. Sure enough, back in Safe Mode, I opened a command window and typed BOOTCFG /? and got a whole list of possibilities, including /DELETE. The proper syntax in my case, as I saw from the Microsoft webpage, was bootcfg /delete /id 1. I entered that, did another BOOTCFG /QUERY to make sure, and exited from the command window. I found somebody else who seemed like they had the same problem I had been having. In that case, it sounded like Windows automatic updates had caused some problems. One approach tried there was to disable various "services" in WinXP and see if that solved the problem. It sounded pretty time-consuming -- the list of services provided by BlackViper was pretty long -- but it was either that, or return all the hardware and return to my previous system (I hadn't yet sold my motherboard or other components), because at this point I really was not too sure where the problem might be. When I hit Start > Run > services.msc, I found (by quick count of pages) 114 services on the machine. Fortunately, in what BlackViper called a "bare bones" system, the vast majority would be set to "disabled." (There was no telling what to do about third-party services, other than to assume that Windows, in barebones format, could get by without them, i.e., that they could be disabled too.) I followed BlackViper's advice on how to create a hardware profile, so that my present services settings would be preserved, in case I wanted to come back to my starting point. But BlackViper also warned that disabling a service in the General tab, within Services (right-click on a service and choose Properties), would permanently disable it for all profiles -- that, instead, you should use the Log On tab. So I did that, disabling all of the services that BlackViper designated as "Disable" in the bare-bones list. I noticed that the Startup Type information did not change, on the main Services screen, as a result of my efforts; I assumed that those changes would be implemented only after rebooting. It occurred to me, as I was doing this, that the next step (assuming a successful barebones boot) would probably involve coming back down this list and re-enabling some of these services and trying another boot, and so forth, until I had narrowed down the culprit(s). I decided to save a different profile for each screenful of services that I disabled. After doing the first screen, I saved a profile called Bare Bones 1, and so forth. So then I would start by booting the extreme barebone profile, which would be Bare Bones 5; and if that worked, I would just reboot and try Bare Bones 4, and so forth, working back toward Bare Bones 1, in which I would have disabled only the first screenful of services. This took maybe 20 to30 minutes, and then I rebooted. On reboot, after the BIOS loaded, I got a Hardware Profile/Configuration Recovery Menu. I chose Bare Bones 5 and hit Enter. And guess what? The system choked and rebooted, same as before. Even with the maximum number of services disabled, I still had a problem. On reboot, I chose Original profile and then chose Safe Mode. I wasn't sure if I would need those Bare Bones profiles anymore, so I didn't delete them right away. At this point, I found a helpful article on RM.com. They laid out a nice series of troubleshooting steps. Doing an "in-place upgrade reinstallation" was step no. 7 on their list. I thought that this process might at least help me determine whether I had a hardware problem and, more specifically, which piece of hardware might be implicated. Certainly it made sense that I could test more things if I had a working Windows installation than if I didn't. Yet it was not clear, to me, exactly how I would proceed to test the hardware, even if I did have a working system. As mentioned above, I had previously found and used Microsoft's Windows Memory Diagnostic (WINDIAG) to test the RAM. But in my software and hardware installations, and my visits to the manufacturers' websites, I had not found any comparable diagnostic program to test the CPU, motherboard, and/or video card, and I now revisited some of those sources to make sure. It seemed that my options, at this point, were (1) to do the reinstallation and (2) if I still had crashes that I could not pin down, to return the CPU, motherboard, and video card to Newegg, perhaps after buying replacements that I could cross-test. This made me think that surely some third-party supplier must have developed diagnostics that would verify that a system was able to run Windows XP properly. Microsoft was a good first place to look, especially since I had seen that they had devised a memory tester. I found that they offered an Offline Crash Diagnostic (in their document KB923800). I found several other diagnostic tools as well. First, however, Andrew K's Diagnose XP website stated that viruses are the No. 1 cause of system problems and can be responsible for any sort of symptom. This was interesting because now I did recall that I had been running Ad-Aware 2007 and another malware checker, while working on something else, and both of them had frozen before finishing. So now, in Safe Mode, I ran what security programs I could, including smart (i.e., not full) scans in Symantec Antivirus, Spybot Search & Destroy, and Ad-Aware. Also, I downloaded and ran, from the link on Andrew K's site, the CCleaner program. This appeared to be a dangerously powerful program, capable of uninstalling many of your programs with one click. The options I used were: Run Cleaner with default settings in the Windows tab of the Cleaner section; Scan for Issues in the Issues section, and then Fix Selected Issues (all, by default), and repeat until all were gone. Andrew K also pointed me toward the Trend Micro Online Virus Scanner, which I could not use in Safe Mode because I was not able to go online, and the Trend Micro Sysclean Package, including its accompanying pattern file, which I did run. Sysclean found drive errors, to which (of course) access was denied (but which I planned to address on reboot), but it found no viruses. I noticed that Sysclean was not listed on TrendMicro's list of free products, which included HouseCall online virus scanner, HijackThis (which I had often seen cited as a system analysis tool on various webpages), TrendProtect (which I planned to install, in both Internet Explorer and Firefox versions, but could not do so while in Safe Mode). It seemed that Trend Micro had several other interesting paid programs, such as Damage Cleanup, but I did not pursue those now (but then I noticed, in the log file created by Trend Micro Sysclean after it had completed its run, that it had included the "Damage CleanupEngine"). I didn't use the free Transaction Guard, which seemed designed for security on public computers. I noted, but did not pursue, the ICSA Certification link that Andrew K provided; I would have been more interested in that if I had been using an antivirus program that did not have Symantec's antivirus reputation -- though I was a bit concerned about the indication that an antivirus program is certified if it will "Detect 90% of the ICSA Labs Virus Collection." I realized that might be all that one could feasibly expect; it just alerted me to the apparent fact that the antivirus programs don't necessarily try to do it all. I also didn't explore Trend Micro's several free-trial programs. Continuing with other antivirus and antispyware programs cited by Andrew K, I also ran (in addition to Spybot and others mentioned above, already present on my machine) CWShredder, to counteract a specific web browser problem, but it found no errors. I was not able to run its "Test Your System for Other Errors" option because I could not go online in Safe Mode. Microsoft Windows Defender found no errors. I ran the Microsoft Java Virtual Machine v1.1.4 Removal Tool and accompanying registry fix. I couldn't install the Sun Java Virtual Machine or run its test page because it tested the machine on which I was trying to download, and found itself to be present there already; but that was OK, because I was pretty sure I had already installed it on the problem machine anyway. Another Andrew K website that I planned to explore when I could get back online was Driver XP, for the latest drivers for my hardware. I will say, in passing, that dealing with all this security software made me feel that Windows was ridiculous. I did wonder whether I shouldn't take another serious try at making Ubuntu work. But for now, I thought that at least I ought to be able to get a basic Windows XP installation up and running. I was sure there would be times when I would need it, whether in its own right or as the source for a virtual machine that I would run in VMware. Andrew K recommended using PC Wizard to identify your specific hardware, so that you could be sure of getting the right drivers. I downloaded and ran PC Wizard because it was also supposed to test your hardware. But I got "Error initialising peripherical driver. Err : 1084." I was not able to figure out what that meant, though it did make me think that there was some kind of driver or hardware issue even at the Safe Mode level -- which wouldn't have been surprising, considering that my system, with the motherboard that I had returned to Newegg, had crashed in Safe Mode and even in Ubuntu. Andrew K included a rare reference to a commercial program, Steve Gibson's SpinRite ($89), with a description that made it sound like a very useful thing to have. In Andrew K's writeup, SpinRite could recover data from failed hard drives of all types (even from a TiVo), and could also create a bootable CD or jumpdrive. Since it was available for immediate download, I decided the important thing was to remember where to find it, if I did need it. Another data recovery program that Andrew K recommended was CDCheck, which I thought I had probably better try to use in the future when burning CDs and DVDs (also apparently useful on hard, floppy, and flash drives), so as to verify that they were *really* created properly before deleting the source data that I *thought* I had put onto them (having encountered a couple of defective CDs and DVDs along the way). The Microsoft Offline Crash Diagnostic mentioned above was another option at this point; but when I tried to install it, it said that installing in Safe Mode was not recommended unless you couldn't start your system in Normal Mode. That was applicable here, but I was concerned that this meant that installing in Safe Mode might create new problems. I decided to defer using this diagnostic program until I had used up my other options, with a note to myself to make sure to install it, in any case, if I could ever get back to Normal Mode on my troubled computer. I had not run the Maxtor hard drive diagnostic on the one drive that remained connected to my machine. I had run CHKDSK and other Windows programs often, however, so I didn't really think the problem was with the drive. Nonetheless, I was going to download the Maxtor diagnostic, from the several helpful hard drive diagnostic links that Andrew K provided, but it turned out that Seagate had acquired Maxtor. So now the diagnostic program they were recommending was SeaTools. I already had a recent copy of that on a bootable CD, so I rebooted and ran its Short Test and also its Long Test. Both passed. For RAM testing, Andrew K recommended Memtest86+. I considered it possible that I might have had a memory problem that had escaped WINDIAG, so I downloaded the ISO, burned the CD, and rebooted. It automatically went right into its default test, so I let it go for a while. But since I hadn't done any tests at all on the CPU, I was not as interested in running another memory diagnostic. What I really wanted to run was Prime95, which Andrew K described as "a good stress test for the CPU, memory, L1 and L2 caches, CPU cooling, and case cooling. The torture test runs continuously, comparing your computer's results to results that are known to be correct. Any mismatch and you've got a problem! ... [S]elect 'Torture Test', then 'Ok', let it run for a minimum of one hour, preferably overnight for a thorough test. On a working stable system this test should never fail. Any errors or failures indicate a hardware problem. The exact cause of a hardware problem can be very hard to find. If both the Harddrive and Memory Diagnostic passed then the most likely causes are Overheating, a faulty Mainboard, a faulty CPU or a faulty Power Supply." I ran Prime95's Torture Test overnight and beyond, using the small FFT setting, which looked most likely to test the CPU and other components rather than RAM. Final report: "Torture Test ran 21 hours, 16 minutes - 0 errors, 0 warnings." The Microsoft Offline Crash Diagnostic, above, seemed like my last option before doing a reinstall. First, though, I wanted to be sure of the situation, and if possible I wanted to install that diagnostic in Normal Mode. So, once again, I went through the whole process of FIXMBR followed by CHKDSK /R for each of the four partitions on the one drive that I now had connected. Then I rebooted, to see if it would go into Normal Mode. This had worked in the early days of my problems with this new hardware and/or software (i.e., the install-from-scratch that I had done with WinXP Pro a few weeks earlier), but then it seemed like it had stopped working, and at this point I did not precisely know where the matter lay. I had had crashes sometimes in Safe Mode and even in Ubuntu, so it seemed like there had to be a hardware problem. But I wasn't actually finding one. So when CHKDSK was done, I tried rebooting into Normal Mode. No dice! A freeze and reboot, early on. Trying something different, I hit F8 after the BIOS loaded and chose the Debugging Mode option. Same thing. So in Safe Mode, I tried installing the Microsoft Offline Crash Diagnostic. It seemed like it installed -- it said it was successful -- but I didn't see any icons in my Start > Program menu. The download instructions said I might have to reboot, so I did. Still nothing. About this time, some things began to change in my handling of this problem. One change was that I eliminated some distractions that may have kept me from seeing this problem clearly. Another, which was probably related to the first, was that I think I got past the point of assuming that I could just click on some button, or run some program, and that would solve my problems. Maybe it was because I had pretty much exhausted the software repair options, other than reinstalling. It had been a while since I had done hardware troubleshooting, and it felt like only now that I began to take some steps that more experienced or warmed-up people might have gone to right away. So it was only now that I started working with tech support for each of the components in question. I felt a little foolish about this. I think, in part, that my experiences with tech support had taught me that it was in my interest to be totally on top of the problem before calling. I'd had enough experiences of having technicians tell me to uninstall or change something that didn't help at all, but then took me hours to get back to working condition. If nothing else, at least I could take some comfort in seeing that apparently a number of other people were approaching the problem in roughly the same way as I had done. For whatever reason, however, I now turned that corner. I called Intel's tech support at the number shown on the Newegg website (916-377-7000). My call was at 1:38 PM. I wanted to know if they had any diagnostics that I could run. There didn't seem to be any information along those lines on their website. I got a human almost immediately. He took down my name, e-mail address, and phone number, and gave me a case number. At 1:42 PM, he forwarded me to "the next available agent," i.e., a technician. Two minutes later, I was speaking with Paul, the technician. Paul had a hellacious accent, but whatever. By 1:51 PM, he had checked the MSI website (not finding my motherboard on Intel's approved list) and had confirmed that, according to MSI, the E4300 processor was supposed to be compatible with that mobo. Therefore, he said (given that I told him I had done extensive memory diagnostics and had a good power supply), it pretty much had to be a defective processor. He wanted me to give him some "markings" from the CPU (presumably, whatever letters and numbers were on it). I said I couldn't now, I'd have to take the fan off it first. He said fine, just call back. When I asked if there was an e-mail alternative, for purposes of my scheduling convenience, he said I could e-mail the info to rpd@mailbox.intel.com. They tried to have a 24-hour maximum turnaround on e-mails, he said, but they would probably be much faster than that. He had told me there wasn't really a diagnostic program, just their Processor Identification Utility. Otherwise, I found very limited troubleshooting guidance on Intel's support webpage for this processor. Anyway, after CHKDSK was finished, I took the fan off the processor, and here is what I ultimately gave to Intel:
As you requested, the information from the top of the defective CPU is as follows: INTEL (M) (C) '05 INTEL (R) CORE(TM)2 DUO 4300 SL9TB MALAY 1.80GHZ/2M/800/06 Q703A370 (04)
I realized that I should have been contacting MSI tech support likewise. From them, I got a Troubleshooting Guide whose section on stability problems instructed me to verify that my system's temperature was not exceeding the recommended level. In my American Megatrends Inc. (AMI) BIOS, the current CPU temperature was shown under "H/W Monitor" (i.e., hardware monitor). It showed a CPU temperature of 53 degrees Celsius. I wasn't successful in finding out, at the Intel website, whether that was an appropriate temperature for this CPU. In several discussions at Tom's Hardware, however, it sounded like 65 degrees C would be the maximum. A temperature of 53 with no load seemed too high. I saw where someone else had a temperature of only 40 while the CPU was idling. I wondered whether the CPU might actually get less cooling (e.g., less directed airflow) when, as now, I had the case open to facilitate looking inside. MSI had posted an FAQ that seemed to say that Windows would report a lower temperature than I was getting in this reading in the CMOS setup utility, because Windows would put the CPU into a low-power status (keeping it cooler), while the setup utility would not. I made a note to check the temp, if I could figure out how, when I got back into Safe Mode or Normal Mode, and see how it looked then. But temperature did not appear to have anything to do with my initial difficulties in booting stably. Later, I realized that one other possible (likely!) source of excess heat was that I had not been able to secure the stock cooling fan, supplied with the E4300 CPU. The connectors that were supposed to attach the cooler to the motherboard were ridiculous. They were designed poorly and they just did not work. The advice was to install it before putting the mobo in the case, because apparently a fair amount of pressure was required to do it right. A number of people seemed to advise getting rid of the stock cooler and buying a third-party product instead: it would be cooler, quieter, and better secured. But it looked like the most highly recommended one, an Arctic Freezer 7 Pro, would cost $40. (Intel had some others on a list.) At that rate, there was some opinion to the effect that I could have gotten a faster E6300 or E6400 CPU and could have used its stock cooler, as long as I wasn't overclocking -- or could have bought the E4300 OEM (original equipment manufacturer, i.e., not in a retail box) and added my own fan. It wasn't clear to me whether the E6300 or E6400 used the same flimsy way of attaching the fan and heat sink, though I feared they might. A discussion at Tom's Hardware indicated that the E4300 was perfectly respectable, especially if a person wanted to overclock it -- in which case a third-party cooler was highly recommended by multiple sources. Then again, one tester found that the Arctic cooler didn't make much difference. (Upon reviewing his photo and the many praises of the Arctic cooler, I decided that the problem must have been in his test setup.) For reasons of noise and to have a secure attachment (any cooler, attached well, was better than having essentially no cooler because of an unreliable attachment!), I decided to go ahead and get the Arctic cooler. At about the same time, I noticed something interesting on the MSI webpage pertaining to my motherboard: "Due to the High Performance Memory design, motherboards or system configurations may or may not operate smoothly at the JEDEC (Joint Electron Device Engineering Council) standard settings (BIOS Default on the motherboard) such as DDR2 voltage, memory speeds and memory timing. Please confirm and adjust your memory setting in the BIOS accordingly for better system stability." In my BIOS, those settings were under Cell Menu > Advance DRAM Configuration > Memory Timings. MSI recommended using the automatic setting, which was the default and was what I had been using; but they said that, if that didn't work, you should consult the specification on the sticker on the actual RAM DIMM module or in the RAM manufacturer's literature. According to the webpage for my Crucial RAM, the settings were "DDR2 PC2-6400 • 4-4-4-12 • Unbuffered • NON-ECC • DDR2-800 • 2.2V • SLI-Ready • 128Meg x 64." Crucial had said that this memory was guaranteed to work in my system; also, I think I must have been assuming that only overclockers would be wanting to change the settings. Now, however, as I looked at the BIOS, I saw the following items listed, when I selected the Manual memory timings options; and according to a Techware Labs webpage, these items had the following meanings: tCL: CAS Latency tRCD: Row Address to Column Address Delay tRP: Row Precharge Time tRAS: Row Active Time tRRD: Row Active to Row Active Delay tRC: Row Cycle Time tWR: Write Recovery Time tWTR: Internal Write to Read Command Delay tREF: Refresh Rate CMD: Command Per Clock I didn't see information for all of these settings in the Crucial specifications just cited. One clue came from the Techware Labs webpage, which said that all but the first three (i.e., tCL, tRCD, tRP) plus CMD were under the Advanced memory tab of the system they were discussing. But that seemed to be not quite right. A Tech PowerUp forum posting explained, more clearly, that the first four in the preceding list are the ones you would see in memory ratings, in the order of CAS-tRCD-tRP-tRAS. The example they gave was 2.5-3-3-8. Posts in Legit Reviews Forums confirmed that the order was CAS-tRCD-tRP-tRAS (e.g., 2 - 2 - 2 - 5), and that all the other settings should be left on Auto. Another post there said, however, that you should run Memtest86 afterwards to make sure that your manual settings would fly. Since I had already run Memtest briefly, and WINDIAG extensively, I suspected that maybe the automatic settings had been fine after all. I had noticed, in passing, that one person had even seemed to say that the automatic feature might back off to a more conservative setting that would match what the RAM in question could actually do, as distinct from its apparently overoptimistic rating. But I went ahead and changed those four values to the 4-4-4-12 values specified in the Crucial settings. My other concern was that these changes didn't account for the voltage setting, which seemed, to me, to be the one other Crucial memory specification that might call for adjustment in the BIOS. I found the Memory Voltage setting, all right, but it was greyed out with an [AUTO] setting. I looked for a way to remove the greying, so that I could change memory voltage to 2.2V, as Crucial had specified, there in that screen within the AMI CMOS Setup Utility (ver. 2.61). Eventually, I found an AnandTech post that explained how the utility worked. It didn't work for the Memory Voltage setting -- not yet, anyway -- but at least for the CPU Voltage setting, I found that I could raise it up or down, not by typing a value or hitting Enter to open a list of options, but rather by using the plus (+) or minus (-) keys on the numeric keypad (i.e., not the row of keys, from 0 to 9, running across the top of the keyboard). Now the question was how to ungrey the Memory Voltage option so that I could do the same thing there. I tried changing the settings as shown on a PCStats page, but Memory Voltage was still greyed out -- for me, but not for them! Likewise on a Neoseeker webpage -- where both such webpages were talking about this specific motherboard. For some reason, apparently, I was not allowed to change the Memory Voltage, whereas other systems could do so. I posted an inquiry in a forum in Tom's Hardware and hoped for a reply. Later, I did get a reply in an Anandtech forum telling me to just ignore the grey and use PgUp or PgDn or the plus or minus key, and that worked. I set it to 2.2V and I was happy with that. Because of various crashes and whatnot, by this time my Windows installation had reached a point of being so confused that it wouldn't even talk to itself -- that it, it couldn't even boot into Safe Mode. It was trying -- the hard disk light was running almost constantly -- but it seemed to be stuck in a loop after loading MUP.SYS. So I restored the most recent disk image, from about a week earlier, and rebooted into Safe Mode. MSI's Troubleshooting Guide also suggested testing each RAM module separately, and testing each RAM slot on the motherboard. I had already done that with one module, and yet had still had random system crashes, but my previous test had not been with Memtest. So I installed a few items and shut the system down. I removed all but one memory module and rebooted into the Memtest CD. I ran that on one module in memory slot 1 for five hours, with no errors. I noticed, about this time, that Newegg had put a motherboard on sale in their "Open Box" category, which I think means that they got it back from a customer, tested it, and concluded there was nothing wrong with it; but it had been opened and used, so they could not resell it as new. The motherboard in question happened to be an MSI P6N SLI Platinum, just like mine. In fact, I suspected that it *had* been mine -- that I had returned a good board. I felt bad about the extra cost or loss for them or MSI, if that was indeed the situation. This did not guarantee that the board I now had was good; it just seemed to indicate that the CMOS error message that I would get on reboot, after shutting off system power, was a built-in bug (or "feature") that did not prevent the boards from working well for a large majority of other Newegg purchasers. (As it later turned out, Newegg did not receive the board I returned until some days later. I knew because they, in their estimable way, sent me an e-mail to let me know that the thing had just arrived at their place. So it was somebody else's mobo that was on sale that day.) MSI had almost no other FAQs about this motherboard. While I was revisiting their webpage, I confirmed again that this CPU was supported on this board. On their Compatibility Test Items page, I confirmed again that this Crucial RAM had not yet been tested, but I double-checked that Crucial said this RAM was guaranteed compatible with this mobo. Likewise, MSI had tested very few video cards, and their list did not include this one; but EVGA confirmed that they supported this Crucial RAM. EVGA did not seem to try, however, to provide a list of compatible motherboards. Finally, Corsair, maker of my power supply, confirmed that my motherboard was compatible with their product. I did not bother to confirm compatibility with hard drives and other components, which I was confident were almost completely interchangeable among systems. At any rate, with the week-old drive image restored, the system rebooted into Normal Mode without a problem. I downloaded a few updates and rebooted. This time, it gave me a SysTray error, whose details I didn't write down, followed by Runtime Error 216 at 51F2242C. Microsoft said this might mean my system was infected with a SubSeven Trojan virus. I reinstalled the remaining antivirus and antispyware programs described above (whose recent installation had been erased, of course, when I reinstalled the week-old drive image) and ran the whole collection of security programs now found on the computer. I also installed the Microsoft Offline Crash Diagnostic again, and once again I could not figure out whether or where it might have installed itself. I was just in the process of downloading these upgrades, scanning my files, etc., and then the system crashed. On reboot, I decided to focus on running the scans. It wasn't clear to me what had caused the crash, but for various reasons I had loaded Word, Acrobat, and other programs in addition to the malware scanners themselves -- I had gotten too excited about having the computer working again -- so it was hard to tell what might have caused the crash. I did notice that the system was being funky shortly beforehand, such as when it would not let me create a new folder in WinXP without opening an ephemeral dialog box for the Installer program. I thought the problem with the Microsoft Offline Crash Diagnostic might be that I had downloaded it on another computer and then tried to install it on the problem machine. Microsoft had a validation process; maybe I had to do the download and installation, both, on that machine. But then I read on in the instructions (RTFM) and discovered that the way to do it is to run ocadiagnostic.exe. I did that and, sure enough, there was a screen, welcoming me to Offline Crash Diagnostic for Windows XP. So I set up a shortcut to it for ease of future reference. (It was in C:\Windows\pchealth\helpctr\binaries.) I wasn't sure it was going to work too well, because the description said it depended upon the Minidump folder, and as described above, that folder still wasn't being written to. The thing gave me the option of uploading those two three-week-old crash files, the ones that did exist in C:\Windows\Minidump, to see what sense Microsoft could make of them. I was curious, so I said sure, upload them. Almost immediately, it gave me a Web Response link. I clicked on it, and of course that opened Firefox. I noticed, down in the system tray, that my FreeRAM XP Pro program was telling me that I had very little memory left (since I was still using just the one memory module). I wondered if that fact might have been related to the most recent crash. But the system didn't crash. The analysis from Microsoft read as follows:
Problem report summary Problem type Windows stop error (a message appears on a blue screen with error code information) Solution available? No What does this problem mean? Windows has encountered a problem it cannot recover from and it needs to be restarted Cause Unknown Computer symptoms A message appears on a blue screen with error code information (for example: 0x0000001E, KMODE_EXCEPTION_NOT_HANDLED) Additional steps for you to take Please continue to send problem reports so analysts at Microsoft can study and try to correct the problem as quickly as possible
So that wasn't too helpful. But I liked the concept and -- who could say? -- if the Minidump folder miraculously began to be populated with more recent crash error data, I would continue to be interested in this program's reactions to them. I also got an e-mail back from Intel. Here's what they said about the idea that I could have them ship a replacement to me right away, instead of shipping my apparently defective CPU to them and waiting for them to send me a new one:
Thank you for contacting Intel(R) Technical Support. We do have a cross shipping service, however there is a fee of $25 (twenty five dollars) only for shipping and handling. If you agree with this payment you will need to contact Intel(R) Technical Support again since the cross shipping service is provided only by phone. If you prefer not to pay the $25, than we can process the order by email as an standard warranty replacement, meaning that you will need to send us first the defective processor and as soon as we receive the defective processor we will ship out the new processor and it will take from three to five business days for you to receive the new processor. Also we are missing two lines located on top of the processor on the edge of it, we need this information in order for us to complete the replacement processes, if you have any problem finding the processor markings please visit the following website to see an example of the processor markings: http://support.intel.com/support/processors/sb/CS-025525.htm You can provide this number when you call in to request the cross shipping replacement or you can provide us the missing information by replaying to this email in case you prefer the standard warranty replacement. Your email case number is Please do not hesitate to contact us again if you need further assistance. Sincerely, Adolfo S. Intel(R) Technical Support Intel(R) Processor Support Web Site: http://support.intel.com/support/processors/index.htm This email was cleaned by emailStripper, available for free from http://www.papercut.biz/emailStripper.htm
They didn't supply their phone number on their e-mail. Fortunately, I had been composing this post, so despite the chaotic nature of my workspace because of this computer crash, it was easy for me to find it in the words written above: 916-377-7000. I couldn't call yet; I was still doing virus sweeps. The virus sweeps found quite a few objectionable items, though only one or two that looked potentially serious. I had been concerned that sometimes the ZoneAlarm icon was not appearing in the system tray. I wasn't sure of the reason. I always had antivirus, firewall, etc. software running. By the time the sweeps had finished and I had cleaned out the objectionable stuff, I had decided how to proceed with Intel. Since I had to dismantle the computer somewhat to give them the remaining numbers from the CPU, I decided to run a little experiment. Before leaving Windows, I renamed my hardware profile (as described above) and made a copy of it. I called the first one Core2Duo, because that was the kind of CPU I had in the system now, and I called the copy P4, because ... you know. I also told Windows not to proceed with a profile until I selected one. Then I took out the entire motherboard and replaced it with the old one, which still had the fan and CPU and memory on it. I had been keeping it just in case I needed it. I set aside the new MSI motherboard, processor and all, preparatory to sending it back to Intel, and inserted the old Gigabyte GA-8IG100MK that had been running fairly stably before I removed it. I now saw one reason why the fan had not wanted to become mounted properly: one flimsy little plastic wing, or tab -- whatever -- had failed to go into the hole, bending off to the side instead. I rebooted the system with the old Gigabyte motherboard, using the old video card etc. I was glad to do so: the system had crashed once again, this time doing nothing more than moving files from one directory to another. I then spent several hours working on something else, but occasionally going to that computer to click or type something, so that the system could install various drivers and acquaint itself (that is, its Gigabyte/Pentium 4 self) with the hardware I had connected to it. In all this process, I did not have any of the random crashes that had been such a feature of life since I got the new motherboard, processor, and RAM. This little experiment tended to support the belief that the problem with this system was primarily a hardware problem. So now I just had to wait for Intel to replace the CPU. Actually, it wasn't quite that simple. Installing the Gigabyte mobo brought its own issues. One, mentioned above, was that, nearly every time I clicked on some program or tried to run some command, I got a dialog box (even if only for a fraction of a second) labeled "Windows Installer" that would inform me that the system was "preparing to install" something. One poster offered the following diagnosis:
The reason that windows installer is coming up in this case is that the user has a corrupted install that has installed a shared component. Most likely comctl, or oleaut32. The other programs that are triggering the repair are likely just calling into a shared COM object and that is what is triggering the repair. To figure out which installer is failing you have to look in your windows "Event Log". This is a log of activity on your windows NT based OS. To get to the event log right click on "My Computer" and select manage. In the list on the left you'll see "Event Viewer" Select that and then select "Application". If you sort by "Source" You'll see a huge number of MsiInstaller entries. Looking through those entries will tell you who the culprit is. If you either uninstall and re-install the offending software or MsiZap the offending software you'll probably be back in buisiness.
The system was running really slowly, despite my best efforts in installing drivers for the Gigabyte mobo. I took some of the steps described above -- fooling around in Safe Mode, defragmenting, checking for errors in Disk Management, possibly running SFC (can't remember) -- and I wound up with a basically functional system. Somehow the Windows Installer issue did fade away in this process. If it hadn't done so, I had just found one or two more websites that seemed likely to be helpful in that effort. But now I had a new issue: a BSOD (which was, as always, too fast for me to read) and a system reset! All of my new stuff was out. I was using strictly the old motherboard, old CPU, old memory ... ah, but I still had the new power supply inside the case. The power supply was the popular and capable Corsair HX Series CMPSU-620HX 620W. It was a monster. I probably didn't need a 620-watt power supply, and I was told that the larger one would not necessarily adjust its power consumption or noise level to be like a smaller power supply, in those instances when its full potential was not being tapped. Even with 4GB of RAM and four hard drives, it seemed like overkill. But it was a Corsair. Anyway, I could see no other culprit, so I returned it. In its place, I put my old power supply, an Antec Truepower 2.0, which had worked quite well for quite a while. So now, on the hardware side, I had completely reverted to my system as it was before the attempted upgrade. On the software side, I was still trying to run the new XP Pro installation. But it was not working. It would work OK in Safe Mode, but I was getting virtually no responsiveness in Normal Mode. For example, I did manage to get a command window to open; I typed SFC /BOOTONCE, and it did run SFC on reboot; but then the SFC dialog just sat there, not even starting. I was thinking about trying Fred Langa's suggestion to create a bootable WinXP thumb drive, using BartPE. But I really just needed regular access to my computer. So I opted for Intel's $25 service -- which, as it turns out, meant that they took my credit card number and promised that I would receive, the next day, a replacement CPU and a prepaid shipping label to use in returning the old one to them (which they wanted to receive within 30 days). The next day, sure enough, I had a replacement Core 2 Duo E4300 processor, with a note telling me to return the old one within 10 days. Some of Intel's communications had said 30 days. It didn't matter. It would be in the mail as soon as I could manage. The first step was to see whether it had indeed been the processor. I had to reinstall the putatively defective Corsair power supply -- its replacement would not be arriving for days -- because my old Antec didn't have an EPS plug. My new Arctic cooler had arrived at the same time, and there the joke was on me, because it attached to the motherboard with the same kind of connector as on the stock Intel fan. Oh, well. I could only hope that it was quieter. I discovered a couple of things about those little connectors, in the process of mucking around for fifteen minutes with what should have been a simple operation. One is that you needed to inspect them yourself to figure out how they work. The instructions -- from Intel, and also from Arctic, were backwards. To tighten them, you had to twist them clockwise, just as you would twist any ordinary nut or screw. Also, before you could tighten them, you had to push them down, not merely to get a good connection, but also to lengthen the shaft of the thing. And when you twisted them, you were supposed to just use your fingers. Yes, they had a slot in the top, as if to invite use of a screwdriver; but using a screwdriver would merely increase the likelihood of slipping and rupturing some tiny electrical connection on the vulnerable motherboard lying beneath. Finally, when screwing around with all this, you had to make sure not to smear the thermal paste, which they kindly pre-applied to the bottom of the cooler. In short, once I had carefully tightened down the motherboard and attached all the other connectors, so as to minimize the amount of screwing-around I would need to do with my static-sensitive CPU and RAM in place, I was obliged to dismantle it all, take the motherboard completely out of the case, and find a nice, hopefully not staticy tabletop where I could see this thing close-up and finally get the cooler attached to the mobo. I had been planning to give the Idiot Connector of the Year award to the people who designed those SATA hard drive connectors that fall off when you breathe on them, but now I could see I might have to reconsider. Of course, once I got the motherboard properly positioned in a vise on my workbench, where I could crank down on it with due force, I was able to see that this attachment was actually quite simple and logical, and after that (assuming my antics had not mini-lightninged a chip into dysfunctionality) everything went smoothly. I realized that, from another perspective, I was the idiot; but from the viewpoint of someone who has been putting coolers on PC CPUs for nearly 25 years, the natural thing was to try to install it while the motherboard was securely fastened inside the case. This time around, when I was reconnecting the Corsair 620W power supply, I did one thing differently. The MSI motherboard had come with a little clip covering four of the eight mini-sockets on the EPS 12V connector. I had taken that as a hint that I should preferably use a four-pin rather than an eight-pin plug. The power supply happened to have one of each. Now, however, I checked the power supply specifications and saw that only the eight-pin plug was a proper EPS connector. So it was possible (though unlikely, I believed) that all of my problems stemmed from not using an EPS power source for that connector on the motherboard. We would soon see. I wasn't sure, at this point, what the purpose of that little clip might have been. It didn't seem that they would have covered four of the eight pinsockets if they had intended me to use all eight. There didn't seem to be any instructions on point. Normally, I did review the installation literature supplied with my hardware, but I didn't recall actually seeing any installation instructions with the power supply, although possibly I had overlooked it. The motherboard manual didn't seem to say anything about that little clip, or about the importance of using the eight-pin plug rather than the four-pin. I might not have noticed it, even now, if it hadn't been for the experience of shopping for a replacement power supply that morning. I had noticed that the 520W Corsair indicated that its EPS plug was eight-pin, and I had thought to myself that possibly it would not be compatible with my mobo, which (I thought) required a four-pin. Now, reviewing the Newegg product specification page for the 620W power supply, I saw that it did indeed refer to an "8-pin EPS12V" connector. I had failed to review that webpage when the power supply arrived. Anyway, now I restored the last drive image that I had made, before all this swapping of motherboards and CPUs, and tried booting. The system went into Safe Mode. It also went into Normal Mode, but not until after I ran FIXMBR and CHKDSK /R again and did some other fixes. The most important of the fixes seemed to be the one, not previously documented here, involving LVPrcMon.sys. Now that I had reverted to the previous drive image, I had to redo this fix, which I had done early in the troubleshooting process but apparently did not document here. This error arose, at least in my case, from the Logitech Webcam driver. Removing that driver in Add/Remove programs eliminated this problem. Upon completing that and other fixes, I found that the system kept running all night while doing spyware scans using several programs (including especially Ad-Aware 2007, Spyware Doctor, Symantec Antivirus, and Spybot Search-and-Destroy). The next day, I rebooted, having also installed a number of assorted programs in the process of completing the rebuild of the Windows setup I had been using to date. I had a couple of programs running when the system froze. I rebooted and, after a while, it did it again. This time, though, it seemed to be attributable to Adobe Acrobat 8, which had updated itself. I wasn't sure what was the matter, but I scheduled and ran disk checks (right-click Properties for each partition). In the process, I spent several hours cooking up a batch file to automate these various maintenance processes, seeing how familiar I had become with CHKDSK and SFC and rebooting and all. By the time another day had passed -- this time, with (I think) no crashes since the last time I had run CHKDSK -- I was prepared to conclude that, one week after starting, my work on this project was done.

2 comments:

Steve

You know, this is why I come home every day from my overly Windowsy IT job and kiss my Mac. PCs are certainly cheaper if your time and aggravation are worth nothing.

Brad Fallon

If you wish a crash dump file to be written, you must enable such dump files, choose the path and file name, and select the size of the dump file. For more information, see Enabling a Kernel-Mode Dump File.