Monday, January 3, 2011

VMware Workstation 7.1 Unrecoverable Error

I was using VMware Workstation 7.1 on Ubuntu 10.04 with a Windows XP SP3 guest.  I had a WinXP virtual machine (VM) open, and suddenly it crashed, with this error message:

VMware Workstation unrecoverable error: (mks)
Unexpected signal: 11.
A log file is available in "/media/VMS/VMware VMs/WXMUProjectC/vmware.log". Please request support and include the contents of the log file.

To collect data to submit to VMware support, select Help > About and click "Collect Support Data". You can also run the "vm-support" script in the Workstation folder directly.
We will respond on the basis of your support entitlement.
I pursued that option, but it turned out I didn't have a support entitlement, so I turned to this process of researching the solution on my own.  I had gotten a similar error message once before.  I wasn't entirely sure what I had done to solve the problem in that case, other than to keep flailing around until something clicked.  But as I reviewed that previous post, I did recall a different error message that I had gotten when starting the VM.  I started it again and saw this in the lower right-hand corner:
Could not connect Ethernet0 to virtual network "/dev/vmnet8"
More information can be foun din the vmware.log file.
Virtual device Ethernet0 will start disconnected.
I tried a search for that error and came across some possible answers.  One was just to reboot, but I had already done that.  Another was to use the Virtual Network Editor.  I found a VMware video on that.  It told me that vmnet8 was associated with NAT, which was the kind of network connection I had selected in VM > Settings > Hardware tab > Network Adapter.  The video then started to talk about adding a network adapter.

It seemed that this could have two implications for me.  First, I had just put a network interface card (NIC) into the computer, as an alternative to the motherboard's onboard network connector.  I had done that in an attempt to deal with a networking problem that turned out to be just a bad cable.  I did think that, previously, I had been getting the second error message previously, the one just quoted, "Could not connect Ethernet0."  But I had not been getting the crashes previously.  So probably I could fix that error by just removing the unnecessary NIC, instead of trying to figure out how to configure it as described in the video.  Second, I had preserved my Ubuntu /home directory during this most recent installation of Ubuntu.  I had also recently installed a new motherboard.  Possibly the settings that I had saved in that /home directory were still dreaming of the old days, with the previous motherboard; perhaps I would have to configure the VM's ethernet adapter anyway, so as to make it comfortable with the new motherboard.

I started by shutting down the machine, removing that unnecessary NIC, and restarting.  (Before shutting down, I checked Ubuntu's System > Administration > Update Manager, just in case there were updates that would make my life easier in unknown ways.)  I powered up the VM.  No Ethernet0 error message.  OK, one problem solved.  Would it crash?  I worked with the VM for a couple of days, but then it did crash again.  It seemed that the frequency of the crashes was much reduced, so maybe removing the NIC helped.

This time, I took a look at the log file.  It was in the folder containing the other files for the VM, including the .vmdk file, and it was named simply "vmware.log."  It contained a large number of entries, going back days, including what looked like several hundred, at the end of the file, that all occurred within the last second before the crash.  The first one in that last second was "Caught signal 11 -- tid 3652."  There hadn't been any others for several minutes before that, and I also noticed that the "11" was the same number as appeared in the error message onscreen (quoted above).  This did seem to be the beginning of the end.  After that "Caught signal 11" message, there in the log file, I saw many repetitions of a few other types of messages, like these:
mks| SIGNAL: stack B6CE1AE0 : [etc.]
mks| Backtrace[0] [etc.]
mks| SymBacktrace[0] [etc.]
mks| Panic: dropping lock (was bug 49968)
mks| Unexpected signal: 11.
mks| Core dump limit is 0 KB.
mks| Child process 19031 failed to dump core (status 0x6).

mks| Backtrace[0] [etc.]
mks| SymBacktrace[0] [etc.]
where [etc.] refers to various sets of computer gibberish (e.g., 0xb7823410).  It went on from there, but now we seemed to be at the point of no return, where the log showed VMware giving me the "Unexpected signal" error message onscreen.  A search led to a thread in which people were saying that they avoided this error by tinkering with their screen resolution.  When I saw that, I figured that we were talking about a kind of general reaction to somewhat incompatible hardware:  could be the NIC, could be the screen resolution, etc.

I had noticed that this particular crash occurred when I used my KVM to switch from the computer in question to a different computer.  It was a new KVM, an IOGear GCS72U.  I had previously been using a PS/2 KVM without problems, but my new motherboard did not have two PS/2 sockets, so I had to switch to this USB KVM.  Had I installed the KVM before or after taking out the NIC?  I couldn't remember.  But the KVM was a suspect.

Another suspect was the keyboard.  I had also had to get a new keyboard and mouse -- because, of course, I was using PS/2 devices previously, and now I had to have USB devices.  It was an inexpensive keyboard, a Logitech K120, and I had noticed that it did not work consistently on the other computer.  It would be working fine, and then there would suddenly be no more keyboard input.  The mouse was still working, but not the keyboard.  At that point, it didn't matter whether the keyboard was connected to that other computer through the KVM or was plugged into it directly; either way, it wouldn't work.  But I hadn't done a scientific study to determine whether it was the keyboard or the USB KVM that was screwing up.

So, putting it together, what had happened in this case was that I was using VMware on computer no. 1 (C1).  The keyboard and mouse had been working fine on both C1 and C2.  I hit the KVM switch button to move to C2.  That's when VMware crashed; and at the same time, suddenly the keyboard was not working on C2.  But the USB mouse would still work.  So it seemed that either the keyboard or the KVM was sending a funky keyboard-related signal to C2.

C2 was an older system, and I was in the process of upgrading it, and I thought that might solve the problem.  But in the meantime, I plugged my old PS/2 keyboard into C2 and rebooted, with the new keyboard and KVM still connected to both computers.  In this setup, over a period of days, I observed that the PS/2 keyboard continued to work consistently throughout, but the USB keyboard would sometimes stop working on C2, like before.

Then I had another VMware crash on C1, and that made me decide to try another angle.  I unplugged the KVM from C2.  By this time, I had replaced C2; now it was pretty much the same computer as C1 (same kind of motherboard, CPU, and case).  So now the mouse and keyboard connected to the KVM would work only on C1.  On C2, I kept using the PS/2 keyboard, and added a USB mouse.  If there were still crashes or freezes, I would have a better idea of whether the problem was the new USB keyboard or the new USB KVM.  At this writeup, a couple of weeks later, my recollection was that I did not have any further crashes.

I decided to send the KVM back to IOGear.  In the interim, I connected my old PS/2 KVM to both computers, and used it only with the PS/2 keyboard.  So now I had a separate USB mouse for each computer, but only one keyboard for both of them.  It took a while to get used to switching mice when I switched the keyboard from one computer to the other, but the point here is that there were no further crashes.  I had to wait for the replacement KVM to arrive from IOGear before I could say for sure whether it was the keyboard or the KVM; but by that point I had decided to stop using VMware on Ubuntu,

2 comments:

raywood

Another example of how it can be *really bad* when your VMware Workstation has an unrecoverable error.

Anonymous

Thanks for this hints!

I my case i stopped my running vnc servers to solve the issue!