One of the server under my supervision has started to experience problems since a few weeks ago. It has experienced several kernel Oops-es (equivalent to Windows’ BSOD I think), but sometimes it just crashed hard – no message whatsoever in the logfiles. This has me baffled for a while – I thought Fedora needed to be upgraded to the latest version at first. But then it was clear that even after updated with the latest updates, it’s still experiencing problems.
Somebody pointed out that memory should be the prime suspect at this case. So I ran memtest86, and true enough; it found hundreds of bad bits in the first 512MB.
Unfortunately, it is NOT possible to print out the error messages from memtest86, which will cause problem for me when I tried to return the memory module to the supplier. So I started to look around.
(note to self: recheck that these errors are not caused by wrong memory timing in BIOS)
Thankfully there’s memtester. I’ll give it a try probably tomorrow.
Along the way, I found several other relevant links:
Hope you’ll find it useful.