Linux diagnostic software
One of the server under my supervision has started to experience problems since a few weeks ago. It has experienced several kernel Oops-es (equivalent to Windows’ BSOD I think), but sometimes it just crashed hard - no message whatsoever in the logfiles. This has me baffled for a while - I thought Fedora needed to be upgraded to the latest version at first. But then it was clear that even after updated with the latest updates, it’s still experiencing problems.
Somebody pointed out that memory should be the prime suspect at this case. So I ran memtest86, and true enough; it found hundreds of bad bits in the first 512MB.
Unfortunately, it is NOT possible to print out the error messages from memtest86, which will cause problem for me when I tried to return the memory module to the supplier. So I started to look around.
(note to self: recheck that these errors are not caused by wrong memory timing in BIOS)
Thankfully there’s memtester. I’ll give it a try probably tomorrow.
Along the way, I found several other relevant links:
[ An excellent guide on troubleshooting hardware problems on Linux ]
[ List of many diagnostic tools on Linux ]
[ Comprehensive list of tools and procedures for testing hardware on Linux ]
Hope you’ll find it useful.

