When checked with tools such as "top", the row showing "%wa" (meaning: I/O wait) was very high. Normally < 10%, it was ranging from 50% up to 98% instead.
This is a very alarming information. Because the server is using SSD disks 🙂 so its I/O (input/output) should be #really fast. Not bogged down like this.
I checked the daemons (server software) such as Apache, MySQL, Varnish, etc – and they were all idling. None were busy.
So the I/O load came from somewhere else. Probably from the hypervisor (physical server) itself. Which means it's a possible hardware problem.
Because the datacenter has concluded that it was not a "noisy neighbour" – another VM (virtual machine) in the same physical server that's hogging all the I/O resources. Pretty much all of them were idling, just like mine.
However I'll need some data to convince the datacenter to do hardware check on its SSD storage cluster. So I wrote this little bash script : http://pastebin.com/SxxuaVy4
The script logs server's I/O status into a CSV (Excel) format. So it can be very easily graphed later.
Using iostat tool, it probed the server's current I/O load.
Then the script is executed every minute, by running in as a cronjob.
You may notice that iostat is executed with "-d 1 3" parameter. Which means "run 3 times, with 1 second delay in between"
This is because iostat's first run always cause a spike in I/O load 🙂 so the numbers would be inaccurate. I noticed the numbers tend to stabilize after the 3rd run, so I set it up that way.
Of course, you can very easily modify this script to monitor something totally different 🙂 just change the iostat / head / tail / cut part to something else – voila.
Attached is a graph created from one of the log. The X axis is timestamp, in military format (24 hours)
I submitted the logs to the datacenter.
It convinced them to do checks on the storage cluster – and voila, they found some degraded disks in that cluster 🙂
Damaged disks replaced, storage cluster rebuilt – and everyone lives happily ever after ? 🙂 Fingers crossed. Happy ending !
Post imported by Google+Blog for WordPress.