A few days ago a client contacted me and said that their website is down. I checked the server, and indeed the server is very overloaded. In total it got 36 cores – and they’re all 100% utilized.
As usual I checked the whole stack, but today it’s something different – the MySQL / database server was the culprit. It was performing very slowly, and in turn caused the webserver to slow down as well.
In the slow query log, some queries kept showing up with crazy query times, in tens of seconds.
For comparison – all of the other queries finishes in less than a second.
MySQL’s slow query log is your friend – it enable you to find problematic queries very quickly.
And all those slow queries are in these tables:
wp_statistics_useronline
wp_statistics_visitor
I checked the currently running queries with “mysqladmin processlist“, and almost all (hundreds of them) queued queries are those involving those tables, looking for specific content in the field “ip”. They’re all looked like these:
SELECT location FROM wp_statistics_visitor WHERE ip = '88.88.88.88'
SELECT * FROM wp_statistics_useronline WHERE ip = '99.99.99.99'
On a hunch, I checked the structure of those tables. And right enough, there’s no index for “ip”
An index can increase a query’s performance by a, very, significant amount.
When the size of those tables are big enough, and you have a higher traffic than usual (they just published a very important information that’s of interest to a lot of people) – then suddenly these seemingly innocent queries were able to bring down a 36-core server to its knees.
Anyway, now we know the culprit, the solution is easy enough:
alter table wp_statistics_useronline add index (ip);
alter table wp_statistics_visitor add index (ip);
And voilĂ – in an instant, the website was up again, and the CPU utilization dropped to nearly zero.
Everyone’s happy, and I have also notified the developers about the issue as well.