Mr. Romi, founder of IlmuKomputer.com (IKC), yesterday asked me to help optimize this website. A bit about IlmuKomputer.com, it means “Computer Knowledge”, and contains a lot (and I mean it) free high quality computer tutorials.
As you can easily guess, the website is very popular. On peak hours, it’ll usually become overloaded, and will become unresponsive.
I’m only too happy if I can be of assistance to IKC’s team in their good cause. So I started working on it with help from one of my staff, Yopi.
Turned out that what we’ll be doing will be very different with what most others do. Anyway, IKC is a very popular website (and “slashdotted” daily, by leechers), so what works for most others doesn’t work for us.
The Bottlenecks
A bit of background – IKC uses WordPress as its CMS. It’s a very nice CMS, and makes your life easier. I’ve used WP myself since version 1.5.x. However, being database-based, there are a lot of points within its a WP-based infrastructure which can become a potential bottleneck. So if your website started to become popular with this CMS, you will need to start optimizing it.
After examining the situation for a while, it’s clear that MySQL was THE bottleneck. Output of top shows it using at least 8 times of CPU time than other service. Mr. Romi also told me how it kept on falling down on peak time.
Apache (and PHP, since it’s compiled as Apache module) is the next one; with each of its process using more than 10 MB of RAM. This may seem insignificant at first, but multiply that by (potentially) 150 processes – and you’ve got quite a memory hogger here.
Also CPU-usage wise; I’m quite surprised to see that each incoming request will cause the particular process’s CPU usage to spike to more than 50%.
Initial actions
I asked Mr. Romi to increase the size of MySQL’s internal cache size. He did, but the machine still fell down in daily basis.
He has also implemented caching on the app server (PHP) by way of wp-cache plugin. Still no joy too.
The Edge
I decided that we need to go straight to the “edge”, and stop the load there.
I proposed that I setup Squid in HTTP Acceleration mode. This way, most of the requests won’t even touch Apache, much less MySQL. Squid will bear most of the load, but since it’s very efficient, it should be able help a lot in making the website perform better.
Since I’ve got a few things to do myself, I asked Yopi to setup Squid in our test machine.
I just gave him pointers now and then, yet he managed to finish testing the setup and implement it in IKC’s server in just about 3.5 hours.
Then I showed him “tail -f /log/squid/access.log”, and we watched in amazement on how quickly the TCP_MISS lines are changing to TCP_HITs.
After about 12 hours, I increased the cache_mem size, and the TCP_HITs are slowly changing to TCP_MEM_HITs.
The result
Squid is working as we expected.
Average server load dropped from 30% plus to about 3%. While squid’s CPU usage increased from 0% to an average of only 2%. A very nice trade off.
After about a month, I checked the website’s logfiles, and saw some very nice numbers — traffic to IlmuKomputer.com has doubled ! Needless to say, Mr. Romi is very happy with it.
I also found that everyday there will be people downloading the contents using crawler software – such as Teleport Pro, wget, etc. I asked Mr. Romi if he’s got problem with it, and he says no. It is his mission to spread knowledge for free after all. So I let these leechers alone.
Come to think of it, it’s possible that these crawlers are the ones causing IKC server to fell down at peak hours. Example, Teleport Pro is able to download 10 links simultaneously at the same time. Then once any of it is finished, it will instantly start download the next one. When all 10 downloads access the database, and many crawlers at the same time, not many servers will be able to stand up to it. It’s like being machine gunned wearing just a simple leather cloth. If you have had the experience of having your website linked from Slashdot or Digg, you’ll understand what I’m talking about.
In this case, squid acted as a thick titanium armor, and taking most of the hits for your server. I suspect now the number of crawlers has increased than before, but it shouldn’t be a problem.
MySQL is a bit strange though. Sometimes its CPU usage can be as high as 160%. Thankfully this is very rare, so it’s probably just some internal clean-up routine.
One day, after happily watching the low load on the server for a while, suddenly everything froze. Even my SSH connection. Attempts to reconnect to the server failed.
After a while, I was finally able to connect again. Looking around, I noticed there’s some sort of bandwidth limiter daemon running on the server. After consulting with Mr. Romi, I killed it. The problem stopped.
Happy ending ?
I’m still monitoring the server as we speak for glitches. For example, squid seem to hang from time to time. This can be caused by anything from bad memory to problem with specific hardware configuration; so for now I’ve setup a cronjob which will restart it in certain intervals.
It seems to help, so I can troubleshoot the problem in peace.
Anyway, I’m sure that with the increased availability, even more people will visit the website (Ed: confirmed!). Then at some time in the future, we may find the server overloaded again.
In that case, there are still many things which we can do to keep IKC up & running in just one server :
- Coral-ize internal links : Coral is a global cache with servers all over the world. It has proven to help people with overloaded servers to lighten their load (when slashdotted, digged, etc). With the Coralize plugin, all of your internal links will point to its Coral cache.
Actually, for most people, this may be the easiest and the best step they can do. I can setup Squid because IKC has its own dedicated server. Not everyone does, I personally also own a (shared) webhosting account. Coral CDN (Content Distribution Network) is a very nice & easy solution to us. It’s rarely mentioned though, so here you go.
If you’re not using WordPress, you can still utilize Coral CDN easily ! Just append .nyud.net:8080 to your links. For example, if you access http://harry.sufehmi.com.nyud.net:8080, you’ll actually access a Coral server, serving a copy of my website from its cache.
I did say that it’s very easy, didn’t I ? 🙂
- RAM Upgrade : This will enable Squid to have bigger memory cache size, therefore increasing its effectiveness significantly.
- Roundrobin Edge servers : If the load is so high that even Squid is overwhelmed by it, then we can implement a cluster of Edge servers. People can volunteer their servers and have it act as the edge server for IlmuKomputer.com.
The incoming requests are spread over the edge servers by way of Roundrobin DNS feature. It’s not the best way to do it, but it’s very easy and the cost is almost nothing.
- Use lighttpd : Apache is a rather heavy webserver. I personally like its (amazing) flexibility (there’s a reason why it’s called the Swiss Army Knife of Webserver), but at times you’ll need something else. From my experience, lighttpd + fastcgi is very nice alternative to Apache + PHP. The features are now quite similar to Apache’s, but it’s much more lightweight. Its community is also quite helpful and happy to help a newbie within reasons. Recommended.
- And many other ways
Last, we’d like to say thanks to Mr. Romi for giving us the opportunity, it was very interesting ! Hope IKC will become even more successful in the future, therefore benefitting even more people. Well done pak.