CDN vs Dedicated servers : cutting through the hypes

Several days ago I was having problems with my web infrastructure. It seems that it got overloaded. Traffic-wise, it should not; because (1) I got a caching reverse-proxy (squid) installed, and (2) 3 terabytes of traffic is still very well within its current capacity.

Of course, other factors may change this equation, for example, when you have database-intensive pages. In this case, even several requests per minute may already be enough to overload your servers.

Anyway, I notified the parties affected by this and started the troubleshooting process. Following the usual process of benchmarking, profiling, and optimization (BPO); soon I got all fingers pointing to squid. So tried several others, varnish, nginx, ncache; all failed – but this is for another post. This post is about hype and how even IT experts fell for it.

When doing the BPO process, I got in chat with several friends which are quite well-known as IT experts. Help is always welcome, so I followed through the discussions. The suggestions were rather strange though, but all was still well. Until one suggested me to move my infrastructure to a CDN (Content Delivery Network).

I almost snorted coffee through my nose 😀
(I really should have it by IV drips to prevent this from happening again in the future, but anyway…)

A bit about CDN – it’s basically a network of servers all over the world, which hosts the same set of data. Then when a visitor requested the file, it will be served from the server closest to its location. So the visitor will be able to fetch the data with maximum speed.

That’s basically how a CDN works. There are variations, but this is the basic of it.

The problem with using CDN :

(1) CDN is for static contents : Facebook users probably have seen their browser’s status page showing lines such as “Loading static.ak.fbcdn.net”. That’s Facebook’s own CDN. Notice the first word at the beginning of the domain name? Yup, static.

There’s a reason why CDNs are for static contents. Static contents are easier to synchronize and deliver through the whole network. You can, indeed, synch and deliver dynamic contents through a CDN — but the level of complexity jumped by several magnitude at the instant. And so is the cost. Which brings us to the second reason,

(2) Cost : standard CDN will cost you at least 5x of your normal bandwidth costs.
SoftLayer.com brought a breakthrough in this case, where their CDN costs “only” twice the normal bandwidth.

However, it’s still 200% more expensive, and my web infrastructure hosts dynamic contents, which may change by the minute — so it’s absolutely out of the question.

If that friend is willing to foot the cost, then I’m willing to play with the CDN. It makes things more fun with none of the pain 🙂

Anyway, I’m still amazed at how even IT experts fell for hypes. I know CDN sounds cool & hip & sophisticated and so on, still, personally I prefer hard proof. Especially by proving any claim by myself.

But each to its own I guess. Just try not to misled others by spreading the hype too, okay?
Repeat after me – CDN is NOT a silver bullet. And as we all knew already, applying the wrong solution to a problem will just cause even more problems.

Regarding my problem, I solved it by moving squid’s cache to a different disk. Looks like the previous disk was defective. Including some further tweaks, the performance now almost doubled compared to before the trouble begun. Some of the websites fully loads in as little as 2 seconds. Not bad.

Performance-wise, it’s now alright. But my work still continue to further expand the capacity of my web infrastructure. For now, the customers are happy.

That’s what matters.

57 thoughts on “CDN vs Dedicated servers : cutting through the hypes

  1. My recipes:

    1. Do it “cache” way. Squid, memcache, etc. You name it.
    2. Then do it “mirror” way. Active/Active, Active/Passive, Load Balancer, etc.

    If those fail, and we need to load balance server through IP-level (Geo-targetting etc), then we need CDN.

    However, it’s always a good idea to group all those pesky static files (CSS, JS, audio/video, etc) into one IP only for sake of maintainability.

  2. Hi, here’s how I do it :
    – handle static and dynamic content on two separate fronts
    – dns hashing -> one squid per hash -> 99.98% cache hit
    – hardware config for my reverse proxies : good network interfaces (intel or broadcom), good CPUs (intel xeon 51xx if you can afford it), six to eight fast SCSI/SAS disks in RAID0 (you don’t care of losses), 8Gb RAM.

    We handle 7Gbps of dynamic and static traffic with ~100 such servers. The machines are loading at 10%, mainly with iowait.

  3. @Andry – totally agree. Facebook forgot to cache their database servers, and they paid pretty dearly for that mistake.
    .
    Grouping the static files into an IP is okay, but made it accessible through several sub-domain. For example: js.mysite.com, images.mysite.com, etc
    .
    This way, the visitor’s browser will open several simultaneous connections for EACH sub-domain — so the components on your website will load faster.
    .
    Thanks, keep sharing.

  4. @Guillaume – awesome stuff. These are the stuff that you won’t find in the books.
    .
    Thanks for sharing.

  5. @Slavi – personally I’d place a high-speed reverse proxy engine (like squid, varnish, ncache, etc); and let them handle those static files – automatically. No hassle on our part.
    .
    In my tests, squid can give you 500x performance increase. Varnish & ncache should be able to give you even more.
    .
    That should be enough for everyone, save for several most visited websites.

  6. I agree CDN is not worth the trouble unless your running something that gets attacked often. If your server goes down often then it might be something to look into. I know a lot of p2p sites are using this to keep running but for the average, or large business it’s not really needed.

  7. Have you tried Amazon’s S3 service, is it only for site that use a huge amount of bandwidth?

  8. Interesting post. I gave up on varnish few months ago, will give ncache a go tonight.. it should be nice since i like nginx … its light and fast!

  9. I think you forgot one of the biggest nuissance of internet: Distributed Denial of Service attack. Using CDN you can avoid being held hostage, or at least minimize any effects when subjected to one.

    Sufehmi wrote:
    >>>Grouping the static files into an IP is okay, but made it accessible through several sub-domain. For example: js.mysite.com, images.mysite.com, etc

    According to Yahoo Developers, that is not wise:
    http://developer.yahoo.com/performance/rules.html#dns_lookups

    One particular paragraph that I would like to quote is:
    “When the client’s DNS cache is empty (for both the browser and the operating system), the number of DNS lookups is equal to the number of unique hostnames in the web page. This includes the hostnames used in the page’s URL, images, script files, stylesheets, Flash objects, etc. Reducing the number of unique hostnames reduces the number of DNS lookups.”

    However, I do agree that CDN is not the silver bullet.

  10. Hi Anthony, thanks for joining in 🙂

    .

    I think you forgot one of the biggest nuissance of internet: Distributed Denial of Service attack. Using CDN you can avoid being held hostage, or at least minimize any effects when subjected to one.

    .

    That’s absolutely true. One of a reason to use CDN.

    .

    However, if you have a dynamic website (based on PHP, Ruby, etc); it will be pretty hard to put them on a CDN. Because it’s mostly for static content.

    .

    You could put a dynamic content on a CDN. But usually this will increase the cost rather significantly.

    .

    Another alternative is to find a good datacenter.

    .

    A good datacenter will be able to block a DDoS attack on its upstream router. So the load won’t do mch damage to your server.

    .

    According to Yahoo Developers, that is not wise:

    http://developer.yahoo.com/performance/rules.html#dns_lookups

    One particular paragraph that I would like to quote is:

    “When the client’s DNS cache is empty (for both the browser and the operating system), the number of DNS lookups is equal to the number of unique hostnames in the web page. This includes the hostnames used in the page’s URL, images, script files, stylesheets, Flash objects, etc. Reducing the number of unique hostnames reduces the number of DNS lookups.”
    .
    Another excellent point. True, spreading your content on too many domains will cause slowdown on DNS lookups.
    .
    However, Yahoo themselves actually suggested us to spread the content on several domains :
    http://developer.yahoo.com/performance/rules.html#split
    .
    A browser can fetch up to 3 objects from a single domain.
    So, if you spread your content on 4 domains; that means the browser will be able to fetch up to 12 objects simultaneously. On many cases, this will give SIGNIFICANT performance boost.
    .
    The key is balance – do not use more than necessary, since you’ll be slowed down by the DNS lookups time. But also don’t use too few – then you’re not making it as fast as it could be.
    .
    We have several customers with up to 50 objects in their frontpage. Crazy, I know.
    However, using various tricks and techniques, we were able to make the website loads in as little as 4 seconds.
    .
    The website have as much traffic as Republika / Jawapos website. These were accomplished just with a single dual-core server.

  11. The content on my site is mostly static. However, I’m being told that I need to convert to dynamic to gain more flexibility. Is this true?

  12. content on my site is mostly static. However, I’m being told that I need to convert to dynamic to gain more flexibility. Is this true?

  13. Hey james, looks like you’re spamming this site. Normally I wouldn’t care, but this time you’ve copied by comments word-for-word. That doesn’t seem too smart.

  14. I accept CDN is not that worth for trouble unless your running something that gets attacked often. If your server goes down often then it might be something to look into. I know a lot of p2p sites are using this to keep running but for the average, or large business it’s not really needed.
    Angeline @ marcus evans scam

  15. Wow,great! This article is really excellent! You must see !

    Blu-ray Burner software is a powerful and best video Blu-ray DVD burner and Blue-ray burner tool.It can help you burn videos to Blu-ray Disc (BD) and also enable you to burn all these videos to Blu-ray disc and create your own Blu-ray movie.

  16. interesting article, I really liked your article. I will keep visiting to read your article quality. if you’re willing I hope you can visit my blog. I am very excited to read and read your article. from your loyal fans.Tika

  17. These are great tips not only from the main post but in the comments as well. I feel as though almost everything has been touched on. Thanks for a great discussion!

  18. naturally like your website however you have to test the spelling on several of your posts. Several of them are rife with spelling issues and I find it very bothersome to tell the reality however I’ll certainly come again again.

  19. Howdy! Someone in my Myspace group shared this website with us so I came to give it a look. I’m definitely enjoying the information. I’m book-marking and will be tweeting this to my followers! Exceptional blog and brilliant design and style.

Leave a Reply

Your email address will not be published. Required fields are marked *