I’ve always believed that the best solution is the simplest one. They tend to be more fool-proof, reliable, and easier to maintain in the long run.
That said, I had a hard time believing my own eyes when I found what might be the holy grail of High Availability (HA) and High Performance Computing (HPC) : easy & transparent (no apps recode) sharding, with HSCALE.
A bit about sharding : also known as “shared nothing architecture”, this will enable almost infinite scalability to any system. Google, Facebook, Flickr, MySpace – you name any high-traffic website, and there’s very good chance that they already implemented it in one way or another. Whenever they need more performance, they only need to throw another box into the system, and that’s it. Instant performance gain.
However, there’s a significant drawback – to most, sharding means a total rewrite of your application.
I’ve been providing consultation services to several high-traffic websites. They’re doing pretty well now, but some are about to go to the next level. Where their current standard 4 tier architecture; edge server – webserver – apps server – database, no longer suffices, and will need to scale up (bigger server), or scale out (more servers).
Most, however, could not rewrite their apps for sharding for various reasons; number of resources needed, development time, potential reliability issues, and so on.
Let’s back up a bit (this can be very helpful; sometimes we’re to busy charging forward, as fast as we can – thus missing all the hints we encountered along the way).
If we examine this infrastructure, almost all aspects of it are already easily scalable. Edge servers: check, you can throw another box and instant performance gain. Webserver & Apps server: checked as well, especially with PHP where it already implements the shared-nothing architecture. We can even load-balance them using software and have it outperform and out-feature the hardware load-balancers [1].
So that leaves the database as the bottleneck.
For the database, my requirements are simple (cough) :
- Scale out, not up : There’s a limit to how big a server can be. Several high-traffic websites (YouTube, MySpace) had chosen this path and they’ve experienced the pain when they hit the dead-end – no more server bigger than what they already have. So it definitely will have to be a scale-out (more boxes) solution.
- Transparent : No application recode.
- Easy to maintain : I want a system that doesn’t need constant babysitting. I (we all) have better things to do with my (our) time.
- Works for MySQL : it’s the most used database for web-apps, currently.
- Affordable : at least, provides several pricing schemes. Not just one. We all have different needs and budgets.
Yeah, I’m picky. And that caused me serious headache – for weeks I was not able to find anything.
Several that I looked into :
- MySQL cluster : NDB = memory/RAM-only tables? Full backup/restore everytime we add new server? No thanks.
- MySQL replication : I looked into single/multiple master – multiple slaves setup. Doesn’t like it (1) I have systems which will need scalability for write ops as well (2) for read ops, caching (especially memcache) can provide better performance gain with less headache (3) It’s Asynchronous – meaning: the user edited his profile, clicked save; and lo, the old profile is still there (because the update hasn’t reached the slaves yet) (4) multiple master scheme seems only safe for up to 2 masters; I don’t fully understand the risks with more than 2 masters (not enough case studies).
- MySQL partition : Good idea (although doesn’t do vertical / column-level partitioning yet, but that’s ok [2]). There are limits, but it’s not a showstopper for most. However, it doesn’t scale-out.
- Various MySQL load-balancers, replication, HA solutions : Pick any / several of these : unproven, costs an arm and a leg, proprietary, needlessly complex, need loads of maintenance work, etc.
So it was with great disbelieve when I read the description of HSCALE by Michael Shadle : “Sharding pushed from application level, to transparent DB proxy”.
Here’s the beauty of its simplicity : It’s just a plugin for MySQL-Proxy π
MySQL-Proxy is a high-performance proxying daemon for (guess what) MySQL, created by Jan Knesche – also well-known for Lighttpd: one of the fastest webserver on Earth. Then Peter Romianowski decided that he’ll build his sharding-proxy upon this great foundation.
So we have HSCALE.
Amazing, this solution to high-performance database (HSCALE) is similar to high-performance Apps server (HAproxy) : through proxying π
Configuration is dead simple. Ability to spread queries to multiple backend servers (with some minor limitations, but it’s being actively worked on) – transparent to the application. Version 0.3 will enable us to run multiple instances of HSCALE (to avoid it becoming the bottleneck). And it’s free (do remember to contribute back if your website becomes successful because of it). With MySQL-Proxy’s scripting capability, it provides an easy & clean way to extend it to fulfill your own requirements. Etc.
Currently not many people knows about HSCALE’s existence, so I’m writing this article to let more people know about it.
At the moment HSCALE’s is already making great progress. I’m sure it’ll become better at much faster rate if more people know that it existed. So here it is.
And kudos to Peter Romianowski & Jan Knesche for their software. Here’s cheering for great future of both MySQL-Proxy & HSCALE.
A small FAQ for now, no doubt the list will be bigger as I (and you) found out more about this :
- Q: So, come again, how does HSCALE works?
A: First you define the partitioning scheme on HSCALE’s config file. Then it’s all automatic from there, HSCALE will partition / regroup the rows – not your application. - Q: What’s the difference then with MySQL’s partition feature ?
A: You can spread HSCALE’s partitions to multiple databases (meaning: multiple backend servers), while MySQL’s only works on a single database.
OK, my head still hurts from doing marathon reading of so much documentations on this topic (transparent sharding) ! Chances are great that there are errors in this article. If you do find them, please leave a comment so I can fix it. Thanks !
In the meantime, shall we play with HSCALE in the Sandbox? Have fun !
[1] HAproxy seems indeed is the leader of the pack (apps/web server high availability + load balancers) at the moment. It can load-balance 20 Gbps with the CPU 85% idle, have loads of great features, and secure. Even 37signals.com swears by it (and so does several Forbes 500 companies).
[2] MySpace found out that vertical-partitioning does not scale.
very interesting indeed, thanks for sharing
Wonderful!
It kinda reminds me of Oracle RAC. Without painful configuration part and deadly 25k$/processor license part.
Wonderful. Wonderful. Wonderful!
I have two questions however:
1. Concurrency. Is it good? How fast can it be? Have you tested it yet?
2. Transaction. How does HASCALE handle multiple “INSERT” on multiple tables in single transaction. If it spread each queries unto separate ‘partition’ –which is it is desired effects–, will it leaves database in unstable state?
3. My love/hate relationship with MySQL: How about full-text-search InnoDB? Does HSCALE handle it well?
nice to read it, very descriptive and informative article …. thanks guys
@Andry – I haven’t got the chance yet, but we will definitely be testing this extensively. For the moment, you can look at some benchmarks by the author.
.
Re: transaction : I believe it spreads the INSERT ops to multiple tables / database, according to how we defined it in its config file. The config file looks similar to how MySQL partition its tables (much simplified though).
.
Re: fulltext – no idea yet π I’d definitely be interested in this as well.
.
For more detailed information, I highly suggest you to enjoy Peter’s presentation on HSCALE. Excellent stuff.
Note: it’s said in the presentation that multiple database capability is slated for version 2.0 – fact is, it’s already in version 0.2 (guess he forgot to update his slides). Amazing.
.
Anyway, HSCALE is indeed in its early stages (although it IS progressing very rapidly). What I like very much from it is the concept. He nailed it right the first time – transparent performance gain through proxying.
I can’t believe that other experts missed it completely, and instead beating around the bushes with various inferior solutions (example: MySQL cluster is truly a joke – I didn’t even bother myself to test it once I read about NDB being memory-only database)
.
It’s going to be fun π
@Andry – will it leaves database in unstable state?
.
Sorry, missed this one – since each “partition” in HSCALE is a complete and a proper MySQL table, it will not cause any instability / data corruption.
Okay sir.
Thank you for answers and pointer. Going into deep hack mode unto this.
after finding this blog, i got many things to know…. especially from this post, well written and very informative post ! thanks
Wow, I haven’t studied about that yet..
but i think it’s a bit like clustering right?
thanks for the information.
It’s quite similar, yes. A loose database cluster.
i really enjoyed reading it, I had problems with my site and found this post very helpful
i really enjoyed reading it, I had problems with my site and found this post very helpful. thanks guys
i really enjoyed reading it, I had problems with my site and found this post very helpful.
thanks
And I say get a mac for scalable powerful AND simple computing. And what about new frameworks like CakePHP? Is HSCALE compatible with this?
HSCALE is quite transparent, so it should work with pretty much anything.
Vert informative post, I came to know alot. Thanks for sharing
Thanks for such a nice post. Its helpful for me during my problem.
Great Post! thanks
very interesting, thanks ..
This is some very nice coding work! Being scalable is a huge key to keeping all users happy.
hey,, a refreshing post. nice to share, thanks
This is some really good thinking.
Very Intresting Coading Thanks for sharing!
Great.
yes its a loose database cluster ….
Very Informative post. It would surly help out lot of web masters in creating masterpiece web application. I am going to digg it in my favs.
Great post, Thanks
Thanks guys, nice to read the articles …..
Thanks for sharing, very usefull points…
Nice work, thanks for sharding..
Nice Article Thanks!
Great Post
Great blog helped me loads when I built my site
Jill xx
thanks for the info very easy to follow and very imformative
jen x
would surly help out lot of web masters in creating masterpiece web application. I am going to digg it in my favs.
Great post!
Really interesting. I have read a lot about this on other articles written by other people, but I must admit that you is the best.
Interesting piece of information need to tell more.
Good post.. i will now keep managing the application like you described here. Thanks
I like your thoughts dear.
Where do you find this stuff? It really shakes the mind
Great Post! Thanks. I really enjoyed reading it. i will now keep managing the application like you described here
———————————–
Artikel Kesehatan | Download PDF JawaPos | Practical Lessons in Yoga | Review | BitDefender | Cheap Softwares | CNN | Natural and Technology
Thanks for sharing your views on the topic. It makes one think and look the other side of the story.
Great article thans alot.
Nice to read the article, Thanks
i was looking around and got your post, nice to read it!
This is some very nice coding work! Being scalable is a huge key to keeping all users happy.
great blog thanks for sharing,
Great blog with valuable information.. thanks for sharing with us..
I have view your article and this is interesting and very useful. I need any more articles. Thanks for knowledge.
Very Intresting Coading Thanks for sharing!
Great blog with valuable information
Very informative… I enjoyed and learned.
Nice Blog!
Very Informative Blog, I just book mark this blog.
Quite interesting. Thank you to explain this.
This was very helpful post, Thank you guys !!!
Thanks for sharing this plug in. I’ll see what I can do with my Doing Business Dubai and Company Incorporation Dubai site.
Very helpful plug in ….
I guess with so many other plug ins that I see, this one is better.
Dubai Property | Dubai Rental | Sell Dubai Property
This is great plug in, i highly appreciate sharing quality stuff, best of luck.
Hy, very interesting & informative blog, gud job done. thanks
I like this blog very much keep it up this good work
I don’t think that i’ll need this information. However, its very interesting. I enjoy it. Thanks.
Very Nice Article!
I like It!
Just wanted to say thanks π !!!
This was an informative post, thank you sharing all this stuff with us !!
interesting & informative
Hey Harry, keep up the blogging, you really have a great talent for it!:D
I guess with so many other plug ins that I see, this one is better.
yes very intresting blog
just wondering if you could allow us to publish this post on one my blog, can i ?
Thanks banget om Harry, sudah nolongin dan langsung turun tangan π Seminggu ke depan ini kita coba dulu lihat kondisi. Aku server cukup stabil dan kondisi CPU dan Memory terjaga. Paling tidak untuk sementara sudah lumayan, sampai kita pikirkan lagi strategi lain. Sekali lagi, thanks.
do you have any posts on PHP stuff, Please let me know if you have any ..
You should try living in India for a while, they have got the entire system down to art form so you end up begging the authorities to take a bribe just to do things like remove the rubbish from outside your apartment! Slow websites I can handle, authorities that make their money by taking bribes for doing what they should be doing anyway can test my patience to the limits.
It is a very good issue, that of simplifying things.
It is almost always, the simpler, the better.
I used to be more than happy to seek out this internet-site.I wanted to thanks in your time for this glorious read!! I positively enjoying each little bit of it and I have you bookmarked to check out new stuff you weblog post.
LX0-101 // 312-49 // NS0-502 // BH0-006 // 70-448 // 1z0-050 // 642-145 //
IΓ’β¬β’ve been providing consultation services to several high-traffic websites. TheyΓ’β¬β’re doing pretty well now, but some are about to go to the next level. Where their current standard 4 tier architecture;
The post is absolutely great! Lots of great info and inspiration, both of which we all need! Also like to admire the time and effort you put into your website and detailed info you offer! I will bookmark your site!
Many fashion ladies always pursue fashion Cheap Louboutins, they often pay attention to famous brand. Especially purchasing their Discount Christian Louboutin, certainly, they try to choose a pair of distinctive Cheap Louboutin Shoes. Beautifully show daily work profile. It is no doubt that Christian Louboutin Ankle Boots are a fun choice in true fashion for women. This Christian Louboutin Boots will be perfect for the weekdays and the weekends.
Quite informative and easy to understand technical explanations.
Great sharing, very good and fantastic article.Awesome and incredible post.The effort is like appreciation.
Great post! You have a very informative article here. Thank you for sharing this tips
It’s so tough to encounter right information on the blog. I really loved reading this post..
Great share. Some brilliant piece of information here on your blog.
i really enjoyed reading it, I had problems with my site and found this post very helpful. thanks guys
Awesoem some commanats good and i bookmark this page today new dat Thanks!!
I really appreciate this wonderful post that you have provided for us. I assure this would be beneficial for most of the people. Thanks for sharing the information keep updating, looking forward to more post.