As some of you may have already guessed – a week went by, and the query still hasn't finished 🙂
I was asked to help, so I tried almost every trick that I knew, including (but not limited to) :
# vertical partitioning : split the columns into a table each
# horizontal partitioning : splitting the table into 10 smaller tables
# putting the tables on fast SSDs
# putting the tables on RAM disk : on several occasions, I actually managed to saturate the memory bandwidth……
# enlarging the JOIN buffers : up to tens of gigabytes.
# etc, etc.
Still no joy. The query just won't finish. And I've been sleeping on my work desk for almost a week.
One day before the deadline (just like in the movies, huh?), another of our data analyst suggested that I hashes the fields being compared, and then compare the hashes instead.
To my surprise, it works GREAT. It's like a miracle.
The query which wasn't finished in a week, now finished in an hour.Â
We got the job done, at the very least, 16800% faster.
Mind = Blown 🙂
So, here's how it's done, hopefully you'll find it useful too later.
Enjoy !
======
ORIGINAL QUERY
SELECT a.f1, a.f2, b.f1, b.f2 Â
FROM t1 AS a, t2 AS bÂ
WHERE a.f1=b.f1 AND a.f2=b.f2 AND a.f3=b.f3 AND a.f4=b.f4 AND a.f5=b.f5;
======
CREATE THE HASHES FROM EACH TABLE
CREATE TABLE hash1 (KEY(uid), KEY(hash)) ENGINE=MyISAM ASÂ
(
SELECT uid, MD5(CONCAT(f1,f2,f3,f4,f5)) AS hash Â
FROM t1Â
)
CREATE TABLE hash2 (KEY(uid), KEY(hash)) ENGINE=MyISAM ASÂ
(
SELECT uid, MD5(CONCAT(f1,f2,f3,f4,f5)) AS hash Â
FROM t2Â
)
======
COMPARE THE HASHESÂ
This is where the magic happens :)Â
we compare the hashes, and then we store the uid related to those hashes.
CREATE TABLE hash_result (KEY(uid1), KEY(uid2)) ENGINE=MyISAM ASÂ
(
SELECT a.uid AS uid1, b.uid AS uid2Â
FROM hash1 AS a, hash2 AS bÂ
WHERE a.hash=b.hashÂ
)
======
CREATE THE END RESULT Â
Collect the records with the corresponding uid :
CREATE TABLE end_result ENGINE=MyISAM ASÂ
(
SELECT a.f1, a.f2, b.f1, b.f2Â
FROM t1 AS a, t2 AS b, hash_result AS cÂ
WHERE a.uid=c.uid1 AND b.uid=c.uid2Â
)
fastest car in the world | World of Cars
world top leading car companies information and photos. lamborghini,bmw,audi,mercedes,maserati,pagani,pgo,toyota,honda. fastest car in the world. fastest car fast car. fastest car wallpaper. bugatti veyron. fast car. bugatti veyron. fast car. bugatti veyron. fast car. bugatti zonda. fast car …
This post has been reshared 1 times on Google+
View this post on Google+
Post imported by Google+Blog for WordPress.
Credit & kudos to Mr +Yanmarshus Bachtiar , who suggested the hash method. You sir is a genius. Thank you for enlightening us 🙂
Great technique. I'm aware that the hashes were done against multiple columns, but am I correct in assuming that you're aware and accepting of the risk of hash collisions here? Especially when considering that you have hundreds of millions of records?
Great technique and idea! Thanks for sharing sir… 🙂
+Arga Nugraha – Yup, jadi ingat di posting ini di Facebook, ada yang komentar, menyebutkan menggunakan CRC32 & secara tidak langsung merekomendasikan itu 😀 padahal CRC32 itu 32 bit = cuma ada 4 milyar kombinasi.Â
Kalau yang di hash ada ratusan juta… bakalan panen hash collisions ini….. 😀
MD5 itu outputnya 128 bit, jadi ada 2^128 kombinasi = 3,4 x 10^38 ….. ratusan juta itu adalah 1 x 10^8, yah……  sejauh ini, terlalu kecil  kemungkinan hash collisionnya 🙂
Parabéns pela matéria. Valeu!
http://www.jociandre.com.br
sangat bermanfaat bangat tuch gan
thanks gan atas infonya, kebetulan saya lagi cari info kaya gini.
nice post
sangat bermanfaat banget gan makasih gann infonya
Terimaksih telah berbagi infonya, info yang sangat bagus . sukses selalu untuk info and websitenya and be the best to the next info..
If you want to aware your self with the new changes then you will really enjoy this post. Smart Phone
thanks gan atas infonya, kebetulan saya lagi cari info kaya gini.
makasi atas inpo yang sangat berguna banget bagi kami
turned out to be very instrumental mysql….thx
terimakasih atsa info nya gan , selamat siang semuahank u very much much much atas infonya gan 😉 http://goo.gl/u8z8HE | http://goo.gl/O2egrp | http://goo.gl/XmlGjZ
gambarnya kecil gan gak kliatan
bahasanya perlu di translate gan biar pada paham semua
Infonya sangat bermanfaat gan..terima kasih
Nice post! The information you provided is very helpful if someone is planning to Digitalized our Business. I think Digitalopment is a better way to get all the information about Digital Marketing.
Before choosing any SEO company you need to know their past record although you can check it by looking at their portfolio which is great to know besides this you can also look for their experience level as by this you will get the brief idea about their services.
I am inspired by your post writing style & how continuously you describe this topic. Desert Safari in Sharjah
Nice post! The information you provided is very helpful if someone is planning to enjoy the holidays. I think ALNoorIsland is a better way to spend the joyful vacations.
Thank you for your sharing. I am worried that I lack creative ideas. It is your article that makes me full of hope. Thank you. But, I have a question, can you help me? https://www.binance.com/join?ref=T7KCZASX