{"id":3304,"date":"2013-05-13T15:34:39","date_gmt":"2013-05-13T05:34:39","guid":{"rendered":"http:\/\/www.flamingspork.com\/blog\/?p=3304"},"modified":"2013-05-13T15:40:31","modified_gmt":"2013-05-13T05:40:31","slug":"the-mysql-cluster-storage-engine","status":"publish","type":"post","link":"https:\/\/www.flamingspork.com\/blog\/2013\/05\/13\/the-mysql-cluster-storage-engine\/","title":{"rendered":"The MySQL Cluster storage engine"},"content":{"rendered":"<p>This is one close to my heart. I&#8217;ve recently written on other storage engines:\u00c2\u00a0<a title=\"Permalink to Where are they now: MySQL Storage Engines\" href=\"http:\/\/www.flamingspork.com\/blog\/2013\/04\/18\/where-are-they-now-mysql-storage-engines\/\" rel=\"bookmark\">Where are they now: MySQL Storage Engines<\/a>,\u00c2\u00a0<a title=\"Permalink to The MERGE storage engine: not dead, just resting\u00e2\u20ac\u00a6. or forgotten.\" href=\"http:\/\/www.flamingspork.com\/blog\/2013\/04\/19\/the-merge-storage-engine-not-dead-just-resting-or-forgotten\/\" rel=\"bookmark\">The MERGE storage engine: not dead, just resting\u00e2\u20ac\u00a6. or forgotten<\/a>\u00c2\u00a0and <a href=\"http:\/\/www.flamingspork.com\/blog\/2013\/04\/20\/the-memory-storage-engine\/\">The MEMORY storage engine<\/a>. Today, it&#8217;s the turn of MySQL Cluster.<\/p>\n<p>Like InnoDB, MySQL Cluster started outside of MySQL. Those of you paying attention at home may notice a correlation between storage engines not written exclusively for MySQL and being at all successful.<\/p>\n<p>NDB (for Network DataBase) started inside Ericsson, originally written in a language called <a href=\"https:\/\/en.wikipedia.org\/wiki\/PLEX_(programming_language)\">PLEX<\/a>, which was internal to Ericsson and used in the AXE telephone switches. Mikael Ronstrom&#8217;s PHD thesis covered NDB and even covered things that (at least were) yet to be implemented (it&#8217;s been quite a few years since I leafed through it last). The project at Ericsson (IIRC) was shelved a couple of times, but eventually got spun out into an Ericsson Business Innovation company called <a href=\"http:\/\/www.ericsson.com\/news\/859450\">Alzato<\/a>.<\/p>\n<p>Some remnants of PLEX can still be found in the NDB source code (if you look really hard that is). At some point the code was fed through a PLEX to C++ converter and development continued from there. Some of the really, really old parts of the source may seem weird either due to this or some hand optimization for SPARC processors in the 1990s.<\/p>\n<p>In 2003, MySQL AB acquired Alzato and work on a storage engine plugin for MySQL to interface to the (C++ API only) NDB was underway. Seeing as the storage engine interface was so simple, easy and modular it would only take several years for the interface to NDB to become mature.<\/p>\n<p>The biggest problem: NDB itself worked really well if your workload fit exactly what it was good at&#8230; if you deviated, horrific performance and\/or crashes were not as uncommon as we&#8217;d have liked. This was a source of strain for many years with the developers and support team on one side and some of the less-than-careful sales team on the other. That being said, there have been some absolutely awesome sales people selling NDB into markets it truly fits, and this is why there&#8217;s barely a place in the world where placing a mobile phone call doesn&#8217;t go through MySQL Cluster at some point.<\/p>\n<p>You should read Tomas Ulin&#8217;s post\u00c2\u00a0<a title=\"Permalink to Celebrating 10 years @MySQL\" href=\"http:\/\/insidemysql.com\/celebrating-10-years-mysql\/\" rel=\"bookmark\">Celebrating 10 years @MySQL<\/a>\u00c2\u00a0for a bit of an insight into how Alzato became part of MySQL AB (which later became part of Sun which became part of Oracle).<\/p>\n<p>I joined the MySQL Cluster team at MySQL in December 2004, not too long after Alzato was acquired, but certainly when the NDB storage engine in MySQL 4.1 was in its very early stages &#8211; it was then by no means a general purpose database.<\/p>\n<p>Over the years, MySQL Cluster gained both traction and features, making it useful for more applications. One of the biggest marketing successes of MySQL was the storage engine architecture and how you could just &#8220;plug in&#8221; different engines. The reality (of course) was far different and even though MySQL Cluster did just &#8220;plug in&#8221; to MySQL, it was certainly not a drop in replacement.<\/p>\n<p>In MySQL 5.0, a bunch of neat new features were added:<\/p>\n<ul>\n<li><span style=\"line-height: 15px;\">Engine condition pushdown<br \/>\nThis enabled conditions on non-indexed columns to be evaluated on the data nodes rather than having every row pulled up to the SQL node to be evaluated.<\/span><\/li>\n<li>Batched read interface<br \/>\nSo that queries like SELECT FOO FROM BAR WHERE A IN (1,2,3) were executed as a single network round trip rather than 3 round trips.<\/li>\n<li>Query cache<br \/>\nAlthough the query cache should die, hey, at least it worked with NDB now&#8230;. in a way.<\/li>\n<li>Reduced IndexMemory usage<br \/>\nRemember, NDB is an in-memory database, so saving a bunch of bytes for secondary indexes was a big thing.<\/li>\n<\/ul>\n<p>the first release with things I really worked on was MySQL 5.1. My first talk (to a packed room) at the MySQL User Conference in 2006 was on new features in MySQL Cluster 5.1. I&#8217;m still quite proud of that talk even though I know I am a much better speaker than I was then (It would have been great to have had more guidance&#8230; but hey, learning from experience is good too).<\/p>\n<p>We added a lot in 5.1:<\/p>\n<ul>\n<li><span style=\"line-height: 15px;\">Integration with replication<br \/>\nThis is where row based replication was born. It was a real team effort with the NDB kernel part (going from memory and bzr logs) having been written by Tomas and Jonas seems to have a bunch of code there too. I worked a bunch on the NDB Injector thread in mysqld, Mats worked on the core row based code (at the time the most C++ like code in the entire MySQL world). You could now have a cluster replicate to another cluster with the giant bottleneck that is MySQL replication.<\/span><\/li>\n<li>disk data<br \/>\nYou could store non-indexed columns on disk. I implemented the INFORMATION_SCHEMA.FILES table for this, I was young and naive enough to think that the InnoDB guys would also fill out this table and all would be happy with the world (I&#8217;m lucky I haven&#8217;t been holding my breath on this one).<\/li>\n<li>Variable Sized columns<br \/>\nA VARCHAR(255) would actually not always use more than 255bytes if you just stored a single character in it. Catch? Only for in-memory columns.<\/li>\n<li>User defined partitioning<br \/>\nBecause NDB desperately needed more options, we let the user choose how they wanted to partition up their data (per table).<\/li>\n<li>Autodiscovery of schema changes<br \/>\nThis was a giant workaround to the epic mess that is FRM files and data dictionary things inside the MySQL Server. It is because of all this code that when I went to rewrite the whole thing for Drizzle I took the approach of &#8220;just pass it down to the engines, the server must not attempt to know better&#8221;. FWIW, I&#8217;m still right: if the server tries to be clever you now have two places for bugs to be, not just one.<\/li>\n<li>Distribution awareness<br \/>\ni.e. better selection of which data node to talk to for a particular query, reducing latency.<\/li>\n<li>Online add\/drop index.<br \/>\nHow long did it take for other engines to get this? Let&#8217;s not think about that :)<\/li>\n<\/ul>\n<p>After that the really interesting stuff started to happen, that is, the first major fork of MySQL: MySQL Cluster Carrier Grade Edition (CGE). Why? We had customers that simply couldn&#8217;t wait for MySQL 6.0 (after all, they&#8217;d still be waiting).<\/p>\n<p>We had MySQL Cluster CGE 6.1, 6.2, 6.3 and now we&#8217;re into 7.0, 7.1 and 7.2. There is without doubt that it&#8217;s the longest serving and surviving MySQL fork. There were non-trivial changes inside the MySQL server too, which caused enough of a merge problem for the (small) Cluster team.<\/p>\n<p>One big thing that you&#8217;re probably still all waiting for? Replication conflict detection and resolution in circular\/multi-master replication setups. It was an NDB first and been used in production for a decent amount of time.<\/p>\n<p>I remember a hack while on an airplane led to the CompressedBackup and CompressedLCP options (used zlib when writing out checkpoints\/backups) &#8211; something that took more time than you&#8217;d think to go from prototype to production ready code.<\/p>\n<p>The last few things I worked on in MySQL Cluster before going and working full time on Drizzle was the Windows port, online add\/drop node and NDBINFO.<\/p>\n<p>I&#8217;ve left out so many cool MySQL Cluster things that were worked on over the years (e.g. online add\/drop column, rewriting of LCP code, micro GCPs, crash-safe DDL, the test suite). I really should mention the test suite, in lines of code it was over three times that of MyISAM.. and that was probably six years ago that I worked that out.<\/p>\n<p>One thing to think about: when Innobase Oy was bought by Oracle and there was this effort to have a transactional storage engine that was inside MySQL AB rather than another company, I pointed out that I thought it would take less time adding the needed features to NDB and integrating it inside the MySQL server binary (and with the addition of online add node you could go from stand alone DB server to a full cluster with no down time) than it would for any of the alternatives to get to a suitable level of maturity.<\/p>\n<p>I wish I put money on this&#8230; I put money on the MySQL 5.1 GA release date (which I was happy to loose), but in the years since you can see that InnoDB is still reigning supreme with all that came to replace it having fallen away for one reason or another. It&#8217;s still on track to have MySQL Cluster be the only real alternative (now also, funnily enough, owned by Oracle). I have to say, it&#8217;s kind of a hollow victory though, it would have been nice to see Falcon and PBXT be serious players in today&#8217;s market.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is one close to my heart. I&#8217;ve recently written on other storage engines:\u00c2\u00a0Where are they now: MySQL Storage Engines,\u00c2\u00a0The MERGE storage engine: not dead, just resting\u00e2\u20ac\u00a6. or forgotten\u00c2\u00a0and The MEMORY storage engine. Today, it&#8217;s the turn of MySQL Cluster. &hellip; <a href=\"https:\/\/www.flamingspork.com\/blog\/2013\/05\/13\/the-mysql-cluster-storage-engine\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[76,1,14],"tags":[511,628,218,54,518],"class_list":["post-3304","post","type-post","status-publish","format-standard","hentry","category-code","category-general","category-mysql","tag-falcon","tag-mysql","tag-mysql-cluster","tag-ndb","tag-pbxt"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p5a6n8-Ri","jetpack-related-posts":[{"id":904,"url":"https:\/\/www.flamingspork.com\/blog\/2007\/10\/16\/mysql-5122ish-stew1\/","url_meta":{"origin":3304,"position":0},"title":"MySQL 5.1.22(ish)-stew1","author":"Stewart Smith","date":"2007-10-16","format":false,"excerpt":"I've decided to publish my patch series. The goal of the -stew patches is to collect things I find interesting and that at some point could (should) make it into the main MySQL tree (even if others don't think so). It's not designed for use in production.. I don't really\u2026","rel":"","context":"In &quot;mysql&quot;","block_context":{"text":"mysql","link":"https:\/\/www.flamingspork.com\/blog\/category\/work-et-al\/mysql\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":710,"url":"https:\/\/www.flamingspork.com\/blog\/2006\/06\/06\/call-for-comments-on-mysql-online-backup-api-jay-pipes\/","url_meta":{"origin":3304,"position":1},"title":"Call for Comments on MySQL Online Backup API &#8211; Jay Pipes","author":"Stewart Smith","date":"2006-06-06","format":false,"excerpt":"Call for Comments on MySQL Online Backup API - Jay Pipes It's been interesting watching the ideas develop for online, consistent Backup for MySQL. I should expand that... consistent across storage engines. Other RDBMS vendors get it easy - they just have one storage engine to back up. We have\u2026","rel":"","context":"In &quot;mysql&quot;","block_context":{"text":"mysql","link":"https:\/\/www.flamingspork.com\/blog\/category\/work-et-al\/mysql\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1650,"url":"https:\/\/www.flamingspork.com\/blog\/2009\/05\/27\/pluggable-metadata-stores-or-the-revenge-of-table-discovery\/","url_meta":{"origin":3304,"position":2},"title":"Pluggable Metadata stores (or&#8230; the revenge of table discovery)","author":"Stewart Smith","date":"2009-05-27","format":false,"excerpt":"Users of the ARCHIVE or NDB storage engines in MySQL may be aware of a MySQL feature known as \"table discovery\". For ARCHIVE, you can copy the archive data file around between servers and it magically works (you don't need to copy the FRM). For MySQL Cluster (NDB) it works\u2026","rel":"","context":"In &quot;drizzle&quot;","block_context":{"text":"drizzle","link":"https:\/\/www.flamingspork.com\/blog\/category\/work-et-al\/drizzle-work-et-al\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":776,"url":"https:\/\/www.flamingspork.com\/blog\/2007\/01\/08\/ndb-ndb-the-storage-engine-for-me\/","url_meta":{"origin":3304,"position":3},"title":"NDB! NDB! The storage engine for me!","author":"Stewart Smith","date":"2007-01-08","format":false,"excerpt":"Today I set up a mysqld connected to my not-quite-HA cluster at home here to replicate from my MythTV database into cluster. The idea behind this is to eat an increasing amount of my own dogfood around the house. To do this, I also set up the MySQL Instance Manager\u2026","rel":"","context":"In &quot;mysql&quot;","block_context":{"text":"mysql","link":"https:\/\/www.flamingspork.com\/blog\/category\/work-et-al\/mysql\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":914,"url":"https:\/\/www.flamingspork.com\/blog\/2007\/11\/09\/mysql-5122-stew2\/","url_meta":{"origin":3304,"position":4},"title":"mysql-5.1.22-stew2","author":"Stewart Smith","date":"2007-11-09","format":false,"excerpt":"New: Updated NDB Compressed LCP and BACKUP patches (now with O_DIRECT support) InnoDB patch for Windows that should give ~5x improvement on commits\/sec (Bug31876) Everything in current telco-6.3 tree (ndb ~6.3.5) Lots of NDB improvements and new features over regular 5.1. WL3686 Remove read before update WL2680 NDB Batched Update\u2026","rel":"","context":"In &quot;mysql&quot;","block_context":{"text":"mysql","link":"https:\/\/www.flamingspork.com\/blog\/category\/work-et-al\/mysql\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1711,"url":"https:\/\/www.flamingspork.com\/blog\/2010\/04\/08\/alsosql\/","url_meta":{"origin":3304,"position":5},"title":"AlsoSQL","author":"Stewart Smith","date":"2010-04-08","format":false,"excerpt":"So there's a bit of a swelling around the idea of NoSQL. That is, databases that don't have an SQL interface in front of them - with the promise of better performance. With a well designed backend, this is no doubt the case. A flexible query language is rather useful\u2026","rel":"","context":"In &quot;drizzle&quot;","block_context":{"text":"drizzle","link":"https:\/\/www.flamingspork.com\/blog\/category\/work-et-al\/drizzle-work-et-al\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts\/3304","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/comments?post=3304"}],"version-history":[{"count":3,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts\/3304\/revisions"}],"predecessor-version":[{"id":3328,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts\/3304\/revisions\/3328"}],"wp:attachment":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/media?parent=3304"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/categories?post=3304"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/tags?post=3304"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}