{"id":1745,"date":"2009-12-09T20:50:10","date_gmt":"2009-12-09T10:50:10","guid":{"rendered":"http:\/\/www.flamingspork.com\/blog\/?p=1745"},"modified":"2009-12-10T11:26:59","modified_gmt":"2009-12-10T01:26:59","slug":"drizzle-frm-replacement-the-table-proto","status":"publish","type":"post","link":"https:\/\/www.flamingspork.com\/blog\/2009\/12\/09\/drizzle-frm-replacement-the-table-proto\/","title":{"rendered":"Drizzle FRM replacement: the table proto"},"content":{"rendered":"<p>Drizzle originally inherited the FRM file from MySQL (which inherited it from UNIREG). The FRM file stores metadata about a table; what columns it has, what type those columns are, what indexes, any default values, comments etc are all stored in the FRM. In the days of MyISAM, this worked relatively well. The row data was stored in table.MYD, indexes on top of it in table.MYI and information about the format of the row was<br \/>\nin table.FRM. Since MyISAM itself wasn&#8217;t crash safe, it didn&#8217;t really matter if creating\/deleting the FRM file along with the table was either.<\/p>\n<p>As more sophisticated engines were introduced (e.g. InnoDB) that had their own data dictionary, there started to be more of a problem. There were now two places storing information about a table: the FRM file and the data dictionary specific to the engine. Even if the data dictionary of the storage engine was crash safe, the FRM file was not plugged into that, so you could end up in a situation where the storage engine<br \/>\nrecovered from a crash okay, but the FRM was incorrect for what the engine recovered to. This would always require manual intervention to find out what went wrong and then fix it (in some rather unusual ways).<\/p>\n<p>When the MySQL Cluster (NDB) engine was introduced, a new set of problems arose. Now the MySQL server was connecting to an existing database, where tables could be created on other nodes connected to the cluster. You now not only had the problems of crash recovery, but the problems of keeping the FRM files in sync across many nodes, requiring<br \/>\nall sorts of interesting solutions that, for the most part, do work.<\/p>\n<p>The &#8220;obvious&#8221; solution to some of these problems would be for an engine to write out an FRM file itself. This is much easier said than done. The file format was never created to be read and written by multiple pieces of software, the code that did the reading and writing inside the server was not reusable elsewhere and the only documentation (that<br \/>\nwasn&#8217;t a decent chunk of the MySQL source tree) is the rather incomplete definition in the MySQL Internals wiki (http:\/\/forge.mysql.com\/wiki\/MySQL_Internals_File_Formats) &#8211; not nearly enough to write a correct FRM file as the specifics are very, very odd.<\/p>\n<p>Our goals for reworking the metadata system in Drizzle were: to allow engines to own their own metadata (removing any opportunity to have inconsistencies between the engine and the &#8216;FRM&#8217;) and for engines without their own data dictionary, to replace the FRM file format with something simple and well documented.<\/p>\n<p>One option was to use SQL as the standard storage format, but it is rather non-trivial and expensive to parse &#8211; especially if we were to use it as the preferred way of talking table definitions with storage engines. We had been looking at the protobuf library<br \/>\n(http:\/\/code.google.com\/p\/protobuf\/) ever since its first release and it has a number of very nice characteristics: a description language of a data structure that is then used to generate APIs for reading and writing it in a number of programming languages and a standard (documented) way to serialize the data structure.<\/p>\n<p>After a bit of discussion, we arrived at a good outline for the table definition proto. The current one can always be found in the Drizzle source tree at  drizzled\/message\/table.proto. The current format is very close to final (i.e. one that we&#8217;ll suppport upgrades from).<\/p>\n<p>The process of modifying the Drizzle code base so that it would write (and read) a file format different to the FRM isn&#8217;t worth going too much into here although there were some interesting hurdles to overcome. An interesting one was the FRM file contains a binary image of the default row for the table (which is in the row format that the server uses); we now store the default value for each column in the proto and generate the default row when we read the proto. Another interesting one was removing and refactoring &#8220;pack_flag&#8221; &#8211; the details of which should only be extracted from Jay or Stewart with a liberal application of fine ale.<\/p>\n<p>The end result is that we now have storage engines that are completely responsible for their own metadata. One example is the ARCHIVE engine. In the CREATE TABLE code path, the ARCHIVE storage engine gets the table definition in an object that represents the table proto. It can examine the parameters it needs to and then either store the proto directly, or convert it into its own format. Since ARCHIVE is simple, it just stores<br \/>\nthe table proto in a serialised form (using a standard function provided by the protobuf library) and stores it in the .ARZ data file for the table. This instantly makes the ARCHIVE storage engine crash safe for CREATE and DROP table as there is only 1 file on disk, so no two files to get out of sync.<\/p>\n<p>If an engine does not have its own data dictionary, it can still use the default implementation which just stores the serialised table proto in a file on disk.<\/p>\n<p>We can also now use this interface to move INFORMATION_SCHEMA into its own storage engine. This means we can remove a lot of special case code throughout the server for INFORMATION_SCHEMA and instead just have a INFORMATION_SCHEMA storage engine that says it has the following tables in the INFORMATION_SCHEMA database. Because the table definition is now in a documented format with a standard API, this becomes a relatively trivial exercise.<\/p>\n<p>What we&#8217;re all looking forward to is when the InnoDB data dictionary is linked into the new interface and we can have a truly crash safe database server.<\/p>\n<p>Another wonderful side effect is since we now have a standard data structure for representing a table definition, we can integrate this with the replication system. In the &#8220;near&#8221; future, we can represent a CREATE TABLE in the replication stream as a table proto and not the raw SQL. If you were wanting to apply the replication stream to a different database server, you then only have to write a table proto to SQL<br \/>\nconverter. If the target database system doesn&#8217;t do SQL at all, you could generate API calls to create the table.<\/p>\n<p>So we now have a rather flexible system in place, with the code implementing it being increasingly simple and possible to be &#8220;obviously correct&#8221;.<\/p>\n<p>Things that easily fall out of this work that people have written about:<br \/>\n&#8211; CREATE TABLE LIKE with ENGINE clause<br \/>\n<a href=\"http:\/\/krow.livejournal.com\/671235.html\"> http:\/\/krow.livejournal.com\/671235.html<\/a><br \/>\n&#8211; table_raw_reader &#8211; looking at the raw representation of table metadata<br \/>\n<a href=\"http:\/\/www.flamingspork.com\/blog\/2009\/10\/01\/table_raw_reader-reading-the-table-proto-from-disk-and-examining-everything\/\"> http:\/\/www.flamingspork.com\/blog\/2009\/10\/01\/table_raw_reader-reading-the-table-proto-from-disk-and-examining-everything\/<\/a><br \/>\n&#8211; Table discovery<br \/>\n<a href=\"http:\/\/www.flamingspork.com\/blog\/2009\/07\/29\/table-discovery-for-drizzle-take-2-now-merged\/\"> http:\/\/www.flamingspork.com\/blog\/2009\/07\/29\/table-discovery-for-drizzle-take-2-now-merged\/<\/a><\/p>\n<p>Some more info:<br \/>\n<a href=\"http:\/\/krow.livejournal.com\/642329.html\"> http:\/\/krow.livejournal.com\/642329.html<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Drizzle originally inherited the FRM file from MySQL (which inherited it from UNIREG). The FRM file stores metadata about a table; what columns it has, what type those columns are, what indexes, any default values, comments etc are all stored &hellip; <a href=\"https:\/\/www.flamingspork.com\/blog\/2009\/12\/09\/drizzle-frm-replacement-the-table-proto\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[75,1,7],"tags":[202,157,70,87,203,131,628,54,185,201],"class_list":["post-1745","post","type-post","status-publish","format-standard","hentry","category-drizzle-work-et-al","category-general","category-work-et-al","tag-archive","tag-cluster","tag-drizzle","tag-frm","tag-information_schema","tag-myisam","tag-mysql","tag-ndb","tag-protobuf","tag-unireg"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p5a6n8-s9","jetpack-related-posts":[{"id":1262,"url":"https:\/\/www.flamingspork.com\/blog\/2008\/11\/22\/drop-table-fail-on-the-road-to-removing-the-frm\/","url_meta":{"origin":1745,"position":0},"title":"drop table fail (on the road to removing the FRM)","author":"Stewart Smith","date":"2008-11-22","format":false,"excerpt":"So... in removing the FRM file in Drizzle, I found a bit of a nugget on how drop table works (currently in the MySQL server and now \"did\" in Drizzle). If you DROP TABLE t1; this is what happens open the .frm file read first 10bytes (oh, and if you\u2026","rel":"","context":"In &quot;drizzle&quot;","block_context":{"text":"drizzle","link":"https:\/\/www.flamingspork.com\/blog\/category\/work-et-al\/drizzle-work-et-al\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":2323,"url":"https:\/\/www.flamingspork.com\/blog\/2011\/03\/17\/things-ive-done-in-drizzle\/","url_meta":{"origin":1745,"position":1},"title":"Things I&#8217;ve done in Drizzle","author":"Stewart Smith","date":"2011-03-17","format":false,"excerpt":"When writing my Dropping ACID: Eating Data in a Web 2.0 Cloud World talk for LCA2011 I came to the realisation that I had forgotten a lot of the things I had worked on in MySQL and MySQL Cluster. So, as a bit of a retrospective as part of the\u2026","rel":"","context":"In &quot;drizzle&quot;","block_context":{"text":"drizzle","link":"https:\/\/www.flamingspork.com\/blog\/category\/work-et-al\/drizzle-work-et-al\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1650,"url":"https:\/\/www.flamingspork.com\/blog\/2009\/05\/27\/pluggable-metadata-stores-or-the-revenge-of-table-discovery\/","url_meta":{"origin":1745,"position":2},"title":"Pluggable Metadata stores (or&#8230; the revenge of table discovery)","author":"Stewart Smith","date":"2009-05-27","format":false,"excerpt":"Users of the ARCHIVE or NDB storage engines in MySQL may be aware of a MySQL feature known as \"table discovery\". For ARCHIVE, you can copy the archive data file around between servers and it magically works (you don't need to copy the FRM). For MySQL Cluster (NDB) it works\u2026","rel":"","context":"In &quot;drizzle&quot;","block_context":{"text":"drizzle","link":"https:\/\/www.flamingspork.com\/blog\/category\/work-et-al\/drizzle-work-et-al\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":4180,"url":"https:\/\/www.flamingspork.com\/blog\/2016\/09\/27\/mysql-removes-the-frm-7-years-after-drizzle-did\/","url_meta":{"origin":1745,"position":3},"title":"MySQL removes the FRM (7 years after Drizzle did)","author":"Stewart Smith","date":"2016-09-27","format":false,"excerpt":"The new MySQL 8.0.0 milestone release that was recently announced brings something that has been a looooong time coming: the removal of the FRM file. I was the one who implemented this in Drizzle way back in 2009 (July 28th 2009 according to Brian)- and I may have had a\u2026","rel":"","context":"In &quot;code&quot;","block_context":{"text":"code","link":"https:\/\/www.flamingspork.com\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1251,"url":"https:\/\/www.flamingspork.com\/blog\/2008\/11\/06\/goodbye-frm-or-at-least-the-steps-to-it\/","url_meta":{"origin":1745,"position":4},"title":"Goodbye FRM (or at least the steps to it)","author":"Stewart Smith","date":"2008-11-06","format":false,"excerpt":"Since before MySQL was MySQL, there has been the .FRM file. Of course, what it really wanted to be was \".form\" -\u00c2\u00a0 a file that stored how to display a form on your (green) CRT. Jan blogged earlier in the year on this still being there, even in MySQL 5.1\u2026","rel":"","context":"In &quot;drizzle&quot;","block_context":{"text":"drizzle","link":"https:\/\/www.flamingspork.com\/blog\/category\/work-et-al\/drizzle-work-et-al\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":1576,"url":"https:\/\/www.flamingspork.com\/blog\/2009\/02\/25\/no-implicit-defaults\/","url_meta":{"origin":1745,"position":5},"title":"No implicit defaults","author":"Stewart Smith","date":"2009-02-25","format":false,"excerpt":"See also: MySQL Bug 43151 The MySQL Manual proudly states that you don't get implicit default values if strict mode. mysql> set sql_mode='STRICT_ALL_TABLES'; Query OK, 0 rows affected (0.00 sec) mysql> create table t1 (a int, b int, c int, d int, e int, f int, g int, h int,i\u2026","rel":"","context":"In &quot;drizzle&quot;","block_context":{"text":"drizzle","link":"https:\/\/www.flamingspork.com\/blog\/category\/work-et-al\/drizzle-work-et-al\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"jetpack_likes_enabled":true,"_links":{"self":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts\/1745","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/comments?post=1745"}],"version-history":[{"count":3,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts\/1745\/revisions"}],"predecessor-version":[{"id":1766,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/posts\/1745\/revisions\/1766"}],"wp:attachment":[{"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/media?parent=1745"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/categories?post=1745"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.flamingspork.com\/blog\/wp-json\/wp\/v2\/tags?post=1745"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}