HTTP JSON AlsoSQL interface to Drizzle

So… I had another one of those “hrrm… this shouldn’t be hard to hack a proof-of-concept” moments. Web apps are increasingly speaking JSON all around the place. Why can’t we speak JSON to/from the database? Why? Seriously, why not?

One reason why MongoDB has found users is that JSON is very familiar to people. It has gained popularity in spite of having pure disregard for the integrity and safety of your data.

So I started with a really simple idea: http server in the database server. Thanks to the simple code to do that with libevent, I got that going fairly quickly. Finding a rather nice C++ library to create and parse JSON was the next challenge. I found JSONcpp, a public domain library with a nice API and proceeded to bring it into the tree (it’s not much code). I then created a simple way to find out the version of the Drizzle server you were speaking to:

$ curl http://localhost:8765/0.1/version
{
   "version" : "2011.04.15.2285"
}

But that wasn’t nearly enough… I also wanted to be able to issue arbitrary queries. Thanks to the supporting code we have in the Drizzle server for EXECUTE() (also used by the replication slave), this was also pretty easy. I created a way to execute the content of a HTTP POST request as if you had done so with EXECUTE() – all nicely wrapped in a transaction.

I created a simple table using the drizzle client, connecting over a normal TCP socket speaking the MySQL protocol and inserted a row in it:

$ ../client/drizzle --port 9306 test
Welcome to the Drizzle client..  Commands end with ; or \g.
Your Drizzle connection id is 4
Connection protocol: mysql
Server version: 2011.04.15.2285 Source distribution (json-interface)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

drizzle> show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `a` INT NOT NULL AUTO_INCREMENT,
  `b` VARCHAR(100) COLLATE utf8_general_ci DEFAULT NULL,
  PRIMARY KEY (`a`)
) ENGINE=InnoDB COLLATE = utf8_general_ci
1 row in set (0.001209 sec)

drizzle> insert into t1 (b) values ("from mysql protocol");
Query OK, 1 row affected (0.00207 sec)

Now to select rows from it via HTTP and get a JSON object back with the result set:

$ curl http://localhost:8765/0.1/sql --data 'select * from t1;'
{
   "query" : "select * from t1;",
   "result_set" : [
      [ "1", "from mysql protocol" ],
      [ "", "" ]
   ],
   "sqlstate" : "00000"
}

I can also insert more rows using the HTTP interface and then select them from the MySQL protocol interface:

$ curl http://localhost:8765/0.1/sql --data 'insert into t1 values (NULL, \"from HTTP!\");'
{
   "query" : "insert into t1 values (NULL, \\\"from HTTP!\\\");",
   "sqlstate" : "00000"
}

drizzle> select * from t1;
+---+---------------------+
| a | b                   |
+---+---------------------+
| 1 | from mysql protocol | 
| 2 | from HTTP!          | 
+---+---------------------+
2 rows in set (0.000907 sec)

So what does this get us? With the addition of proper authentication, you could start doing some really quite neat and nifty things. I imagine we could add interfaces to avoid SQL and directly do key lookups, table scans and index range scans, giving really quite sweet performance. We could start building web tools to manage and manipulate the database speaking the native language of the web.

But… there’s more!

Since we have a web server and a way to execute queries via HTTP along with getting the result set as JSON, why can’t we have a simple Web UI for monitoring the database server and running queries built into the database server?

Yes we can.

If you wanted a WHERE condition or anything else, easy. Change the query, hit execute:

No TCP connection or parsing the MySQL protocol or anything. Just HTTP requests straight to the database server from the browser with a bit of client side javascript producing the HTML for the table.

Proof of concept code is up on launchpad in lp:~stewart/drizzle/json-interface

HailDB: A NoSQl API Direct to InnoDB

At the MySQL Conference and Expo last week I gave a session on HailDB. I’ve got the slides up on slideshare so you can either view through them or download them. I think the session went well, and there certainly is some interest in HailDB out there (which is great!).

Speaking on Tuesday: HailDB and Dropping ACID: Eating Data in a Web 2.0 Cloud World

I’m giving two talks tomorrow (Tuesday) at the MySQL Conference and Expo:

HailDB: A NoSQL API direct to InnoDB, 2:00pm, Ballroom D

Dropping ACID: Eating Data In A Web 2.0 Cloud World 3:05pm, Ballroom G

The HailDB talk is all about a C API to embed an InnoDB based relational database engine into your application. Awesome stuff (also nice and technical).

The second talk, “Dropping ACID: Eating Data in a Web 2.0 Cloud World” is not only a joke that only database people get, but a humorous and serious look at data integrity and reliability as promised by the current hype. This was quite well received at linux.conf.au in January. So, if you weren’t in Australia in January this year, then certainly come along and see how you go heckling an Australian.

innodb and memcached

I had a quick look at the source tree (I haven’t compiled it, just read the source – that’s what I do. I challenge any C/C++ compiler to keep up with my brain!) that’s got a tarball up on labs.mysql.com for the memcached interface to innodb. A few quick thoughts:

  • Where’s the Bazaar tree on launchpad? I hate pulling tarballs, following the dev tree is much more interesting from a tech perspective (especially for early development releases). I note that the NDB memcached stuff is up on launchpad now, so yay there. I would love it if the InnoDB team in general was much more open with development, especially with having source trees up on launchpad.
  • It embeds a copy of the memcached server engines branch into the MySQL tree. This is probably the correct way to go. There is no real sense in re-implementing the protocol and network stack (this is about half what memcached is anyway).
  • The copy of the memcached engine branch seems to be a few months old.
  • The current documentation appears to be the source code.
  • The innodb_memcached plugin embeds a memcached server using an API to InnoDB inside the MySQL server process (basically so it can access the same instance of InnoDB as a running MySQL server).
  • There’s a bit of something that kind-of looks similar to the Embedded InnoDB (now HailDB) API being used to link InnoDB and memcached together. I can understand why they didn’t go through the MySQL handler interface… this would be bracing to say the least to get correct. InnoDB APIs, much more likely to have fewer bugs.
  • If this accepted JSON and spat it back out… how fast would MongoDB die? weeks? months?
  • The above dot point would be a lot more likely if adding a column to an InnoDB table didn’t involve epic amounts of IO.
  • I’ve been wanting a good memcached protocol inside Drizzle, we have ,of course, focused on stability of what we do have first. That being said…. upgrade my flight home so I can open a laptop… probably be done well before I land….. (assuming I don’t get to it in the 15 other awesome things I want to hack on this week)