The rotating blades database benchmark

(and before you ask, yes “rotating blades” comes from “become a fan”)

I’m forming the ideas here first and then we can go and implement it. Feedback is much appreciated.

Two tables.

Table one looks like this:

CREATE TABLE fan_of (
user_id BIGINT,
item_id BIGINT,
PRIMARY KEY (user_id, item_id),
INDEX (item_id)
);

That is, two columns, both 64bit integers. The primary key covers both columns (a user cannot be a fan of something more than once) and can be used to look up all things the user is a fan of. There is also an index over item_id so that you can find out which users are a fan of an item.

The second table looks like this:

CREATE TABLE fan_count (
item_id BIGINT PRIMARY KEY,
fans BIGINT
);

Both tables start empty.

You will have 1000, 2000,4000 and 8000 concurrent clients attempting to run the queries. These concurrent clients must behave as if they could be coming from a web server. The spirit of the benchmark is to have 8000 threads (or processes) talk to the database server independent of each other.

The following set of queries will be run a total of 23,000,000 (twenty three million) times. The my_user_id below is an incrementing ID per connection allocated by partitioning 23,000,000 evenly between all the concurrent clients (e.g. for 1000 connections each connection gets 23,000 sequential ids)

You must run the following queries.

  • How many fans are there of item 12345678 (e.g. SELECT fans FROM fan_count WHERE item_id=12345678)
  • Is my_user_id already a fan of item 12345678 (e.g. SELECT user_id FROM fan_of WHERE user_id=my_user_id AND item_id=12345678)
  • The next two queries MUST be in the same transaction:
    • my_user_id becomes a fan of item 12345678 (e.g. INSERT INTO fans (user_id,item_id) values (my_user_id, 12345678))
    • increment count of fans (e.g. UPDATE fan_count SET fans=fans+1 WHERE item_id=12345678)

For the first query you are allowed to use a caching layer (such as memcached) but the expiry time must be 5 seconds or less.

You do not have to use SQL. You must however obey the transaction boundary above. The insert and the update must be part of the same transaction.

Results should include: min, avg, max response time for each query as well as the total time to execute the benchmark.

Data must be durable to a machine being switched off and must still be available with that machine switched off. If committing to local disk, you must also replicate to another machine. If running asynchronous replication, the clock does not stop until all changes have been applied on the slave. If doing asynchronous replication, you must also record the replication delay throughout the entire test.

In the event of timeout or deadlock in doing the insert and update part, you must go back to the first query (how many fans) and retry. Having to retry does not count towards the 23,000,000 runs.

At the end of the benchmark, the query SELECT fans FROM fan_count WHERE item_id=12345678 should return 23,000,000.

Yes, this is a very evil benchmark. It seems to be a bit indicative about the kind of peak load that can be experienced by a bunch of Web 2.0 sites that have a “like” or “become a fan” style buttons. I fully expect the following:

  • Pretty much all systems will nosedive in performance after 1000 concurrent clients
  • Transaction rollbacks due to deadlock detection or lock wait timeouts will be a lot.
  • Many existing systems and setups not complete it in reasonable time.
  • A solution using Scale Stack to be an early winner (backed by MySQL or Drizzle)
  • Somebody influenced by Domas turning InnoDB deadlock detection off very quickly.
  • Somebody to call this benchmark “stupid” (that person will have a system that fails dismally at this benchmark)
  • Somebody who actually has any knowledge of modern large scale web apps to suggest improvements
  • Nobody even attempting to benchmark the Oracle database
  • Somebody submitting results with MySQL to not wait until the replication stream has finished applying.
  • Some NoSQL systems to suck considerably more than their SQL counterparts.

Storage Engine API: write_row, CREATE SELECT and DDL

(this probably applies exactly the same for MySQL and Drizzle… but I’m just speaking about current Drizzle here)

In my current merge request for the embedded-innodb-create-select-transaction-arrgh branch (also see this specific revision), you’ll notice an odd hoop that we have to jump through to make CREATE SELECT statements work with an engine such as InnoDB.

Basically, this is what happens:

  • start transaction
  • start executing SELECT QUERY (well, prepare executing it and fetch a row)
  • create table
  • attempt to insert into table

But… we have to do the DDL statement (i.e. the CREATE TABLE) in its own transaction. This means that the outer transaction (running the SELECT) shouldn’t be able to see it. Except it does. We can create a cursor on this table. However, when we try and do something with it (e.g. ib_cursor_first()) we then get the error message DB_MISSING_HISTORY from InnoDB. With a data dictionary that was REPEATABLE READ, we shouldn’t have this problem. However, we don’t have that.

So? What do we do? If we’re in ::write_row and we get an error and we’re running a SQLCOM_CREATE_TABLE sql_command (yes, we get to poke into current_session->lex->sql_command to find this out) we just magically restart the transaction so that we can (properly) see the created table and write rows to it.

This is not a sane part of the interface; it won’t be an issue for many engines but it is needed here.

This blog post (but not the whole blog) is published under the Creative Commons Attribution-Share Alike License. Attribution is by linking back to this post and mentioning my name (Stewart Smith).

Announcing HailDB

I just announced our continuation of the Embedded InnoDB project under the name of HailDB. Check out the announcement over at http://www.haildb.com/.

HailDB is a relational database that is embeddable within applications. You embed HailDB by linking to a shared library and calling a clean and simple API. HailDB is a continuation of the Embedded InnoDB project. It is not itself a database server, but is a library implementing the storage layer. With the addition of the HailDB plugin to Drizzle you get a full SQL interface.

Read more at http://www.haildb.com

Embedded InnoDB is in the tree!

Well… the start of it :)

I’ve taken the approach of taking tiny incremental steps (and getting review for each step) in implementing a Storage Engine based on the Embedded InnoDB library. What hit lp:drizzle (the trunk branch, for the 2010-04-07 milestone tarball) is only a handful of these small steps, so this engine is not remotely ready for end users.

There should be more of my Embedded InnoDB work hitting the tree in the upcoming days/weeks, enough to get it to a satte that one could describe as functional :)

Storing the table message in Embedded InnoDB

One of the exciting things[1] about working on a storage engine in Drizzle is that you get to manage your own metadata. When the database engine you’re writing the storage engine interface for has a pretty complete data dictionary (e.g. Embedded InnoDB) you could just directly use it. At some point I plan to do this for the embedded_innodb engine for Drizzle so that you could just point Drizzle at an existing Embedded InnoDB database and run SQL queries on it.

The Drizzle table message does have some things in it that aren’t in the InnoDB data dictionary though (e.g. table and column comments). We want to preserve these (and also things like there may be several data types in Drizzle that map to the same data type in InnoDB). Since the Embedded InnoDB API allows us to do things within the DDL transaction (such as insert a row into a table), we store the serialized table message in a table as part of the DDL transaction. This means we can have fully crash safe DDL! There is no way the table definition can get out of sync with what is in InnoDB; we are manipulating them both in the same transaction!

The table structure we’re using is pretty simple. There is two columns: table_name VARCHAR(IB_MAX_TABLE_NAME_LEN) and message BLOB.

The operations we need are:

  • store the table message in doCreateTable (INSERT)
  • rename the table message in doRenameTable (UPDATE the table_name column)
  • delete the table message in doDropTable (DELETE)
  • list tables in a database (SELECT with prefix)
  • get table message (SELECT using key lookup)

All of which are pretty easy to implement using the Embedded InnoDB API.

[1] Maybe I need to get out more….

Embedded InnoDB: InnoDB Status

Using the Embedded InnoDB plugin I’m working on, you can use the INNODB_STATUS table function in the data_dictionary, you can do pretty neat things.

For example, we can see that each autocommit transaction causes an fsync and if you insert multiple rows ina  single statement, you still only get 1 fsync:

drizzle> SELECT * FROM DATA_DICTIONARY.INNODB_STATUS
    -> WHERE name="fsync_req_done";
+----------------+-------+
| NAME           | VALUE |
+----------------+-------+
| fsync_req_done |    25 |
+----------------+-------+
1 row in set (0 sec)

drizzle> insert into t1 values (1);
Query OK, 1 row affected (0.05 sec)

drizzle> SELECT * FROM DATA_DICTIONARY.INNODB_STATUS WHERE name="fsync_req_done";
+----------------+-------+
| NAME           | VALUE |
+----------------+-------+
| fsync_req_done |    26 |
+----------------+-------+
1 row in set (0 sec)

drizzle> insert into t1 values (1),(2),(3),(4);Query OK, 4 rows affected (0 sec)
Records: 4  Duplicates: 0  Warnings: 0

drizzle> SELECT * FROM DATA_DICTIONARY.INNODB_STATUS WHERE name="fsync_req_done";
+----------------+-------+
| NAME           | VALUE |
+----------------+-------+
| fsync_req_done |    27 |
+----------------+-------+
1 row in set (0 sec)

Embedded InnoDB: querying the configuration

I am rather excited about being able to do awesome things such as this to get the current configuration of your server:

drizzle> SELECT NAME,VALUE 
    -> FROM DATA_DICTIONARY.INNODB_CONFIGURATION
    ->  WHERE NAME IN ("data_file_path", "data_home_dir");
+----------------+-------+
| NAME           | VALUE |
+----------------+-------+
| data_file_path | NULL  | 
| data_home_dir  | ./    | 
+----------------+-------+
2 rows in set (0 sec)

drizzle> SELECT NAME,VALUE
    -> FROM DATA_DICTIONARY.INNODB_CONFIGURATION
    -> WHERE NAME IN ("data_file_path", "data_home_dir");
+----------------+-------+
| NAME           | VALUE |
+----------------+-------+
| data_file_path | NULL  | 
| data_home_dir  | ./    | 
+----------------+-------+
2 rows in set (0 sec)

drizzle> SELECT NAME,VALUE 
    -> FROM DATA_DICTIONARY.INNODB_CONFIGURATION 
    -> WHERE NAME = "io_capacity";
+-------------+-------+
| NAME        | VALUE |
+-------------+-------+
| io_capacity | 200   | 
+-------------+-------+
1 row in set (0 sec)

Coming soon: status in a table.

(this is for the upcoming embedded_innodb plugin, which using the API provided by Embedded InnoDB to implement a Storage Engine for Drizzle)