Ghosts of MySQL past, part 8.1: Five Years

With many apologies to David Bowie, come 2009 it was my 5 year anniversary with Sun (well, MySQL AB and then Sun). Companies tend to like to talk about how they like to retain employees and that many employees stay with the company for a long time. It is very, very expensive to hire the right people, so this is largely a good plan.

In 2009, it was five years since I joined MySQL AB – something I didn’t quite originally expect (let’s face it, in the modern tech industry you’re always surprised when you’re somewhere for more than a few years).

The whole process of having been with sun for 5 years seemed rather impersonal… it largely felt like a form letter automatically sent out with a certificate and small badge (below). You also got to go onto a web site and choose from a variety of gifts. So, what did I choose? The one thing that would be useful at Burning Man: a camelbak.

5 year badge for Sun Microsystems

5 year badge for Sun Microsystems

Ghosts of MySQL Past, Part 8: The First Fork.

This is the 8th installment in the rather long series that started with Part 1 about a month ago.

Back in 2006, we were in the situation where MySQL 5.0 had taken forever, and the first “GA” release was not suitable for production. Looking towards MySQL 5.1, it was also unlikely to be out any time soon. The MySQL Cluster team had customers that needed new features in a stable release. The majority of users didn’t use the MySQL server at all, they directly used the C++ NDB API for the vast majority of queries – so the vast majority of release blocker bugs in the MySQL server would not affect the production readiness of MySQL Cluster for these customers.

So, the decision was wisely made to do separate releases from a separate tree for MySQL Cluster. This was named MySQL Cluster: Carrier Grade Edition and exists to this day.

The main use case for MySQL Cluster at this time was running telephone networks, specifically the Home Location Registry databases of GSM phone networks. Basically, you need to keep a database of which tower each subscriber is associated with so when you go to make a phone call (or SMS) the network can properly route the call. This means there’s some realtime response requirements and hardcore availabilty which demands a special type of database.

NDB has a long history (some of it detailed in a previous post), but for those kind of interested in internals, I’ll quote Frazer Clement (now a long time MySQL Cluster developer although I completely forget which year he joined the team, which is just slightly embarrassing):

…Erlang and Ndb Cluster share some Plex heritage, which can still be seen in their architectures today. Since Plex, Erlang has mated with Prolog, and Ndb Cluster was involved in a car crash with C++.

The customers for MySQL Cluster were not the ones who bought the $5000 support option… Typically, paying for the addition of a relatively major feature was considered quite okay and even “normal”. After all, the effort that goes into constructing the entire software stack of a large cell phone network is rather large, and MySQL Cluster would end up being a very small part of that.

Many major features went into MySQL Cluster first, sometimes years before they made the general MySQL Server: row based replication, circular replication with conflict resolution and online DDL (add/drop index and column). In fact, it kind of incredibly frustrates me that we solved online add column in NDB so many years ago and you still can’t add a column to an InnoDB table without some serious planning.

The key thing about MySQL Cluster releases? They happened. They were also regular, addressing customer issues and new major versions brought features that worked for the use cases of those who needed them.

Interestingly enough, you may recognize the person who ran the MySQL Cluster team as the same person who has overseen the now regular release cycle of the MySQL Server itself (and has now been in the MySQL world for over ten years).

Ghosts of MySQL Past, Part 7.1: Hey look, I found an old business card

Digging through piles of old stuff in the house, I came across a nice little artifact of MySQL history, an old business card. One benefit of a small company is that you can tend to get your first name @company.com, which of course, I had.

IMG_20140310_111500I wonder which other MySQL AB employees still have a business card laying around somewhere?

Ghosts of MySQL Past, Part 7: PBXT

Recently, I’ve been writing based on my linux.conf.au 2014 talk, which you can watch the recording of. Also see Part 1, Part 2, Part 3, Part 4, Part 5 and Part 6. My feed feel off Planet MySQL for a bit so you may have missed those posts – so feel free to go on a trip down memory lane before returning here for, well, more of a trip down memory lane.

At the start of 2005, Paul McCullagh started working on a new transactional storage engine for MySQL. He announced it in early 2006, so the majority of initial development was done before Oracle bought InnoBase Oy.

It’s at this point I should correct myself from my Part 5 where I was talking about when Maria started – it was actually at the start of 2005 when the project started, about 10 months before InnoDB Friday. I think this was mostly small scale work at first, before months later having a larger focus on it.

But anyway, let’s talk about PBXT! It had a different architecture than InnoDB and thus could have an interesting set of performance characteristics. The shortest way to describe the original architecture is “kind of log based”. Rows are written to a log file and there’s a handle file that points to where the rows are in the log.

Also, the design had both a fixed size and a variable sized part of the row. The fixed size part was stored alongside the handle in the record data file, so it was incredibly quick at getting the fixed size part of the row.

Not having to write data twice and having really quick access to some columns gave PBXT a quite decent performance advantage for many work loads.

The BLOB Streaming feature that was also worked on (originally just for PBXT and then for any engine) was quite ahead of its time. Think HandlerSocket but think of it done more correctly. HandlerSocket is an awful binary protocol that supports very limited operations and uses two TCP ports (one for reads, one for writes). The blob streaming used HTTP, something that anyone could use and manipulate and proxy and whatever you’d want.

If we look at Paul’s Top 5 Wishes for storage engines, an engine test suite and sane APIs (or at least documented) would clearly have helped. Having to do API tracing to discover when you should start a transaction is probably not the best way to attract developers – the amount of time that could have been saved given better interfaces in the server was immense (and not just for PBXT developers).

So, why aren’t we all using PBXT now? Well… if MySQL had shipped with PBXT, it would possibly have been a different story. There was (and indeed is) a very uneven playing field for MySQL storage engines. If you’re in the main MySQL tree then you’re everywhere. If not, then you’ve got a heck of a lot of work on packaging and keeping up to date with various API changes (not to mention ABI, which since we’re talking C++ and since we’re talking MySQL, changed a lot). With InnoDB being “good enough” for many and actually being forced to improve due to challengers such as PBXT (as well as being in the tree) it just continued to have the majority of the users… and it’s not easy making money as a storage engine vendor – and developing a storage engine does cost money.

With no more than a couple of people working on it at any one time, it’s amazing that PBXT had such a big influence – and we can likely credit many InnoDB performance improvement to PBXT being so much better in some areas.

An interesting side story: I was sending some build fixes/complier warning fixes and other minimal patches to Paul when MySQL belonged to Sun and he mentioned that perhaps it was time he had a contributor agreement. I pointed out the Sun contributor agreement as an example of one that could be used, and maybe he could just replace “Sun Microsystems” with “PrimeBase” and I could submit that to the Sun system – at the very least, it would be amusing. Paul said that sounded like a great idea and I submitted Sun’s contributor agreement (which it expected outside contributors to agree to) to Sun for them to agree to.

Well… a few weeks later I got a response, and Sun wasn’t exactly keen on it – until I pointed out a few times that it was in fact their agreement and then finally it was all okay.

If your company wants people to agree to a contributor agreement: see if they’d agree to it if it was for someone else. It was an interesting exercise.

Further reading:

Ghosts of MySQL Past, Part 6: The engine revs

This week I’ve been writing based on my linux.conf.au 2014 talk, which you can watch the recording of. Also see Part 1, Part 2, Part 3, Part 4 and Part 5. My feed feel off Planet MySQL for a bit so you may have missed those posts.

Netfrastructure was many other things apart from just a database engine, but the one part that was interesting to MySQL was the storage engine. One interesting thing about Netfrastructure was the inbuilt Java VM, which meant you could do all sorts of neat tricks right near the data and with Jim and Ann coming from much more feature complete databases, some parts of MySQL were going to be a little bit of a shock.

There was still work to be done on the Falcon engine itself, it wasn’t completely finished and ready to just drop in, even though you may have heard things like “to be completed later this year”.

There was also another experiment to see if MySQL could get a storage engine that wasn’t owned by somebody else: resurrect Gemini. Remember, it was GPL licensed now, and it could conceivably be updated to the latest MySQL versions. You can go and see the code for the Amira project on launchpad. Ultimately, this went nowhere, but it could have been an interesting story if this project got more resources. Ultimately, it was Falcon that got all the attention and effort.

This was going to be the ultimate triumph of the Pluggable Storage Engine Architecture, something that had been touted in presentations and marketing material for years. What really happened? Well, it took NDB (now MySQL Cluster) somewhere between two and four years before you could really say it was stable and free of many “obvious” bugs/unintended deviations in behavior. Would Falcon be any different?

In reality, the Falcon project unearthed all of the warts of the storage engine interface. There were many exchanges of “Is it really like that?”, “Yes, yes it is, sorry.”

The APIs (and I use the term loosely) between a storage engine and the database server were originally used to plug in ISAM and then MyISAM, and the first transactional store (InnoDB) required a bunch of hacks to work and the API was never cleaned up. When the third transactional engine came along (MySQL Cluster) – which was also a distributed engine – there were again no real fixes in the API, just more of the same work-arounds and oddness. So by the time Falcon came along, if you were lucky the API was merely just not what you’d expect. There were many, many situations where not only was the official documentation inaccurate, but you’d have to commit several gross layering violations just to get something rather fundamental to a transactional database working.

Arguably the worst thing was foreign keys, a requirement for the Falcon project. You had to implement your own SQL parser to get any information on foreign keys. Let’s not even get started on auto_increment – something which the correct behavior is pretty much completely undocumented.

Next, we’ll look at PBXT.

Ghosts of MySQL Past Part 5: The Era of Acquisitions

This week I’ve been writing based on my linux.conf.au 2014 talk, which you can watch the recording of.

Also see Part 1, Part 2, Part 3 and Part 4. My feed feel off Planet MySQL for a bit so you may have missed those posts.

Now we head into the era of acquisitions… there have been a few in MySQL history, and in 2005 came the second (the first was MySQL AB acquiring Alzato for NDB). In what was to be known as “InnoDB Friday”, the makers of InnoDB – Innobase Oy – was acquired by Oracle. That very same month….

MySQL 5.0 GA. The first GA release of MySQL 5.0 is infamous. It was nowhere near ready and everybody who tried to use 5.0 in the early GA days has a story about something obvious that was broken. Basically, the majority of the new features simply didn’t work. It took many point releases before people would consider 5.0 ready.

The real measure of 5.0 quality was that it took MySQL AB over a year before we started to use it for our support database.

At the end of 2005, the Maria project was started: a project to create a transactional storage engine. This should not be confused with MariaDB, which would come years later. This is Maria, now called Aria. The basic idea was to fork MyISAM and work on adding features. In hindsight, it’s easy to see that when you have a quality problem with your main product, you should probably not take a bunch of senior engineers and have them work on a different project. IIRC there was some initial estimate of a GA by the end of 2007. It’s now eight years since the project started and there’s still no stable release.

There were other efforts to get a transactional storage engine not owned by Oracle, and in 2006 MySQL AB acquired Netfrastructure and along with it Jim Starkey and Ann Harrison came to work for MySQL AB.

Originally named JSTAR, this would become known as Falcon (probably something to do with the Swedish beer by the same name).

Ghosts of MySQL Past: Part 4, A million features for Enterprise

Continuing on from Part 3….

SAP is all about Enterprises and as such, used all the Enterprise features of databases. This is a much different application than every user of MySQL so far (which were often web applications, and increasingly being used at scale). This was “MySQL focuses on Enterprise”.

This is 2003: before VIEWS, before stored procedures, before triggers, before cursors, before prepared statements, before precision math, before SQL access to logs, before any real deadlock info, before you could use a / in a table name, before any performance statistics, before full unicode and before partitioning, row level binary logs, fractional seconds in temporal types and before EXPLAIN for INSERT/UPDATE/DELETE (which we still don’t have). There was a long way to go before MySQL would fit with SAP. A long, long way.

But how were we going to run our cell phone networks on MySQL? There was an answer to that as MySQL AB acquired an Ericsson spinoff, Alzato for it’s clustered database server, NDB, which would become MySQL Cluster. This is separate from replication and originally designed for GSM HLR databases. Early versions had two types of availability: five nines (99.999%) or zero nines (0.000%) – it either worked or exploded. Taking a database engine that wasn’t fully mature and integrating it with MySQL is not a task for the faint of heart. It wouldn’t be the last time this was attempted and it would prove to be the most successful.

In October of 2004, MySQL 4.1 went GA – the first version with MySQL Cluster and incidentally, just before I joined MySQL AB on the MySQL Cluster team.

By this time, MySQL AB had hired just about all outside contributors, as, well, what better job interview than a patch? The struggle to have a good relationship with outside contributors started a long time ago, it’s certainly nothing new.

2005 was the year of MySQL 5.0 instability. At the time, 5.0 was the development version and it was plagued by instability. It was really hard for development teams (such as the SAP team) to work on deliverables when every time they’d rebase their work on a new 5.0 trunk version, the server would explode in new and interesting ways.

In fact, the situation was so bad that Continuous Integration was re-invented. Basically, it was feature++; stability–. The fact that at the same time even less stable development trees existed did not bode well.

Ghosts of MySQL Past: Part 3

See Part 1 and Part 2.

We rejoin our story with a lawsuit. While MySQL suing Progress NuSphere is not perhaps the first GPL lawsuit that comes to mind, it was the first time that the GPL was tested in court. Basically, the GEMINI storage engine was a proprietary storage engine bundled with a copy of MySQL. Guess what? The GPL was found to be valid and GEMINI was eventually GPLed, and it didn’t really go anywhere after that. Why? Probably some business reasons and also, InnoDB was actually rather good and there wasn’t a lawsuit to enforce the GPL there, making business relationships remarkably easier.

In 2003 there was a second round of VC funding. The development team increased in size. One thing that MySQL AB did was invest heavily in technology. I think this is what gave the company a lot of value, you need to spend money developing technology if you wish to be seen as a giant and if you wish to be able to provide a high level of quality service to customers.

MySQL 4.0 went GA in March 2003 while at the same time there were 4.1 and 5.0 development trees. Three concurrent development trees may seem too many – and of course, it was. But these were heady days of working on features that MySQL was missing and ever wanting to gain users and market share. Would all these extra features be able to be added to MySQL? Time would tell…

The big news of 2003 for MySQL? A partnership with SAP. There was this idea: “run SAP on MySQL” which would push the MySQL Server in a bit of an odd direction. For a start, the bootstrap SQL script for SAP created something like 10,000 tables and loaded gigabytes of data – before you even started setting it up. In 2003, on MySQL 4.0, this didn’t go so well. Why was SAP interested? Well, then you’d be able to run SAP without paying Oracle licenses!

Ghosts of MySQL Past: Part 2

This continues on from my post yesterday and also contains content from my linux.conf.au 2014 talk (view video here).

Way back in May in the year 2000, a feature was added to MySQL that would keep many people employed for many years – replication. In 3.23.15 you could replicate from one MySQL instance to another. This is commonly cited as the results of two weeks of work by one developer. The idea is simple: create a log of all the SQL queries that modify the database and then replay them on a slave. Remember, this is before there was concurrency and everything was ISAM or MyISAM, so this worked (for certain definitions of worked).

The key things to remember about MySQL replication are: it was easy to use, it was easy to set up and it was built into the MySQL Server. This is why it won. You have to fast forward to September in 2010 before PostgreSQL caught up! It was only with PostgreSQL 9.0 that you could have queryable read-only slaves with stock standard PostgreSQL.

If you want to know why MySQL was so much bigger than PostgreSQL, this built in and easy to use replication was a huge reason. There is the age of a decent scotch between read-only slaves for MySQL and PostgreSQL (although I don’t think I’ve ever pointed that out to my PostgreSQL friends when having scotch with them… I shall have to!)

In 2001, when space was an odyssey, the first GA (General Availability) release of MySQL 3.23 hit the streets (quite literally, this was back in the day of software that came in actual physical boxes, so it quite probably was literally hitting the streets).

For a good piece of trivia, it’s 3.23.22-beta that is the first release in the current bzr tree, which means that it was around this time that BitKeeper first came into use for MySQL source code.

We also saw the integration of InnoDB in 2001. What was supremely interesting is that the transactional storage engine was not from MySQL AB, it was from Innobase Oy. The internals of the MySQL server were certainly not set up for transactions, and for many years (in fact, to this day) we talk about how a transactional engine was shoehorned in there. Every transactional engine since has had to do the same odd things to, say, find out when a transaction was being started. The exception here is in Drizzle, where we finally cleaned up a bunch of this mess.

Having a major component of the MySQL server owned and controlled by another company was an interesting situation, and one that would prove interesting in a few years time.

We also saw MÃ¥rten Mickos become CEO in 2001, a role he would have through the Sun acquisition – an acquisition that definitively proved that you can build an open source company and sell it for a *lot* of money. It was also the year that saw MySQL AB accept its first round of VC funding, and this would (of course) have interesting implications: some good, some less ideal.

(We’ll continue tomorrow with Part 3!)

Past, Present and future of MySQL and variants Part 1: Ghosts of MySQL Past

You can watch the video of my linux.conf.au 2014 talk here: http://mirror.linux.org.au/linux.conf.au/2014/Wednesday/28-Past_Present_and_future_of_MySQL_and_variants_-_Stewart_Smith.mp4

But let’s talk about things in blog form rather than video form :)

Back in 1979, there was UNIREG. A text UI to records (rows) in a database (err, table). The reason I mention UNIREG is that it had FoRMs which as you may have guessed by my capitalization there is where the FRM file comes from.

In 1986, UNIREG came to UNIX. That’s right kids, the 80×24 VT100 interface to ISAM (Index Sequential Access Method – basically rows are written in insert order and indexes point to them) came to UNIX. There was no generic query language, just FoRMs and reports. In fact, to this day, that 80×24 text interface is stored in the FRM file by MySQL and never ever used (I’ve written about this before).

Then there was this mSQL thing around the 1990s, which was a small SQL server (with source) but not FOSS. Originally, Monty W plugged in his ISAM engine but it wasn’t quite the right fit… so in 1995, we had MySQL 1.0 and MySQL AB was founded.

Fast forward a bit and in 1996 we had MySQL 3.19 and development continued. It managed to gain features, performance, ports to different operating systems and CPU architectures and, of course, stability.

It wasn’t until the year 2000 that MySQL adopted the GPL. This turned out to be a huge step in the right direction for increased adoption. At the time, this was a huge risk for the company, essentially risking all the revenue of the company on making the software more free.

This was the birth of the dual licensing business model. You see, the client library (libmysql) was also GPL, which meant it was easy to use if your application was also GPL, but if you were going to distribute your application and it wasn’t under a GPL compatible license (there was also a FOSS exception so that things like PHP could use it) then you needed a license.

Revenue from licensing was to be significant throughout the entire history of MySQL AB.

(We’ll continue this in part 2 tomorrow)

and now for something completely different…

As many of you know, I’ve been working in the MySQL world for quite a while now. IN fact, it was nearly 10 years ago when I first started hacking on MySQL Cluster at MySQL AB.

Most recently, I was at Percona which was a wonderful journey where over my nearly three years there the company at least doubled in size, launched several new software products and greatly improved the quality and frequency of releases.

However the time has come for something completely different. The MySQL world is rather mature, the future of Percona software is bright and, well, I could do with poking into something rather different.

So a couple of weeks ago I started at IBM in the Linux Technology Centre working on KVM on POWER and related things. No doubt there’ll be interesting things to blog about as time goes on, but it’s about time I posted my change of employment :)

Converting MySQL trees to git

I have put up a set of scripts on github: https://github.com/stewartsmith/bzr-to-git-conversion-scripts. Why do I need these? Well… if only bzr fast-export|git fast-import worked flawlessly for large, complex and old trees. It doesn’t.

Basically, when you clone this repo you can run “./sync-BLAH.sh” and it’ll pull BZR trees for the project, convert to git and clean things up a bit. You will likely have to edit the sync-BLAH.sh scripts as I have them pointed at branches on my own machine (to speed up the process, not having to do fresh BZR branches of MySQL trees over the network is a feature – it’s never been fast.). You’ll also want to edit the git remotes to point where you want git trees to end up.

I’ve done it for:

What problems did I hit? Well… the first is performance, things are slow unless you tweak a bunch of knobs, and then it’s just rather slow rather than slow. So in the empty git repo I set core.compression=1, which makes zlib a whole lot faster.

I naturally give the correct incantation to bzr fast-export to munge tag names appropriately, set a git branch name (each BZR branch ends up as a git branch) and use a marks file (this speeds up incremental syncs).

For one of these branches I was importing, BZR had allowed the invalid committer of “billy-earney billy.earney@gmail.com\n <>” – yes, a newline in the committer. This messes up the fast-import format so I have to run the entire fast-export output through sed to clean it up.

We then use bzr fast-import-filter to apply a user map – which is me looking at the appropriate committers and cleaning them up so that we get better attribution in the resulting git trees as well as cleaning up some errors in the bzr tree so that Git likes them (most notably, missing < or (not and) > around email addresses). The user map is fairly Percona specific, but there’s at least one or two for Oracle committers too.

Next, I pass the output through pv(1) – to do two things: monitor the output to see that it’s still going, and to have a transfer buffer so that git fast-import doesn’t stall waiting for output – amazingly enough, this gave a decent speed boost to import speed.

Finally, when we’re done doing the import of all of the revisions for all of the bzr branches, if this is our first run, we set the HEAD ref to the last BZR branch name and then do a git repack. Through experimentation, I’ve found that “git repack -AdfF –depth=100 –window=500” is what gives me the smallest size possible.

My lca2014 talk video: Past, Present and Future of MySQL and variants

On last Wednesday morning I gave my talk at linux.conf.au 2014. You can now view and download the recording of it here:

http://mirror.linux.org.au/linux.conf.au/2014/Wednesday/28-Past_Present_and_future_of_MySQL_and_variants_-_Stewart_Smith.mp4

(hopefully more free formats will come soon, the all volunteer AV team has been absolutely amazing getting things up this quickly).

Hong Kong (OpenStack Summit)

I’ll be in Hong Kong for the upcoming OpenStack Summit Nov 5-8. I’d be thrilled to talk database things with others present, especially around Trove DBaaS (DataBase as a Service) and high availability MySQL for OpenStack deployments.

I was last in Hong Kong in 2010 when I worked for Rackspace. The closest office to me was in Hong Kong so that’s where I did my HR onboarding training. I remember telling friends on the Sunday night before leaving for Hong Kong that I may be able to make dinner later in the week purely depending on if somebody got back to me on if I was going to Hong Kong that week. I was, and I went. I took some photos while there.

Walking from the hotel where we were staying to the Rackspace office could be done pretty much entirely through buildings without going outside. There were bits of art around too, which is just kind of awesome – I’m always in favour of random art.
Statues in walkways

The photo below was the view from my hotel room. The OpenStack summit is just by the airport rather than in the middle of town, so the views will be decidedly different to this, but still probably quite spectacular if you’re around the right place (I plan to take camera gear, so shout if you want to journey too)
Hotel Window (Hong Kong)

There are some pretty awesome markets around Hong Kong offering just about everything you’d want, including a lot just out on the street.
Java Road
Hong Kong Street Market

Nightime was pretty awesome, having people from around the world journey out into the night was great.
Rackers walking Hong Kong at Night

I was there during the World Cup, and the streets were wonderfully decorated. I’m particularly proud of this photo as it was handheld, at night, after beer.
Hong Kong streetlife

The road to Percona Server 5.6

Over a year ago now, I announced the first Percona Server 5.6 alpha on the Percona MySQL Performance Blog (Announcing Percona Server 5.6 Alpha). That was way back on August 14th, 2012 and it was based on MySQL 5.6.5 released in April.

I’m really happy now to point to the release of the first GA release of Percona Server 5.6 along with some really interesting benchmarks. We’ve certainly come a long way from that first alpha and I’m really happy that we’ve also managed to continue to release Percona Server 5.5 and Percona Server 5.1 releases on time and of high quality.

Over the same time frame that we’ve been working on Percona Server 5.6 we’ve increased the size of the company, improved development practices and grown enough that we’ve reorganised how development of software is managed to make it scale better. One thing I’m really, really pleased about is a culture of quality we’ve managed to nurture.

Keeping a culture of quality alive is something that requires constant nurturing. All too often I’ve seen pressure to ship sooner rather than stabler (yes, I just invented that word), and yes, we initially planned the GA of PS 5.6 earlier than we ended up shipping it, but we instead took the time to round out features and stability to ship something much better.

Now comes the effort of continuing good releases, promoting it and writing a Webinar to give next week.

Pictures of Auckland (where OSDC 2013 is!)

It’s getting close to time to head to Auckland for OSDC and a few days ago I blogged about how I’m speaking there). I’ll be speaking on MySQL In the Cloud, As A Service and all of the challenges that can entail as well as on The Agony and Ecstasy of Continuous Integration. Both of these talks draw heavily on the experience of Percona (my employer) and with experience from helping customers with all sorts of MySQL deployments and in our experience in producing our own high quality software.

I was in Auckland earlier this year, so thought I’d share some pictures of the wonderful city in which OSDC is being held.

Firstly, New Zealand has some pretty awesome wildlife. This is possibly not the best example of it ever as there are way more odd looking birds than this one:

Auckland

The waterfront is quite nice, and when we were there earlier in the year it was awfully nice weather for it:
Auckland

I’m pretty sure there isn’t going to be a triathlon in Auckland for OSDC, but I’m still hoping to get out for a run while there (anybody else up for one?). We left home at something like 3:30 in the morning and got some silly early flight (6am or before) and were totally walking around the city a little like zombies, realising that we simultaneously wanted to go for a run and sleep.

Auckland Triathlon

We were meeting friends from Seattle and managed to spot this coffee place down by the water. I didn’t try it myself, but I’ve certainly had good coffee at other places in New Zealand.

Seattle coffee in Auckland, New Zealand

Streets at night:

Auckland@dusk

And if I haven’t already convinced you that Auckland would be a great place to be, here’s a crappy cell-phone snapshot of a variety of New Zealand beers – a tiny, tiny fraction of beer you can get in New Zealand (the microbrewery scene is amazing)

A selection of NZ beer

Go register for OSDC 2013 right now: http://osdc.org.nz/tickets/

The end of Bazaar

I’ve used the Bazaar (bzr) version control system since roughly 2005. The focus on usability was fantastic and the team at Canonical managed to get the entire MySQL BitKeeper history into Bazaar – facilitating the switch from BitKeeper to Bazaar.

There were some things that weren’t so great. Early on when we were looking at Bazaar for MySQL it was certainly not the fastest thing when it came to dealing with a repository as large as MySQL. Doing an initial branch over the internet was painful and a much worse experience than BitKeeper was. The work-around that we all ended up using was downloading a tarball of a recent Bazaar repository and then “bzr pull” to get the latest. This was much quicker than letting bzr just do it. Performance for initial branch improved a lot since then, but even today it’s still not great – but at least it isn’t terrible like it once was.
The integration with Launchpad was brilliant. We never really used it for MySQL but for Drizzle the combination was crucial and helped us get releases out the door, track tasks and bugs and do code review. Parts of launchpad saw great development (stability and performance improved immensely) and others did not (has anything at all changed in blueprints in the past 5+ years?). Not running your own bugs db was always a win and I’m really sad to say that I still think Launchpad is the best bug tracker out there.
For both Drizzle and Percona, Bazaar was the right option as it was what MySQL was using, so people in the community already knew the tools. These days however… Git is the tool that there’s large familiarity with – even to the extent that Twitter maintains their MySQL branch in Git rather than in bzr.Is Bazaar really no longer being developed? Here are graphs (from github actually) on the activity on Bazaar itself over the years:Screenshot from 2013-10-02 10:32:19Screenshot from 2013-10-02 10:33:41You can easily see the drop off in commits and code changes. The last commit to trunk was 2 months ago and although there was the 2.6.0 release in August, in my opinion it wasn’t a very strong one (the first one I’ve had problems with in years).So… git is the obvious successor and with such a strong community around GitHub, it kinda makes sense. I’m not saying that GitHub has caught up to Launchpad in terms of features or anything – it’s just that with Bazaar clearly no longer really being developed…. it may be the only option.In fact, in my experiment of putting a mirror of Percona Server on GitHub, we already have a pull request mere days after I blogged about it. Migrating all of Percona development over to Git and Github may take some time, but it’s certainly time that we kicked the tyres on it and worked out how we’d do it without interrupting releases or development.I’ve also thrown up a Drizzle tree and although it required some munging to get the conversion to happen, I’m kind of optimistic about it and I think that after a round of merging things, I’m tempted to very strongly advocate for us switching (which I don’t think there’ll be any opposition to).When will Oracle move over their MySQL development? This I cannot say (as I don’t know and don’t make that call for them). There is a lot of renewed interest in code contribution by Oracle and moving to Git and GitHub may well be a very good way to encourage people.
The downside of git? Well… With BZR you could get away with not understanding pretty much every single bit of the internals. With git, I wish I was so lucky.

Who is working on MySQL 5.7?

First I find out the first commit that is in 5.7 that isn’t in 5.6 (using bzr missing) and then look at the authors of all of those commits. Measuring the number of commits is a really poor metric as it does not measure the complexity of the code committed, and if your workflow is to revise a patchset before committing, you get much fewer commits than if you commit 10 times a day.

There are a good number of people who are committing a lot of code to the latest MySQL development tree. (Sorry for the annoying layout of “count. number-of-commits name”)

  1. 1022 Magnus Blaudd
  2. 723 Jonas Oreland
  3. 329 Marko Mäkelä
  4. 286 Krunal Bauskar
  5. 230 Tor Didriksen
  6. 218 John David Duncan
  7. 205 Vasil Dimov
  8. 197 Sunny Bains
  9. 166 Ole John Aske
  10. 141 Marc Alff
  11. 141 Frazer Clement
  12. 140 Jimmy Yang
  13. 131 Joerg Bruehe
  14. 129 Jon Olav Hauglid
  15. 125 Annamalai Gurusami
  16. 106 Martin Skold
  17. 104 Nuno Carvalho
  18. 103 Georgi Kodinov
  19. 102 Pekka Nousiainen

There’s also a good number who have 50-100 commits:

  1. 99 Mauritz Sundell
  2. 97 Bjorn Munch
  3. 92 Craig L Russell
  4. 85 Andrei Elkin
  5. 81 Mattias Jonsson
  6. 73 Nirbhay Choubey
  7. 71 Roy Lyseng
  8. 68 Kevin Lewis
  9. 66 Rohit Kalhans
  10. 65 Guilhem Bichot
  11. 61 Sayantan Dutta
  12. 59 Akhila Maddukuri
  13. 58 Jorgen Loland
  14. 57 Martin Zaun
  15. 56 Harin Vadodaria
  16. 55 Inaam Rana
  17. 53 Venkatesh Duggirala
  18. 53 Venkata Sidagam
  19. 52 Gleb Shchepa
  20. 51 Norvald H. Ryeng
  21. 51 Jan Wedvik
  22. 50 Tatjana Azundris Nuernberg

And there’s even more with less than 50:

  1. 49 Manish Kumar
  2. 49 Alexander Barkov
  3. 48 Shivji Kumar Jha
  4. 48 Martin Hansson
  5. 42 Maitrayi Sabaratnam
  6. 40 Satya Bodapati
  7. 39 Horst Hunger
  8. 38 Neeraj Bisht
  9. 34 Yasufumi Kinoshita
  10. 34 prabakaran thirumalai
  11. 34 Kristofer Pettersson
  12. 33 Evgeny Potemkin
  13. 33 Dmitry Lenev
  14. 33 Chaithra Gopalareddy
  15. 33 Alexander Nozdrin
  16. 31 Hemant Kumar
  17. 31 Allen lai
  18. 31 Aditya A
  19. 30 Nisha Gopalakrishnan
  20. 30 Anirudh Mangipudi
  21. 29 Tanjot Uppal
  22. 28 Christopher Powers
  23. 27 Sujatha Sivakumar
  24. 27 Ashish Agarwal
  25. 25 Olav Sandstaa
  26. 25 Mayank Prasad
  27. 24 Anitha Gopi
  28. 24 Ahmad Abdullateef
  29. 23 Hery Ramilison
  30. 22 Vamsikrishna Bhagi
  31. 22 Praveenkumar Hulakund
  32. 22 Pedro Gomes
  33. 20 Sergey Glukhov
  34. 20 Libing Song
  35. 19 Vinay Fisrekar
  36. 19 Harin Vadodaria
  37. 18 Raghav Kapoor
  38. 18 Luis Soares
  39. 18 Gopal Shankar
  40. 18 Astha Pareek
  41. 17 viswanatham gudipati
  42. 17 Thayumanavar
  43. 17 Ramil Kalimullin
  44. 16 Oystein Grovlen
  45. 15 Dmitry Shulga
  46. 15 Amit Bhattacharya
  47. 15 Akhil Mohan
  48. 14 Ravinder Thakur
  49. 14 Kent Boortz
  50. 13 Bernd Ocklin
  51. 12 Bill Qu
  52. 11 Shaohua Wang
  53. 10 Sven Sandberg

There’s also a good number with fewer than 10 (31 names actually), which is encouraging as it means that this means it’s likely people who are not involved every day in development of new code (maybe QA, build etc) which probably means that (at least internally) contributing code isn’t really a big problem (and as I’ve shown previously, the barriers to external contributions between Oracle MySQL and MariaDB appear to result in roughly the same amount of code from people outside those companies).

There are 125 names here in total, with 19 having over 100 commits, 22 with 50-100 commits, another 53 with 10-50 commits and 31 with <10. So it’s possible to say that there are at least 125 people at Oracle working on MySQL – and I know there are awesome people who are missing from this list as their work doesn’t result in committing code directly to the tree.

Who is working on MariaDB 10.0?

There was some suggestion after my previous post (Who works on MariaDB and MySQL?) that I look at MariaDB 10.0 – so I have. My working was very simple, in a current MariaDB 10.0 BZR tree (somewhat beyond 10.0.3), I ran the following command:

bzr log -n0 -rtag:mariadb-10.0.0..|egrep '(author|committer): '| \
  sed -e 's/^\s*//; s/committer: //; s/author: //'| \
  sort -u|grep -iv oracle

 

MariaDB foundation/MontyProgram/SkySQL:

  1. Alexander Barkov
  2. Alexey Botchkov
  3. Daniel Bartholomew
  4. Elena Stepanova
  5. Igor Babaev
  6. Jani Tolonen
  7. knielsen
  8. Michael Widenius
  9. sanja
  10. Sergei Golubchik
  11. Sergey Petrunya
  12. Sergey Vojtovich
  13. timour
  14. Vladislav Vaintroub

Elsewhere:

  1. Kentoku SHIBA (4 commits)
  2. Lixun Peng (1 commit)
  3. Olivier Bertrand (212 commits)

From Oracle (i.e. revisions merged from Oracle MySQL):

  • 81 names (which I won’t list here as 81 is a lot)

The results are no different if you go back to the first revision that is different between MariaDB 5.5 and 10.0 (found using bzr missing). Even when grepping through the bzr log for things such as “patch by”, “contribution” or “originally” I can only find 1 or two more names as original authors for patches (about the same as I can for patches going into the Oracle tree).

Please point me to revisions (revid is best way) that come from outside contributors as then I really can update this to show that there’s a larger developer community.

The current development version of Drizzle (7.2) has just as many contributors as the MariaDB development version (10.0) – although Drizzle does have fewer commits.