Archive for the 'mysql' Category

Scaling MySQL on a 256-way T5440 server using Solaris ZFS and Java 1.7

Friday, November 21st, 2008

Scaling MySQL on a 256-way T5440 server using Solaris ZFS and Java 1.7

*cough*

(and then wipe coffee off the computer)

of course the real aim should be to scale with one instance on the machine as scaling with multiple instances on the one machine isn’t scaling at all - it’s scale out, but with more problems (now when one machine goes down, so do 1110202434 database instances).

Technology predictions

Sunday, November 9th, 2008

In 2 years (ish):

  • the majority of consumer bought machines (which will be laptops) will have SSD and not rotational media
  • At the same time, servers with larger storage requirements will use disk as we once used tape.
  • At least one Linux distributoin will be shipping with btrfs as default
  • OpenSolaris will be looking interesting and not annoying to try out (a lot more “just work” and easy to get going).
  • Unless Sun puts ZFS under a GPL compatible license so it can make it into the Linux kernel, it will become nothing more than a Solaris oddity as other file systems will have caught up (and possibly surpassed).
  • There will be somebody developing a a MySQL compatible release based off Drizzle
  • Somebody will have ported Drizzle back to Microsoft Windows… possibly Microsoft.
  • X will still be used for graphics on Linux, although yet another project will start up to “replace X with something modern”, get a lot of press and then fail.

In 5 years:

  • Apple will single handedly control 1/3rd the mobile phone market
  • The other 2/3rds will be divided between Blackberry (small), Windows Mobile and Android.
  • Linux desktop market share will be much higher than Apple’s

That’s all for now…

libmallocfail

Saturday, November 8th, 2008

Bazaar branches of libmallocfail

Simple LD_PRELOAD library that will take parameters via environment variables and cause malloc() to occationally fail.

Aim was to use this to test bits of MySQL/Drizzle although since their libtool based stuf, the binary in tree is a libtool shell script, and I haven’t found a way to LD_PRELOAD only for mysqld and not the shell script and the other processes spawned by it.

I have found a bug in libc though :)

Goodbye FRM (or at least the steps to it)

Thursday, November 6th, 2008

Since before MySQL was MySQL, there has been the .FRM file. Of course, what it really wanted to be was “.form” -  a file that stored how to display a form on your (green) CRT. Jan blogged earlier in the year on this still being there, even in MySQL 5.1 (albeit not in any useful form).

So why do we want it to die?

Well… it’s not exactly very useful anymore.

There are a few things it’s used for….

If database/table.frm exists, the table exists (or, on Windows, you may also get database\table.frm). This is tested in a few bits in the code by a call to access(2).

Most engines have their own data dictionaries (Innodb, PBXT, NDB, Falcon). Keeping these in sync with the FRMs can be problematic at best. This is especially true with distributed engines such as NDB.

The current solution is that on the SQL node that is creating the table, we create the FRM file, gzip it, and store it in the cluster. Then, other nodes, if they go “err… no local frm” first call ha_create_table_from_engine() which NDB will go and see if the table exists in the cluster. If so, it copies the FRM from the cluster to local disk and then the SQL server continues on its way with the standard way of opening a table (through the FRM). If you do DDL through the NDB-API (and not via SQL) then well… you get to keep both pieces.

As for if you crash during a table rename (with any engine with its own data dictionary.. e.g. InnoDB)… you again get to keep both pieces. (There is a bit of discussion on this over here)

Having FRM files also doesn’t especially lead to having multiple versions of table metadata co-existing in the server.

The fun part of reading a frm is open_binary_frm in table.cc. It reads in the frm into a TABLE_SHARE. If we only had some other way of filling out a TABLE_SHARE… one from the engine itself…

But what about any metadata that the engine data dictionary doesn’t have? For example, many server types may map to 1 engine type. An example of this is the GIS types in MySQL. For most engines, these just map to BLOBs. The engine itself has no knowledge about that, but we should fill out the table definition correctly…. so for this type of thing the engine may need to store some additional metadata. This is pretty easy for transactional engines: put it in a table! (although you then have your own problem about keeping this synchronised with any DDL). For engines that don’t have their own data dictionary, we can just provide a set of routines to store/read a frm type file (based on protobufs no doubt).

There also seems to be some entanglement with LOCK_open. Ahhh LOCK_open, the lock that nobody can possibly understand.

The tricky thing will be not rewriting every little bit from scratch all at once but rather go for the incremental bits….

Use MySQL, get elected President of the United States

Wednesday, November 5th, 2008

Jonathan puts it in slighty different words, and doesn’t gaurantee The White House to everybody.

I do wonder when we’ll get a Drizzle or NDB using president though….

Singing in the Rain

Wednesday, November 5th, 2008

The past 3 years, 11 months I have worked full time on NDB (MySQL Cluster). It’s been awesome. Love the product and people. In the time I’ve been on the Cluster team, we’ve gone from a small group that would easily fit in the (old old) Stockholm office to one that requires large rooms to house us all in. It’s also been all about smart people (you have to be to work on a distributed database).

With MySQL Cluster 6.4 we’re getting in a bunch of features that have been on the “wide adoption” wishlist. With each release of NDB we’ve gained a wedge of applications that can be used with it - and 6.4 is no exception.

One of the biggest things that’s been worked on is multithreaded data nodes. If you check out Jonas‘ recent posts on 500,000 reads/sec and then a massive 700,000 reads/sec.

We’ve also got a Microsoft Windows port coming up, which a number of people have asked for over the years. Mostly I think this is a “I want to try it out” thing and not a deployment thing. (can any sane person deploy a HA app on Win32?)

I’ve used “NDB$INFO” as the ultimate answer to any problem for a while now. It’s been the much-wanted monitoring interface. We have a lot of info inside NDB that currently isn’t easily user accessible (or only accessible through the magic DUMP interface or by gathering up many events in the cluster log). We have the start of NDB$INFO in 6.4 now and Martin will be continuing my work in making it truly awesome.

So go and grab the 6.4 tree and have a look - things are looking sweet.

What next for me?

Well… a while ago I started hacking on Drizzle. Why? Well… I thought we could move the database server in a new direction and make it more modular, leaner, meaner query machine.

And now, I’m starting to work on it full time.

It’s exciting, and I’ll be blogging on the first TODO which is remove the FRM file and switch to a full discovery method shortly.

UPDATE: Yes, I’m working full time on Drizzle for Sun Microsystems (in the CTO group). While not spending work time on NDB anymore, no doubt you’ll still see fun-time patches.

on compiling with –disable-assert

Wednesday, October 15th, 2008

It’s like removing the brakes from your car. yes, it will go faster (slightly less weight) but, dude, you just removed the brakes.

getarg calls srand() ???

Wednesday, October 8th, 2008

storage/ndb/test/src/getarg.c

Guess what? It calls srand(time(NULL)) in getarg(). Why you ask? well.. what you want to be able to when specifying a flag is have it be true, false or it could “maybe” be set.

That’s right kids… maybe.

I’m sure it’s used somewhere in our test suite to get coverage on different things.. but umm.. yeah, interesting discovery for today.

Visual Studio 2008 unreferenced local variable bug

Friday, September 19th, 2008

screenshot ’cause typing is for wusses

UPDATE: not actually VS bug. Nasty macro defining strtok_r to strtok on Win32. ouch.

NDB Windows port shaping up…

Thursday, September 18th, 2008

It’s getting there. The tree should now pretty much always compile, and (at least mostly) doesn’t break anything on other platforms. It even works on win32… at least basic functionality. There will be monsters (bugs.. but scarier, becuase it’s win32).

SetFileValidData Function (Windows) - Now with added FAIL

Monday, September 8th, 2008

SetFileValidData Function (Windows)

There seems to be two options on Win32 for preallocating disk space to files.

Basically, I want a equivilent to posix_fallocate or the ever wonderful xfsctl XFS_IOC_RESVSP64 call.

The idea being to (quickly) create a large file on disk that is stored efficiently (i.e. isn’t fragmented).

From SQL, you’d do something like “CREATE LOGFILE GROUP lg1 ADD UNDOFILE ‘uf1′ INITIAL_SIZE 1G;” and expect a 1GB file on disk. One way of getting this is calling write() (or WriteFile() on Win32) repeatedly until you’ve written a 1GB file full of zeros. This means you’re generating approximately 1GB of IO.

Except it’s worse than that: every time you extend the file, you’re going to be changing the metadata (file and free space information). If you’re lucky, you won’t be using a file system that writes a new transaction to the journal for each time you do this.

If your file system allocator doesn’t like you today (even more likely when you’ve got more than one process doing IO), you may end up with rather fragmented files as well - especially if you’re doing synchronous IO. So you want some method of saying “this file will be size X, please allocate disk space to it in the most efficient way for a file size of X” as it’s not possible to infer this from everyday IO calls (I guess the Win32 CopyFile and CopyFileEx calls could though).

It probably doesn’t do it, but having a CopyFile call would be neat for copy on write file systems and saving space… although I wonder how many Win32 apps would cope with ENOSPC on a write to an existing part of a file.

On IRIX we used the magic xfsctl() with the XFS_IOC_RESVSP64 argument. On Linux (with XFS), we use the same. On ext2/ext3 the only way to get the same has been to (with the file system unmounted), parse the file system and implement it yourself. Although (and this just in) the brand new fallocate() call should help with this. The posix_fallocate() call in GNU libc has just been a wrapper around the simple method of writing 0 to a file from start to end (albeit rather efficiently).

XFS implements something called “unwritten extents”. An unwritten extent says “this range of blocks is allocated to this file. If reading from this range, return a zero page. If writing, split the unwritten extent into 3 parts: before, the newly written extent (which isn’t unwritten: i.e. now valid data), and the after extent.” Simple, rather efficient and gets really good allocation as XFS gets to search the free space btrees based on size.

So what to do on Win32 (apart from drink heavily to try and make it all go away)?

There’s SetFileValidData, but that needs special permissions and may expose previously deleted data from other users. i.e. massive security hole. FAIL

There’s SetEndOfFile which, quoting the MS docs: “If the file is extended, the contents of the file between the old end of the file and the new end of the file are not defined.” Not exactly reassuring… but introduced in W2k, so rather safe to use today. Doesn’t save you from having to fill the file with zeros as part of initialisation though.

There’s SetFileInformationByHandle, which looks like it may do exactly what I want… if you read between the lines of the documentation. But it’s only supported starting with Vista. Which you all use of course, so that’s not a problem.

Building MySQL on Windows - MySQL Forge Wiki

Monday, September 8th, 2008

Building MySQL on Windows - MySQL Forge Wiki

This one covers running mysqld in the VisualStudio debugger, which can be useful.

I have no special ndb_mgmd.exe or ndbd.exe in debugger instructions or wisdom (running them from mysql-test-run.pl at least). I’ve attached debugger to already running (started by mysql-test-run.pl) ndb processes, but haven’t made any changes to mtr to make it like the mysqld of “go and enter this”.

MySQL Conference & Expo 2009 - CFP open

Friday, September 5th, 2008

Is it that time already? MySQL Conference & Expo 2009 has opened the CFP.

Submit (well) early and often. It’s always an exciting (and exhausting) conf. Good technical, relevant content is what makes it good. Getting to talk to people who do amazing things, people who use your software, people looking to use it, people who want to chat about how you can learn off each other.

Any suggestions for what you’d like to hear from me (Cluster, Drizzle et al) are welcome - either via private mail or comments here.

when the problem is likely a bug in the linker…

Thursday, August 28th, 2008

Windows FAIL.

It has been suggested the current thing I’m trying to fix is actually a bug in the Microsoft linker…. and I’m quite willing to believe that.

I wonder if I can expense rehab if this Windows port leads to a drinking problem….

Building MySQL Cluster on Windows (for Windows)

Wednesday, August 27th, 2008

You will need:

  • CMake (at least 2.4.7)
  • Bazaar (the newer the better - 1.6 was just released - at least use that)
  • Gnu Bison
  • Visual Studio (Express works, but I’m talking about 2005 here)
  • … and all this installed on a Microsoft Windows machine.
  • … and to hate yourself, you are going to be using Windows after all.

Then, get and build it:

  1. Get the source:
    bzr branch lp:~mysql/mysql-server/mysql-5.1-telco-6.4-win
  2. Run CMake. the CMake GUI can now be used to select compile options! You’ll have to set the path “where is the source code” to where you put the source code in step 1.
  3. Hit “Configure” in CMake
  4. Select the target (i.e. the version of Visual Studio you’re going to use)
  5. Select the build options. HINT: WITH_NDBCLUSTER_STORAGE_ENGINE may be a useful one to enable
  6. Hit Configure again
  7. Hit Ok.
  8. CMAKE now generates the Visual Studio project. Use this time to drink some good scotch.
  9. Open Mysql.sln (which should launch Visual Studio)
  10. Go Build -> Build Solution (or hit F7)

Now you can go and have much whisky as this will take a few minutes. You should now have a set of built binaries for MySQL Cluster on Windows. Scary.

ndb_mgm.exe builds (and works) in mysql-5.1-telco-6.4-win

Friday, August 22nd, 2008

“MySQL Cluster 6.4 Windows tree” branch in Launchpad

(which really should have the -fail suffix… but anyway)

In what will (soon) be mirrored to launchpad, all but 17 targets (yeah, working on that… but it’s out of 130 or something) build.

Not only that, I’ve used the management client (ndb_mgm.exe) to monitor the cluster running my Bugzilla instance (which is now a rather old 6.3 build).

Getting closer to NDB on Windows.

Be afraid. Be very, very afraid.

“MySQL Cluster 6.4 Windows tree” branch in Launchpad

Thursday, August 21st, 2008

“MySQL Cluster 6.4 Windows tree” branch in Launchpad

That’s right folks, I’m pushing up patches for MySQL Cluster on Windows. This tree is incomplete, and no promises on when enough will be pushed for it to even compile on Windows.

Tree is updated when launchpad pulls from our internal tree.

Firefox on OpenSolaris fixed (and installed bzr)

Wednesday, August 6th, 2008

Thanks to Glynn for pointing me to the right thread on opensolaris.org (in a comment on my Good adventures with OpenSolaris post). The package verification thingy (pkg verify -v -f SUNWfirefox) did actually throw an error (indicating some sort of problem). So that’s pretty neat. The fact that it got into trouble in the first place isn’t good, but corruption detection is the next best thing.

I still occationally hit the bug in VirtualBox where if you have 127.0.0.1 in your resolv.conf on your host (e.g. running a local caching nameserver), VirtualBox passes this through to the guest, so the guest tries to use the guest 127.0.0.1 as a nameserver - this usually doesn’t work so well.

The good news is, Firefox now works in my OpenSolaris VM.

The bad news is that even though I’ve gone and set my keyboard layout as DVORAK (with the Input Method Switcher applet), whath should be ctrl-l (for location bar) in Firefox, actually brings up the Print dialog (on DVORAK, L is where P is on QWERTY).

But, I’ve managed to download bazaar now, and the install was simple (just follow INSTALL in the bzr tarball). At some point I’ll badger someone to make an OpenSolaris package for it so you could do “pkg install bzr”, but you can’t do that yet.

The next challenge will be to branch repositories from the host onto a temp drive, build and test.

Good adventures with OpenSolaris

Tuesday, August 5th, 2008

First of all, thanks to everyone who commented on my previous OpenSolaris entry (which wasn’t really positive at all).

I recently tried again - this time starting with an ISO of build 93. I’d recommend completely ignoring the 2008.05 release and going straight for the build 93 image.

Installed easily in VirtualBox, adding the VirtualBox extensions was easy. Select “Devices -> Install Guest Additions” in the VirtualBox menu, then when logged into the OpenSolaris install, do the following:

su

pkgadd -d /media/VBOXADDITIONS_1.6.0_30421/VBoxSolarisAdditions.pkg

(you then say yes, i really do want to install it. rather obvious. I had to do this step again after the “pkg image-update” below though). Just logging out and then back in again gets you all the awesomeness you’d expect from running other guests (such as that system released by a large corporation in Redmond).

The “pkg image-update” went as expected, and I’m now running build 94.

I installed SunStudio Express (compilers) pretty easily - “pkg install sunstudio”. Unfortunately, this is all in /opt/SunStudioExpress and not in $PATH, which would have been much more useful. I guess there’s still a bit to go before usability nirvana. Also, no .desktop entries, so have to explicitly run /opt/SunStudioExpress/bin/sunstudio to get the NetBeans gui. Presumably if i add /opt/SunStudioExpress/bin to PATH, building random software packages will be nicer.

So, I then want bzr so i can pull source repositories. Monty Taylor informs me that the magic packages you want are: SUNWgcc, gcc-dv and SUNWtoo. Then you can build bzr as downloaded from the website. Installed these easily.

However, now trying to get the bzr source:
$ firefox
ld.so.1: firefox-bin: fatal: /usr/lib/firefox/libxul.so: corrupt or truncated file

and then symbol kPStaticModules: referenced symbol not found.

So maybe I shouldn’t have upgraded to build 94…..

But certainly in much better shape than the may release, but be warned, it’s still a work-in-progress and some things may sporadically not work from time to time (e.g. like firefox and now).

Hopefully, some time soon I’ll get a MySQL build (well… really I want MySQL Cluster, and later drizzle) going and will really be able to hammer these things with dtrace.

OSCON

Monday, July 21st, 2008

Arrived okay - long travel, but in one piece. Staying at the doubletree.