libmallocfail

Bazaar branches of libmallocfail

Simple LD_PRELOAD library that will take parameters via environment variables and cause malloc() to occationally fail.

Aim was to use this to test bits of MySQL/Drizzle although since their libtool based stuf, the binary in tree is a libtool shell script, and I haven’t found a way to LD_PRELOAD only for mysqld and not the shell script and the other processes spawned by it.

I have found a bug in libc though :)

Goodbye FRM (or at least the steps to it)

Since before MySQL was MySQL, there has been the .FRM file. Of course, what it really wanted to be was “.form” -  a file that stored how to display a form on your (green) CRT. Jan blogged earlier in the year on this still being there, even in MySQL 5.1 (albeit not in any useful form).

So why do we want it to die?

Well… it’s not exactly very useful anymore.

There are a few things it’s used for….

If database/table.frm exists, the table exists (or, on Windows, you may also get databasetable.frm). This is tested in a few bits in the code by a call to access(2).

Most engines have their own data dictionaries (Innodb, PBXT, NDB, Falcon). Keeping these in sync with the FRMs can be problematic at best. This is especially true with distributed engines such as NDB.

The current solution is that on the SQL node that is creating the table, we create the FRM file, gzip it, and store it in the cluster. Then, other nodes, if they go “err… no local frm” first call ha_create_table_from_engine() which NDB will go and see if the table exists in the cluster. If so, it copies the FRM from the cluster to local disk and then the SQL server continues on its way with the standard way of opening a table (through the FRM). If you do DDL through the NDB-API (and not via SQL) then well… you get to keep both pieces.

As for if you crash during a table rename (with any engine with its own data dictionary.. e.g. InnoDB)… you again get to keep both pieces. (There is a bit of discussion on this over here)

Having FRM files also doesn’t especially lead to having multiple versions of table metadata co-existing in the server.

The fun part of reading a frm is open_binary_frm in table.cc. It reads in the frm into a TABLE_SHARE. If we only had some other way of filling out a TABLE_SHARE… one from the engine itself…

But what about any metadata that the engine data dictionary doesn’t have? For example, many server types may map to 1 engine type. An example of this is the GIS types in MySQL. For most engines, these just map to BLOBs. The engine itself has no knowledge about that, but we should fill out the table definition correctly…. so for this type of thing the engine may need to store some additional metadata. This is pretty easy for transactional engines: put it in a table! (although you then have your own problem about keeping this synchronised with any DDL). For engines that don’t have their own data dictionary, we can just provide a set of routines to store/read a frm type file (based on protobufs no doubt).

There also seems to be some entanglement with LOCK_open. Ahhh LOCK_open, the lock that nobody can possibly understand.

The tricky thing will be not rewriting every little bit from scratch all at once but rather go for the incremental bits….

Perhaps you’re not all stupid…. although…

As we (the rest of the world) heave a huge sigh of relief at the result of the US Presidential election it’s morning here and I’m happy that Obama is the president elect.

However, it looks like Proposition 8 will get up in California. EPIC FAIL. I hope the remaining ballots tip the result in the other direction, I really do.

I don’t see how the state (where state includes country and state… i.e. governments) has any right to legislate anything to do with human relationships.

What is marriage? Beyond the personal committment between consenting adults, it sets up a few things in case one party is incapacitated or dies. So why don’t we have a set of forms for this information?

In case I’m unable to make my own medical decisions, a list of people who (in order) I want to make them for me. If I’m not married, surely I still have a right to say who gets to make medical decisions for me.

On the “who gets my stuff when I die” front, a legal will pretty much covers that.

Some places seem to have tax benefits if you’re married… which to me is much like the government interfering in something that is none of its business. Surely being with a partner is its own reward, not something you do for tax incentives.

So you’re left with things like: shared property if relationship ends, right to adopt, access to fertility treatment etc

Shared property: How is this different from “two friends buy a house together, share it. they then fall out/decide to sell and there is dispute as one put more $$ into it”? It’s not, it’s just on a larger scale.

All I can say on who gets to reproduce is that it is surely a child is best raised by people who love them and can provide them with a safe, nurturing and positive environment.

Solution? Abolish Marriage.

Well… at least any legal definition thereof and everybody can live happily ever after.

Except that’s not what the issue is. It’s a debate between those who believe in personal freedom and right to live as you choose (on the proviso that it does not cause harm to others) and those who do not.

I have exactly zero time for those who think abortion should be illegal. When no child on this earth dies of hunger or easily preventative disease, I will then hear your argument. Odds are I will disagree with you, but until that day, you are just heartless.

But for the moment, we have until January 20th to see what more Bush/Cheney can do to make Americans constantly apologise for their country.

There is a great deal of hope for Obama, so I say this: don’t fuck up.

(Oh, and please don’t follow our new guys and try and erect a great internet firewall… because the club of countries that do that: Iran, Myanmar, North Korea, and Syria is one you really want to be part of).

I remain cautiously optimistic.

Use MySQL, get elected President of the United States

Jonathan puts it in slighty different words, and doesn’t gaurantee The White House to everybody.

I do wonder when we’ll get a Drizzle or NDB using president though….

Singing in the Rain

The past 3 years, 11 months I have worked full time on NDB (MySQL Cluster). It’s been awesome. Love the product and people. In the time I’ve been on the Cluster team, we’ve gone from a small group that would easily fit in the (old old) Stockholm office to one that requires large rooms to house us all in. It’s also been all about smart people (you have to be to work on a distributed database).

With MySQL Cluster 6.4 we’re getting in a bunch of features that have been on the “wide adoption” wishlist. With each release of NDB we’ve gained a wedge of applications that can be used with it – and 6.4 is no exception.

One of the biggest things that’s been worked on is multithreaded data nodes. If you check out Jonas‘ recent posts on 500,000 reads/sec and then a massive 700,000 reads/sec.

We’ve also got a Microsoft Windows port coming up, which a number of people have asked for over the years. Mostly I think this is a “I want to try it out” thing and not a deployment thing. (can any sane person deploy a HA app on Win32?)

I’ve used “NDB$INFO” as the ultimate answer to any problem for a while now. It’s been the much-wanted monitoring interface. We have a lot of info inside NDB that currently isn’t easily user accessible (or only accessible through the magic DUMP interface or by gathering up many events in the cluster log). We have the start of NDB$INFO in 6.4 now and Martin will be continuing my work in making it truly awesome.

So go and grab the 6.4 tree and have a look – things are looking sweet.

What next for me?

Well… a while ago I started hacking on Drizzle. Why? Well… I thought we could move the database server in a new direction and make it more modular, leaner, meaner query machine.

And now, I’m starting to work on it full time.

It’s exciting, and I’ll be blogging on the first TODO which is remove the FRM file and switch to a full discovery method shortly.

UPDATE: Yes, I’m working full time on Drizzle for Sun Microsystems (in the CTO group). While not spending work time on NDB anymore, no doubt you’ll still see fun-time patches.

Microsoft Calls Today Global Anti-Piracy Day

Slashdot | Microsoft Calls Today Global Anti-Piracy Day

Happy with my software freedom thanks, news at 11.

suspend works again!

yay – latest package updates for ubuntu made suspend and resume work again for me. yet to try some of the xrandr stuff to see if its fixed yet.

on compiling with –disable-assert

It’s like removing the brakes from your car. yes, it will go faster (slightly less weight) but, dude, you just removed the brakes.

MySQL Cluster (NDB) on Win32 progress

Many things have been happenning in the land of NDB on Win32 as of late.

I’ve fixed about 700 compiler warnings (some of which were real bugs) leaving about 161 to go on Win32 (VS2003). We’re getting a few more warnings on Win64 (some of which look merely semantic, while others could be real bugs), but the main focus now is getting 32bit going really well.

I fixed a number of bugs that were around preventing lots of things from working properly:

Disk Data (i.e. CREATE TABLESPACE, CREATE LOGFILE GROUP, and CREATE TABLE… TABLESPACE ts1 STORAGE DISK) now works. The main problem here was that our filesystem abstraction layer for the NDB kernel (ndbd) once had a Win32 port… which has sorely bitrotted over the years. As new features were introduced to the file IO interface, they (of course) weren’t also added to the Win32 abstraction. In the disk data case, the OM_INIT feature, which on FSOPENREQ (open a file) allows data to be passed in for initialising the file. Previously, I fixed this to allocate the file on disk and create a file of the same size, but i didn’t add the feature that writes initial data to the file. This caused bugs as soon as you tried to use the disk data tables (the files weren’t initialised, so you hit asserts on corrupt disk data files).

Paths in the server: for whatever bizarre and stupid reason, the MySQL server can end up having paths to a table as ./database/table OR .\database\table. The latter *never* shows up on non-Win32 platforms but can *sometimes* show up on Win32. Ick ick ick ick. Anyway, we (in the NDB handler) weren’t dealing with this properly, causing problems around some metadata ops.

Our pushbuild system takes each push to a source tree, builds it on a variety of platfroms and runs the mysql-test-run.pl test suite. The Win32 hosts are actually running on vmware. In order to make tests run faster, on Linux we use /dev/shm for the data files. Microsoft Windows doesn’t have a good ram disk, so we create a file on /dev/shm on the host and map that as a drive inside Windows (and format it as NTFS). This drive is only 1GB. This is not enough disk space for running all the clusters (yes, plural) started by the test suite (and everything would die with ENOSPC). The workaround I’ve come up with is that for debug builds, we simply enable NTFS file compression on files ndbd creates.

Win64 is also working! Pushbuild builds and runs on 64bit, and the Win64 host is building with NDB and passing about the same amount of tests as the Win32 hosts!

The bad news is that the NDB with replication tests are pretty much all failing… so I’m fairly confident that cluster replication is very broken on Win32 (and 64) at the moment.

I’ve had to do a fair amount of fixing on a bunch of the test cases (mainly to do with finding where various NDB utilities are). They’ve also prompted fixes in NDB (automatically converting / to \ in ndbd on Win32 for CREATE DATAFILE/UNDOFILE).

If you want to give it a go – you can get the source from launchpad. Either in the mysql-5.1-telco-6.4 tree, or if you want a few more things fixed, always have a look at the mysql-5.1-telco-6.4-win tree. Hopefully both are synced with the latest internal trees (i.e. plain 6.4 is working on win32) by the time you read this.

Iggy and I discussed installers for NDB on Windows in Riga, and we should have something soon-ish for those of you who don’t build from source.

No opt-out of filtered Internet

Computerworld – No opt-out of filtered Internet (via Chris)

EPIC FAIL. Looks like I’m going to have to reconsider which end of the ballot paper the ALP goes at (hint, it’s now towards the bottom).

Pay no attention to the criminals behind the curtain… we’re just going to let them do their thing and make sure it’s hard for you to see them.

With a false positive rate (incorrectly blocking something) of 10,000/1,000,000 (otherwise known as 1%) we’ll no doubt see things blocked that shouldn’t be.

What happened to my free country?

evolution-data-server even worse (is that possible?)

Just caught it using 713MB of resident memory. What the fuck? I don’t even have Evolution running! There’s only the clock applet (which does pull things out of calendar i guess…).

Does Evolution win the prize for worst piece of free software yet?

new NetworkManager VPNC not better at all (in fact, much worse)

I upgraded to Ubuntu 8.10 the other day, NetworkManager promptly forgot my wireless LAN key (grr… lucky I keep a copy in a text file) as well as my VPN configuration. It’s also changed the UI for entering what specific networks to route over the VPN (’cause the last thing you want is putting all your traffic through VPN when you have a perfectly good internet connection here… or even worse, I do *not* need to go via Sydney or the US to access the machine 2ft from me thank you very much).

Generally not happy with the new NetworkManager.

getarg calls srand() ???

storage/ndb/test/src/getarg.c

Guess what? It calls srand(time(NULL)) in getarg(). Why you ask? well.. what you want to be able to when specifying a flag is have it be true, false or it could “maybe” be set.

That’s right kids… maybe.

I’m sure it’s used somewhere in our test suite to get coverage on different things.. but umm.. yeah, interesting discovery for today.

Visual Studio 2008 unreferenced local variable bug

screenshot ’cause typing is for wusses

UPDATE: not actually VS bug. Nasty macro defining strtok_r to strtok on Win32. ouch.

NDB Windows port shaping up…

It’s getting there. The tree should now pretty much always compile, and (at least mostly) doesn’t break anything on other platforms. It even works on win32… at least basic functionality. There will be monsters (bugs.. but scarier, becuase it’s win32).

Best use of the word ‘fuck’ for today

A Web OS? Are You Dense? – Ted Dziuba

All this crap about moving everything into another 5 layers of abstraction down just means we’re never going to have a computer that’s visibly any faster than an Apple // to any user sitting in front of it… Sure, it can recalculate your spreadsheet faster (as long as it’s not an online spreadsheet running in a browser), but menus take longer to appear.

4GB of RAM and swapping

Evolution: why exactly do you need 89MB when you start yet 898MB now?

Somehow I feel that 4GB of physical memory and 4GB of swap (well.. 2GB used) should be enough to run an email client, web browser, office suite, play some tunes, use a calendar, IRC and IM.

GIMPshop.com GPL fail

GIMPshop.com – A GIMP hack by Scott Moschella
“For example, the GPL does not necessarily allow access to the original source code”

umm… a FAIL that’s perhaps EPIC.

(Courtesy of Mary Gardiner’s identi.ca feed)

SetFileValidData Function (Windows) – Now with added FAIL

SetFileValidData Function (Windows)

There seems to be two options on Win32 for preallocating disk space to files.

Basically, I want a equivilent to posix_fallocate or the ever wonderful xfsctl XFS_IOC_RESVSP64 call.

The idea being to (quickly) create a large file on disk that is stored efficiently (i.e. isn’t fragmented).

From SQL, you’d do something like “CREATE LOGFILE GROUP lg1 ADD UNDOFILE ‘uf1’ INITIAL_SIZE 1G;” and expect a 1GB file on disk. One way of getting this is calling write() (or WriteFile() on Win32) repeatedly until you’ve written a 1GB file full of zeros. This means you’re generating approximately 1GB of IO.

Except it’s worse than that: every time you extend the file, you’re going to be changing the metadata (file and free space information). If you’re lucky, you won’t be using a file system that writes a new transaction to the journal for each time you do this.

If your file system allocator doesn’t like you today (even more likely when you’ve got more than one process doing IO), you may end up with rather fragmented files as well – especially if you’re doing synchronous IO. So you want some method of saying “this file will be size X, please allocate disk space to it in the most efficient way for a file size of X” as it’s not possible to infer this from everyday IO calls (I guess the Win32 CopyFile and CopyFileEx calls could though).

It probably doesn’t do it, but having a CopyFile call would be neat for copy on write file systems and saving space… although I wonder how many Win32 apps would cope with ENOSPC on a write to an existing part of a file.

On IRIX we used the magic xfsctl() with the XFS_IOC_RESVSP64 argument. On Linux (with XFS), we use the same. On ext2/ext3 the only way to get the same has been to (with the file system unmounted), parse the file system and implement it yourself. Although (and this just in) the brand new fallocate() call should help with this. The posix_fallocate() call in GNU libc has just been a wrapper around the simple method of writing 0 to a file from start to end (albeit rather efficiently).

XFS implements something called “unwritten extents”. An unwritten extent says “this range of blocks is allocated to this file. If reading from this range, return a zero page. If writing, split the unwritten extent into 3 parts: before, the newly written extent (which isn’t unwritten: i.e. now valid data), and the after extent.” Simple, rather efficient and gets really good allocation as XFS gets to search the free space btrees based on size.

So what to do on Win32 (apart from drink heavily to try and make it all go away)?

There’s SetFileValidData, but that needs special permissions and may expose previously deleted data from other users. i.e. massive security hole. FAIL

There’s SetEndOfFile which, quoting the MS docs: “If the file is extended, the contents of the file between the old end of the file and the new end of the file are not defined.” Not exactly reassuring… but introduced in W2k, so rather safe to use today. Doesn’t save you from having to fill the file with zeros as part of initialisation though.

There’s SetFileInformationByHandle, which looks like it may do exactly what I want… if you read between the lines of the documentation. But it’s only supported starting with Vista. Which you all use of course, so that’s not a problem.

Building MySQL on Windows – MySQL Forge Wiki

Building MySQL on Windows – MySQL Forge Wiki

This one covers running mysqld in the VisualStudio debugger, which can be useful.

I have no special ndb_mgmd.exe or ndbd.exe in debugger instructions or wisdom (running them from mysql-test-run.pl at least). I’ve attached debugger to already running (started by mysql-test-run.pl) ndb processes, but haven’t made any changes to mtr to make it like the mysqld of “go and enter this”.