So, ten years ago (how is that even possible… it seems like it was just a couple of years ago), there was the first commit in the libeatmydata repository (now in git on github rather than in bzr on launchpad). The first implementation was literally just this:
int fsync(int fd)
Soooo…. kind of incredibly simple. But, hey, it worked! Little did I know, that these two lines of code were going to grow into 166 lines of C in order to do it a bit more “properly”.
My initial use case was making the MySQL test suite run faster: 30% faster back then! In fact, it was better than using tmpfs! It’s still used for that (even though I no longer hack on MySQL with any regularity), see github issue #1 for a recent bug that cropped up.
Since then, I’m aware of eatmydata being used to build entire operating systems and in production in way too many places (on way too many machines). The probability that any given human who’s used a computer in the past 10 years has used libeatmydata, used a package built with it or used a service with it running somewhere in production is so close to 1 that I don’t want to think about it.
Well… here’s to the next ten years of eating data!
I updated the web site for libeatmydata (woah!): http://flamingspork.com/projects/libeatmydata/ and the launchpad page: https://launchpad.net/libeatmydata to reflect this too.
New exciting things in the land of libeatmydata:
- sync_file_range is now wrapped (thanks to Phillip Susi)
- I now bundle the eatmydata helper script originally included in the debian packages
- the autotools foo builds on MacOS X
- I modified the eatmydata helper script to also do the right DYLD environment variables if it’s running on Darwin. i.e. the eatmydata helper script now runs on MacOS X too (well, it should – please test)
- libeatmydata should now work just about everywhere that can LD_PRELOAD. Patches welcome.
If anyone knows how to build a non-versioned shared libray using autotools… I’d love to hear it. libeatmydata is totally not something that needs soname versioning. I guess it’s harmless though.
Read the following:
Linux has its fair share of dumb things with data too (ext3 not defaulting to using write barriers is a good one). This is however particularly nasty… I’d have really hoped there were some good tests in place for this.
This should also be a good warning to anybody implementing advanced storage systems: we database guys really do want to be able to write things reliably and you really need to make sure this works.
So, Stewart’s current list of stupid shit you have to do to ensure a 1MB disk write goes to disk in a portable way:
- You’re a database, so you’re using O_DIRECT
- Use < 32k disk writes
- write 32-64mb of sequential data to hopefully force everything out of the drive write cache and onto the platter to survive power failure (because barriers may not be on). Increase this based on whatever caching system happens to be in place. If you think there may be battery backed RAID… maybe 1GB or 2GB of data writes
- If you’re extending the file, don’t bother… that especially seems to be buggy. Create a new file instead.
Of course you could just assume that the OS kind of gets it right…. *laugh*