Fuzzing Firmware – afl-fuzz + skiboot

In what is likely to be a series on how firmware makes some normal tools harder to use, first I’m going to look at american fuzzy lop – a tool for fuzz testing that if you’re not using then you most certainly have bugs it’ll find for you.

I first got interested in afl-fuzz during Erik de Castro Lopo’s excellent linux.conf.au 2016 in Geelong earlier this year: “Fuzz all the things!“. In a previous life, the Random Query Generator managed to find a heck of a lot of bugs in MySQL (and Drizzle). For randgen info, see Philip Stoev’s talk on it from way back in 2009, a recent (2014) blog post on how Tokutek uses it and some notes on how it was being used at Oracle from 2013. Basically, the randgen was a specialized fuzzer that (given a grammar) would randomly generate SQL queries, and then (if the server didn’t crash), compare the result to some other database server (e.g. your previous version).

The afl-fuzz fuzzer takes a different approach – it’s a much more generic fuzzer rather than a targeted tool. Also, while tools such as the random query generator are extremely powerful and find specialized bugs, they’re hard to get started with. A huge benefit of afl-fuzz is that it’s really, really simple to get started with.

Basically, if you have a binary that takes input on stdin or as a (relatively small) file, afl-fuzz will just work and find bugs for you – read the Quick Start Guide and you’ll be finding bugs in no time!

For firmware of course, we’re a little different than a simple command line program as, well, we aren’t one! Luckily though, we have unit tests. These are just standard binaries that include a bunch of firmware code and get run in user space as part of “make check”. Also, just like unit tests for any project, people do send me patches that break tests (which I reject).

Some of these tests act on data we get from a place – maybe reading other parts of firmware off PNOR or interacting with data structures we get from other bits of firmware. For testing this code, it can be relatively easy to (for the test), read these off disk.

For skiboot, there’s a data structure we get from the service processor on FSP machines called HDAT. Basically, it’s just like the device tree, but different. Because yet another binary format is always a good idea (yes, that is laced with a heavy dose of sarcasm). One of the steps in early boot is to parse the HDAT data structure and convert it to a device tree. Luckily, we structured our code so that creating a unit test that can run in userspace was relatively easy, we just needed to dump this data structure out from a running machine. You can see the test case here. Basically, hdat_to_dt is a binary that reads the HDAT structure out of a pair of files and prints out a device tree. One of the regression tests we have is that we always produce the same output from the same input.

So… throwing that into AFL yielded a couple of pretty simple bugs, especially around aborting out on invalid data (it’s better to exit the process with failure rather than hit an assert). Nothing too interesting here on my simple input file, but it does mean that our parsing code exits “gracefully” on invalid data.

Another utility we have is actually a userspace utility for accessing the gard records in the flash. A GARD record is a record of a piece of hardware that has been deconfigured due to a fault (or a suspected fault). Usually this utility operates on PNOR flash through /dev/mtd – but really what it’s doing is talking to the libflash library, that we also use inside skiboot (and on OpenBMC) to read/write from flash directly, via /dev/mtd or just from a file. The good news? I haven’t been able to crash this utility yet!

So I modified the pflash utility to read from a file to attempt to fuzz the partition reading code we have for the partitioning format that’s on PNOR. So far, no crashes – although to even get it going I did have to fix a bug in the file handling code in pflash, so that’s already a win!

But crashing bugs aren’t the only type of bugs – afl-fuzz has exposed several cases where we act on uninitialized data. How? Well, we run some test cases under valgrind! This is the joy of user space unit tests for firmware – valgrind becomes a tool that you can run! Unfortunately, these bugs have been sitting in my “todo” pile (which is, of course, incredibly long).

Where to next? Fuzzing the firmware calls themselves would be nice – although that’s going to require a targeted tool that knows about what to pass each of the calls. Another round of afl-fuzz running would also be good, I’ve fixed a bunch of the simple things and having a better set of starting input files would be great (and likely expose more bugs).

Carbon footprint of interpreted languages

Thought from a good discussion with at François at OSDC today, what is the carbon footprint of various languages? He mentioned that the carbon footprint of a new Haskell compiler release is remarkably non-trivial due to every Haskell package in Debian needing to be rebuilt.

So, I thought, what’s the impact of something like Python? (or Perl). Every machine running the code has to do the bytecode compilation/JIT/interpretation of that code so when, say, Ubuntu ships some new version of $random_desktop_thing_written_in_python, we’re actually compiling it well over 20 million times. That’s a remarkably non-trivial amount of CPU time (and thus CO2 emissions).

So, program in compiled languages such as C or C++ as doing so will save polar bears.

A better set of Boost m4 macros

I just replaced the old Pandora boost m4 macros in a project with boost.m4 from https://github.com/tsuna/boost.m4 and it basically just solved all my problems with Boost and the whole set of distributions that I build for (everything from CentOS/RHEL 5 to Debian unstable).

I like things that other people maintain.

Stewart’s dot twenty rule

I realised I haven’t written on this for a while and I was asked about it again today.

Stewart’s dot twenty rule is that a piece of software is never really mature until a dot twenty release.

This was a variant of “never use a dot zero release” which has been around the industry for a long time (i.e. always wait for X.0.1).

My first written observation on my variant on this rule was back in 2006:

This is a really stupid metric of software maturity. It is, however, disturbingly accurate.

It seems to continue to be both really stupid and disturbingly accurate. The first few point releases are still going to have rough edges and once you get to about 5 you likely have something that’s intensely usable for a good number of people, by dot 10 the more complex use cases should start to be okay and once you get to dot twenty, then you could say it’s mature.

A topic for another time is how releasing often is one thing but maintaining a release is quite another.

Previously:

Fun with Coverity found bugs (Episode 1)

Taking the inspiration of  great series of blog posts “Fun with Bugs” (and not http://funwithbugs.com/ which is about both caring for and eating bugs), and since I recently went and run Coverity against Drizzle, I thought I’d have a small series of posts on bugs that it has found (and I’ve fixed).

An idea that has been pervasive in the Drizzle project (and one that I rather subscribe to) is that there is two types of correct: correct and obviously correct. Being obviously correct is much, much better than merely being correct.

The first category of problems that Coverity found was kind of interesting, there was a warning that data_file_name and index_file_name in class ha_myisam weren’t initialized in the ha_myisam constructor nor in any function that it calls. It turns out that this was basically because the code wasn’t exactly optimal, and these variables were used kind of oddly. In fact, in writing this blog post I went back and found that there’s a bunch of extra dead code and these should just be removed, along with the code that “used” them.

The historical use for data_file_name and index_file_name were that (in MySQL) you could specify different paths for MyISAM data and index files, so that the FRM ended up in the server datadir, the data file ended up some other place and the index file was off behind the sofa. Since MyISAM is used only for temporary tables in Drizzle, this is entirely not needed.

Another place where a similar bug was found by Coverity was in the SQLExecutor class of the json_server plugin. The _err variable wasn’t initialized in the constructor. After some careful auditing, I think this was actually a false positive as it was set to something before being used, but it was pretty simple to prevent future bugs by initializing it.

Two instances of the same warning, one just found a bunch of code to delete (rather useful) and the other is rather minor but may help someone in the future.

Coming up next: total embarrassment bugs.

Limiting functions to 32k stack in Drizzle (and scoped_ptr)

I wonder if this comes under “Code Style” or not…

Anyway, Monty and I finished getting Drizzle ready for adding “-Wframe-larger-than=32768” as a standard compiler flag. This means that no function within the Drizzle source tree can use greater than 32kb stack – it’s a compiler warning – and with -Werror, it means that it’s a build error.

GCC is not perfect at detecting stack usage, but it’s pretty good.

Why have we done this?

Well, there is a little bit of recursion in the server… and we can craft queries to blow a small stack (not so good). On MacOS X, the default thread stack size is only 512kb. This gives not many frames if 32kb stack is a even remotely common.

I found some interesting places to throw a lot of things on the stack too – that would be rather far down on a callchain – leading to the possibility of blowing up in really strange ways.

We’d love to make it 16kb…. but that’s a fair bit more work, so something for the future.

We’ve used the Boost scoped_ptr to address a bunch of these situations as it provides pretty much minimal code change for the same effect (except that memory is dynamically allocated instead of as part of the stack frame).

A tale of a bug…

So I sometimes get asked if we funnel back bug reports or patches back to MySQL from Drizzle. Also, MariaDB adds some interest here as they are a lot closer (and indeed compatible with) to MySQL. With Drizzle, we have deviated really quite heavily from the MySQL codebase. There are still some common areas, but they’re getting rarer (especially to just directly apply a patch).

Back in June 2009, while working on Drizzle at Sun, I found a bug that I knew would affect both. The patch would even directly apply (well… close, but I made one anyway).

So the typical process of me filing a MySQL bug these days is:

  • Stewart files bug
  • In the next window of Sveta being awake, it’s verified.

This happened within a really short time.

Unfortunately, what happens next isn’t nearly as awesome.

Namely, nothing. For a year.

So a year later, I filed it in launchpad for MariaDB.

So, MariaDB is gearing up for a release, it’s a relatively low priority bug (but it does have a working, correct and obvious patch), within 2 months, Monty applied it and improved the error checking around it.

So MariaDB bug 588599 is Fix Committed (June 2nd 2010 – July 20th 2010), MySQL Bug 45377 is still Verified (July 20th 2009 – ….).

(and yes, this tends to be a general pattern I find)

But Mark says he gets things through… so yay for him.2

Shocked and Stunned (that code exists and does work)

#define READ_ALL		1	/* openfrm: Read all parameters */
#define EXTRA_RECORD		8	/* Reservera plats f|r extra record */

and later on….

  if (prgflag & (READ_ALL+EXTRA_RECORD))
    records++;

Feel free to think about that for a second.

(I have an urge to add this to questions asked in a job interview…)

stringstream is completely useless (and why C++ should have a snprintf)

  1. It’s easy to screw up thread safety.
    If you’re trying to format something for output (e.g. leading zeros, only 1 decimal place or whatever… you know, format specifiers in printf) you are setting a property on the stream, not on what you’re converting. So if you have a thread running that sets a format, adds something to the stream, and then unsets the format, you cannot have another thread able to come in and do something to that stream. Look out for thread unsafe cout code.
  2. You cannot use streams for any text that may need to be translated.
    gettext is what everybody uses. You cannot get a page into the manual before it tells you that translators may want to change the order of what you’re printing. This goes directly against stringstream.
  3. You need another reason? Number 2 rules it out for so much handling it’s not funny.

Progress in nofrm branch

“Ban FRM Now!” branch in Launchpad

Now we’re reading part of the table information out of the proto file on disk instead of the frm.

Not everything (yet) but a bit. Good first steps. Had to fix bugs along the way as well (and find weirdness in FRM file format…).

Progress is being made.

magic number super fun happy time

umm…..

int Field_timestamp::store(double nr)
{
  int error= 0;
  if (nr < 0 || nr > 99991231235959.0)
  {
    set_datetime_warning(DRIZZLE_ERROR::WARN_LEVEL_WARN,
                         ER_WARN_DATA_OUT_OF_RANGE,
                         nr, DRIZZLE_TIMESTAMP_DATETIME);
    nr= 0;					// Avoid overflow on buff
    error= 1;
  }
  error|= Field_timestamp::store((int64_t) rint(nr), false);
  return error;
}

(likely the same in mysql as well… haven’t checked though). these date and time things scare me.

Drizzle progress… (testing can be good)

We’ve been working on fixing up the remaining test cases so that they run with Drizzle. We’ve found: bugs in Drizzle, bugs in MySQL (one that seems to have been there for at least 10 years), bugs in the tests, tests that no longer apply and occationally, something like this:


/* Please god, will someone rewrite this to be readable :( */
if (to->pack_length() == from->pack_length() &&
!(to->flags & UNSIGNED_FLAG && !(from->flags & UNSIGNED_FLAG)) &&
to->real_type() != DRIZZLE_TYPE_ENUM &&
(to->real_type() != DRIZZLE_TYPE_NEWDECIMAL || (to->field_length == from->field_length && (((Field_num*)to)->dec == ((Field_num*)from)->dec))) &&
from->charset() == to->charset() &&
to->table->s->db_low_byte_first == from->table->s->db_low_byte_first &&
(!(to->table->in_use->variables.sql_mode & (MODE_NO_ZERO_DATE | MODE_INVALID_DATES)) || (to->type() != DRIZZLE_TYPE_DATE && to->type() != DRIZZLE_TYPE_DATETIME)) &&
(from->real_type() != DRIZZLE_TYPE_VARCHAR || ((Field_varstring*)from)->length_bytes == ((Field_varstring*)to)->length_bytes))
{ // Identical fields
/* This may happen if one does 'UPDATE ... SET x=x' */
if (to->ptr != from->ptr)
memcpy(to->ptr,from->ptr,to->pack_length());
return 0;
}

and no, I haven’t really changed the formatting.

libmallocfail

Bazaar branches of libmallocfail

Simple LD_PRELOAD library that will take parameters via environment variables and cause malloc() to occationally fail.

Aim was to use this to test bits of MySQL/Drizzle although since their libtool based stuf, the binary in tree is a libtool shell script, and I haven’t found a way to LD_PRELOAD only for mysqld and not the shell script and the other processes spawned by it.

I have found a bug in libc though :)

MemberDB speed improvements

So I finally installed the xdebug PHP extension and started doing some performance analysis of MemberDB using xdebug and kcachegrind. The upshot of which is a number of commits to the bzr tree that dramatically improve performance in several key areas. The answer? Caching.

I’m not even talking using memcached or caching things in database tables or anything like that – just about everything is still the same dynamically produced content as before, but I’m now caching some simple things avoiding many round-trips to the database while executing a script.

There were a few things that were taking a fair bit of execution time:

  1. The generation of the menu. In MemberDB, there’s a menu on the left. There’s also a powerful (read: non-trivial) permissions system allowing relatively fine grained granting of permissions. So, we need to check that the user has permission to go to the page before showing the page in the menu.
    Previously, for each item in the menu, we’d do a lookup to the database – checking if they have the permission or they are an admin. This ended up taking a bit of time – up to 30% of the time for the front page was taken up just generating the menu!
    So, now I cache the set of permissions for the user. One function to fetch it from the DB into a structure, another function to check the permissions of the user in that struct.
    While testing this, I actually used memcached to cache the menu to see how much of an improvement I could get… I’m about 69/70ths of the speed of using memcached with a purely PHP implementation caching the permissions info.
  2. Getting the information about a member is done in a variety of places. On some pages, you want information on the current logged in user (or just need to find their member ID). These are now cached for the duration of the script. Saved quite a few DB round trips
  3. When viewing an election (not the results, just the normal “view election” page that lists candidates), we need to get the membership information on a number of users (okay… so technically I should rewrite some of the queries to use joins in the DB… but this was easier). I now have a (limited) cache of membership info. So now, when a member has nominated multiple people, we only pull the member info out of the database once.
  4. Rewrite the “current_members” view. The old one was not as efficient as it could be. While the new one has slightly different semantics (can have duplicate rows, it turns out the use of DISTINCT was adding a bit of execution time, which for a bunch of queries is not needed) it’s significantly quicker.

I used the faithful Apache Bench (ab) to do benchmarks against the modified PHP code. I think the biggest improvement was the view election page which went from about 6seconds/page to 0.2seconds/page.

libeatmydata

Following my successful linux.conf.au talk “Eat My Data: How Everybody Gets POSIX File I/O Wrong“, I started to feel the need to easily be able to have my data eaten.

Okay, not quite. However, when you’ve written your software properly, so it uses fsync() correctly, opening files with O_SYNC or whatever – tests take longer as you’re having to wait for things to hit the rust.

So….. LD_PRELOAD=libeatmydata.so to the rescue! With a POSIX compliant fsync() (that does nothing) and filtering on open(2), it can take your test run times down dramatically.

The only time you shouldn’t use it for your tests is when you end up crashing the machine to test durability (i.e. when the OS doesn’t have the opportunity to cleanly write out the data to disk).

See the libeatmydata project page: http://www.flamingspork.com/projects/libeatmydata/

and the bazaar repository: http://www.flamingspork.com/src/libeatmydata

(it’s seemed to have saved somewhere between 20 and 30% of the time for innodb/ndb tests in mysql-test-run).

SCM performance

Linus is right when he talks about the performance of SCMs…. and that BitKeeper was about the first one to be worth using at all (really).

But as an interesting speed comparison… I’ve managed to pull the latest git (with git) and build it in less time than BitKeeper has taken to pull the latest NDB tree…. hrrm..

One of the reasons I’m so enjoying quilt for every day hacking is that it is blindingly fast.

adding a pluggable information schema table to a pluggable engine in mysql 5.1

Also now up is the patch series in my “ndb-work” tree which small patch for adding INFORMATION_SCHEMA.NDB_NODE_STATUS. It’s nearly useful… I haven’t brought in the nice “id to string” functions in the management client that make pretty printing nice… so not quite end user friendly :)

But it’s a nice patch to learn how to add an INFORMATION_SCHEMA table in a pluggable engine and put some engine specific information in it.

(kudos to the falcon code… which i looked at on how to do it).

Doesn’t take long – this was completed in less than 2hrs while watching and paying attention to sessions…. so should take next to no time if you actually concentrate on it.

Of course, this totally abuses the purity of the information schema.

Experimental NDB Patches

I’ve just put up the current “add node” patch… which is like, totally experimental and kills kittens… but could be interesting for people to have a look at as it progresses. Still lots of work before production ready – but people here at the MySQL Conf have said they’re interested in looking at the code for it.

You can grab a combined patch or the quilt series from:

http://saturn.flamingspork.com/~stewart/ndb-experimental/

Applies to 5.1… at least on a few weeks ago tree.