Fuzzing Firmware – afl-fuzz + skiboot

In what is likely to be a series on how firmware makes some normal tools harder to use, first I’m going to look at american fuzzy lop – a tool for fuzz testing that if you’re not using then you most certainly have bugs it’ll find for you.

I first got interested in afl-fuzz during Erik de Castro Lopo’s excellent linux.conf.au 2016 in Geelong earlier this year: “Fuzz all the things!“. In a previous life, the Random Query Generator managed to find a heck of a lot of bugs in MySQL (and Drizzle). For randgen info, see Philip Stoev’s talk on it from way back in 2009, a recent (2014) blog post on how Tokutek uses it and some notes on how it was being used at Oracle from 2013. Basically, the randgen was a specialized fuzzer that (given a grammar) would randomly generate SQL queries, and then (if the server didn’t crash), compare the result to some other database server (e.g. your previous version).

The afl-fuzz fuzzer takes a different approach – it’s a much more generic fuzzer rather than a targeted tool. Also, while tools such as the random query generator are extremely powerful and find specialized bugs, they’re hard to get started with. A huge benefit of afl-fuzz is that it’s really, really simple to get started with.

Basically, if you have a binary that takes input on stdin or as a (relatively small) file, afl-fuzz will just work and find bugs for you – read the Quick Start Guide and you’ll be finding bugs in no time!

For firmware of course, we’re a little different than a simple command line program as, well, we aren’t one! Luckily though, we have unit tests. These are just standard binaries that include a bunch of firmware code and get run in user space as part of “make check”. Also, just like unit tests for any project, people do send me patches that break tests (which I reject).

Some of these tests act on data we get from a place – maybe reading other parts of firmware off PNOR or interacting with data structures we get from other bits of firmware. For testing this code, it can be relatively easy to (for the test), read these off disk.

For skiboot, there’s a data structure we get from the service processor on FSP machines called HDAT. Basically, it’s just like the device tree, but different. Because yet another binary format is always a good idea (yes, that is laced with a heavy dose of sarcasm). One of the steps in early boot is to parse the HDAT data structure and convert it to a device tree. Luckily, we structured our code so that creating a unit test that can run in userspace was relatively easy, we just needed to dump this data structure out from a running machine. You can see the test case here. Basically, hdat_to_dt is a binary that reads the HDAT structure out of a pair of files and prints out a device tree. One of the regression tests we have is that we always produce the same output from the same input.

So… throwing that into AFL yielded a couple of pretty simple bugs, especially around aborting out on invalid data (it’s better to exit the process with failure rather than hit an assert). Nothing too interesting here on my simple input file, but it does mean that our parsing code exits “gracefully” on invalid data.

Another utility we have is actually a userspace utility for accessing the gard records in the flash. A GARD record is a record of a piece of hardware that has been deconfigured due to a fault (or a suspected fault). Usually this utility operates on PNOR flash through /dev/mtd – but really what it’s doing is talking to the libflash library, that we also use inside skiboot (and on OpenBMC) to read/write from flash directly, via /dev/mtd or just from a file. The good news? I haven’t been able to crash this utility yet!

So I modified the pflash utility to read from a file to attempt to fuzz the partition reading code we have for the partitioning format that’s on PNOR. So far, no crashes – although to even get it going I did have to fix a bug in the file handling code in pflash, so that’s already a win!

But crashing bugs aren’t the only type of bugs – afl-fuzz has exposed several cases where we act on uninitialized data. How? Well, we run some test cases under valgrind! This is the joy of user space unit tests for firmware – valgrind becomes a tool that you can run! Unfortunately, these bugs have been sitting in my “todo” pile (which is, of course, incredibly long).

Where to next? Fuzzing the firmware calls themselves would be nice – although that’s going to require a targeted tool that knows about what to pass each of the calls. Another round of afl-fuzz running would also be good, I’ve fixed a bunch of the simple things and having a better set of starting input files would be great (and likely expose more bugs).