Fast Reset, Trusted Boot and the security of /sbin/reboot

In OpenPOWER land, we’ve been doing some work on Secure and Trusted Boot while at the same time doing some work on what we call fast-reset (or fast-reboot, depending on exactly what mood someone was in at any particular time…. we should start being a bit more consistent).

The basic idea for fast-reset is that when the OS calls OPAL reboot, we gather all the threads in the system using a combination of patching the reset vector and soft-resetting them, then cleanup a few bits of hardware (we do re-probe PCIe for example), and reload & restart the bootloader (petitboot).

What this means is that typing “reboot” on the command line goes from a ~90-120+ second affair (through firmware to petitboot, linux distros still take ages to shut themselves down) down to about a 20 second affair (to petitboot).

If you’re running a (very) recent skiboot, you can enable it with a special hidden NVRAM configuration option (although we’ll likely enable it by default pretty soon, it’s proving remarkably solid). If you want to know what that NVRAM option is… Use the source, Luke! (or git history, but I’ve yet to see a neat Star Wars reference referring to git commit logs).

So, there’s nothing like a demo. Here’s a demo with Ubuntu running off an NVMe drive on an IBM S822LC for HPC (otherwise known as Minsky or Garrison) which was running the HTX hardware exerciser, through fast-reboot back into Petitboot and then booting into Ubuntu and auto-starting the exerciser (HTX) again.

Apart from being stupidly quick when compared to a full IPL (Initial Program Load – i.e. boot), since we’re not rebooting out of band, we have no way to reset the TPM, so if you’re measuring boot, each subsequent fast-reset will result in a different set of measurements.

This may be slightly confusing, but it’s not really a problem. You see, if a machine is compromised, there’s nothing stopping me replacing /sbin/reboot with something that just prints things to the console that look like your machine rebooted but in fact left my rootkit running. Indeed, fast-reset and a full IPL should measure different values in the TPM.

It also means that if you ever want to re-establish trust in your OS, never do a reboot from the host – always reboot out of band (e.g. from a BMC). This, of course, means you’re trusting your BMC to not be compromised, which I wouldn’t necessarily do if you suspect your host has been.

Windows NT4 for PowerPC guest on OPAL on POWER8 in qemu

Sometimes, programming is just for fun. This is what PREPHV is for Andrei Warkentin. To quote the README:

“This is mostly a huge ugly hack, derived from my
ppc64le_hello code. The running philosophy here is
to throw things together late at night with my family
asleep and see how far I get without a real design
or without a real desire to implement boring things
like IDE (*sigh*) emulation”

Since my day job is maintaining the firmware that it runs on, I decided to have a go (it also ties in with the retro stuff I’ve been blogging about). So…

screenshot-from-2016-10-30-17-22-20and I’m off! (yes, this is the very latest qemu and skiboot):

screenshot-from-2016-10-30-17-23-32screenshot-from-2016-10-30-17-23-48Yes, prephv does clear all thirty two megabytes of guest memory

screenshot-from-2016-10-30-17-24-15A quick diversion, if you try Windows NT 3.51 for PowerPC, you get this:

screenshot-from-2016-10-30-18-17-35

But on NT4, you continue unharmed:

screenshot-from-2016-10-30-17-22-32A sign I needed to hack my filesystem of bits of NT installer bits a bit more:

screenshot-from-2016-10-30-17-22-45But, on my next try:

screenshot-from-2016-10-30-17-25-26Well… looks like there’s an instruction that needs to be emulated (and there’s no code to currently do that). Mind you… this is decently far into booting before we hit anything fatal, which is a pretty impressive effort – and it is tempting to continue and see if it’ll run on real hardware and if it could be made to work well enough to not find any disks :)

Workaround for opal-prd using 100% CPU

opal-prd is the Processor RunTime Diagnostics daemon, the userspace process that on OpenPower systems is responsible for some of the runtime diagnostics. Although a userspace process, it memory maps (as in mmap) in some code loaded by early firmware (Hostboot) called the HostBoot RunTime (HBRT) and runs it, using calls to the kernel to accomplish any needed operations (e.g. reading/writing registers inside the chip). Running this in user space gives us benefits such as being able to attach gdb, recover from segfaults etc.

The reason this code is shipped as part of firmware rather than as an OS package is that it is very system specific, and it would be a giant pain to update a package in every Linux distribution every time a new chip or machine was introduced.

Anyway, there’s a bug in the HBRT code that means if there’s an ECC error in the HBEL (HostBoot Error Log) partition in the system flash (“bios” or “pnor”… the flash where your system firmware lives), the opal-prd process may get stuck chewing up 100% CPU and not doing anything useful. There’s https://github.com/open-power/hostboot/issues/67 for this.

You will notice a problem if the opal-prd process is using 100% CPU and the last log messages are something like:

HBRT: ERRL:>>ErrlManager::ErrlManager constructor.
HBRT: ERRL:iv_hiddenErrorLogsEnable = 0x0
HBRT: ERRL:>>setupPnorInfo
HBRT: PNOR:>>RtPnor::getSectionInfo
HBRT: PNOR:>>RtPnor::readFromDevice: i_offset=0x0, i_procId=0 sec=11 size=0x20000 ecc=1
HBRT: PNOR:RtPnor::readFromDevice: removing ECC...
HBRT: PNOR:RtPnor::readFromDevice> Uncorrectable ECC error : chip=0,offset=0x0

(the parameters to readFromDevice may differ)

Luckily, there’s a simple workaround to fix it all up! You will need the pflash utility. Primarily, pflash is meant only for developers and those who know what they’re doing. You can turn your computer into a brick using it.

pflash is packaged in Ubuntu 16.10 and RHEL 7.3, but you can otherwise build it from source easily enough:

git clone https://github.com/open-power/skiboot.git
cd skiboot/external/pflash
make

Now that you have pflash, you just need to erase the HBEL partition and write (ECC) zeros:

dd if=/dev/zero of=/tmp/hbel bs=1 count=147456
pflash -P HBEL -e
pflash -P HBEL -p /tmp/hbel

Note: you cannot just erase the partition or use the pflash option to do an ECC erase, you may render your system unbootable if you get it wrong.

After that, restart opal-prd however your distro handles restarting daemons (e.g. systemctl restart opal-prd.service) and all should be well.

Compiling your own firmware for Barreleye (OpenCompute OpenPOWER system)

Aaron Sullivan announced on the Rackspace Blog that you can now get your own Barreleye system! What’s great is that the code for the Barreleye platform is upstream in the op-build project, which means you can build your own firmware for them (just like garrison, the “IBM S822LC for HPC” system I blogged about a few days ago).

Remarkably, to build an image for the host firmware, it’s eerily similar to any other platform:

git clone --recursive https://github.com/open-power/op-build.git
cd op-build
. op-build-env
op-build barreleye_defconfig
op-build

…and then you wait. You can cross compile on x86.

You’ve been able to build firmware for these machines with upstream code since Feb/March (I wouldn’t recommend running with builds from then though, try the latest release instead).

Hopefully, someone involved in OpenBMC can write on how to build the BMC firmware.

Compiling your own firmware for the S822LC for HPC

IBM (my employer) recently announced  the new S822LC for HPC POWER8+NVLINK NVIDIA P100 GPUs server (press release, IBM Systems Blog, The Register). The “For HPC” suffix on the model number is significant, as the S822LC is a different machine. What makes the “for HPC” variant different is that the POWER8 CPU has (in addition to PCIe), logic for NVLink to connect the CPU to NVIDIA GPUs.

There’s also the NVIDIA Tesla P100 GPUs which are NVIDIA’s latest in an SXM2 form factor, but instead of delving into GPUs, I’m going to tell you how to compile the firmware for this machine.

You see, this is an OpenPOWER machine. It’s an OpenPOWER machine where the vendor (in this case IBM) has worked to get all the needed code upstream, so you can see exactly what goes into a firmware build.

To build the latest host firmware (you can cross compile on x86 as we use buildroot to build a cross compiler):

git clone --recursive https://github.com/open-power/op-build.git
cd op-build
. op-build-env
op-build garrison_defconfig
op-build

That’s it! Give it a while and you’ll end up with output/images/garrison.pnor – which is a firmware image to flash onto PNOR. The machine name is garrison as that’s the code name for the “S822LC for HPC” (you may see Minsky in the press, but that’s a rather new code name, Garrison has been around for a lot longer as a name).

Using Smatch static analysis on OpenPOWER OPAL firmware

For Skiboot, I’m always looking at new automated systems to find bugs in the code. A little while ago, I read about the Smatch tool developed by some folks at Oracle (they also wrote about using it on the Linux kernel).

I was eager to try it with skiboot to see if it could find anything.

Luckily, it was pretty easy. I built Smatch according to their documentation and then built skiboot:

make CHECK="/home/stewart/smatch/smatch" C=1 -j20 all check

Due to some differences in how we implement abort() and assert() in skiboot, I added “_abort”, “abort” and “assert_fail” to smatch_data/no_return_funcs in the Smatch source tree to silence some false positives.

It seems that there’s a few useful warnings there (some of which I’ve fixed in skiboot master already), along with some false positives around the preprocessor/complier tricks we do to ensure at compile time that an OPAL call definition has the correct number of arguments specified.

So far, so good though. Try it on your project!

Building OPAL firmware for POWER9

Recently, we merged into the op-build project (the build scripts for OpenPOWER Firmware) a defconfig for building OPAL for (certain) POWER9 simulators. I won’t bother linking over to articles on the POWER9 chip or schedule (there’s search engines for that), but with this commit – if you happen to be able to get your hands on a POWER9 simulator, you can now boot to the petitboot bootloader on it!

We’re using upstream Linux 4.7.0-rc3 and upstream skiboot (master), so all of this code is already upstream!

Now, by no means is this complete. There’s some fairly fundamental things that are missing (e.g. PCI) – but how many other platforms can you build open source firmware for before you can even get your hands on a simulator?

Fuzzing Firmware – afl-fuzz + skiboot

In what is likely to be a series on how firmware makes some normal tools harder to use, first I’m going to look at american fuzzy lop – a tool for fuzz testing that if you’re not using then you most certainly have bugs it’ll find for you.

I first got interested in afl-fuzz during Erik de Castro Lopo’s excellent linux.conf.au 2016 in Geelong earlier this year: “Fuzz all the things!“. In a previous life, the Random Query Generator managed to find a heck of a lot of bugs in MySQL (and Drizzle). For randgen info, see Philip Stoev’s talk on it from way back in 2009, a recent (2014) blog post on how Tokutek uses it and some notes on how it was being used at Oracle from 2013. Basically, the randgen was a specialized fuzzer that (given a grammar) would randomly generate SQL queries, and then (if the server didn’t crash), compare the result to some other database server (e.g. your previous version).

The afl-fuzz fuzzer takes a different approach – it’s a much more generic fuzzer rather than a targeted tool. Also, while tools such as the random query generator are extremely powerful and find specialized bugs, they’re hard to get started with. A huge benefit of afl-fuzz is that it’s really, really simple to get started with.

Basically, if you have a binary that takes input on stdin or as a (relatively small) file, afl-fuzz will just work and find bugs for you – read the Quick Start Guide and you’ll be finding bugs in no time!

For firmware of course, we’re a little different than a simple command line program as, well, we aren’t one! Luckily though, we have unit tests. These are just standard binaries that include a bunch of firmware code and get run in user space as part of “make check”. Also, just like unit tests for any project, people do send me patches that break tests (which I reject).

Some of these tests act on data we get from a place – maybe reading other parts of firmware off PNOR or interacting with data structures we get from other bits of firmware. For testing this code, it can be relatively easy to (for the test), read these off disk.

For skiboot, there’s a data structure we get from the service processor on FSP machines called HDAT. Basically, it’s just like the device tree, but different. Because yet another binary format is always a good idea (yes, that is laced with a heavy dose of sarcasm). One of the steps in early boot is to parse the HDAT data structure and convert it to a device tree. Luckily, we structured our code so that creating a unit test that can run in userspace was relatively easy, we just needed to dump this data structure out from a running machine. You can see the test case here. Basically, hdat_to_dt is a binary that reads the HDAT structure out of a pair of files and prints out a device tree. One of the regression tests we have is that we always produce the same output from the same input.

So… throwing that into AFL yielded a couple of pretty simple bugs, especially around aborting out on invalid data (it’s better to exit the process with failure rather than hit an assert). Nothing too interesting here on my simple input file, but it does mean that our parsing code exits “gracefully” on invalid data.

Another utility we have is actually a userspace utility for accessing the gard records in the flash. A GARD record is a record of a piece of hardware that has been deconfigured due to a fault (or a suspected fault). Usually this utility operates on PNOR flash through /dev/mtd – but really what it’s doing is talking to the libflash library, that we also use inside skiboot (and on OpenBMC) to read/write from flash directly, via /dev/mtd or just from a file. The good news? I haven’t been able to crash this utility yet!

So I modified the pflash utility to read from a file to attempt to fuzz the partition reading code we have for the partitioning format that’s on PNOR. So far, no crashes – although to even get it going I did have to fix a bug in the file handling code in pflash, so that’s already a win!

But crashing bugs aren’t the only type of bugs – afl-fuzz has exposed several cases where we act on uninitialized data. How? Well, we run some test cases under valgrind! This is the joy of user space unit tests for firmware – valgrind becomes a tool that you can run! Unfortunately, these bugs have been sitting in my “todo” pile (which is, of course, incredibly long).

Where to next? Fuzzing the firmware calls themselves would be nice – although that’s going to require a targeted tool that knows about what to pass each of the calls. Another round of afl-fuzz running would also be good, I’ve fixed a bunch of the simple things and having a better set of starting input files would be great (and likely expose more bugs).

TianoCore (UEFI) ported to OpenPower

Recently, there’s been (actually two) ports of TianoCore (the reference implementation of UEFI firmware) to run on POWER on top of OPAL (provided by skiboot) – and it can be run in the Qemu PowerNV model.

More details:

1 Million SQL Queries per second: GA MariaDB 10.1 on POWER8

A couple of days ago, MariaDB announced that MariaDB 10.1 is stable GA – around 19 months since the GA of MariaDB 10.0. With MariaDB 10.1 comes some important scalabiity improvements, especially for POWER8 systems. On POWER, we’re a bit unique in that we’re on the higher end of CPUs, have many cores, and up to 8 threads per core (selectable at runtime: 1, 2, 4 or 8/core) – so a dual socket system can easily be a 160 thread machine.

Recently, we (being IBM) announced availability of a couple of new POWER8 machines – machines designed for Linux and cloud environments. They are very much OpenPower machines, and more info is available here: http://www.ibm.com/marketplace/cloud/commercial-computing/us/en-us

Combine these two together, with Axel Schwenke running some benchmarks and you get 1 Million SQL Queries per second with MariaDB 10.1 on POWER8.

Having worked a lot on both MySQL for POWER and the firmware that ships in the S882LC, I’m rather happy that 1 Million queries per second is beyond what it was in June 2014, which was a neat hack on MySQL 5.7 that showed the potential of MySQL on POWER8 but wasn’t yet a product. Now, you can run a GA release of MariaDB on GA POWER8 hardware designed for scale-out cloud environments and get 1 Million SQL queries/second (with fewer cores than my initial benchmark last year!)

What’s even more impressive is that this million queries per second is in a KVM guest!

Running OPAL in qemu – the powernv platform

Ben has a qemu tree up with some work-in-progress patches to qemu to support the PowerNV platform. This is the “bare metal” platform like you’d get on real POWER8 hardware running OPAL, and it allows us to use qemu like my previous post used the POWER8 Functional Simulator – to boot OpenPower firmware.

To build qemu for this, follow these steps:

apt-get -y install gcc python g++ pkg-config libz-dev libglib2.0-dev \
  libpixman-1-dev libfdt-dev git
git clone https://github.com/ozbenh/qemu.git
cd qemu
./configure --target-list=ppc64-softmmu
make -j `grep -c processor /proc/cpuinfo`

This will leave you with a ppc64-softmmu/qemu-system-ppc64 binary. Once you’ve built your OpenPower firmware to run in a simulator, you can boot it!

Note that this qemu branch is under development, and is likely to move/change or even break.

I do it like this:

cd ~/op-build/output/images;  # so skiboot.lid is in pwd
~/qemu/ppc64-softmmu/qemu-system-ppc64 -m 1G -M powernv \
-kernel zImage.epapr -nographic \
-cdrom ~/ubuntu-vivid-ppc64el-mini.iso

and this lets me test that we launch the Ubunut vivid installer correctly.

You can easily add other qemu options such as additional disks or networking and verify that it works correctly. This way, you can do development on some skiboot functionality or a variety of kernel and op-build userspace (such as the petitboot bootloader) without needing either real hardware or using the simulator.

This is useful if, say, you’re running on ppc64el, for which the POWER8 functional simulator is currently not available on.

OPAL firmware specification, conformance and documentation

Now that we have an increasing amount of things that run on top of OPAL:

  1. Linux
  2. hello_world (in skiboot tree)
  3. ppc64le_hello (as I wrote about yesterday)
  4. FreeBSD

and that the OpenPower ecosystem is rapidly growing (especially around people building OpenPower machines), the need for more formal specification, conformance testing and documentation for OPAL is increasing rapidly.

If you look at the documentation in the skiboot tree late last year, you’d notice a grand total of seven text files. Now, we’re a lot better (although far from complete).

I’m proud to say that I won’t merge new code that adds/modifies an OPAL API call or anything in the device tree that doesn’t come with accompanying documentation, and this has meant that although it may not be perfect, we have something that is a decent starting point.

We’re in the interesting situation of starting with a working system, with mainline Linux kernels now for over a year (maybe even 18 months) being able to be booted by skiboot and run on powernv hardware (the more modern the kernel the better though).

So…. if anyone loves going through deeply technical documentation… do I have a project you can contribute to!

FreeBSD on OpenPower

There’s been some work on porting FreeBSD over to run natively on top of OPAL, that is, on bare metal OpenPower machines (not just under KVM).

This is one of four possible things to run natively on an OPAL system:

  1. Linux
  2. hello_world (in skiboot tree)
  3. ppc64le_hello (as I wrote about yesterday)
  4. FreeBSD

It’s great to see that another fully featured OS is getting ported to POWER8 and OPAL. It’s not yet at a stage where you could say it was finished or anything (PCI support is pretty preliminary for example, and fancy things like disks and networking live on PCI).

hello world as ppc66le OPAL payload!

While the in-tree hello-world kernel (originally by me, and Mikey managed to CUT THE BLOAT of a whole SEVENTEEN instructions down to a tiny ten) is very, very dumb (and does one thing, print “Hello World” to the console), there’s now an alternative for those who like to play with a more feature-rich Hello World rather than booting a more “real” OS such as Linux. In case you’re wondering, we use the hello world kernel as a tiny test that we haven’t completely and utterly broken things when merging/developing code.

https://github.com/andreiw/ppc64le_hello is a wonderful example of a small (INTERACTIVE!) starting point for a PowerNV (as it’s called in Linux) or “bare metal” (i.e. non-virtualised) OS on POWER.

What’s more impressive is that this was all developed using the simulator rather than real hardware (although I think somebody has tried it on some now).

Kind of neat!

gcov code coverage for OpenPower firmware

For skiboot (which provides the OPAL boot and runtime firmware for OpenPower machines), I’ve been pretty interested at getting some automated code coverage data for booting on real hardware (as well as in a simulator). Why? Well, it’s useful to see that various test suites are actually testing what you think they are, and it helps you be able to define more tests to increase what you’re covering.

The typical way to do code coverage is to make GCC build your program with GCOV, which is pretty simple if you’re a userspace program. You build with gcov, run program, and at the end you’re left with files on disk that contain all the coverage information for a tool such as lcov to consume. For the Linux kernel, you can also do this, and then extract the GCOV data out of debugfs and get code coverage for all/part of your kernel. It’s a little bit more involved for the kernel, but not too much so.

To achieve this, the kernel has to implement a bunch of stub functions itself rather than link to the gcov library as well as parse the GCOV data structures that GCC generates and emit the gcda files in debugfs when read. Basically, you replace the part of the GCC generated code that writes the files out. This works really nicely as Linux has fancy things like a VFS and debugfs.

For skiboot, we have no such things. We are firmware, we don’t have a damn file system interface. So, what do we do? Write a userspace utility to parse a dump of the appropriate region of memory, easy! That’s exactly what I did, a (relatively) simple user space app to parse out the gcov gcda files from a skiboot memory image – something we can easily dump out of the simulator, relatively easily (albeit slower) from the FSP on an IBM POWER system and even just directly out of a running system (if you boot a linux kernel with the appropriate config).

So, we can now get a (mostly automated) code coverage report simply for the act of booting to petitboot: https://open-power.github.io/skiboot/boot-coverage-report/ along with our old coverage report which was just for the unit tests (https://open-power.github.io/skiboot/coverage-report/). My current boot-coverage-report is just on POWER7 and POWER8 IBM FSP based systems – but you can see that a decent amount of code both is (and isn’t) touched simply from the act of booting to the bootloader.

The numbers we get are only approximate for any code run on more than one CPU as GCC just generates code that does a load/add/store rather than using an atomic increment.

One interesting observation was that (at least on smaller systems, which are still quite large by many people’s standards), boot time was not really noticeably increased.

For more information on running with gcov, see the in-tree documentation: https://github.com/open-power/skiboot/blob/master/doc/gcov.txt

Building OpenPower firmware for use in POWER8 Simulator

Previously, I blogged on how to Run skiboot (OPAL) on the POWER8 Simulator. If you want to build the full Open Power firmware environment, including the Petitboot bootloader and kernel, you can now do so!

My pull request for an op-build target for the simulator has been merged, so you can now do the following three steps to compile a kernel+initramfs to use with your built skiboot for development purposes:

git clone --recursive git@github.com:open-power/op-build.git
cd op-build
. op-build-env
op-build mambo_defconfig && op-build

Then you wait for a whole bunch of time while everything compiles! Afterwards, you should be left with a zImage.epapr in output/images/ that you can copy into your skiboot directory.

With zImage.epapr in your skiboot directory, when you run “make check”, the skiboot test suite will actually launch the simulator to verify that your skiboot code boots all the way to the petitboot prompt!

We now have two boot tests as part of “make check” for skiboot!

More OpenPower Firmware code released: OCC

Inside the IBM POWER8 chip there’s another processor! That’s right folks, you get another CPU for no extra cost (It’s a lot funnier if you say these previous two sentences as if you were presenting an informercial for a special TV offer).

It is, however, not what you’d consider a general purpose processor. It is, in fact, a PowerPC 405 – so your POWER8 processor also has another PowerPC chip in it. What’s the purpose of this chip? It’s named the On Chip Controller and it has the job of helping make the main processor (the POWER8) work.

It has two jobs:

  • Monitor temperature and keep the system thermally safe
  • Monitor power usage and keep the system power safe

It runs a hard Real Time OS which has just been released up on github.com/open-power/occ

There’s more complete documentation on OCC here.

It’s fairly exciting to see more of the software that runs on every POWER8 system make it out into the world.

skiboot-4.1

I just posted this to the mailing list, but I’ve tagged skiboot-4.1, so we have another release! There’s a good amount of changes since 4.0 nearly a month ago and this is the second release since we hit github back in July.

For the full set of changes, “git log” is your friend, but a summary of them follows:

  • We now build with -fstack-protector and -Werror
  • Stack checking extensions when built with STACK_CHECK=1
  • Reduced stack usage in some areas, -Wstack-usage=1024 now.
    • Some functions could use 2kb stack, now all are <1kb
  • Unsafe libc functions such as sprintf() have been removed
  • Symbolic backtraces
  • expose skiboot symbol map to OS (via device-tree)
  • removed machine check interrupt patching in OPAL
  • occ/hbrt: Call stopOCC() for implementing reset OCC command from FSP
  • occ: Fix the low level ACK message sent to FSP on receiving {RESET/LOAD}_OCC
  • hardening to errors of various FSP code
    • fsp: Avoid NULL dereference in case of invalid class_resp bits
    • abort if device tree parsing fails
    • FSP: Validate fsp_msg in fsp_queue_msg
    • fsp-elog: Add various NULL checks
  • Finessing of when to use error log vs prerror()
  • More i2c work
  • Can now run under Mambo simulator (see external/mambo/skiboot.tcl) (commonly known as “POWER8 Functional Simulator”)
  • Document skiboot versioning scheme
  • opal: Handle more TFAC errors.
    • TB_RESIDUE_ERR, FW_CONTROL_ERR and CHIP_TOD_PARITY_ERR
  • ipmi: populate FRU data
  • rtc: Add a generic rtc cache
  • ipmi/rtc: use generic cache
  • Error Logging backend for bmc based machines
  • PSI: Drive link down on HIR
  • occ: Fix clearing of OCC interrupt on remote fix

So, who worked on this release? We had 84 csets from 17 developers. A total of 3271 lines were added, 1314 removed (delta 1957).

Developers with the most changesets
Stewart Smith 24 28.6%
Benjamin Herrenschmidt 17 20.2%
Alistair Popple 8 9.5%
Vasant Hegde 6 7.1%
Ananth N Mavinakayanahalli 5 6.0%
Neelesh Gupta 4 4.8%
Mahesh Salgaonkar 4 4.8%
Cédric Le Goater 3 3.6%
Wei Yang 3 3.6%
Anshuman Khandual 2 2.4%
Shilpasri G Bhat 2 2.4%
Ryan Grimm 1 1.2%
Anton Blanchard 1 1.2%
Shreyas B. Prabhu 1 1.2%
Joel Stanley 1 1.2%
Vaidyanathan Srinivasan 1 1.2%
Dan Streetman 1 1.2%
Developers with the most changed lines
Benjamin Herrenschmidt 1290 35.1%
Alistair Popple 963 26.2%
Stewart Smith 344 9.4%
Mahesh Salgaonkar 308 8.4%
Ananth N Mavinakayanahalli 198 5.4%
Neelesh Gupta 186 5.1%
Vasant Hegde 122 3.3%
Shilpasri G Bhat 39 1.1%
Vaidyanathan Srinivasan 24 0.7%
Joel Stanley 21 0.6%
Wei Yang 20 0.5%
Anshuman Khandual 15 0.4%
Cédric Le Goater 12 0.3%
Shreyas B. Prabhu 9 0.2%
Ryan Grimm 3 0.1%
Anton Blanchard 2 0.1%
Dan Streetman 2 0.1%
Developers with the most lines removed
Mahesh Salgaonkar 287 21.8%
Developers with the most signoffs (total 54)
Stewart Smith 44 81.5%
Vasant Hegde 4 7.4%
Benjamin Herrenschmidt 4 7.4%
Vaidyanathan Srinivasan 2 3.7%
Developers with the most reviews (total 2)
Vasant Hegde 2 100.0%

Running skiboot (OPAL) on the POWER8 Simulator

skiboot is open source boot and runtime firmware for OpenPOWER. On real POWER8 hardware, you will also need HostBoot to do this (basically, to make the chip work) but in a functional simulator (such as this one released by IBM) you don’t need a bunch of hardware procedures to make hardware work, so we can make do with just skiboot.

The POWER8 Functional Simulator is free to use but not open source and is only supported on limited platforms. But you can always run it all in a VM! I have it running this way on my laptop right now.

To go from a bare Ubuntu 14.10 VM on x86_64 to running skiboot in the simulator, I did the following:

  • apt-get install vim git emacs wget xterm # xterm is needed by the simulator. wget and editors are useful things.
  • (download systemsim-p8…deb from above URL)
  • dpkg -i systemsim-p8*deb # now the simulator is installed
  • git clone https://github.com/open-power/skiboot.git # get skiboot source
  • wget https://www.kernel.org/pub/tools/crosstool/files/bin/x86_64/4.8.0/x86_64-gcc-4.8.0-nolibc_powerpc64-linux.tar.xz # get a compiler to build it with
  • apt-get install make gcc valgrind # get build tools (skiboot unittests run on the host, so get a gcc and valgrind)
  • tar xfJ x86_64-gcc-4.8.0-nolibc_powerpc64-linux.tar.xz
  • mkdir -p /opt/cross
  • mv gcc-4.8.0-nolibc /opt/cross/ # now you have a powerpc64 cross compiler
  • export PATH=/opt/cross/gcc-4.8.0-nolibc/powerpc64-linux/bin/:$PATH # add cross compiler to path
  • cd skiboot
  • make # this should build a bunch of things, leaving you with skiboot.lid (and other things). If you have many CPUs, feel free to make -j128.
  • make check # run the unit tests. Everything should pass.
  • cd external/mambo
  • /opt/ibm/systemsim-p8/run/pegasus/power8 -f skiboot.tcl # run the simulator

The last step there will barf as you unlikely have a /tmp/zImage.epapr sitting around that’s suitable. If you use op-build to build a full set of OpenPower foo, you’ll likely be able to extract it from there. Basically, the skiboot.tcl script is adding a payload for skiboot to execute. On real hardware, this ends up being a Linux kernel with a small userspace and petitboot (link is to IBM documentation for IBM POWER8 systems). For the simulator, you could boot any tiny zImage.epapr you like, it should detect OPALv3 and boot!

Even if you cannot be bothered building a kernel or petitboot environment, if you comment out the associated lines in skiboot.tcl, you should be able to run the simulator and see the skiboot console message come up that says we couldn’t load a kernel.

At this point, congratulations, you can now become an OpenPower firmware hacker without even possessing any POWER8 hardware!

skiboot/OPAL versioning

skiboot is boot and runtime firmware for OpenPower systems. There are other components that make up all the firmware you need, but if you’re, say, a Linux kernel, you’re going to be interacting with skiboot.

I recently committed doc/versioning.txt to skiboot to try and explain our current thoughts on versioning releases.

It turns out that picking version numbers is a bit harder than you’d expect, especially when you want to construct a version string to display in places that has semantic meaning. In fact, the writing on Semantic Versioning influenced us heavily.

Since we’re firmware, making incompatible API changes is something we should basically never, ever do. Old kernels should must boot and work on new firmware and new kernels should boot and function on old firmware (and if they don’t, it plainly be a kernel bug). So, ignore the Major version parts of Semantic Versioning for us :)

For each new release, we plan to bump the minor version for mostly bug fix releases, while bump the major version for added functionality. Any additional information is to describe the version on that particular platform – as everybody shipping OPAL is likely to build it themselves with possibly some customizations (e.g. YOUR COMPANY NAME HERE, support for some on board RAID card or on-board automated coffee maker). See doc/versioning.txt for details.

You may wonder why we started at 4.0 for our first real version number. Well… this is purely a cunning plan to avoid confusion with other things, the details of which will only be extracted out of my when plied with a suitable amount of excellent craft beer (because if I’m going to tell a boring story, I may as well have awesome craft beer).