Fast Reset, Trusted Boot and the security of /sbin/reboot

In OpenPOWER land, we’ve been doing some work on Secure and Trusted Boot while at the same time doing some work on what we call fast-reset (or fast-reboot, depending on exactly what mood someone was in at any particular time…. we should start being a bit more consistent).

The basic idea for fast-reset is that when the OS calls OPAL reboot, we gather all the threads in the system using a combination of patching the reset vector and soft-resetting them, then cleanup a few bits of hardware (we do re-probe PCIe for example), and reload & restart the bootloader (petitboot).

What this means is that typing “reboot” on the command line goes from a ~90-120+ second affair (through firmware to petitboot, linux distros still take ages to shut themselves down) down to about a 20 second affair (to petitboot).

If you’re running a (very) recent skiboot, you can enable it with a special hidden NVRAM configuration option (although we’ll likely enable it by default pretty soon, it’s proving remarkably solid). If you want to know what that NVRAM option is… Use the source, Luke! (or git history, but I’ve yet to see a neat Star Wars reference referring to git commit logs).

So, there’s nothing like a demo. Here’s a demo with Ubuntu running off an NVMe drive on an IBM S822LC for HPC (otherwise known as Minsky or Garrison) which was running the HTX hardware exerciser, through fast-reboot back into Petitboot and then booting into Ubuntu and auto-starting the exerciser (HTX) again.

Apart from being stupidly quick when compared to a full IPL (Initial Program Load – i.e. boot), since we’re not rebooting out of band, we have no way to reset the TPM, so if you’re measuring boot, each subsequent fast-reset will result in a different set of measurements.

This may be slightly confusing, but it’s not really a problem. You see, if a machine is compromised, there’s nothing stopping me replacing /sbin/reboot with something that just prints things to the console that look like your machine rebooted but in fact left my rootkit running. Indeed, fast-reset and a full IPL should measure different values in the TPM.

It also means that if you ever want to re-establish trust in your OS, never do a reboot from the host – always reboot out of band (e.g. from a BMC). This, of course, means you’re trusting your BMC to not be compromised, which I wouldn’t necessarily do if you suspect your host has been.

Windows NT4 for PowerPC guest on OPAL on POWER8 in qemu

Sometimes, programming is just for fun. This is what PREPHV is for Andrei Warkentin. To quote the README:

“This is mostly a huge ugly hack, derived from my
ppc64le_hello code. The running philosophy here is
to throw things together late at night with my family
asleep and see how far I get without a real design
or without a real desire to implement boring things
like IDE (*sigh*) emulation”

Since my day job is maintaining the firmware that it runs on, I decided to have a go (it also ties in with the retro stuff I’ve been blogging about). So…

screenshot-from-2016-10-30-17-22-20and I’m off! (yes, this is the very latest qemu and skiboot):

screenshot-from-2016-10-30-17-23-32screenshot-from-2016-10-30-17-23-48Yes, prephv does clear all thirty two megabytes of guest memory

screenshot-from-2016-10-30-17-24-15A quick diversion, if you try Windows NT 3.51 for PowerPC, you get this:

screenshot-from-2016-10-30-18-17-35

But on NT4, you continue unharmed:

screenshot-from-2016-10-30-17-22-32A sign I needed to hack my filesystem of bits of NT installer bits a bit more:

screenshot-from-2016-10-30-17-22-45But, on my next try:

screenshot-from-2016-10-30-17-25-26Well… looks like there’s an instruction that needs to be emulated (and there’s no code to currently do that). Mind you… this is decently far into booting before we hit anything fatal, which is a pretty impressive effort – and it is tempting to continue and see if it’ll run on real hardware and if it could be made to work well enough to not find any disks :)

Failed Retro emulation attempts

For reasons that should escape everybody, I went back and looked at some old Operating Systems a little while ago: OS/2 Warp, Windows 3.11 and Microsoft Chicago. So, I went on a little adventure this weekend, largely in failure though.

Windows NT 3.51

This was the first version (err… no, second I think) of Windows NT that I ever used.

Lesson 1: qemu doesn’t expose a SCSI adapter that isn’t virtio-scsi (and I have a feeling there aren’t Windows NT 3.51 installer driver floppies for virtio-scsi)

screenshot_winnt3-1_2016-10-29_191707Lesson 2: OMG I’m so glad I don’t have to wait for things to be read off floppy disks anymore:

screenshot_winnt3-51_2016-10-29_193833Lesson 3: I’d forgotten that the Windows directory on NT 3.51 was different to every other Windows NT version, being \WINNT35

screenshot_winnt3-51_2016-10-29_194040screenshot_winnt3-51_2016-10-29_194139Lesson 3: Yeah, sometimes there’s just fail.

screenshot_winnt3-51_2016-10-29_194147

Windows NT 4.0

This brought the UI of Windows 95 to Windows NT. It was a thing. It required a fairly beefy PC for the day, but it could use two CPUs if you were that amazingly rich (dual Pentium Pro was a thing)

Lesson 1: Windows NT 4 does not like 8GB disks. My idea of “creating a small disk for a VM for an old OS as it probably won’t work well with a 20GB disk” needs to be adjusted. I’m writing this on a system with 8 times more RAM than what I ended up using for a disk for Windows NT 4.

screenshot_winnt4-0_2016-10-29_192840But hey, back to \WINNT rather than \WINNT35 or \WINDOWS

screenshot_winnt4-0_2016-10-29_192937

Lesson 2: Sometimes, full system emulation turns out to be a better idea:

screenshot_winnt4-0_2016-10-29_192418screenshot_winnt4-0_2016-10-29_192304Lesson 3: Remember when Windows couldn’t actually format NTFS in the installer and it installed to FAT and then converted to NTFS? No? Well, aren’t you lucky.

screenshot_winnt4-0_2016-10-29_193102screenshot_winnt4-0_2016-10-29_193235Apple Rhapsody DR2

Before there was MacOS X, there was a project called Rhapsody. This was to take NeXTStep (from NeXT, which Apple bought to get both NeXTStep and Steve Jobs as every internal “let’s replace the aging MacOS” project had utterly failed for the past ten years). Rhapsody was not going to be backwards compatible until everybody said that was a terrible idea and the Blue Box was added (known as Classic) – basically, a para-virtualized VM running the old MacOS 9.

Anyway, for the first two developer releases, it was also available on x86 (not just PowerPC). This was probably because a PowerPC port to Macs was a lot newer than the x86 port.

So, I dusted off the (virtual) Boot and Driver floppies and fired up qemu…..

screenshot_rhapsodydr2_2016-10-29_172732Yeah, MacOS X got a better installer…

screenshot_rhapsodydr2_2016-10-29_173227Hopeful!

screenshot_rhapsodydr2_2016-10-29_182218This was after I decided that using KVM was a bad Idea:

screenshot_rhapsodydr2_2016-10-29_172143

screenshot_rhapsodydr2_2016-10-29_173333Nope… and this is where we stop. There seems to be some issue with ATA drivers? I honestly can’t be bothered to debug it (although… for the PowerPC version… maybe).

MacOS 9.2

Well.. this goes a lot better now thanks to a whole bunch of patches hitting upstream Qemu recently (thanks Ben!)

screenshot-from-2016-10-29-20-37-24screenshot-from-2016-10-29-20-42-10Yeah, I was kind of tempted to set up Outlook Express to read my email…. But running MacOS 9 was way too successful, so I had to stop there :)

Workaround for opal-prd using 100% CPU

opal-prd is the Processor RunTime Diagnostics daemon, the userspace process that on OpenPower systems is responsible for some of the runtime diagnostics. Although a userspace process, it memory maps (as in mmap) in some code loaded by early firmware (Hostboot) called the HostBoot RunTime (HBRT) and runs it, using calls to the kernel to accomplish any needed operations (e.g. reading/writing registers inside the chip). Running this in user space gives us benefits such as being able to attach gdb, recover from segfaults etc.

The reason this code is shipped as part of firmware rather than as an OS package is that it is very system specific, and it would be a giant pain to update a package in every Linux distribution every time a new chip or machine was introduced.

Anyway, there’s a bug in the HBRT code that means if there’s an ECC error in the HBEL (HostBoot Error Log) partition in the system flash (“bios” or “pnor”… the flash where your system firmware lives), the opal-prd process may get stuck chewing up 100% CPU and not doing anything useful. There’s https://github.com/open-power/hostboot/issues/67 for this.

You will notice a problem if the opal-prd process is using 100% CPU and the last log messages are something like:

HBRT: ERRL:>>ErrlManager::ErrlManager constructor.
HBRT: ERRL:iv_hiddenErrorLogsEnable = 0x0
HBRT: ERRL:>>setupPnorInfo
HBRT: PNOR:>>RtPnor::getSectionInfo
HBRT: PNOR:>>RtPnor::readFromDevice: i_offset=0x0, i_procId=0 sec=11 size=0x20000 ecc=1
HBRT: PNOR:RtPnor::readFromDevice: removing ECC...
HBRT: PNOR:RtPnor::readFromDevice> Uncorrectable ECC error : chip=0,offset=0x0

(the parameters to readFromDevice may differ)

Luckily, there’s a simple workaround to fix it all up! You will need the pflash utility. Primarily, pflash is meant only for developers and those who know what they’re doing. You can turn your computer into a brick using it.

pflash is packaged in Ubuntu 16.10 and RHEL 7.3, but you can otherwise build it from source easily enough:

git clone https://github.com/open-power/skiboot.git
cd skiboot/external/pflash
make

Now that you have pflash, you just need to erase the HBEL partition and write (ECC) zeros:

dd if=/dev/zero of=/tmp/hbel bs=1 count=147456
pflash -P HBEL -e
pflash -P HBEL -p /tmp/hbel

Note: you cannot just erase the partition or use the pflash option to do an ECC erase, you may render your system unbootable if you get it wrong.

After that, restart opal-prd however your distro handles restarting daemons (e.g. systemctl restart opal-prd.service) and all should be well.