Tracing flash reads (and writes) during boot

On OpenPOWER POWER9 systems, we typically talk to the flash chips that hold firmware for the host (i.e. the POWER9) processor through a daemon running on the BMC (aka service processor) rather than directly.

We have host firmware map “windows” on the LPC bus to parts of the flash chip. This flash chip can in fact be a virtual one, constructed dynamically from files on the BMC.

Since we’re mapping windows into this flash address space, we have some knowledge as to what IO the host is doing to/from the pnor. We can use this to output data in the blktrace format and feed into existing tools used to analyze IO patterns.

So, with a bit of learning of the data format and learning how to drive the various tools, I was ready to patch the BMC daemon (mboxbridge) to get some data out.

An initial bit of data is a graph of the windows into PNOR opened up during an normal boot (see below).

PNOR windows created over the course of a normal boot.

This shows us that over the course of the boot, we open a bunch of windows, and switch them around a fair bit early on. This makes sense as early in boot we do not yet have DRAM working and page in firmware on-demand into L3 cache.

Later in boot, you can see the loading of larger chunks of firmware into memory. It’s also possible to see that this seems to take longer than it should – and indeed, we have a bug there.

Next, by modifying the code again, I introduced recording of when we used a window that the BMC had already cached. While the host will only see one window at a time, the BMC can keep around the ones it prepared earlier in order to avoid IO to the actual flash chips (which are SPI flash, so aren’t incredibly fast).

Here we can see that we’re likely not doing the most efficient things during boot, and there’s probably room for some optimization.

Normal boot but including re-used Windows rather than just created ones

Finally, in order to get finer grained information, I reduced the window size from one megabyte down to 4096 bytes. This will impose a heavy speed penalty as it’ll mean we will have to create a lot more windows to do the same amount of IO, but it means that since we’re using the page size of hostboot, we’ll see each individual page in/out operation that it does during boot.

So, from the next graph, we can see that there’s several “hot” areas of the image, and on the whole it’s not too many pages. This gives us a hint that a bit of effort to reduce binary image size a little bit could greatly reduce the amount of IO we have to do.

4096 byte (i.e. page) size window, capturing the bits of flash we need to read in several times due to being low on memory when we’re L3 cache constrained.

The iowatcher tool also can construct a video of the boot and what “blocks” are being read.

Video of what blocks are read from flash during booting

So, what do we get from this adventure? Well, we get a good list of things to look into in order to improve boot performance, and we get to back these up with data rather than guesswork. Since this also works on unmodified host firmware, we’re measuring what we really boot rather than changing it in order to measure it.

What you need to reproduce this:

Switching to iPhone: Part 1

I have used Android phones since the first one: the G1. I’m one of the (relatively) few people who has used Android 1.0. I’ve had numerous Android phones since then, mostly the Google flagship.

I have fond memories of the Nexus One and Galaxy Nexus, as well as a bunch of time running Cyanogen (often daily builds, because YOLO) to get more privacy preserving features (or a more recent Android). I had a Sony Z1 Compact for a while which was great bang for buck except for the fact the screen broke whenever you looked at it sideways. Great kudos to the Sony team for being so friendly to custom firmware loads.

I buy my hardware from physical stores. Why? Well, it means that the NSA and others get to spend extra effort to insert hardware modifications (backdoors), as well as the benefit of having a place to go to/set the ACCC on to get my rights under Australian Consumer Law.

My phone before last was a Nexus 5X. There were a lot of good things about this phone; the promise of fast charging via USB-C was one, as was the ever improving performance of the hardware and Android itself. Well… it just got progressively slower, and slower, and slower – as if it was designed to get near unusable by the time of the next Google phone announcement.

Inevitably, my 5X succumbed to the manufacturing defect that resulted in a boot loop. It would start booting, and then spontaneously reboot, in a loop, forever. The remedy? Replace it under warranty! That would take weeks, which isn’t a suitable timeframe in this day and age to be without a phone, so I mulled over buying a Google Pixel or my first ever iPhone (my iPhone owning friends assured me that if such a thing happens with an iPhone that Apple would have swapped it on the spot). Not wanting to give up a lot of the personal freedom that comes with the Android world, I spent the $100 more to get the Pixel, acutely aware that having a phone was now a near $1000/year habit.

The Google Pixel was a fantastic phone (except the price, they should have matched the iPhone price). The camera was the first phone camera I actually went “wow, I’m impressed” over. The eye-watering $279 to replace a cracked screen, the still eye-watering cost of USB-C cables, and the seat to process the HDR photos were all forgiven. It was a good phone. Until, that is, less than a year in, the battery was completely shot. It would power off when less than 40% and couldn’t last the trip from Melbourne airport to Melbourne city.

So, with a flagship phone well within the “reasonable quality” time that consumer law would dictate, I contacted Google after going through all the standard troubleshooting. Google agreed this was not normal and that the phone was defective. I was told that they would mail me a replacement, I could transfer my stuff over and then mail in the broken one. FANTASTIC!! This was soooo much better than the experience with the 5X.

Except that it wasn’t. A week later, I rang back to ask what was going on as I hadn’t received the replacement; it turns out Google had lied to me, I’d have to mail the phone to them and then another ten business days later I’d have a replacement. Errr…. no, I’ve been here before.

I rang the retailer, JB Hi-Fi; they said it would take them at least three weeks, which I told them was not acceptable nor a “reasonable timeframe” as dictated by consumer law.

So, with a bunch of travel imminent, I bought a big external USB-C battery and kept it constantly connected as without it the battery percentage went down faster than the minutes ticked over. I could sort it out once I was back from travel.

So, I’m back. In fact, I drove back from a weekend away and finally bit the bullet – I went to pick up a phone who’s manufacturer has a reputation of supporting their hardware.

I picked up an iPhone.

I figured I should write up how, why, my reasons, and experiences in switching phone platforms. I think my next post will be “Why iPhone and not a different Android”.

pwnm-sync: Synchronizing Patchwork and Notmuch

One of the core bits of infrastructure I use as a maintainer is Patchwork (I wrote about making it faster recently). Patchwork tracks patches sent to a mailing list, allowing me as a maintainer to track the state of them (New|Under Review|Changes Requested|Accepted etc), combine them into patch bundles, look at specific series, test results etc.

One of the core bits of software I use is my email client, notmuch. Most other mail clients are laughably slow and clunky, or just plain annoying for absorbing a torrent of mail and being able to deal with it or just plain ignore it but have it searchable locally.

You may think your mail client is better than notmuch, but you’re wrong.

A key feature of notmuch is tagging email. It doesn’t do the traditional “folders” but instead does tags (if you’ve used gmail, you’d be somewhat familiar).

One of my key work flows as a maintainer is looking at what patches are outstanding for a project, and then reviewing them. This should also be a core part of any contributor to a project too. You may think that a tag:unread and to:project-list@foo query would be enough, but that doesn’t correspond with what’s in patchwork.

So, I decided to make a tool that would add tags to messages in notmuch corresponding with the state of the patch in patchwork. This way, I could easily search for “patches marked as New in patchwork” (or Under Review or whatever) and see what I should be reviewing and looking at merging or commenting on.

Logically, this wouldn’t be that hard, just use the (new) Patchwork REST API to get the state of everything and issue the appropriate notmuch commands.

But just going one way isn’t that interesting, I wanted to be able to change the tags in notmuch and have them sync back up to Patchwork. So, I made that part of the tool too.

Introducing pwnm-sync: a tool to sync patchwork and notmuch.

notmuch-hello tag counts for pwnm-sync tagged
patches in patchwork
notmuch-hello tag counts for pwnm-sync tagged
patches in patchwork

With this tool I can easily see the patchwork state of any patch that I have in my notmuch database. For projects that I’m a maintainer on (i.e. can change the state of patches), If I update the patches of that email and run pwnm-sync again, it’ll update the state in patchwork.

I’ve been using this for a few weeks myself and it’s made my maintainer workflow significantly nicer.

It may also be useful to people who want to find what patches need some review.

The sync time is mostly dependent on how fast your patchwork instance is for API requests. Unfortunately, we need to make some improvements on the Patchwork side of things here, but a full sync of the above takes about 4 minutes for me. You can also add a –epoch option (with a date/time) to say “only fetch things from patchwork since that date” which makes things a lot quicker for incremental syncs. For me, I typically run it with an epoch of a couple of months ago, and that takes ~20-30 seconds to run. In this case, if you’ve locally updated a old patch, it will still sync that change up to patchwork.

Implementation Details

It’s a python3 script using the notmuch python bindings, the requests-futures module for asynchronous HTTP requests (so we can have the patchwork server assemble the next page of results while we process the previous one), and a local sqlite3 database to store state in so we can work out what changed locally / server side.

Getting it

Head to https://github.com/stewart-ibm/pwnm-sync or just:

git clone https://github.com/stewart-ibm/pwnm-sync.git

Optimizing database access in Django: A patchwork story

tl;dr: I made Patchwork a lot faster by looking at what database queries were being generated and optimizing them either by making Django produce better queries or by adding better indexes.

Introduction to Patchwork

One of the key bits of infrastructure a bunch of maintainers of Open Source Software use is a tool called Patchwork. We use it for a bunch of OpenPOWER firmware development, several Linux subsystems use it as well as freedesktop.org.

The purpose of Patchwork is to supplement the patches-to-a-mailing-list development work flow. It allows a maintainer to see all the patches that have been posted on the list, How many Acked-by/Reviewed-by/Tested-by replies they have, delegate responsibility for the patch to a co-maintainer, track (and change) the state of the patch (e.g. to “Under Review”, “Changes Requested”, or “Accepted”), and create bundles of patches to help in review and testing.

Since patchwork is an open source project itself, there’s several instances of it out there in common use. One of the main instances is https://patchwork.ozlabs.org/ which is (funnily enough) used by a bunch of people connected to OzLabs for projects that are somewhat connected to OzLabs. e.g. the linuxppc-dev project and the skiboot and petitboot projects. There’s also a kernel.org instance, which is used by some kernel subsystems.

Recent versions of Patchwork have added some pretty cool features such as the ability to integrate with CI systems such as Snowpatch which helps maintainers see if patches submitted are likely to break things.

Unfortunately, there’s also been some complaints that recent version of patchwork have gotten slower than previous ones. This may well be the case, or it could just be that the volume of patches is much higher and there’s load on the database. Anyway, after asking a few questions about what the size and scope was of the patchwork database on ozlabs.org, I went “hrm… this sounds like it shouldn’t really be a problem… perhaps I should look into this”.

Attacking the problem…

Every so often it is revealed that I know a little bit about databases.

Getting a development environment up for Patchwork is amazingly easy thanks to Docker and the great work of the Patchwork maintainers. The only thing you need to load in is an example dataset. I started by importing mail from a few mailing lists I’m subscribed to, which was Good Enough(TM) for an initial look.

Due to how Django forces us to design a database schema though, the suggested method of getting a sample data set will not mirror what occurs in a production system with multiple lists. It’s for this reason that I ended up using a copy of a live dataset for much of my work rather than constructing an artificial one.

Patchwork supports both a MySQL and PostgreSQL database backend. Since the one on ozlabs.org is backed by PostgreSQL, I ended up loading a large dataset into PostgreSQL for most of my work, although I also did some testing with MySQL.

The current patchwork.ozlabs.org instance has a database of around 13GB in side, with about a million patches. You may think this is big, my database brain goes “no, this is actually quite small and everything should be a lot faster than it is even on quite limited hardware”

The problem with ORMs

It turns out that Patchwork is written in Django, an ORM (Object-Relational Mapping) framework in Python – and thus something that pretty effectively obfuscates application code from the SQL being run.

There is one thing that Django misses that could be a pretty big general performance boost to many applications: it doesn’t support composite primary keys. For some databases (e.g. MySQL’s InnoDB engine) the PRIMARY KEY is a clustered index – that is, the physical layout of the rows on disk reflect primary key order. You can use this feature to your advantage and have much higher cache hits of your database pages.

Unfortunately though, we cannot do that with Django, so we lose a bunch of possible performance because of it (especially for queries that are going to have to bring in data from disk). In fact, we’re forced to use an ID field that’ll scatter our rows all over the place rather than do something efficient. You can somewhat get back some of the performance by creating covering indexes, but this costs in terms of index maintenance and disk space.

It should be noted that PostgreSQL doesn’t have a similar concept, although there is a (locking) CLUSTER statement that can (as an offline operation for the table) re-arrange existing rows to be in index order. In my testing, this can give a bit of a boost to performance of some of the Patchwork queries.

With MySQL, you’d look at a bunch of statistics on what pages are being brought in and paged out of the InnoDB buffer pool. With PostgreSQL it’s a bit more complex as it relies heavily on the OS page cache.

My main experience is with MySQL like environment, so I’ve had to re-learn a bunch of PostgreSQL things in this work which was kind of fun. It may be “because of my upbringing” but it seems as if there’s a lot more resources and documentation out in the wild about optimizing MySQL environments than PostgreSQL ones, especially when it comes to documentation around a bunch of things inside the database server. A lot of credit should go to the MySQL Documentation team – I wish the PostgreSQL documentation was up to the same standard.

Another issue is that fetching BLOBs is generally an expensive operation that you want to avoid unless you’re going to use them. Thus, fetching the whole “object” at once isn’t always optimal. The Django query generation appears to be somewhat buggy when it comes to “hey, don’t fetch these columns, I don’t need them”, so you do have to watch what query is produced not just what query you expect to be produced. For example, [01/11] Improve patch listing performance (~3x).

Another issue with Django is how you go from your Python code to an actual SQL query, especially when the produced SQL query is needlessly complex or inefficient. I’ve seen Django always produce an ORDER BY for one table, even when not needed, I’ve also seen it always join tables even when you’re getting no columns from one of them and there’s no way you’re asking for it. In fact, I had to revert to raw SQL for one of my performance improvements as I just couldn’t beat it into submission: [10/11] Be sensible computing project patch counts.

An ORM can be great for getting apps out quickly, or programming in a familiar way. But like many things, an understanding of what is going on underneath is key for extracting maximum performance.

Also, if you ever hear something like “ORM $x doesn’t scale” then maybe that person just hasn’t looked at how to use the ORM better. The same goes for if they say “database $y doesn’t scale”- especially if it’s a long existing relational database such as MySQL or PostgreSQL.

Speeding up listing current patches for a project

17 SQL queries in 4477ms
More than 4 seconds in the database
does not make page load time great.

Fortunately though, the Django development environment lets you really easily dive into what queries are being generated and (at least roughly) where they’re being generated from. There’s a sidebar in your browser that shows how many SQL queries were needed to generate the page and how long they took. The key to making your application go faster is to run fewer queries in less time.

I was incredibly impressed with how easy it was to see what queries were run, where they were run from, and the EXPLAIN output for them.

By clicking on that SQL button on the right side of your browser, you get this wonderful chart of what queries were executed, when, and how long they took. From this, it is incredibly obvious which query is the most problematic: the one that took more than four seconds!

In the dim dark days of web development, you’d have to turn on a Slow Query Log on the database server and then grep through your source code or some other miserable activity. I am so glad I didn’t have to do that.

More than four seconds for a single database query does not make for a nice UX.

This particular query was a real hairy one, the EXPLAIN output from PostgreSQL (and MySQL) was certainly long and involved and would most certainly not feature in the first half of an “Introduction to query optimization” day long workshop. If you haven’t brushed up on various bits of documentation on understanding EXPLAIN, you should! The MySQL EXPLAIN FORMAT=JSON is especially fantastic for getting deep details as to what’s going on with query execution.

The big performance gain here was to have the database be able to execute the query in a much more efficient way by creating two covering indexes for part of the query. To work out what indexes to create, one has to look at the EXPLAIN output and work out why the database is choosing to do either a sequential scan of a large table, or use an index that doesn’t exclude that many rows. In this case, I tweaked the code to slightly change the query that was generated as well as adding a covering index. What we ended up with is something that is dramatically faster.

The main query is ~350x faster than before

You’ll notice that it appears that the first query there takes a lot more time but it doesn’t, it just takes a lot more time relative to the main query.

In fact, this particular page is one that people have mentioned at being really, really slow to load. With the main query now about 350 times faster than it was originally, it shouldn’t be a problem anymore.

A future improvement would be to cache the COUNT() for the common case, as it’s pretty easily computed when new patches come in or states change.

The patches that help this particular page have been submitted upstream here:

Making viewing a patch with comments faster

Now that we can list patches faster, can we make other pages that Patchwork has quicker?

One such page is viewing a patch (or cover letter) that has a lot of comments on it. One feature of Patchwork is that it will display all the email replies to a patch or cover letter in the Web UI. But… this seemed slow

On one of the most commented patches I could find, we ended up executing one hundred and seventy seven SQL queries to view it! If we dove into it, a bunch of the queries looked really really similar…

I’ve got 99 queries where I only need 1.

The problem here is that the Patchwork UI is wanting to find out the name of each person who submitted a comment, and is doing that by querying the ID from a table. What it should be doing instead is a SQL JOIN on the original query and just fetching all that information in one go: make the database server do the work, it’s really good at it.

My patch [02/11] 4x performance improvement for viewing patch with many comments   does just that by using the select_related() method correctly, as well as being explicit about what information we want to retrieve.

We’re now only a few milliseconds to grab all the comments

With that patch, we’re down to a constant number of queries and around a 3x-7x faster time executing them depending if we have a warm cache or not.

The one time I had to use raw SQL

When viewing a project page (such as https://patchwork.ozlabs.org/project/qemu-devel/ ) it displays the number of patches (archived and not archived) for the project. By looking at what SQL queries are executed to collect these numbers, you’ll notice two things. First, here are the queries:

COUNT() queries can be expensive

First thing you’ll notice is that they took a loooooong time to execute (more than a second each). The second thing, if you look closer, is that they contain a join which is completely unneeded.

I spent a good long while trying to make Django behave, and I just could not. I believe it’s due to the model having some inheritance in it. Writing the query by hand ended up being the best solution, and it gave a significant performance improvement:

Unfortunately, only 4x faster.

Arguably, a better way would be to precompute the count for the archived/non-archived patches and just display them. I (or someone else who knows more about Django) may want to look at that for a future improvement.

Conclusion and final thoughts

There’s a few more places where there could be some optimizations, but currently I cannot get any single page to take more than between 40-400ms in the database when running on my laptop – and that’s Good Enough(TM) for now.

The next steps are getting these patches through a round or two of review, and then getting them into a Patchwork release and deployed out on patchwork.ozlabs.org and see if people can find any new ways to make things slow.

If you’re interested, the full patchset with cover letter is here: [00/11] Performance for ALL THE THINGS!

The diffstat is interesting, as most of the added code is auto-generated by Django for database migrations (adding of indexes).

 .../migrations/0027_add_comment_date_index.py | 23 +++++++++++++++++
 .../0028_add_list_covering_index.py           | 19 ++++++++++++++
 .../0029_add_submission_covering_index.py     | 19 ++++++++++++++
 patchwork/models.py                           | 21 ++++++++++++++--
 patchwork/templates/patchwork/submission.html | 16 ++++++------
 patchwork/views/__init__.py                   |  8 +++++-
 patchwork/views/cover.py                      |  5 ++++
 patchwork/views/patch.py                      |  7 ++++++
 patchwork/views/project.py                    | 25 ++++++++++++++++---
 9 files changed, 128 insertions(+), 15 deletions(-)
 create mode 100644 patchwork/migrations/0027_add_comment_date_index.py
 create mode 100644 patchwork/migrations/0028_add_list_covering_index.py
 create mode 100644 patchwork/migrations/0029_add_submission_covering_index.py

I think the lesson is that making dramatic improvements to performance of your Django based app does not mean you have to write a lot of code or revert to raw SQL or abandon your ORM. In fact, use it properly and you can get a looong way. It’s just that to use it properly, you’re going to have to understand the layer below the ORM, and not just treat the database as a magic black box.

ccache and op-build

You may have heard of ccache (Compiler Cache) which saves you heaps of real world time when rebuilding a source tree that is rather similar to one you’ve recently built before. It’s really useful in buildroot based projects where you’re building similar trees, or have done a minor bump of some components.

In trying to find a commit which introduced a bug in op-build (OpenPOWER firmware), I noticed that hostboot wasn’t being built using ccache and we were always doing a full build. So, I started digging into it.

It turns out that a bunch of the perl scripts for parsing the Machine Readable Workbook XML in hostboot did a bunch of things like foreach $key (%hash) – which means that the code iterates over the items in hash order rather than an order that would produce predictable output such as “attribute name” or something. So… much messing with that later, I had hostboot generating the same output for the same input on every build.

Next step was to work out why I was still getting a lot of CCACHE misses. It turns out the default ccache size is 5GB. A full hostboot build uses around 7.1GB of that.

So, if building op-build with CCACHE, be sure to set both BR2_CCACHE=y in your config as well as something like BR2_CCACHE_INITIAL_SETUP="--max-size 20G"

Hopefully my patches hit hostboot and op-build soon.

A (simplified) view of OpenPOWER Firmware Development

I’ve been working on trying to better document the whole flow of code that goes into a build of firmware for an OpenPOWER machine. This is partially to help those not familiar with it get a better grasp of the sheer scale of what goes into that 32/64MB of flash.

I also wanted to convey the components that we heavily re-used from other Open Source projects, what parts are still “IBM internal” (as they relate to the open source workflow) and which bits are primarily contributed to by IBMers (at least at this point in time).

As such, let’s start with the legend of the diagram:

Now, the diagram:

Simplified development flow for OpenPOWER firmware

The end thing that a user with a machine will download and apply (or that comes shipped with a box) is the purple “Installable Firmware Release” nodes (bottom center). In this diagram, there are 4 of them. One for POWER9 systems such as the just-announced AC922 system (this is the “OP910 Release” node, which is the witherspoon_defconfig in the op-build tree); one for the p9dsu platform (p9dsu_defconfig in op-build) and one is for IBM FSP based systems such as the S812L and S822L systems (or S812/S822 in OPAL mode).

There are more platforms out there, but this diagram is meant to be simplified. The key difference with the p9dsu platform is that this is produced by somebody other than IBM.

All of these releases are based off the upstream op-build project, op-build is the light blue box in the center of the diagram. We do regular X.Y releases and sometimes do X.Y.Z releases. It’s primarily a pull request based workflow currently, so everything goes via a pull request. The op-build project brings together all the POWER specific firmware components (pretty much everything in every other light blue/blue box) along with a Linux kernel and buildroot.

The kernel and buildroot are the two big yellow boxes on the top right. Buildroot brings together a lot of open source components that are in our firmware image (including some power specific ones that we get through upstream buildroot).

For Linux, this is a pretty simplified view of the process, but we primarily ship the stable tree (with maybe up to half a dozen patches).

The skiboot and petitboot components both use a mailing list based workflow (similar to kernel) as well as X.Y and X.Y.Z releases (again, similar to the linux kernel).

On the far left of the diagram, we have Hostboot, SBE and OCC. These are three firmware components that come from the traditional IBM POWER Firmware group, and are shared with the IBM non-OpenPOWER POWER systems (“traditional” POWER). These components have part of their code from from an (internal) repository called “ekb” which also goes into a (very) low level debug tool and the FSP based systems. There’s also an (internal) gerrit instance that’s the primary place where code review/development discussions are for these components.

In future posts, I’ll probably delve into more specifics of the current development process, and how we may try and change things for the better.

How I do email (at work)

Recently, I blogged on my home email setup and in that post, I hinted that my work setup was rather different. I have entirely separate computing devices for work and personal, a setup I strongly recommend. This also lets me “go home” from work even when working from home, I use a different physical machine!

Since I work for IBM I have (at least) two email accounts for work: a Lotus Notes one and a internet standards compliant one. It’s “easy” enough to get the Notes one to forward to the standards compliant one, from which I can just use fetchmail or similar to pull down mail.

I run mail through a rather simple procmail script: it de-mangles some URL mangling that can happen in the current IBM email infrastructure, runs things through SpamAssassin and deliver to a date based Maildir (or one giant pile for spam).

My ~/.procmailrc looks something like this:

LOGFILE=$HOME/mail_log
LOGABSTRACT=yes

DATE=`date +"%Y%m"`
MAILDIR=Maildir/INBOX
DEFAULT=$DATE/

:0fw
| magic_script_to_unmangle_things

:0fw
| spamc

:0
* ^X-Spam-Status: Yes
$HOME/Maildir/junkmail/incoming/

I use tail -f mail_log as a really dumb kind of biff replacement.

Now, what do I read and write mail with? Notmuch! It is the only thing that even comes close to being able to deal with a decent flow of mail. I have a couple of saved searches just to track how much mail I pull in a day/week. Today (on Monday), it says 442 today and 10,403 over the past week.

For the most part, my workflow is kind of INBOX-ZERO like, except that I currently view victory as INBOX 2000. Most mail does go into my INBOX, the notable exceptions are two main mailing lists I’m subscribed to mostly as FYI and to search/find things when needed. Those are the Linux Kernel Mailing List (LKML) and the buildroot mailing list. Why notmuch rather than just searching the web for mailing list archives? Notmuch can return the result of a query in less time it takes light to get to and from the United States in ideal conditions.

For work, I don’t sync my mail anywhere. It’s just on my laptop. Not having it on my phone is a feature. I have a notmuch post-new hook that does some initial tagging of mail, and as such I have this in my ~/.notmuch-config:

[new]
tags=new;

My post-new hook looks like this:

#!/bin/bash

# immediately archive all messages from "me"
notmuch tag -new -- tag:new and from:stewart@linux.vnet.ibm.com

# tag all message from lists
notmuch tag +devicetree +list -- tag:new and to:devicetree@vger.kernel.org
notmuch tag +inbox +unread -new -- tag:new and tag:devicetree and to:stewart@linux.vnet.ibm.com
notmuch tag +listinbox +unread +list -new -- tag:new and tag:devicetree and not to:stewart@linux.vnet.ibm.com

notmuch tag +linuxppc +list -- tag:new and to:linuxppc-dev@lists.ozlabs.org
notmuch tag +linuxppc +list -- tag:new and cc:linuxppc-dev@lists.ozlabs.org
notmuch tag +inbox +unread -new -- tag:new and tag:linuxppc
notmuch tag +openbmc +list -- tag:new and to:openbmc@lists.ozlabs.org
notmuch tag +inbox +unread -new -- tag:new and tag:openbmc

notmuch tag +lkml +list -- tag:new and to:linux-kernel@vger.kernel.org
notmuch tag +inbox +unread -new -- tag:new and tag:lkml and to:stewart@linux.vnet.ibm.com
notmuch tag +listinbox +unread -new -- tag:new and tag:lkml and not tag:linuxppc and not to:stewart@linux.vnet.ibm.com

notmuch tag +qemuppc +list -- tag:new and to:qemu-ppc@nongnu.org
notmuch tag +inbox +unread -new -- tag:new and tag:qemuppc and to:stewart@linux.vnet.ibm.com
notmuch tag +listinbox +unread -new -- tag:new and tag:qemuppc and not to:stewart@linux.vnet.ibm.com

notmuch tag +qemu +list -- tag:new and to:qemu-devel@nongnu.org
notmuch tag +inbox +unread -new -- tag:new and tag:qemu and to:stewart@linux.vnet.ibm.com
notmuch tag +listinbox +unread -new -- tag:new and tag:qemu and not to:stewart@linux.vnet.ibm.com

notmuch tag +buildroot +list -- tag:new and to:buildroot@buildroot.org
notmuch tag +buildroot +list -- tag:new and to:buildroot@busybox.net
notmuch tag +buildroot +list -- tag:newa nd to:buildroot@uclibc.org
notmuch tag +inbox +unread -new -- tag:new and tag:buildroot and to:stewart@linux.vnet.ibm.com
notmuch tag +listinbox +unread -new -- tag:new and tag:buildroot and not to:stewart@linux.vnet.ibm.com

notmuch tag +ibmbugzilla -- tag:new and from:bugzilla@us.ibm.com

# finally, retag all "new" messages "inbox" and "unread"
notmuch tag +inbox +unread -new -- tag:new

This leaves me with both an inbox and a listinbox. I do not look at the overwhelming majority of mail that hits the listinbox – It’s mostly for following up on individual things. If I started to need to care more about specific topics, I’d probably add something in there for them so I could easily find them.

My notmuch emacs setup has a bunch of saved searches, so my notmuch-hello screen looks something like this:

This gets me a bit of a state-of-the-world-of-email-to-look-at view for the day. I’ll often have meetings first thing in the morning that may reference email I haven’t looked at yet, and this generally lets me quickly find mail related to the problems of the day and start being productive.

How I do email (at home)

I thought I might write something up on how I’ve been doing email both at home and at work. I very much on purpose keep the two completely separate, and have slightly different use cases for both of them.

For work, I do not want mail on my phone. For personal mail, it turns out I do want this on my phone, which is currently an Android phone. Since my work and personal email is very separate, the volume of mail is really, really different. Personal mail is maybe a couple of dozen a day at most. Work is… orders of magnitude more.

Considering I generally prefer free software to non-free software, K9 Mail is the way I go on my phone. I have it set up to point at the IMAP and SMTP servers of my mail provider (FastMail). I also have a google account, and the gmail app works fine for the few bits of mail that go there instead of my regular account.

For my mail accounts, I do an INBOX ZERO like approach (in reality, I’m pretty much nowhere near zero, but today I learned I’m a lot closer than many colleagues). This means I read / respond / do / ignore mail and then move it to an ARCHIVE folder. K9 and Gmail both have the ability to do this easily, so it works well.

Additionally though, I don’t want to care about limits on storage (i.e. expire mail from the server after X days), nor do I want to rely on “the cloud” to be the only copy of things. I also don’t want to have to upload any of past mail I may be keeping around. I also generally prefer to use notmuch as a mail client on a computer.

For those not familiar with notmuch, it does tags on mail in Maildir, is extremely fast and can actually cope with a quantity of mail. It also has this “archive”/INBOX ZERO workflow which I like.

In order to get mail from FastMail and Gmail onto a machine, I use offlineimap. An important thing to do is to set “status_backend = sqlite” for each Account. It turns out I first hacked on sqlite for offlineimap status a bit over ten years ago – time flies. For each Account I also set presynchook = ~/Maildir/maildir-notmuch-presync (below) and a postsynchook = notmuch new. The presynchook is run before we sync, and its job is to move files around based on the tags in notmuch and the postsynchook lets notmuch catch any new mail that’s been fetched.

My maildir-notmuch-presync hook script is:

#!/bin/bash
notmuch search --output=files not tag:inbox and folder:fastmail/INBOX|xargs -I'{}' mv '{}' "$HOME/Maildir/INBOX/fastmail/Archive/cur/"

notmuch search --output=files folder:fastmail/INBOX and tag:spam |xargs -I'{}' mv '{}' "$HOME/Maildir/INBOX/fastmail/Spam/cur/"
ARCHIVE_DIR=$HOME/Maildir/INBOX/`date +"%Y%m"`/cur/
mkdir -p $ARCHIVE_DIR
notmuch search --output=files folder:fastmail/Archive and date:..90d and not tag:flagged | xargs -I'{}' mv '{}' "$ARCHIVE_DIR"

# Gmail
notmuch search --output=files not tag:inbox and folder:gmail/INBOX|grep 'INBOX/gmail/INBOX/' | xargs -I'{}' rm '{}'
notmuch search --output=files folder:gmail/INBOX and tag:spam |xargs -I'{}' mv '{}' "$HOME/Maildir/INBOX/gmail/[Gmail].Spam/cur/"

So This keeps 90 days of mail on the fastmail server, and archives older mail off into month based archive dirs. This is simply to keep directory sizes not too large, you could put everything in one directory… but at some point that gets a bit silly.

I don’t think this is all the most optimal setup I could have, but it does let me read and answer mail on my phone and desktop (as well as use a web client if I want to). There is a bit of needless copying of messages by offlineimap under certain circumstances, but I don’t get enough personal mail for it to be a problem.

Ten years of libeatmydata!

So, ten years ago (how is that even possible… it seems like it was just a couple of years ago), there was the first commit in the libeatmydata repository (now in git on github rather than in bzr on launchpad). The first implementation was literally just this:

#include <sys/types.h>
#include 
#include 
#include <sys/stat.h>
#include 

int errno;

int fsync(int fd)
{
       errno=0;
       return 0;
}

Soooo…. kind of incredibly simple. But, hey, it worked! Little did I know, that these two lines of code were going to grow into 166 lines of C in order to do it a bit more “properly”.

My initial use case was making the MySQL test suite run faster: 30% faster back then! In fact, it was better than using tmpfs! It’s still used for that (even though I no longer hack on MySQL with any regularity), see github issue #1 for a recent bug that cropped up.

Since then, I’m aware of eatmydata being used to build entire operating systems and in production in way too many places (on way too many machines). The probability that any given human who’s used a computer in the past 10 years has used libeatmydata, used a package built with it or used a service with it running somewhere in production is so close to 1 that I don’t want to think about it.

Well… here’s to the next ten years of eating data!

Installing Windows on a USB key

For some unknown reason, the Windows installer doesn’t let you install to a USB key. Luckily, there’s a simple workaround. It turns out that only the very first step of installation cares about that. So, if you can fool it (say, by running in qemu), you can have a USB key with a Windows install rather than having to dual boot on your hard disk (e.g. if you run Linux and want all that fast in-built SSD space for Linux)

  1. Download the Windows Installer: https://www.microsoft.com/en-au/software-download/windows10ISO
  2. Start the installer in a VM, with the USB key passed through to the VM as the hard disk (or use a disk image the same size as your USB key for transfer with a utility such as ‘dd’ later). e.g. do:
    qemu-img create -f raw win-installed.img 50G
    
    qemu-system-x86_64 --enable-kvm -m 8G -cdrom Downloads/Win10_1709_English_x64.iso -hda win-installed.img -boot d
  3. At the first reboot of the installer, instead of letting it boot, stop the VM. You are going to copy the install at this state to the USB key.
  4. Boot from the USB key, go through the rest of the installer. You’re done!

Updating Intel Management Engine firmware on a Lenovo without a Windows Install

This is how I updated my Intel ME firmware on my Lenovo X1 Carbon Gen 4 (reports say this also has worked for Gen5 machines). These instructions are pretty strongly inspired by https://news.ycombinator.com/item?id=15744152

Why? Intel security advisory and CVE-2017-5705, CVE-2017-5708, CVE-2017-5711, and CVE-2017-5712 should be reason enough.

You will need:

  • To download about 3.5GB of stuff
  • A USB key
  • Linux installed
  • WINE or a Windows box to run two executables (because self extracting archives are a thing on Windows apparently)
  • A bit of technical know-how. A shell prompt shouldn’t scare you too hard.

Steps:

  1. Go to https://www.microsoft.com/en-au/software-download/windows10ISO and download the 32-bit ISO.
  2. Mount the ISO as a loopback device (e.g. by right clicking and choosing to mount, or by doing ‘sudo mount -o loop,ro file.iso /mnt’
  3. Go to Lenovo web site for Drivers & Software for your laptop, under Chipset, there’s ME Firmware and Software downloads You will need both. It looks like this:
  4. Run both exe files with WINE or on a windows box to extract the archives, you do not need to run the installers at the end.
  5. you now need to extract the management engine drivers. You can do this in  ~/.wine/drive_c/DRIVERS/WIN/AMT, with “cabextract SetupME.exe” or (as suggested in the comments) you can use the innoextract utility (from linux) to extract them (a quick check shows this to work)
  6. Save off HECI_REL folder, it’s the only extracted thing you’ll need.
  7. Go and install https://wimlib.net/ – we’re going to use it to create the boot disk. (it may be packaged for your distro).
    If you don’t have the path /usr/lib/syslinux/modules/bios on your system but you do have /usr/share/syslinux/modules/bios – you will need to change a bit of the file programs/mkwinpeimg.in to point to the /usr/share locations rather than /usr/lib before you install wimlib. This probably isn’t needed if you’re installing from packages, but may be requried if you’re on, say, Fedora.
  8. Copy ~/.wine/drive_c/DRIVERS to a new folder, e.g. “winpe_overlay” (or copy from the Windows box you extracted things on)
  9. Use mkwinpeimg to create the boot disk, pointing it to the mounted Windows 10 installer and the “winpe_overlay”:
    mkwinpeimg -W /path/to/mounted/windows10-32bit-installer/ -O winpe_overlay disk.img
  10. Use ‘dd’ to write it to your USB key
  11. Reboot, go into BIOS and turn Secure Boot OFF, Legacy BIOS ON, and AMT ON.
  12. Boot off the USB disk you created.
  13. In the command prompt of the booted WinPE environment, run the following to start the update:
    cd \
    cd HECI_REL\win10
    drvload heci.inf
    cd \
    cd win\me
    MEUpdate.cmd

    It should look something like this:

  14. Reboot, go back into BIOS and change your settings back to how you started.

ZMODEM saves the day! Or, why my firmware for a machine with a CPU from 2017 contains a serial file transfer protocol from the 1980s

Recently, I added the package lrzsz to op-build in this commit. This package provides the rz and sz commands – for receive zmodem and send zmodem respectively. For those who don’t know, op-build builds a firmware image for OpenPOWER machines, and adding this package adds the commands to the petitboot shell (the busybox environment you get when you “exit to shell” from the boot menu).

For those who aren’t familiar with ZMODEM, you should first get off my lawn, and secondly, know that it’s a method for sending/receiving files over something akin to a serial port, e.g. a modem. The basic protocol is “I want to send you this file named FOO”, “okay, I would like to receive it”, “here’s some data and a checksum, did you get it and does it match the checksum?”, “yes!”, “okay, great, here’s the next bit” until the file is transferred. Importantly, it has a provision for “no, I didn’t get that right” and for bits to be resent.

The one thing that pretty much always somewhat works on a computer is a serial port (or something that looks like a serial port to software). When you connect to the IPMI console (“ipmitool sol activate”), the host sees this as a serial port that it pumps bytes over. With OpenBMC, you can actually connect to this serial port via SSH.

When diagnosing weird problems during firmware bringup such as “why doesn’t PCI work” or “why does my network adapter not work” (or, perhaps, somebody helpfully didn’t plug the network cable in), it can be useful to copy off a bunch of debug information from the machine.

You may say “can’t you just print the log file to the screen and save it?” and you’d be right, you can do that for text – it’s really annoying for binary data though. Plus, there have been bugs in the console implementations on pretty much every BMC I’ve ever used that makes them not as reliable as you’d like.

So, how could we transfer a file over the serial connection we have to the machine? The same way we did on a BBS! Enter ZMODEM. The error recovery properties are perfect in this situation.

So, how do you use it? I’ve found two ways that work well: GNU screen and zssh. For GNU Screen, you’ll want to configure it to catch zmodem traffic by doing “control-a:zmodem catch<ENTER>” (you need the colon there). After that if you execute “sz” on the host and the rest should be obvious! If you wanted to send a file to the host, run “rz” rather than “sz”.

op-test-framework: Let’s break the console!

One of the things I’ve been working on fairly quietly is the test suite for OpenPOWER firmware: op-test-framework. I’ve approach things I’m hacking on from the goal of “when I merge patches into skiboot, can I be confident that I haven’t merged something that’s broken existing functionality?”

By testing host firmware, we incidentally (as well as on purpose) test a whole bunch of BMC functionality. One bit of functionality we rely on a lot is the host “serial” console. Typically, this is exposed to the user over IPMI Serial Over LAN (SOL), or on OpenBMC it’s also exposed as something you can ssh to (which proves to be both faster and more reliable than IPMI, not to mention there’s some remote semblance of security).

When running through some tests, I noticed something pretty odd, it appeared as though we were sometimes missing some console output on larger IOs. This usually isn’t a problem as when we’re using expect(1) (or the python equivalent pexpect) we end up having all sorts of delays here there and everywhere to work around all the terrible things you hope you never learn. So… how could I test that? Well.. what about checking the output of something like dd if=/dev/zero bs=1024 count=16|hexdump -C to see if we get the full output?

Time to add a test to op-test-framework! Adding such a test is pretty easy. If we look at the source of the test I added, we can see what happens (source here).

class Console():
    bs = 1024
    count = 8
    def setUp(self):
        conf = OpTestConfiguration.conf
        self.bmc = conf.bmc()
        self.system = conf.system()

    def runTest(self):
        self.system.goto_state(OpSystemState.PETITBOOT_SHELL)
        console = self.bmc.get_host_console()
        self.system.host_console_unique_prompt()
        bs = self.bs
        count = self.count
        self.assertTrue( (bs*count)%16 == 0, "Bug in test writer. Must be multiple of 16 bytes: bs %u count %u / 16 = %u" % (bs, count, (bs*count)%16))
        try:
            zeros = console.run_command("dd if=/dev/zero bs=%u count=%u|hexdump -C -v" % (bs, count), timeout=120)
        except CommandFailed as cf:
            self.assertEqual(cf.exitcode, 0)
        self.assertTrue( len(zeros) == 3+(count*bs)/16, "Unexpected length of zeros %u" % (len(zeros)))

First thing you’ll notice is that this looks like a Python unittest. It’s because it is. The unittest infrastructure was a path of least resistance, so we started with it. This class isn’t the one that’s actually run, we do a little bit of inheritance magic in order to run the same test with different parameters (see https://github.com/open-power/op-test-framework/blob/6c74fb0fb0993ae8ae1a7aa62ec58e57c0080686/testcases/Console.py#L50)

class Console8k(Console, unittest.TestCase):
    bs = 1024
    count = 8

class Console16k(Console, unittest.TestCase):
    bs = 1024
    count = 16

class Console32k(Console, unittest.TestCase):
    bs = 1024
    count = 32

The setUp() function is pure boiler plate, we grab some objects from the configuration of the test run, namely information about the BMC and the system itself, so we can do things to both. The real magic happens in runTest().

op-test-framework tracks the state of the machine being tested across each test. This means that if we’re executing 101 tests in the petitboot shell, we don’t need to do 101 separate boots to petitboot. The self.system.goto_state(OpSystemState.PETITBOOT_SHELL) statement says “Please ensure the system is booted to the petitboot shell”. Other states include OFF (obvious) and OS, which is when the machine is booted to the OS.

The next two lines ensure we can run commands on the console (where console is IPMI Serial over LAN or other similar connection, such as the SSH console provided by OpenBMC):

console = self.bmc.get_host_console()
self.system.host_console_unique_prompt()

The host_console_unique_prompt() call is a bit ugly, and I’m hoping we fix the APIs so that this isn’t needed. Basically, it sets things up so that pexpect will work properly.

The bit that does the work is the try/except block along with the assertTrue. We don’t currently check that the content is all correct, we just check we got the right *amount* of content.

It turns out, this check is enough to reveal a bug that turns out to be deep in the core Linux TTY layer, and has caused Jeremy some amount of fun (for certain values of fun).

Want to know more about how the console works? Jeremy blogged on it.

Fedora 25 + Lenovo X1 Carbon 4th Gen + OneLink+ Dock

As of May 29th 2017, if you want to do something crazy like use *both* ports of the OneLink+ dock to use monitors that aren’t 640×480 (but aren’t 4k), you’re going to need a 4.11 kernel, as everything else (for example 4.10.17, which is the latest in Fedora 25 at time of writing) will end you in a world of horrible, horrible pain.

To install, run this:

sudo dnf install \
https://kojipkgs.fedoraproject.org//packages/kernel/4.11.3/200.fc25/x86_64/kernel-4.11.3-200.fc25.x86_64.rpm \
https://kojipkgs.fedoraproject.org//packages/kernel/4.11.3/200.fc25/x86_64/kernel-core-4.11.3-200.fc25.x86_64.rpm \
https://kojipkgs.fedoraproject.org//packages/kernel/4.11.3/200.fc25/x86_64/kernel-cross-headers-4.11.3-200.fc25.x86_64.rpm \
https://kojipkgs.fedoraproject.org//packages/kernel/4.11.3/200.fc25/x86_64/kernel-devel-4.11.3-200.fc25.x86_64.rpm \
https://kojipkgs.fedoraproject.org//packages/kernel/4.11.3/200.fc25/x86_64/kernel-modules-4.11.3-200.fc25.x86_64.rpm \
https://kojipkgs.fedoraproject.org//packages/kernel/4.11.3/200.fc25/x86_64/kernel-tools-4.11.3-200.fc25.x86_64.rpm \
https://kojipkgs.fedoraproject.org//packages/kernel/4.11.3/200.fc25/x86_64/kernel-tools-libs-4.11.3-200.fc25.x86_64.rpm \
https://kojipkgs.fedoraproject.org//packages/kernel/4.11.3/200.fc25/x86_64/perf-4.11.3-200.fc25.x86_64.rpm

This grabs a kernel that’s sitting in testing and isn’t yet in the main repositories. However, I can now see things on monitors, rather than 0 to 1 monitor (most often 0). You can also dock/undock and everything doesn’t crash in a pile of fail.

I remember a time when you could fairly reliably buy Intel hardware and have it “just work” with the latest distros. It’s unfortunate that this is no longer the case, and it’s more of a case of “wait six months and you’ll still have problems”.

Urgh.

(at least Wayland and X were bug for bug compatible?)

API, ABI and backwards compatibility are a hard necessity

Recently, I was reading a thread on LKML on a proposal to change the behavior of the open system call when confronted with unknown flags. The thread is worth a read as the topic of augmenting things that exist probably by accident to be “better” is always interesting, as is the definition of “better”.

Keeping API and/or ABI compatibility is something that isn’t a new problem, and it’s one that people are pretty good at sometimes messing up.

This problem does not go away just because “we have cloud now”. In any distributed system, in order to upgrade it (or “be agile” as the kids are calling it), you by definition are going to have either downtime or at least two versions running concurrently. Thus, you have to have your interfaces/RPCs/APIs/ABIs/protocols/whatever cope with changes.

You cannot instantly upgrade the world, it happens gradually. You also have to design for at least three concurrent versions running. One is the original, the second is your upgrade, your third is the urgent fix because the upgrade is quite broken in some new way you only discover in production.

So, the way you do this? Never ever EVER design for N-1 compatibility only. Design for going back a long way, much longer than you officially support. You want to have a design and programming culture of backwards compatibility to ensure you can both do new and exciting things and experiment off to the side.

It’s worth going and rereading Rusty’s API levels posts from 2008:

j-core + Numato Spartan 6 board + Fedora 25

A couple of changes to http://j-core.org/#download_bitstream made it easy for me to get going:

  • In order to make ModemManager not try to think it’s a “modem”, create /etc/udev/rules.d/52-numato.rules with the following content:
    # Make ModemManager ignore Numato FPGA board
    ATTRS{idVendor}=="2a19", ATTRS{idProduct}=="1002", ENV{ID_MM_DEVICE_IGNORE}="1"
  • You will need to install python3-pyserial and minicom
  • The minicom command line i used was:
    sudo stty -F /dev/ttyACM0 -crtscts && minicom -b 115200 -D /dev/ttyACM0

and along with the instructions on j-core.org, I got it to load a known good build.

Books referenced in my Organizational Change talk at LCA2017

All of these are available as Kindle books, but I’m sure you can get 3D copies too:

The Five Dysfunctions of a Team: A Leadership Fable by Patrick M. Lencioni
Leading Change by John P. Kotter
Who Says Elephants Can’t Dance? Louis V. Gerstner Jr.
Nonviolent Communication: A language of Life by Marshall B. Rosenberg and Arun Gandhi

Fast Reset, Trusted Boot and the security of /sbin/reboot

In OpenPOWER land, we’ve been doing some work on Secure and Trusted Boot while at the same time doing some work on what we call fast-reset (or fast-reboot, depending on exactly what mood someone was in at any particular time…. we should start being a bit more consistent).

The basic idea for fast-reset is that when the OS calls OPAL reboot, we gather all the threads in the system using a combination of patching the reset vector and soft-resetting them, then cleanup a few bits of hardware (we do re-probe PCIe for example), and reload & restart the bootloader (petitboot).

What this means is that typing “reboot” on the command line goes from a ~90-120+ second affair (through firmware to petitboot, linux distros still take ages to shut themselves down) down to about a 20 second affair (to petitboot).

If you’re running a (very) recent skiboot, you can enable it with a special hidden NVRAM configuration option (although we’ll likely enable it by default pretty soon, it’s proving remarkably solid). If you want to know what that NVRAM option is… Use the source, Luke! (or git history, but I’ve yet to see a neat Star Wars reference referring to git commit logs).

So, there’s nothing like a demo. Here’s a demo with Ubuntu running off an NVMe drive on an IBM S822LC for HPC (otherwise known as Minsky or Garrison) which was running the HTX hardware exerciser, through fast-reboot back into Petitboot and then booting into Ubuntu and auto-starting the exerciser (HTX) again.

Apart from being stupidly quick when compared to a full IPL (Initial Program Load – i.e. boot), since we’re not rebooting out of band, we have no way to reset the TPM, so if you’re measuring boot, each subsequent fast-reset will result in a different set of measurements.

This may be slightly confusing, but it’s not really a problem. You see, if a machine is compromised, there’s nothing stopping me replacing /sbin/reboot with something that just prints things to the console that look like your machine rebooted but in fact left my rootkit running. Indeed, fast-reset and a full IPL should measure different values in the TPM.

It also means that if you ever want to re-establish trust in your OS, never do a reboot from the host – always reboot out of band (e.g. from a BMC). This, of course, means you’re trusting your BMC to not be compromised, which I wouldn’t necessarily do if you suspect your host has been.

Windows NT4 for PowerPC guest on OPAL on POWER8 in qemu

Sometimes, programming is just for fun. This is what PREPHV is for Andrei Warkentin. To quote the README:

“This is mostly a huge ugly hack, derived from my
ppc64le_hello code. The running philosophy here is
to throw things together late at night with my family
asleep and see how far I get without a real design
or without a real desire to implement boring things
like IDE (*sigh*) emulation”

Since my day job is maintaining the firmware that it runs on, I decided to have a go (it also ties in with the retro stuff I’ve been blogging about). So…

screenshot-from-2016-10-30-17-22-20and I’m off! (yes, this is the very latest qemu and skiboot):

screenshot-from-2016-10-30-17-23-32screenshot-from-2016-10-30-17-23-48Yes, prephv does clear all thirty two megabytes of guest memory

screenshot-from-2016-10-30-17-24-15A quick diversion, if you try Windows NT 3.51 for PowerPC, you get this:

screenshot-from-2016-10-30-18-17-35

But on NT4, you continue unharmed:

screenshot-from-2016-10-30-17-22-32A sign I needed to hack my filesystem of bits of NT installer bits a bit more:

screenshot-from-2016-10-30-17-22-45But, on my next try:

screenshot-from-2016-10-30-17-25-26Well… looks like there’s an instruction that needs to be emulated (and there’s no code to currently do that). Mind you… this is decently far into booting before we hit anything fatal, which is a pretty impressive effort – and it is tempting to continue and see if it’ll run on real hardware and if it could be made to work well enough to not find any disks :)