Booting temporary firmware on the Raptor Blackbird

In a future post, I’ll detail how to build my ported-to-upstream Blackbird firmware. Here though, we’ll explore booting some firmware temporarily to experiment.

Step 1: Copy your new PNOR image over to the BMC.
Step 2: …
Step 3: Profit!

Okay, not really, once you’ve copied over your image, ensure the computer is off and then you can tell the daemon that provides firmware to the host to use a file backend for it rather than the PNOR chip on the motherboard (i.e. yes, you can boot your system even when the firmware chip isn’t there – although I’ve not literally tried this).

root@blackbird:~# mboxctl --backend file:/tmp/blackbird.pnor 
SetBackend: Success
root@blackbird:~# obmcutil poweron

If we look at the serial console (ssh to the BMC port 2200) we’ll see Hostboot start, realise there’s newer SBE code, flash it, and reboot:

--== Welcome to Hostboot hostboot-b284071/hbicore.bin ==--

  3.02606|secure|SecureROM valid - enabling functionality
  5.14678|Booting from SBE side 0 on master proc=00050000
  5.18537|ISTEP  6. 5 - host_init_fsi
  5.47985|ISTEP  6. 6 - host_set_ipl_parms
  5.54476|ISTEP  6. 7 - host_discover_targets
  6.56106|HWAS|PRESENT> DIMM[03]=8080000000000000
  6.56108|HWAS|PRESENT> Proc[05]=8000000000000000
  6.56109|HWAS|PRESENT> Core[07]=1511540000000000
  6.61373|ISTEP  6. 8 - host_update_master_tpm
  6.61529|SECURE|Security Access Bit> 0x0000000000000000
  6.61530|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
  6.61543|ISTEP  6. 9 - host_gard
  7.20987|HWAS|FUNCTIONAL> DIMM[03]=8080000000000000
  7.20988|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
  7.20989|HWAS|FUNCTIONAL> Core[07]=1511540000000000
  7.21299|ISTEP  6.11 - host_start_occ_xstop_handler
  8.28965|ISTEP  6.12 - host_voltage_config
  8.47973|ISTEP  7. 1 - mss_attr_cleanup
  9.07674|ISTEP  7. 2 - mss_volt
  9.35627|ISTEP  7. 3 - mss_freq
  9.63029|ISTEP  7. 4 - mss_eff_config
 10.35189|ISTEP  7. 5 - mss_attr_update
 10.38489|ISTEP  8. 1 - host_slave_sbe_config
 10.45332|ISTEP  8. 2 - host_setup_sbe
 10.45450|ISTEP  8. 3 - host_cbs_start
 10.45574|ISTEP  8. 4 - proc_check_slave_sbe_seeprom_complete
 10.48675|ISTEP  8. 5 - host_attnlisten_proc
 10.50338|ISTEP  8. 6 - host_p9_fbc_eff_config
 10.50771|ISTEP  8. 7 - host_p9_eff_config_links
 10.53338|ISTEP  8. 8 - proc_attr_update
 10.53634|ISTEP  8. 9 - proc_chiplet_fabric_scominit
 10.55234|ISTEP  8.10 - proc_xbus_scominit
 10.56202|ISTEP  8.11 - proc_xbus_enable_ridi
 10.57788|ISTEP  8.12 - host_set_voltages
 10.59421|ISTEP  9. 1 - fabric_erepair
 10.65877|ISTEP  9. 2 - fabric_io_dccal
 10.66048|ISTEP  9. 3 - fabric_pre_trainadv
 10.66665|ISTEP  9. 4 - fabric_io_run_training
 10.66860|ISTEP  9. 5 - fabric_post_trainadv
 10.67060|ISTEP  9. 6 - proc_smp_link_layer
 10.67503|ISTEP  9. 7 - proc_fab_iovalid
 11.10386|ISTEP  9. 8 - host_fbc_eff_config_aggregate
 11.15103|ISTEP 10. 1 - proc_build_smp
 11.27537|ISTEP 10. 2 - host_slave_sbe_update
 11.68581|sbe|System Performing SBE Update for PROC 0, side 0
 34.50467|sbe|System Rebooting To Complete SBE Update Process
 34.50595|IPMI: Initiate power cycle
 34.54671|Stopping istep dispatcher
 34.68729|IPMI: shutdown complete

One of the improvements is we now get output from the SBE! This means that when we do things like mess up secure boot and non secure boot firmware (I’ll explain why/how this is a thing later), we’ll actually get something useful out of a serial port:

--== Welcome to SBE - CommitId[0x8b06b5c1] ==--
istep 3.19
istep 3.20
istep 3.21
istep 3.22
istep 4.1
istep 4.2
istep 4.3
istep 4.4
istep 4.5
istep 4.6
istep 4.7
istep 4.8
istep 4.9
istep 4.10
istep 4.11
istep 4.12
istep 4.13
istep 4.14
istep 4.15
istep 4.16
istep 4.17
istep 4.18
istep 4.19
istep 4.20
istep 4.21
istep 4.22
istep 4.23
istep 4.24
istep 4.25
istep 4.26
istep 4.27
istep 4.28
istep 4.29
istep 4.30
istep 4.31
istep 4.32
istep 4.33
istep 4.34
istep 5.1
istep 5.2
SBE starting hostboot

And then we’re back into normal Hostboot boot (which we’ve all seen before) and end up at a newer petitboot!

Petitboot 1.11 on a Raptor Blackbird

One notable absence from that screenshot is my installed Fedora is missing. This is because there appears to be a bug in the 5.3.7 kernel that’s currently upstream, and if we drop to the shell and poke at lspci and dmesg, we can work out what could be the culprit:

Exiting petitboot. Type 'exit' to return.
You may run 'pb-sos' to gather diagnostic data
No password set, running as root. You may set a password in the System Configuration screen.
# lspci
0000:00:00.0 PCI bridge: IBM Device 04c1
0001:00:00.0 PCI bridge: IBM Device 04c1
0001:01:00.0 Non-Volatile memory controller: Intel Corporation Device f1a8 (rev 03)
0002:00:00.0 PCI bridge: IBM Device 04c1
0002:01:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller (rev 11)
0003:00:00.0 PCI bridge: IBM Device 04c1
0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0004:00:00.0 PCI bridge: IBM Device 04c1
0004:01:00.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0004:01:00.1 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0004:01:00.2 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0005:00:00.0 PCI bridge: IBM Device 04c1
0005:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
0005:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
# dmesg|grep -i nvme
[    2.991038] nvme nvme0: pci function 0001:01:00.0
[    2.991088] nvme 0001:01:00.0: enabling device (0140 -> 0142)
[    3.121799] nvme nvme0: Identify Controller failed (19)
[    3.121802] nvme nvme0: Removing after probe failure status: -5
# uname -a
Linux skiroot 5.3.7-openpower1 #2 SMP Sat Dec 14 09:06:20 PST 2019 ppc64le GNU/Linux

If for some reason the device didn’t show up in lspci, then I’d look at the skiboot firmware log, which is /sys/firmware/opal/msglog.

Looking at upstream stable kernel patches, it seems like 5.3.8 has a interesting looking patch when you realize that ppc64le uses a 64k page size:

commit efac0f186ea654e8389f5017c7f643ef48cb4b93
Author: Kevin Hao <haokexin@gmail.com>
Date:   Fri Oct 18 10:53:14 2019 +0800

    nvme-pci: Set the prp2 correctly when using more than 4k page
    
    commit a4f40484e7f1dff56bb9f286cc59ffa36e0259eb upstream.
    
    In the current code, the nvme is using a fixed 4k PRP entry size,
    but if the kernel use a page size which is more than 4k, we should
    consider the situation that the bv_offset may be larger than the
    dev->ctrl.page_size. Otherwise we may miss setting the prp2 and then
    cause the command can't be executed correctly.
    
    Fixes: dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests")
    Cc: stable@vger.kernel.org
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Kevin Hao <haokexin@gmail.com>
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

So, time to go try 5.3.8. My yaks are getting quite smooth.

Oh, and when you’re done with your temporary firmware, either fiddle with mboxctl or restart the systemd service for it, or reboot your BMC or… well, I gotta leave you something to work out on your own :)

Building OpenPOWER firmware on Fedora 31

One of the challenges with Fedora 31 is that /usr/bin/python is now Python 3 rather than Python 2. Just about every python script in existence relies on /usr/bin/python being Python 2 and not anything else. I can’t really recall, but this probably happened with the 1.5 to 2 transition as well (although IIRC that was less breaking).

What this means is that for projects that are half-way through converting to python 3, everything breaks.

op-build is one of these projects.

So, we need:

After all that, you can actually build a pnor image on Fedora 31. Even on Fedora 31 ppc64le, which is literally what I’ve just done.

Blackbird (singing in the dead of night..)

Way back when Raptor Computer Systems was doing pre-orders for the microATX Blackboard POWER9 system, I put in a pre-order. Since then, I’ve had a few life changes (such as moving to the US and starting to work for Amazon rather than IBM), but I’ve finally gone and done (most of) the setup for my own POWER9 system on (or under) my desk.

An 8 core POWER9 CPU, in bubble wrap and plastic packaging.

Everything came in a big brown box, all rather well packed. I had the board, CPU, heatsink assembly and the special tool to attach the heatsink to the board. Although unique to POWER9, the heatsink/fan assembly was one of the easier ones I’ve ever attached to a board.

The board itself looks pretty much as you’d expect – there’s a big spot for the CPU, a couple of PCI slots, a couple of DIMM slots and some SATA connectors.

The bits that are a bit unusual for a micro-ATX board are the big space reserved for FlexVer, the ASPEED BMC chip and the socketed flash. FlexVer is something I’m not ever going to use, and instead wish that there was an on-board m2 SSD slot instead, even if it was just PCIe. Having to sacrifice a PCIe slot just for a SSD is kind of a bummer.

The Blackbird POWER9 board
The POWER9 chip in socket

One annoying thing is my DIMMs are taking their sweet time in getting here, so I couldn’t actually populate the board with any memory.

Even without memory though, you can start powering it on and see that everything else works okay (i.e. it’s not completely boned). So, even without DIMMs, I could plug it in, and observe the Hostboot firmware complaining about insufficient hardware to IPL the box.

It Lives!

Yep, out the console (via ssh) you clearly see where things fail:

--== Welcome to Hostboot hostboot-3beba24/hbicore.bin ==--

  3.03104|secure|SecureROM valid - enabling functionality
  6.67619|Booting from SBE side 0 on master proc=00050000
  6.85100|ISTEP  6. 5 - host_init_fsi
  7.23753|ISTEP  6. 6 - host_set_ipl_parms
  7.71759|ISTEP  6. 7 - host_discover_targets
 11.34738|HWAS|PRESENT> Proc[05]=8000000000000000
 11.34739|HWAS|PRESENT> Core[07]=1511540000000000
 11.69077|ISTEP  6. 8 - host_update_master_tpm
 11.73787|SECURE|Security Access Bit> 0x0000000000000000
 11.73787|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
 11.76276|ISTEP  6. 9 - host_gard
 11.96654|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
 11.96655|HWAS|FUNCTIONAL> Core[07]=1511540000000000
 12.07554|================================================
 12.07554|Error reported by hwas (0x0C00) PLID 0x90000007
 12.10289|  checkMinimumHardware found no functional dimm cards.
 12.10290|  ModuleId   0x03 MOD_CHECK_MIN_HW
 12.10291|  ReasonCode 0x0c06 RC_SYSAVAIL_NO_MEMORY_FUNC
 12.10292|  UserData1  HUID of node : 0x0002000000000000
 12.10293|  UserData2  number of present, non-functional dimms : 0x0000000000000000
 12.10294|------------------------------------------------
 12.10417|  Callout type             : Procedure Callout
 12.10417|  Procedure                : EPUB_PRC_FIND_DECONFIGURED_PART
 12.10418|  Priority                 : SRCI_PRIORITY_HIGH
 12.10419|------------------------------------------------
 12.10420|  Hostboot Build ID: hostboot-3beba24/hbicore.bin
 12.10421|================================================
 12.51718|================================================
 12.51719|Error reported by hwas (0x0C00) PLID 0x90000007
 12.51720|  Insufficient hardware to continue.
 12.51721|  ModuleId   0x03 MOD_CHECK_MIN_HW
 12.51722|  ReasonCode 0x0c04 RC_SYSAVAIL_INSUFFICIENT_HW
 12.54457|  UserData1   : 0x0000000000000000
 12.54458|  UserData2   : 0x0000000000000000
 12.54458|------------------------------------------------
 12.54459|  Callout type             : Procedure Callout
 12.54460|  Procedure                : EPUB_PRC_FIND_DECONFIGURED_PART
 12.54461|  Priority                 : SRCI_PRIORITY_HIGH
 12.54462|------------------------------------------------
 12.54462|  Hostboot Build ID: hostboot-3beba24/hbicore.bin
 12.54463|================================================
 12.73660|System shutting down with error status 0x90000007
 12.75545|================================================
 12.75546|Error reported by istep (0x1700) PLID 0x90000007
 12.77991|  IStep failed, see other log(s) with the same PLID for reason.
 12.77992|  ModuleId   0x01 MOD_REPORTING_ERROR
 12.77993|  ReasonCode 0x1703 RC_FAILURE
 12.77994|  UserData1  eid of first error : 0x9000000800000c04
 12.77995|  UserData2  Reason code of first error : 0x0000000100000609
 12.77996|------------------------------------------------
 12.77996|  host_gard
 12.77997|------------------------------------------------
 12.77998|  Callout type             : Procedure Callout
 12.77998|  Procedure                : EPUB_PRC_HB_CODE
 12.77999|  Priority                 : SRCI_PRIORITY_LOW
 12.78000|------------------------------------------------
 12.78001|  Hostboot Build ID: hostboot-3beba24/hbicore.bin
 12.78002|================================================

Looking forward to getting some DIMMs to show/share more.

AWS Welcomes Stewart

A little over a month ago now, I started a new role at Amazon Web Services (AWS) as a Principal Engineer with Amazon Linux. Everyone has been wonderfully welcoming and helpful. I’m excited about the future here, the team, and our mission.

Thanks to all my IBM colleagues over the past five and a half and a bit years too, I really enjoyed working with you on OpenPOWER and hope it continues to gain traction. I have my Blackbird now and am eagerly waiting for a spare 20 minutes to assemble it.

CVE-2019-6260: Gaining control of BMC from the host processor

This is details for CVE-2019-6260 – which has been nicknamed “pantsdown” due to the nature of feeling that we feel that we’ve “caught chunks of the industry with their…” and combined with the fact that naming things is hard, so if you pick a bad name somebody would have to come up with a better one before we publish.

I expect OpenBMC to have a statement shortly.

The ASPEED ast2400 and ast2500 Baseboard Management Controller (BMC) hardware and firmware implement Advanced High-performance Bus (AHB) bridges, which allow arbitrary read and write access to the BMC’s physical address space from the host, or from the network if the BMC console uart is attached to a serial concentrator (this is atypical for most systems).

Common configuration of the ASPEED BMC SoC’s hardware features leaves it open to “remote” unauthenticated compromise from the host and from the BMC console. This stems from AHB bridges on the LPC and PCIe buses, another on the BMC console UART (hardware password protected), and the ability of the X-DMA engine to address all of the BMC’s M-Bus (memory bus).

This affects multiple BMC firmware stacks, including OpenBMC, AMI’s BMC, and SuperMicro. It is independent of host processor architecture, and has been observed on systems with x86_64 processors IBM POWER processors (there is no reason to suggest that other architectures wouldn’t be affected, these are just the ones we’ve been able to get access to)

The LPC, PCIe and UART AHB bridges are all explicitly features of Aspeed’s designs: They exist to recover the BMC during firmware development or to allow the host to drive the BMC hardware if the BMC has no firmware of its own. See section 1.9 of the AST2500 Software Programming Guide.

The typical consequence of external, unauthenticated, arbitrary AHB access is that the BMC fails to ensure all three of confidentiality, integrity and availability for its data and services. For instance it is possible to:

  1. Reflash or dump the firmware of a running BMC from the host
  2. Perform arbitrary reads and writes to BMC RAM
  3. Configure an in-band BMC console from the host
  4. “Brick” the BMC by disabling the CPU clock until the next AC power cycle

Using 1 we can obviously implant any malicious code we like, with the impact of BMC downtime while the flashing and reboot take place. This may take the form of minor, malicious modifications to the officially provisioned BMC image, as we can extract, modify, then repackage the image to be re-flashed on the BMC. As the BMC potentially has no secure boot facility it is likely difficult to detect such actions.

Abusing 3 may require valid login credentials, but combining 1 and 2 we can simply change the locks on the BMC by replacing all instances of the root shadow password hash in RAM with a chosen password hash – one instance of the hash is in the page cache, and from that point forward any login process will authenticate with the chosen password.

We obtain the current root password hash by using 1 to dump the current flash content, then using https://github.com/ReFirmLabs/binwalk to extract the rootfs, then simply loop-mount the rootfs to access /etc/shadow. At least one BMC stack doesn’t require this, and instead offers “Press enter for console”.

IBM has internally developed a proof-of-concept application that we intend to open-source, likely as part of the OpenBMC project, that demonstrates how to use the interfaces and probes for their availability. The intent is that it be added to platform firmware test
suites as a platform security test case. The application requires root user privilege on the host system for the LPC and PCIe bridges, or normal user privilege on a remote system to exploit the debug UART interface. Access from userspace demonstrates the vulnerability of systems in bare-metal cloud hosting lease arrangements where the BMC
is likely in a separate security domain to the host.

OpenBMC Versions affected: Up to at least 2.6, all supported Aspeed-based platforms

It only affects systems using the ASPEED ast2400, ast2500 SoCs. There has not been any investigation into other hardware.

The specific issues are listed below, along with some judgement calls on their risk.

iLPC2AHB bridge Pt I

State: Enabled at cold start
Description: A SuperIO device is exposed that provides access to the BMC’s address-space
Impact: Arbitrary reads and writes to the BMC address-space
Risk: High – known vulnerability and explicitly used as a feature in some platform designs
Mitigation: Can be disabled by configuring a bit in the BMC’s LPC controller, however see Pt II.

iLPC2AHB bridge Pt II

State: Enabled at cold start
Description: The bit disabling the iLPC2AHB bridge only removes write access – reads are still possible.
Impact: Arbitrary reads of the BMC address-space
Risk: High – we expect the capability and mitigation are not well known, and the mitigation has side-effects
Mitigation: Disable SuperIO decoding on the LPC bus (0x2E/0x4E decode). Decoding is controlled via hardware strapping and can be turned off at runtime, however disabling SuperIO decoding also removes the host’s ability to configure SUARTs, System wakeups, GPIOs and the BMC/Host mailbox

PCIe VGA P2A bridge

State: Enabled at cold start
Description: The VGA graphics device provides a host-controllable window mapping onto the BMC address-space
Impact: Arbitrary reads and writes to the BMC address-space
Risk: Medium – the capability is known to some platform integrators and may be disabled in some firmware stacks
Mitigation: Can be disabled or filter writes to coarse-grained regions of the AHB by configuring bits in the System Control Unit

DMA from/to arbitrary BMC memory via X-DMA

State: Enabled at cold start
Description: X-DMA available from VGA and BMC PCI devices
Impact: Misconfiguration can expose the entirety of the BMC’s RAM to the host
AST2400 Risk: High – SDK u-boot does not constrain X-DMA to VGA reserved memory
AST2500 Risk: Low – SDK u-boot restricts X-DMA to VGA reserved memory
Mitigation: X-DMA accesses are configured to remap into VGA reserved memory in u-boot

UART-based SoC Debug interface

State: Enabled at cold start
Description: Pasting a magic password over the configured UART exposes a hardware-provided debug shell. The capability is only exposed on one of UART1 or UART5, and interactions are only possible via the physical IO port (cannot be accessed from the host)
Impact: Misconfiguration can expose the BMC’s address-space to the network if the BMC console is made available via a serial concentrator.
Risk: Low
Mitigation: Can be disabled by configuring a bit in the System Control Unit

LPC2AHB bridge

State: Disabled at cold start
Description: Maps LPC Firmware cycles onto the BMC’s address-space
Impact: Misconfiguration can expose vulnerable parts of the BMC’s address-space to the host
Risk: Low – requires reasonable effort to configure and enable.
Mitigation: Don’t enable the feature if not required.
Note: As a counter-point, this feature is used legitimately on OpenPOWER systems to expose the boot flash device content to the host

PCIe BMC P2A bridge

State: Disabled at cold start
Description: PCI-to-BMC address-space bridge allowing memory and IO accesses
Impact: Enabling the device provides limited access to BMC address-space
Risk: Low – requires some effort to enable, constrained to specific parts of the BMC address space
Mitigation: Don’t enable the feature if not required.

Watchdog setup

State: Required system function, always available
Description: Misconfiguring the watchdog to use “System Reset” mode for BMC reboot will re-open all the “enabled at cold start” backdoors until the firmware reconfigures the hardware otherwise. Rebooting the BMC is generally possible from the host via IPMI “mc reset” command, and this may provide a window of opportunity for BMC compromise.
Impact: May allow arbitrary access to BMC address space via any of the above mechanisms
Risk: Low – “System Reset” mode is unlikely to be used for reboot due to obvious side-effects
Mitigation: Ensure BMC reboots always use “SOC Reset” mode

The CVSS score for these vulnerabilities is: https://nvd.nist.gov/vuln-metrics/cvss/v3-calculator?vector=3DAV:A/AC:L/PR:=N/UI:N/S:U/C:H/I:H/A:H/E:F/RL:U/RC:C/CR:H/IR:H/AR:M/MAV:L/MAC:L/MPR:N/MUI:N=/MS:U/MC:H/MI:H/MA:H

There is some debate on if this is a local or remote vulnerability, and it depends on if you consider the connection between the BMC and the host processor as a network or not.

The fix is platform dependent as it can involve patching both the BMC firmware and the host firmware.

For example, we have mitigated these vulnerabilities for OpenPOWER systems, both on the host and BMC side. OpenBMC has a u-boot patch that disables the features:

https://gerrit.openbmc-project.xyz/#/c/openbmc/meta-phosphor/+/13290/

Which platforms can opt into in the following way:

https://gerrit.openbmc-project.xyz/#/c/openbmc/meta-ibm/+/17146/

The process is opt-in for OpenBMC platforms because platform maintainers have the knowledge of if their platform uses affected hardware features. This is important when disabling the iLPC2AHB bridge as it can be a bit of a finicky process.

See also https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/11164 for a WIP OpenBMC Security Architecture document which should eventually contain all these details.

For OpenPOWER systems, the host firmware patches are contained in op-build v2.0.11 and enabled for certain platforms. Again, this is not by default for all platforms as there is BMC work required as well as per-platform changes.

Credit for finding these problems: Andrew Jeffery, Benjamin
Herrenschmidt, Jeremy Kerr, Russell Currey, Stewart Smith. There have been many more people who have helped with this issue, and they too deserve thanks.

Switching to iPhone Part 2: Seriously?

In which I ask of Apple, “Seriously?”.

That was pretty much my reaction with Apple sticking to Lightning connectors rather than going with the USB-C standard. Having USB-C around the place for my last two (Android) phones was fantastic. I could charge a phone, external battery, a (future) laptop, all off the same wall wart and with the same cable. It is with some hilarity that I read that the new iPad Pro has USB-C rather than Lightning.

But Apple’s dongle fetish reigns supreme, and so I get a multitude of damn dongles all for a wonderfully inflated price with an Australia Tax whacked on top.

The most egregious one is the Lightning-to-3.5mm dongle. In the office, I have a good set of headphones. The idea is to block out the sound of an open plan office so I can actually get some concentrating done. With tiny dedicated MP3 players and my previous phones, these sounded great. The Apple dongle? It sounds terrible. Absolutely terrible. The Lighting-to-3.5mm adapter might be okay for small earbuds but it is nearly completely intolerable for any decent set of headphones. I’m now in the market for a Bluetooth headphone amplifier. Another bunch of money to throw at another damn dongle.

Luckily, there seems to be a really good Bluetooth headphone amplifier on Amazon. The same Amazon that no longer ships to Australia. Well, there’s an Australian seller, for six times the price.

Urgh.

Switching to iPhone: Part 1

I have used Android phones since the first one: the G1. I’m one of the (relatively) few people who has used Android 1.0. I’ve had numerous Android phones since then, mostly the Google flagship.

I have fond memories of the Nexus One and Galaxy Nexus, as well as a bunch of time running Cyanogen (often daily builds, because YOLO) to get more privacy preserving features (or a more recent Android). I had a Sony Z1 Compact for a while which was great bang for buck except for the fact the screen broke whenever you looked at it sideways. Great kudos to the Sony team for being so friendly to custom firmware loads.

I buy my hardware from physical stores. Why? Well, it means that the NSA and others get to spend extra effort to insert hardware modifications (backdoors), as well as the benefit of having a place to go to/set the ACCC on to get my rights under Australian Consumer Law.

My phone before last was a Nexus 5X. There were a lot of good things about this phone; the promise of fast charging via USB-C was one, as was the ever improving performance of the hardware and Android itself. Well… it just got progressively slower, and slower, and slower – as if it was designed to get near unusable by the time of the next Google phone announcement.

Inevitably, my 5X succumbed to the manufacturing defect that resulted in a boot loop. It would start booting, and then spontaneously reboot, in a loop, forever. The remedy? Replace it under warranty! That would take weeks, which isn’t a suitable timeframe in this day and age to be without a phone, so I mulled over buying a Google Pixel or my first ever iPhone (my iPhone owning friends assured me that if such a thing happens with an iPhone that Apple would have swapped it on the spot). Not wanting to give up a lot of the personal freedom that comes with the Android world, I spent the $100 more to get the Pixel, acutely aware that having a phone was now a near $1000/year habit.

The Google Pixel was a fantastic phone (except the price, they should have matched the iPhone price). The camera was the first phone camera I actually went “wow, I’m impressed” over. The eye-watering $279 to replace a cracked screen, the still eye-watering cost of USB-C cables, and the seat to process the HDR photos were all forgiven. It was a good phone. Until, that is, less than a year in, the battery was completely shot. It would power off when less than 40% and couldn’t last the trip from Melbourne airport to Melbourne city.

So, with a flagship phone well within the “reasonable quality” time that consumer law would dictate, I contacted Google after going through all the standard troubleshooting. Google agreed this was not normal and that the phone was defective. I was told that they would mail me a replacement, I could transfer my stuff over and then mail in the broken one. FANTASTIC!! This was soooo much better than the experience with the 5X.

Except that it wasn’t. A week later, I rang back to ask what was going on as I hadn’t received the replacement; it turns out Google had lied to me, I’d have to mail the phone to them and then another ten business days later I’d have a replacement. Errr…. no, I’ve been here before.

I rang the retailer, JB Hi-Fi; they said it would take them at least three weeks, which I told them was not acceptable nor a “reasonable timeframe” as dictated by consumer law.

So, with a bunch of travel imminent, I bought a big external USB-C battery and kept it constantly connected as without it the battery percentage went down faster than the minutes ticked over. I could sort it out once I was back from travel.

So, I’m back. In fact, I drove back from a weekend away and finally bit the bullet – I went to pick up a phone who’s manufacturer has a reputation of supporting their hardware.

I picked up an iPhone.

I figured I should write up how, why, my reasons, and experiences in switching phone platforms. I think my next post will be “Why iPhone and not a different Android”.

pwnm-sync: Synchronizing Patchwork and Notmuch

One of the core bits of infrastructure I use as a maintainer is Patchwork (I wrote about making it faster recently). Patchwork tracks patches sent to a mailing list, allowing me as a maintainer to track the state of them (New|Under Review|Changes Requested|Accepted etc), combine them into patch bundles, look at specific series, test results etc.

One of the core bits of software I use is my email client, notmuch. Most other mail clients are laughably slow and clunky, or just plain annoying for absorbing a torrent of mail and being able to deal with it or just plain ignore it but have it searchable locally.

You may think your mail client is better than notmuch, but you’re wrong.

A key feature of notmuch is tagging email. It doesn’t do the traditional “folders” but instead does tags (if you’ve used gmail, you’d be somewhat familiar).

One of my key work flows as a maintainer is looking at what patches are outstanding for a project, and then reviewing them. This should also be a core part of any contributor to a project too. You may think that a tag:unread and to:project-list@foo query would be enough, but that doesn’t correspond with what’s in patchwork.

So, I decided to make a tool that would add tags to messages in notmuch corresponding with the state of the patch in patchwork. This way, I could easily search for “patches marked as New in patchwork” (or Under Review or whatever) and see what I should be reviewing and looking at merging or commenting on.

Logically, this wouldn’t be that hard, just use the (new) Patchwork REST API to get the state of everything and issue the appropriate notmuch commands.

But just going one way isn’t that interesting, I wanted to be able to change the tags in notmuch and have them sync back up to Patchwork. So, I made that part of the tool too.

Introducing pwnm-sync: a tool to sync patchwork and notmuch.

notmuch-hello tag counts for pwnm-sync tagged
patches in patchwork
notmuch-hello tag counts for pwnm-sync tagged
patches in patchwork

With this tool I can easily see the patchwork state of any patch that I have in my notmuch database. For projects that I’m a maintainer on (i.e. can change the state of patches), If I update the patches of that email and run pwnm-sync again, it’ll update the state in patchwork.

I’ve been using this for a few weeks myself and it’s made my maintainer workflow significantly nicer.

It may also be useful to people who want to find what patches need some review.

The sync time is mostly dependent on how fast your patchwork instance is for API requests. Unfortunately, we need to make some improvements on the Patchwork side of things here, but a full sync of the above takes about 4 minutes for me. You can also add a –epoch option (with a date/time) to say “only fetch things from patchwork since that date” which makes things a lot quicker for incremental syncs. For me, I typically run it with an epoch of a couple of months ago, and that takes ~20-30 seconds to run. In this case, if you’ve locally updated a old patch, it will still sync that change up to patchwork.

Implementation Details

It’s a python3 script using the notmuch python bindings, the requests-futures module for asynchronous HTTP requests (so we can have the patchwork server assemble the next page of results while we process the previous one), and a local sqlite3 database to store state in so we can work out what changed locally / server side.

Getting it

Head to https://github.com/stewartsmith/pwnm-sync or just:

git clone https://github.com/stewartsmith/pwnm-sync.git

ccache and op-build

You may have heard of ccache (Compiler Cache) which saves you heaps of real world time when rebuilding a source tree that is rather similar to one you’ve recently built before. It’s really useful in buildroot based projects where you’re building similar trees, or have done a minor bump of some components.

In trying to find a commit which introduced a bug in op-build (OpenPOWER firmware), I noticed that hostboot wasn’t being built using ccache and we were always doing a full build. So, I started digging into it.

It turns out that a bunch of the perl scripts for parsing the Machine Readable Workbook XML in hostboot did a bunch of things like foreach $key (%hash) – which means that the code iterates over the items in hash order rather than an order that would produce predictable output such as “attribute name” or something. So… much messing with that later, I had hostboot generating the same output for the same input on every build.

Next step was to work out why I was still getting a lot of CCACHE misses. It turns out the default ccache size is 5GB. A full hostboot build uses around 7.1GB of that.

So, if building op-build with CCACHE, be sure to set both BR2_CCACHE=y in your config as well as something like BR2_CCACHE_INITIAL_SETUP="--max-size 20G"

Hopefully my patches hit hostboot and op-build soon.

How I do email (at home)

I thought I might write something up on how I’ve been doing email both at home and at work. I very much on purpose keep the two completely separate, and have slightly different use cases for both of them.

For work, I do not want mail on my phone. For personal mail, it turns out I do want this on my phone, which is currently an Android phone. Since my work and personal email is very separate, the volume of mail is really, really different. Personal mail is maybe a couple of dozen a day at most. Work is… orders of magnitude more.

Considering I generally prefer free software to non-free software, K9 Mail is the way I go on my phone. I have it set up to point at the IMAP and SMTP servers of my mail provider (FastMail). I also have a google account, and the gmail app works fine for the few bits of mail that go there instead of my regular account.

For my mail accounts, I do an INBOX ZERO like approach (in reality, I’m pretty much nowhere near zero, but today I learned I’m a lot closer than many colleagues). This means I read / respond / do / ignore mail and then move it to an ARCHIVE folder. K9 and Gmail both have the ability to do this easily, so it works well.

Additionally though, I don’t want to care about limits on storage (i.e. expire mail from the server after X days), nor do I want to rely on “the cloud” to be the only copy of things. I also don’t want to have to upload any of past mail I may be keeping around. I also generally prefer to use notmuch as a mail client on a computer.

For those not familiar with notmuch, it does tags on mail in Maildir, is extremely fast and can actually cope with a quantity of mail. It also has this “archive”/INBOX ZERO workflow which I like.

In order to get mail from FastMail and Gmail onto a machine, I use offlineimap. An important thing to do is to set “status_backend = sqlite” for each Account. It turns out I first hacked on sqlite for offlineimap status a bit over ten years ago – time flies. For each Account I also set presynchook = ~/Maildir/maildir-notmuch-presync (below) and a postsynchook = notmuch new. The presynchook is run before we sync, and its job is to move files around based on the tags in notmuch and the postsynchook lets notmuch catch any new mail that’s been fetched.

My maildir-notmuch-presync hook script is:

#!/bin/bash
notmuch search --output=files not tag:inbox and folder:fastmail/INBOX|xargs -I'{}' mv '{}' "$HOME/Maildir/INBOX/fastmail/Archive/cur/"

notmuch search --output=files folder:fastmail/INBOX and tag:spam |xargs -I'{}' mv '{}' "$HOME/Maildir/INBOX/fastmail/Spam/cur/"
ARCHIVE_DIR=$HOME/Maildir/INBOX/`date +"%Y%m"`/cur/
mkdir -p $ARCHIVE_DIR
notmuch search --output=files folder:fastmail/Archive and date:..90d and not tag:flagged | xargs -I'{}' mv '{}' "$ARCHIVE_DIR"

# Gmail
notmuch search --output=files not tag:inbox and folder:gmail/INBOX|grep 'INBOX/gmail/INBOX/' | xargs -I'{}' rm '{}'
notmuch search --output=files folder:gmail/INBOX and tag:spam |xargs -I'{}' mv '{}' "$HOME/Maildir/INBOX/gmail/[Gmail].Spam/cur/"

So This keeps 90 days of mail on the fastmail server, and archives older mail off into month based archive dirs. This is simply to keep directory sizes not too large, you could put everything in one directory… but at some point that gets a bit silly.

I don’t think this is all the most optimal setup I could have, but it does let me read and answer mail on my phone and desktop (as well as use a web client if I want to). There is a bit of needless copying of messages by offlineimap under certain circumstances, but I don’t get enough personal mail for it to be a problem.

Installing Windows on a USB key

For some unknown reason, the Windows installer doesn’t let you install to a USB key. Luckily, there’s a simple workaround. It turns out that only the very first step of installation cares about that. So, if you can fool it (say, by running in qemu), you can have a USB key with a Windows install rather than having to dual boot on your hard disk (e.g. if you run Linux and want all that fast in-built SSD space for Linux)

  1. Download the Windows Installer: https://www.microsoft.com/en-au/software-download/windows10ISO
  2. Start the installer in a VM, with the USB key passed through to the VM as the hard disk (or use a disk image the same size as your USB key for transfer with a utility such as ‘dd’ later). e.g. do:
    qemu-img create -f raw win-installed.img 50G
    
    qemu-system-x86_64 --enable-kvm -m 8G -cdrom Downloads/Win10_1709_English_x64.iso -hda win-installed.img -boot d
  3. At the first reboot of the installer, instead of letting it boot, stop the VM. You are going to copy the install at this state to the USB key.
  4. Boot from the USB key, go through the rest of the installer. You’re done!

Updating Intel Management Engine firmware on a Lenovo without a Windows Install

This is how I updated my Intel ME firmware on my Lenovo X1 Carbon Gen 4 (reports say this also has worked for Gen5 machines). These instructions are pretty strongly inspired by https://news.ycombinator.com/item?id=15744152

Why? Intel security advisory and CVE-2017-5705, CVE-2017-5708, CVE-2017-5711, and CVE-2017-5712 should be reason enough.

You will need:

  • To download about 3.5GB of stuff
  • A USB key
  • Linux installed
  • WINE or a Windows box to run two executables (because self extracting archives are a thing on Windows apparently)
  • A bit of technical know-how. A shell prompt shouldn’t scare you too hard.

Steps:

  1. Go to https://www.microsoft.com/en-au/software-download/windows10ISO and download the 32-bit ISO.
  2. Mount the ISO as a loopback device (e.g. by right clicking and choosing to mount, or by doing ‘sudo mount -o loop,ro file.iso /mnt’
  3. Go to Lenovo web site for Drivers & Software for your laptop, under Chipset, there’s ME Firmware and Software downloads You will need both. It looks like this:
  4. Run both exe files with WINE or on a windows box to extract the archives, you do not need to run the installers at the end.
  5. you now need to extract the management engine drivers. You can do this in  ~/.wine/drive_c/DRIVERS/WIN/AMT, with “cabextract SetupME.exe” or (as suggested in the comments) you can use the innoextract utility (from linux) to extract them (a quick check shows this to work)
  6. Save off HECI_REL folder, it’s the only extracted thing you’ll need.
  7. Go and install https://wimlib.net/ – we’re going to use it to create the boot disk. (it may be packaged for your distro).
    If you don’t have the path /usr/lib/syslinux/modules/bios on your system but you do have /usr/share/syslinux/modules/bios – you will need to change a bit of the file programs/mkwinpeimg.in to point to the /usr/share locations rather than /usr/lib before you install wimlib. This probably isn’t needed if you’re installing from packages, but may be requried if you’re on, say, Fedora.
  8. Copy ~/.wine/drive_c/DRIVERS to a new folder, e.g. “winpe_overlay” (or copy from the Windows box you extracted things on)
  9. Use mkwinpeimg to create the boot disk, pointing it to the mounted Windows 10 installer and the “winpe_overlay”:
    mkwinpeimg -W /path/to/mounted/windows10-32bit-installer/ -O winpe_overlay disk.img
  10. Use ‘dd’ to write it to your USB key
  11. Reboot, go into BIOS and turn Secure Boot OFF, Legacy BIOS ON, and AMT ON.
  12. Boot off the USB disk you created.
  13. In the command prompt of the booted WinPE environment, run the following to start the update:
    cd \
    cd HECI_REL\win10
    drvload heci.inf
    cd \
    cd win\me
    MEUpdate.cmd

    It should look something like this:

  14. Reboot, go back into BIOS and change your settings back to how you started.

op-test-framework: Let’s break the console!

One of the things I’ve been working on fairly quietly is the test suite for OpenPOWER firmware: op-test-framework. I’ve approach things I’m hacking on from the goal of “when I merge patches into skiboot, can I be confident that I haven’t merged something that’s broken existing functionality?”

By testing host firmware, we incidentally (as well as on purpose) test a whole bunch of BMC functionality. One bit of functionality we rely on a lot is the host “serial” console. Typically, this is exposed to the user over IPMI Serial Over LAN (SOL), or on OpenBMC it’s also exposed as something you can ssh to (which proves to be both faster and more reliable than IPMI, not to mention there’s some remote semblance of security).

When running through some tests, I noticed something pretty odd, it appeared as though we were sometimes missing some console output on larger IOs. This usually isn’t a problem as when we’re using expect(1) (or the python equivalent pexpect) we end up having all sorts of delays here there and everywhere to work around all the terrible things you hope you never learn. So… how could I test that? Well.. what about checking the output of something like dd if=/dev/zero bs=1024 count=16|hexdump -C to see if we get the full output?

Time to add a test to op-test-framework! Adding such a test is pretty easy. If we look at the source of the test I added, we can see what happens (source here).

class Console():
    bs = 1024
    count = 8
    def setUp(self):
        conf = OpTestConfiguration.conf
        self.bmc = conf.bmc()
        self.system = conf.system()

    def runTest(self):
        self.system.goto_state(OpSystemState.PETITBOOT_SHELL)
        console = self.bmc.get_host_console()
        self.system.host_console_unique_prompt()
        bs = self.bs
        count = self.count
        self.assertTrue( (bs*count)%16 == 0, "Bug in test writer. Must be multiple of 16 bytes: bs %u count %u / 16 = %u" % (bs, count, (bs*count)%16))
        try:
            zeros = console.run_command("dd if=/dev/zero bs=%u count=%u|hexdump -C -v" % (bs, count), timeout=120)
        except CommandFailed as cf:
            self.assertEqual(cf.exitcode, 0)
        self.assertTrue( len(zeros) == 3+(count*bs)/16, "Unexpected length of zeros %u" % (len(zeros)))

First thing you’ll notice is that this looks like a Python unittest. It’s because it is. The unittest infrastructure was a path of least resistance, so we started with it. This class isn’t the one that’s actually run, we do a little bit of inheritance magic in order to run the same test with different parameters (see https://github.com/open-power/op-test-framework/blob/6c74fb0fb0993ae8ae1a7aa62ec58e57c0080686/testcases/Console.py#L50)

class Console8k(Console, unittest.TestCase):
    bs = 1024
    count = 8

class Console16k(Console, unittest.TestCase):
    bs = 1024
    count = 16

class Console32k(Console, unittest.TestCase):
    bs = 1024
    count = 32

The setUp() function is pure boiler plate, we grab some objects from the configuration of the test run, namely information about the BMC and the system itself, so we can do things to both. The real magic happens in runTest().

op-test-framework tracks the state of the machine being tested across each test. This means that if we’re executing 101 tests in the petitboot shell, we don’t need to do 101 separate boots to petitboot. The self.system.goto_state(OpSystemState.PETITBOOT_SHELL) statement says “Please ensure the system is booted to the petitboot shell”. Other states include OFF (obvious) and OS, which is when the machine is booted to the OS.

The next two lines ensure we can run commands on the console (where console is IPMI Serial over LAN or other similar connection, such as the SSH console provided by OpenBMC):

console = self.bmc.get_host_console()
self.system.host_console_unique_prompt()

The host_console_unique_prompt() call is a bit ugly, and I’m hoping we fix the APIs so that this isn’t needed. Basically, it sets things up so that pexpect will work properly.

The bit that does the work is the try/except block along with the assertTrue. We don’t currently check that the content is all correct, we just check we got the right *amount* of content.

It turns out, this check is enough to reveal a bug that turns out to be deep in the core Linux TTY layer, and has caused Jeremy some amount of fun (for certain values of fun).

Want to know more about how the console works? Jeremy blogged on it.

Fedora 25 + Lenovo X1 Carbon 4th Gen + OneLink+ Dock

As of May 29th 2017, if you want to do something crazy like use *both* ports of the OneLink+ dock to use monitors that aren’t 640×480 (but aren’t 4k), you’re going to need a 4.11 kernel, as everything else (for example 4.10.17, which is the latest in Fedora 25 at time of writing) will end you in a world of horrible, horrible pain.

To install, run this:

sudo dnf install \
https://kojipkgs.fedoraproject.org//packages/kernel/4.11.3/200.fc25/x86_64/kernel-4.11.3-200.fc25.x86_64.rpm \
https://kojipkgs.fedoraproject.org//packages/kernel/4.11.3/200.fc25/x86_64/kernel-core-4.11.3-200.fc25.x86_64.rpm \
https://kojipkgs.fedoraproject.org//packages/kernel/4.11.3/200.fc25/x86_64/kernel-cross-headers-4.11.3-200.fc25.x86_64.rpm \
https://kojipkgs.fedoraproject.org//packages/kernel/4.11.3/200.fc25/x86_64/kernel-devel-4.11.3-200.fc25.x86_64.rpm \
https://kojipkgs.fedoraproject.org//packages/kernel/4.11.3/200.fc25/x86_64/kernel-modules-4.11.3-200.fc25.x86_64.rpm \
https://kojipkgs.fedoraproject.org//packages/kernel/4.11.3/200.fc25/x86_64/kernel-tools-4.11.3-200.fc25.x86_64.rpm \
https://kojipkgs.fedoraproject.org//packages/kernel/4.11.3/200.fc25/x86_64/kernel-tools-libs-4.11.3-200.fc25.x86_64.rpm \
https://kojipkgs.fedoraproject.org//packages/kernel/4.11.3/200.fc25/x86_64/perf-4.11.3-200.fc25.x86_64.rpm

This grabs a kernel that’s sitting in testing and isn’t yet in the main repositories. However, I can now see things on monitors, rather than 0 to 1 monitor (most often 0). You can also dock/undock and everything doesn’t crash in a pile of fail.

I remember a time when you could fairly reliably buy Intel hardware and have it “just work” with the latest distros. It’s unfortunate that this is no longer the case, and it’s more of a case of “wait six months and you’ll still have problems”.

Urgh.

(at least Wayland and X were bug for bug compatible?)

j-core + Numato Spartan 6 board + Fedora 25

A couple of changes to http://j-core.org/#download_bitstream made it easy for me to get going:

  • In order to make ModemManager not try to think it’s a “modem”, create /etc/udev/rules.d/52-numato.rules with the following content:
    # Make ModemManager ignore Numato FPGA board
    ATTRS{idVendor}=="2a19", ATTRS{idProduct}=="1002", ENV{ID_MM_DEVICE_IGNORE}="1"
  • You will need to install python3-pyserial and minicom
  • The minicom command line i used was:
    sudo stty -F /dev/ttyACM0 -crtscts && minicom -b 115200 -D /dev/ttyACM0

and along with the instructions on j-core.org, I got it to load a known good build.

Books referenced in my Organizational Change talk at LCA2017

All of these are available as Kindle books, but I’m sure you can get 3D copies too:

The Five Dysfunctions of a Team: A Leadership Fable by Patrick M. Lencioni
Leading Change by John P. Kotter
Who Says Elephants Can’t Dance? Louis V. Gerstner Jr.
Nonviolent Communication: A language of Life by Marshall B. Rosenberg and Arun Gandhi

Fast Reset, Trusted Boot and the security of /sbin/reboot

In OpenPOWER land, we’ve been doing some work on Secure and Trusted Boot while at the same time doing some work on what we call fast-reset (or fast-reboot, depending on exactly what mood someone was in at any particular time…. we should start being a bit more consistent).

The basic idea for fast-reset is that when the OS calls OPAL reboot, we gather all the threads in the system using a combination of patching the reset vector and soft-resetting them, then cleanup a few bits of hardware (we do re-probe PCIe for example), and reload & restart the bootloader (petitboot).

What this means is that typing “reboot” on the command line goes from a ~90-120+ second affair (through firmware to petitboot, linux distros still take ages to shut themselves down) down to about a 20 second affair (to petitboot).

If you’re running a (very) recent skiboot, you can enable it with a special hidden NVRAM configuration option (although we’ll likely enable it by default pretty soon, it’s proving remarkably solid). If you want to know what that NVRAM option is… Use the source, Luke! (or git history, but I’ve yet to see a neat Star Wars reference referring to git commit logs).

So, there’s nothing like a demo. Here’s a demo with Ubuntu running off an NVMe drive on an IBM S822LC for HPC (otherwise known as Minsky or Garrison) which was running the HTX hardware exerciser, through fast-reboot back into Petitboot and then booting into Ubuntu and auto-starting the exerciser (HTX) again.

Apart from being stupidly quick when compared to a full IPL (Initial Program Load – i.e. boot), since we’re not rebooting out of band, we have no way to reset the TPM, so if you’re measuring boot, each subsequent fast-reset will result in a different set of measurements.

This may be slightly confusing, but it’s not really a problem. You see, if a machine is compromised, there’s nothing stopping me replacing /sbin/reboot with something that just prints things to the console that look like your machine rebooted but in fact left my rootkit running. Indeed, fast-reset and a full IPL should measure different values in the TPM.

It also means that if you ever want to re-establish trust in your OS, never do a reboot from the host – always reboot out of band (e.g. from a BMC). This, of course, means you’re trusting your BMC to not be compromised, which I wouldn’t necessarily do if you suspect your host has been.

Failed Retro emulation attempts

For reasons that should escape everybody, I went back and looked at some old Operating Systems a little while ago: OS/2 Warp, Windows 3.11 and Microsoft Chicago. So, I went on a little adventure this weekend, largely in failure though.

Windows NT 3.51

This was the first version (err… no, second I think) of Windows NT that I ever used.

Lesson 1: qemu doesn’t expose a SCSI adapter that isn’t virtio-scsi (and I have a feeling there aren’t Windows NT 3.51 installer driver floppies for virtio-scsi)

screenshot_winnt3-1_2016-10-29_191707Lesson 2: OMG I’m so glad I don’t have to wait for things to be read off floppy disks anymore:

screenshot_winnt3-51_2016-10-29_193833Lesson 3: I’d forgotten that the Windows directory on NT 3.51 was different to every other Windows NT version, being \WINNT35

screenshot_winnt3-51_2016-10-29_194040screenshot_winnt3-51_2016-10-29_194139Lesson 3: Yeah, sometimes there’s just fail.

screenshot_winnt3-51_2016-10-29_194147

Windows NT 4.0

This brought the UI of Windows 95 to Windows NT. It was a thing. It required a fairly beefy PC for the day, but it could use two CPUs if you were that amazingly rich (dual Pentium Pro was a thing)

Lesson 1: Windows NT 4 does not like 8GB disks. My idea of “creating a small disk for a VM for an old OS as it probably won’t work well with a 20GB disk” needs to be adjusted. I’m writing this on a system with 8 times more RAM than what I ended up using for a disk for Windows NT 4.

screenshot_winnt4-0_2016-10-29_192840But hey, back to \WINNT rather than \WINNT35 or \WINDOWS

screenshot_winnt4-0_2016-10-29_192937

Lesson 2: Sometimes, full system emulation turns out to be a better idea:

screenshot_winnt4-0_2016-10-29_192418screenshot_winnt4-0_2016-10-29_192304Lesson 3: Remember when Windows couldn’t actually format NTFS in the installer and it installed to FAT and then converted to NTFS? No? Well, aren’t you lucky.

screenshot_winnt4-0_2016-10-29_193102screenshot_winnt4-0_2016-10-29_193235Apple Rhapsody DR2

Before there was MacOS X, there was a project called Rhapsody. This was to take NeXTStep (from NeXT, which Apple bought to get both NeXTStep and Steve Jobs as every internal “let’s replace the aging MacOS” project had utterly failed for the past ten years). Rhapsody was not going to be backwards compatible until everybody said that was a terrible idea and the Blue Box was added (known as Classic) – basically, a para-virtualized VM running the old MacOS 9.

Anyway, for the first two developer releases, it was also available on x86 (not just PowerPC). This was probably because a PowerPC port to Macs was a lot newer than the x86 port.

So, I dusted off the (virtual) Boot and Driver floppies and fired up qemu…..

screenshot_rhapsodydr2_2016-10-29_172732Yeah, MacOS X got a better installer…

screenshot_rhapsodydr2_2016-10-29_173227Hopeful!

screenshot_rhapsodydr2_2016-10-29_182218This was after I decided that using KVM was a bad Idea:

screenshot_rhapsodydr2_2016-10-29_172143

screenshot_rhapsodydr2_2016-10-29_173333Nope… and this is where we stop. There seems to be some issue with ATA drivers? I honestly can’t be bothered to debug it (although… for the PowerPC version… maybe).

MacOS 9.2

Well.. this goes a lot better now thanks to a whole bunch of patches hitting upstream Qemu recently (thanks Ben!)

screenshot-from-2016-10-29-20-37-24screenshot-from-2016-10-29-20-42-10Yeah, I was kind of tempted to set up Outlook Express to read my email…. But running MacOS 9 was way too successful, so I had to stop there :)

Microsoft Chicago – retro in qemu!

So, way back when (sometime in the early 1990s) there was Windows 3.11 and times were… for Workgroups. There was this Windows NT thing, this OS/2 thing and something brewing at Microsoft to attempt to make the PC less… well, bloody awful for a user.

Again, thanks to abandonware sites, it’s possible now to try out very early builds of Microsoft Chicago – what would become Windows 95. With the earliest build I could find (build 56), I set to work. The installer worked from an existing Windows 3.11 install.

I ended up using full system emulation rather than normal qemu later on, as things, well, booted in full emulation and didn’t otherwise (I was building from qemu master… so it could have actually been a bug fix).

chicago-launch-setupMmmm… Windows 3.11 File Manager, the fact that I can still use this is a testament to something, possibly too much time with Windows 3.11.

chicago-welcome-setupchicago-setupUnfortunately, I didn’t have the Plus Pack components (remember Microsoft Plus! ?- yes, the exclamation mark was part of the product, it was the 1990s.) and I’m not sure if they even would have existed back then (but the installer did ask).

chicago-install-dirObviously if you were testing Chicago, you probably did not want to upgrade your working Windows install if this was a computer you at all cared about. I installed into C:\CHICAGO because, well – how could I not!

chicago-installingThe installation went fairly quickly – after all, this isn’t a real 386 PC and it doesn’t have of-the-era disks – everything was likely just in the linux page cache.

chicago-install-networkI didn’t really try to get network going, it may not have been fully baked in this build, or maybe just not really baked in this copy of it, but the installer there looks a bit familiar, but not like the Windows 95 one – maybe more like NT 3.1/3.51 ?

But at the end… it installed and it was time to reboot into Chicago:
chicago-bootSo… this is what Windows 95 looked like during development back in July 1993 – nearly exactly two years before release. There’s some Windows logos that appear/disappear around the place, which are arguably much cooler than the eventual Windows 95 boot screen animation. The first boot experience was kind of interesting too:
Screenshot from 2016-08-07 20-57-00Luckily, there was nothing restricting the beta site ID or anything. I just entered the number 1, and was then told it needed to be 6 digits – so beta site ID 123456 it is! The desktop is obviously different both from Windows 3.x and what ended up in Windows 95.

Screenshot from 2016-08-07 20-57-48Those who remember Windows 3.1 may remember Dr Watson as an actual thing you could run, but it was part of the whole diagnostics infrastructure in Windows, and here (as you can see), it runs by default. More odd is the “Switch To Chicago” task (which does nothing if opened) and “Tracker”. My guess is that the “Switch to Chicago” is the product of some internal thing for launching the new UI. I have no ideawhat the “Tracker” is, but I think I found a clue in the “Find File” app:

Screenshot from 2016-08-13 14-10-10Not only can you search with regular expressions, but there’s “Containing text”, could it be indexing? No, it totally isn’t. It’s all about tracking/reporting problems:

Screenshot from 2016-08-13 14-15-19Well, that wasn’t as exciting as I was hoping for (after all, weren’t there interesting database like file systems being researched at Microsoft in the early 1990s?). It’s about here I should show the obligatory About box:
Screenshot from 2016-08-07 20-58-10It’s… not polished, and there’s certainly that feel throughout the OS, it’s not yet polished – and two years from release: that’s likely fair enough. Speaking of not perfect:

Screenshot from 2016-08-07 20-59-03When something does crash, it asks you to describe what went wrong, i.e. provide a Clue for Dr. Watson:

Screenshot from 2016-08-13 12-09-22

But, most importantly, Solitaire is present! You can browse the Programs folder and head into Games and play it! One odd tihng is that applications have two >> at the end, and there’s a “Parent Folder” entry too.

Screenshot from 2016-08-13 12-01-24Solitair itself? Just as I remember.

Screenshot from 2016-08-07 21-21-27Notably, what is missing is anything like the Start menu, which is probably the key UI element introduced in Windows 95 that’s still with us today. Instead, you have this:

Screenshot from 2016-08-13 11-55-15That’s about the least exciting Windows menu possible. There’s the eye menu too, which is this:

Screenshot from 2016-08-13 11-56-12More unfinished things are found in the “File cabinet”, such as properties for anything:
Screenshot from 2016-08-13 12-02-00But let’s jump into Control Panels, which I managed to get to by heading to C:\CHICAGO\Control.sys – which isn’t exactly obvious, but I think you can find it through Programs as well.Screenshot from 2016-08-13 12-02-41Screenshot from 2016-08-13 12-05-40The “Window Metrics” application is really interesting! It’s obvious that the UI was not solidified yet, that there was a lot of experimenting to do. This application lets you change all sorts of things about the UI:

Screenshot from 2016-08-13 12-05-57My guess is that this was used a lot internally to twiddle things to see what worked well.

Another unfinished thing? That familiar Properties for My Computer, which is actually “Advanced System Features” in the control panel, and from the [Sample Information] at the bottom left, it looks like we may not be getting information about the machine it’s running on.

Screenshot from 2016-08-13 12-06-39

You do get some information in the System control panel, but a lot of it is unfinished. It seems as if Microsoft was experimenting with a few ways to express information and modify settings.

Screenshot from 2016-08-13 12-07-13But check out this awesome picture of a hard disk for Virtual Memory:

Screenshot from 2016-08-13 12-07-47The presence of the 386 Enhanced control panel shows how close this build still was to Windows 3.1:

Screenshot from 2016-08-13 12-08-08At the same time, we see hints of things going 32 bit – check out the fact that we have both Clock and Clock32! Notepad, in its transition to 32bit, even dropped the pad and is just Note32!

Screenshot from 2016-08-13 12-11-10Well, that’s enough for today, time to shut down the machine:
Screenshot from 2016-08-13 12-15-45

Windows 3.11 nostalgia

Because OS/2 didn’t go so well… let’s try something I’m a lot more familiar with. To be honest, the last time I in earnest used Windows on the desktop was around 3.11, so I kind of know it back to front (fun fact: I’ve read the entire Windows 3.0 manual).

It turns out that once you have MS-DOS installed in qemu, installing Windows 3.11 is trivial. I didn’t even change any settings for Qemu, I just basically specced everything up to be very minimal (50MB RAM, 512mb disk).

win31-setupwin31-disk4win31-installedWindows 3.11 was not a fun time as soon as you had to do anything… nightmares of drivers, CONFIG.SYS and AUTOEXEC.BAT plague my mind. But hey, it’s damn fast on a modern processor.