My POWER9 CPU Core Layout

So, following on from my post on Sensors on the Blackbird (and thus Power9), I mentioned that when you look at the temperature sensors for each CPU core in my 8-core POWER9 chip, they’re not linear numbers. Let’s look at what that means….

stewart@blackbird9$ sudo ipmitool sensor | grep core
 p0_core0_temp            | na                                                                                                               
 p0_core1_temp            | na                                                                                                               
 p0_core2_temp            | na                                                                                                               
 p0_core3_temp            | 38.000                                                                                                           
 p0_core4_temp            | na          
 p0_core5_temp            | 38.000      
 p0_core6_temp            | na          
 p0_core7_temp            | 38.000      
 p0_core8_temp            | na          
 p0_core9_temp            | na          
 p0_core10_temp           | na          
 p0_core11_temp           | 37.000      
 p0_core12_temp           | na          
 p0_core13_temp           | na          
 p0_core14_temp           | na          
 p0_core15_temp           | 37.000      
 p0_core16_temp           | na          
 p0_core17_temp           | 37.000      
 p0_core18_temp           | na          
 p0_core19_temp           | 39.000      
 p0_core20_temp           | na          
 p0_core21_temp           | 39.000      
 p0_core22_temp           | na          
 p0_core23_temp           | na        

You can see I have eight CPU cores in my Blackbird system. The reason the 8 CPU cores are core 3, 5, 7, 11, 15, 17, 19, and 21 rather than 0-8 or something is that these represent the core numbers on the physical die, and the die is a 24 core die. When you’re making a chip as big and as complex as modern high performance CPUs, not all of the chips coming out of your fab are going to be perfect, so this is how you get different models in the line with only one production line.

Weirdly, the output from the hwmon sensors and why there’s a “core 24” and a “core 28”. That’s just… wrong. What it is, however, is right if you think of 8*4=32. This is a product of Linux thinking that Thread=Core in some ways. So, yeah, this numbering is the first thread of each logical core.

[stewart@blackbird9 ~]$ sensors|grep -i core
 Chip 0 Core 0:            +39.0°C  (lowest = +25.0°C, highest = +71.0°C)
 Chip 0 Core 4:            +39.0°C  (lowest = +26.0°C, highest = +66.0°C)
 Chip 0 Core 8:            +39.0°C  (lowest = +27.0°C, highest = +67.0°C)
 Chip 0 Core 12:           +39.0°C  (lowest = +26.0°C, highest = +67.0°C)
 Chip 0 Core 16:           +39.0°C  (lowest = +25.0°C, highest = +67.0°C)
 Chip 0 Core 20:           +39.0°C  (lowest = +26.0°C, highest = +69.0°C)
 Chip 0 Core 24:           +39.0°C  (lowest = +27.0°C, highest = +67.0°C)
 Chip 0 Core 28:           +39.0°C  (lowest = +27.0°C, highest = +64.0°C)

But let’s ignore that, go from the IPMI sensors (which also match what the OCC shows with “occtoolp9 -LS” (see below).

$ ./occtoolp9 -SL
Sensor Details: (found 86 sensors, details only for Status of 0x00)                                           
     GUID Name             Sample     Min    Max U    Stat   Accum     UpdFreq   ScaleFactr   Loc   Type 
....
   0x00ED TEMPC03………     47      29     47 C    0x00 0x00037CF2 0x00007D00 0x00000100 0x0040 0x0008
   0x00EF TEMPC05………     37      26     39 C    0x00 0x00014E53 0x00007D00 0x00000100 0x0040 0x0008
   0x00F1 TEMPC07………     46      28     46 C    0x00 0x0001A777 0x00007D00 0x00000100 0x0040 0x0008
   0x00F5 TEMPC11………     44      27     45 C    0x00 0x00018402 0x00007D00 0x00000100 0x0040 0x0008
   0x00F9 TEMPC15………     36      25     43 C    0x00 0x000183BC 0x00007D00 0x00000100 0x0040 0x0008
   0x00FB TEMPC17………     38      28     41 C    0x00 0x00015474 0x00007D00 0x00000100 0x0040 0x0008
   0x00FD TEMPC19………     43      27     44 C    0x00 0x00016589 0x00007D00 0x00000100 0x0040 0x0008
   0x00FF TEMPC21………     36      30     40 C    0x00 0x00015CA9 0x00007D00 0x00000100 0x0040 0x0008

So what does that mean for physical layout? Well, like all modern high performance chips, the POWER9 is modular, with a bunch of logic being replicated all over the die. The most notable duplicated parts are the core (replicated 24 times!) and cache structures. Less so are memory controllers and PCI hardware.

P9 chip layout from page 31 of the POWER9 Register Specification

See that each core (e.g. EC00 and EC01) is paired with the cache block (EC00 and EC01 with EP00). That’s two POWER9 cores with one 512KB L2 cache and one 10MB L3 cache.

You can see the cache layout (including L1 Instruction and Data caches) by looking in sysfs:

$ for i in /sys/devices/system/cpu/cpu0/cache/index*/; \
  do echo -n $(cat $i/level) $(cat $i/size) $(cat $i/type); \
  echo; done
 1 32K Data
 1 32K Instruction
 2 512K Unified
 3 10240K Unified

So, what does the layout of my POWER9 chip look like? Well, thanks to the power of graphics software, we can cross some cores out and look at the topology:

My 8-core POWER9 CPU in my Raptor Blackbird

If I run some memory bandwidth benchmarks, I can see that you can see the L3 cache capacity you’d assume from the above diagram: 80MB (10MB/core). Let’s see:

[stewart@blackbird9 lmbench3]$ for i in 5M 10M 20M 30M 40M 50M 60M 70M 80M 500M; \
  do echo -n "$i   "; \
  ./bin/bw_mem -N 100  $i rd; \
done
  5M    5.24 63971.98
 10M   10.49 31940.14
 20M   20.97 17620.16
 30M   31.46 18540.64
 40M   41.94 18831.06
 50M   52.43 17372.03
 60M   62.91 16072.18
 70M   73.40 14873.42
 80M   83.89 14150.82
 500M 524.29 14421.35

If all the cores were packed together, I’d expect that cliff to be a lot sooner.

So how does this compare to other machines I have around? Well, let’s look at my Ryzen 7. Specifically, a “AMD Ryzen 7 1700 Eight-Core Processor”. The cache layout is:

$ for i in /sys/devices/system/cpu/cpu0/cache/index*/; \
  do echo -n $(cat $i/level) $(cat $i/size) $(cat $i/type); \
  echo; \
done
 1 32K Data
 1 64K Instruction
 2 512K Unified
 3 8192K Unified

And then the performance benchmark similar to the one I ran above on the POWER9 (lower numbers down low as 8MB is less than 10MB)

$ for i in 4M 8M 16M 24M 32M 40M 48M 56M 64M 72M 80M 500M; \
  do echo -n "$i   "; ./bin/x86_64-linux-gnu/bw_mem -N 10  $i rd;\
done
  4M    4.19 61111.04
  8M    8.39 28596.55
 16M   16.78 21415.12
 24M   25.17 20153.57
 32M   33.55 20448.20
 40M   41.94 20940.11
 48M   50.33 20281.39
 56M   58.72 21600.24
 64M   67.11 21284.13
 72M   75.50 20596.18
 80M   83.89 20802.40
 500M 524.29 21489.27

And my laptop? It’s a four core part, specifically a “Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz” with a cache layout like:

$ for i in /sys/devices/system/cpu/cpu0/cache/index*/; \
   do echo -n $(cat $i/level) $(cat $i/size) $(cat $i/type); \
     echo; \
   done
   1 32K Data
   1 32K Instruction
   2 256K Unified
   3 6144K Unified 
$ for i in 3M 6M 12M 18M 24M 30M 36M 42M 500M; \
  do echo -n "$i   "; ./bin/x86_64-linux-gnu/bw_mem -N 10  $i rd;\
done
  3M    3.15 48500.24
  6M    6.29 27144.16
 12M   12.58 18731.80
 18M   18.87 17757.74
 24M   25.17 17154.12
 30M   31.46 17135.87
 36M   37.75 16899.75
 42M   44.04 16865.44
 500M 524.29 16817.10

I’m not sure what performance conclusions we can realistically draw from these curves, apart from “keeping workload to L3 cache is cool”, and “different chips have different cache hardware”, and “I should probably go and read and remember more about the microarchitectural characteristics of the cache hardware in Ryzen 7 hardware and 10th gen Intel Core hardware”.

OCC and Sensors on the Raptor Blackbird (and other POWER9 systems)

This post we’re going to look at three different ways to look at various sensors in the Raptor Blackbird system. The Blackbird is a single socket uATX board for the POWER9 processor. One advantage of the system is completely open source firmware, so you can (like I have): build your own firmware. So, this is my Blackbird running my most recent firmware build (the BMC is running the 2.00 release from Raptor).

Sensors over IPMI

One way to get the sensors is over IPMI. This can be done either in-band (as in, from the OS running on the blackbird), or over the network.

stewart@blackbird9$ sudo ipmitool sensor |head
occ                      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
 occ0                     | 0x0        | discrete   | 0x0200| na        | na        | na        | na        | na        | na        
 occ1                     | 0x0        | discrete   | 0x0100| na        | na        | na        | na        | na        | na        
 p0_core0_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core1_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core2_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core3_temp            | 38.000     | degrees C  | ok    | na        | -40.000   | na        | 78.000    | 90.000    | na        
 p0_core4_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core5_temp            | 38.000     | degrees C  | ok    | na        | -40.000   | na        | 78.000    | 90.000    | na        
 p0_core6_temp            | na         |            | na    | na        | na        | na        | na        | na        | na    

It’s kind of annoying to read there, so standard unix tools to the rescue!

stewart@blackbird9$ sudo ipmitool sensor | cut -d '|' -f 1,2
 occ                      | na                                                                                                               
 occ0                     | 0x0                                                                                                              
 occ1                     | 0x0                                                                                                              
 p0_core0_temp            | na                                                                                                               
 p0_core1_temp            | na                                                                                                               
 p0_core2_temp            | na                                                                                                               
 p0_core3_temp            | 38.000                                                                                                           
 p0_core4_temp            | na          
 p0_core5_temp            | 38.000      
 p0_core6_temp            | na          
 p0_core7_temp            | 38.000      
 p0_core8_temp            | na          
 p0_core9_temp            | na          
 p0_core10_temp           | na          
 p0_core11_temp           | 37.000      
 p0_core12_temp           | na          
 p0_core13_temp           | na          
 p0_core14_temp           | na          
 p0_core15_temp           | 37.000      
 p0_core16_temp           | na          
 p0_core17_temp           | 37.000      
 p0_core18_temp           | na          
 p0_core19_temp           | 39.000      
 p0_core20_temp           | na          
 p0_core21_temp           | 39.000      
 p0_core22_temp           | na          
 p0_core23_temp           | na          
 p0_vdd_temp              | 40.000 
 dimm0_temp               | 35.000      
 dimm1_temp               | na          
 dimm2_temp               | na          
 dimm3_temp               | na          
 dimm4_temp               | 38.000      
 dimm5_temp               | na          
 dimm6_temp               | na          
 dimm7_temp               | na          
 dimm8_temp               | na          
 dimm9_temp               | na          
 dimm10_temp              | na          
 dimm11_temp              | na          
 dimm12_temp              | na          
 dimm13_temp              | na          
 dimm14_temp              | na          
 dimm15_temp              | na          
 fan0                     | 1200.000    
 fan1                     | 1100.000    
 fan2                     | 1000.000    
 p0_power                 | 33.000      
 p0_vdd_power             | 5.000       
 p0_vdn_power             | 9.000       
 cpu_1_ambient            | 30.600      
 pcie                     | 27.000      
 ambient                  | 26.000  

You can see that I have 3 fans, two DIMMs (although why it lists 16 possible DIMMs for a two DIMM slot board is a good question!), and eight CPU cores. More on why the layout of the CPU cores is the way it is in a future post.

The code path for reading these sensors is interesting, it’s all from the BMC, so we’re having the OCC inside the P9 read things, which the BMC then reads, and then passes back to the P9. On the P9 itself, each sensor is a call all the way to firmware and back! In fact, we can look at it in perf:

$ sudo perf record -g ipmitool sensor
$ sudo perf report --no-children
“ipmitool sensors” perf report

What are the 0x300xxxxx addresses? They’re the OPAL firmware (i.e. skiboot). We can look up the symbols easily, as the firmware exposes them to the kernel, which then plonks it in sysfs:

[stewart@blackbird9 ~]$ sudo head /sys/firmware/opal/symbol_map 
[sudo] password for stewart: 
0000000000000000 R __builtin_kernel_end
0000000000000000 R __builtin_kernel_start
0000000000000000 T __head
0000000000000000 T _start
0000000000000010 T fdt_entry
00000000000000f0 t boot_sem
00000000000000f4 t boot_flag
00000000000000f8 T attn_trigger
00000000000000fc T hir_trigger
0000000000000100 t sreset_vector

So we can easily look up exactly where this is:

[stewart@blackbird9 ~]$ sudo grep '18e.. ' /sys/firmware/opal/symbol_map 
 0000000000018e20 t .__try_lock.isra.0
 0000000000018e68 t .add_lock_request

So we’re managing to spend a whole 12% of execution time spinning on a spinlock in firmware! The call stack of what’s going on in firmware isn’t so easy, but we can find the bt_add_ipmi_msg call there which is probably how everything starts:

[stewart@blackbird9 ~]$ sudo grep '516.. ' /sys/firmware/opal/symbol_map   0000000000051614 t .bt_add_ipmi_msg_head  0000000000051688 t .bt_add_ipmi_msg  00000000000516fc t .bt_poll

OCCTOOL

This is the most not-what-you’re-meant-to-use method of getting access to sensors! It’s using a debug tool for the OCC firmware! There’s a variety of tools in the OCC source repositiory, and one of them (occtoolp9) can be used for a variety of things, one of which is getting sensor data out of the OCC.

$ sudo ./occtoolp9 -SL
     Sensor Type: 0xFFFF
 Sensor Location: 0xFFFF
     (only displaying non-zero sensors)
 Sending 0x53 command to OCC0 (via opal-prd)…
   MFG Sub Cmd: 0x05  (List Sensors)
   Num Sensors: 50
     [ 1] GUID: 0x0000 / AMEintdur…….  Sample:     20  (0x0014)
     [ 2] GUID: 0x0001 / AMESSdur0…….  Sample:      7  (0x0007)
     [ 3] GUID: 0x0002 / AMESSdur1…….  Sample:      3  (0x0003)
     [ 4] GUID: 0x0003 / AMESSdur2…….  Sample:     23  (0x0017)

The odd thing you’ll see is “via opal-prd” – and this is because it’s doing raw calls to the opal-prd binary to talk to the OCC firmware running things like “opal-prd --expert-mode htmgt-passthru“. Yeah, this isn’t a in-production thing :)

Amazingly (and interestingly), this doesn’t go through host firmware in the way that an IPMI call will. There’s a full OCC/Host firmware interface spec to read. But it’s insanely inefficient way to monity sensors, a long bash script shelling out to a whole bunch of other processes… Think ~14.4 billion cycles versus ~367million cycles for the ipmitool option above.

But there are some interesting sensors at the end of the list:

Sensor Details: (found 86 sensors, details only for Status of 0x00)                                                  
     GUID Name             Sample     Min    Max U    Stat   Accum     UpdFreq   ScaleFactr   Loc   Type   
....
   0x014A MRDM0………..    688       3  15015 GBs  0x00 0x0144AE6C 0x00001901 0x000080FB 0x0008 0x0200
   0x014E MRDM4………..    480       3  14739 GBs  0x00 0x01190930 0x00001901 0x000080FB 0x0008 0x0200
   0x0156 MWRM0………..    560       4  16605 GBs  0x00 0x014C61FD 0x00001901 0x000080FB 0x0008 0x0200
   0x015A MWRM4………..    360       4  16597 GBs  0x00 0x014AE231 0x00001901 0x000080FB 0x0008 0x0200

is that memory bandwidth? Well, if I run the STREAM benchmark in a loop and look again:

0x014A MRDM0………..  15165       3  17994 GBs  0x00 0x0C133D6C 0x00001901 0x000080FB 0x0008 0x0200
   0x014E MRDM4………..  17145       3  18016 GBs  0x00 0x0BF501D6 0x00001901 0x000080FB 0x0008 0x0200
   0x0156 MWRM0………..   8063       4  24280 GBs  0x00 0x07C98B88 0x00001901 0x000080FB 0x0008 0x0200
   0x015A MWRM4………..   1138       4  24215 GBs  0x00 0x07CE82AF 0x00001901 0x000080FB 0x0008 0x0200

It looks like it! Are these exposed elsewhere? Well, another blog post at some point in the future is where I should look at that.

lm-sensors

$ rpm -qf /usr/bin/sensors
 lm_sensors-3.5.0-6.fc31.ppc64le

Ahhh, old faithful lm-sensors! Yep, a whole bunch of sensors are just exposed over the standard interface that we’ve been using since ISA was a thing.

[stewart@blackbird9 ~]$ sensors                                                                  
 ibmpowernv-isa-0000                                       
 Adapter: ISA adapter                                      
 Chip 0 Vdd Remote Sense:  +1.02 V  (lowest =  +0.72 V, highest =  +1.02 V)
 Chip 0 Vdn Remote Sense:  +0.67 V  (lowest =  +0.67 V, highest =  +0.67 V)
 Chip 0 Vdd:               +1.02 V  (lowest =  +0.73 V, highest =  +1.02 V)
 Chip 0 Vdn:               +0.68 V  (lowest =  +0.68 V, highest =  +0.68 V)
 Chip 0 Core 0:            +47.0°C  (lowest = +25.0°C, highest = +71.0°C)            
 Chip 0 Core 4:            +47.0°C  (lowest = +26.0°C, highest = +66.0°C)            
 Chip 0 Core 8:            +48.0°C  (lowest = +27.0°C, highest = +67.0°C)            
 Chip 0 Core 12:           +48.0°C  (lowest = +26.0°C, highest = +67.0°C)            
 Chip 0 Core 16:           +47.0°C  (lowest = +25.0°C, highest = +67.0°C)                      
 Chip 0 Core 20:           +47.0°C  (lowest = +26.0°C, highest = +69.0°C)            
 Chip 0 Core 24:           +48.0°C  (lowest = +27.0°C, highest = +67.0°C)                     
 Chip 0 Core 28:           +51.0°C  (lowest = +27.0°C, highest = +64.0°C)                     
 Chip 0 DIMM 0 :           +40.0°C  (lowest = +34.0°C, highest = +44.0°C)                     
 Chip 0 DIMM 1 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)                     
 Chip 0 DIMM 2 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 3 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 4 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 5 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 6 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 7 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 8 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 9 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 10 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 11 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 12 :          +43.0°C  (lowest = +36.0°C, highest = +47.0°C)
 Chip 0 DIMM 13 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 14 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 15 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 Nest:              +48.0°C  (lowest = +27.0°C, highest = +64.0°C)
 Chip 0 VRM VDD:           +47.0°C  (lowest = +39.0°C, highest = +66.0°C)
 Chip 0 :                  44.00 W  (lowest =  31.00 W, highest = 132.00 W)
 Chip 0 Vdd:               15.00 W  (lowest =   4.00 W, highest = 104.00 W)
 Chip 0 Vdn:               10.00 W  (lowest =   8.00 W, highest =  12.00 W)
 Chip 0 :                 227.11 kJ
 Chip 0 Vdd:               44.80 kJ
 Chip 0 Vdn:               58.80 kJ
 Chip 0 Vdd:              +21.50 A  (lowest =  +6.50 A, highest = +104.75 A)
 Chip 0 Vdn:              +14.88 A  (lowest = +12.63 A, highest = +18.88 A)

The best thing? It’s really quick! The hwmon interface is fast and efficient.

Yet another near-upstream Raptor Blackbird firmware build

In what is coming a month occurance, I’ve put up yet another firmware build for the Raptor Blackbird with close-to-upstream firmware (see here and here for previous ones).

Well, I’ve done another build! It’s current op-build (as of yesterday), but my branch with patches for the Raptor Blackbird. The skiboot patch is there, the SBE speedup patch is now upstream. The machine-xml which is straight from Raptor but in my repo.

Here’s the current versions of everything:

$ lsprop /sys/firmware/devicetree/base/ibm,firmware-versions/
skiboot          "v6.5-228-g82aed17a-p4360f95"
bmc-firmware-version
                 "0.00"
occ              "3ab2921"
hostboot         "acdff8a-pe7e80e1"
buildroot        "2019.05.3-15-g3a4fc2a888"
capp-ucode       "p9-dd2-v4"
machine-xml      "site_local-stewart-a0efd66"
hostboot-binaries
                 "hw013120a.opmst"
sbe              "c318ab0-p1ddf83c"
hcode            "hw030220a.opmst"
petitboot        "v1.12"
phandle          0000064c (1612)
version          "blackbird-v2.4-514-g62d1a941"
linux            "5.4.22-openpower1-pdbbf8c8"
name             "ibm,firmware-versions"

If we compare this to the last build I put up, we have:

Componentoldnew
skibootv6.5-209-g179d53df-p4360f95v6.5-228-g82aed17a-p4360f95
linux5.4.13-openpower1-pa361bec5.4.22-openpower1-pdbbf8c8
occ3ab2921no change
hostboot779761d-pe7e80e1acdff8a-pe7e80e1
buildroot2019.05.3-14-g17f117295f2019.05.3-15-g3a4fc2a888
capp-ucodep9-dd2-v4no change
machine-xmlsite_local-stewart-a0efd66no change
hostboot-binarieshw011120a.opmsthw013120a.opmst
sbe166b70c-p06fc80cc318ab0-p1ddf83c
hcodehw011520a.opmsthw030220a.opmst
petitbootv1.11v1.12
versionblackbird-v2.4-415-gb63b36efblackbird-v2.4-514-g62d1a941

So, what do those changes mean? Not too much changed over the past month. Kernel bump, new petitboot (although I can’t find release notes but it doesn’t look like there’s a lot of changes), and slight bumps to other firmware components.

Grab blackbird.pnor from https://www.flamingspork.com/blackbird/stewart-blackbird-4-images/ and give it a whirl!

To flash it, copy blackbird.pnor to your Blackbird’s BMC in /tmp/ (important! the /tmp filesystem has enough room, the home directory for root does not), and then run:

pflash -E -p /tmp/blackbird.pnor

Which will ask you to confirm and then flash:

About to erase chip !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Erasing... (may take a while)
[==================================================] 99% ETA:1s      
done !
About to program "/tmp/blackbird.pnor" at 0x00000000..0x04000000 !
Programming & Verifying...
[==================================================] 100% ETA:0s   

Another close-to-upstream Blackbird Firmware Build

A few weeks ago (okay, close to six), I put up a firmware build for the Raptor Blackbird with close-to-upstream firmware (see here).

Well, I’ve done another build! It’s current op-build (as of this morning), but my branch with patches for the Raptor Blackbird. The skiboot patch is there, as is the SBE speedup patch. Current kernel (works fine with my hardware), current petitboot, and the machine-xml which is straight from Raptor but in my repo.

Versions of everything are:

$ lsprop /sys/firmware/devicetree/base/ibm,firmware-versions/
skiboot          "v6.5-209-g179d53df-p4360f95"
bmc-firmware-version
		 "0.01"
occ              "3ab2921"
hostboot         "779761d-pe7e80e1"
buildroot        "2019.05.3-14-g17f117295f"
capp-ucode       "p9-dd2-v4"
machine-xml      "site_local-stewart-a0efd66"
hostboot-binaries
		 "hw011120a.opmst"
sbe              "166b70c-p06fc80c"
hcode            "hw011520a.opmst"
petitboot        "v1.11"
phandle          000005d0 (1488)
version          "blackbird-v2.4-415-gb63b36ef"
linux            "5.4.13-openpower1-pa361bec"
name             "ibm,firmware-versions"

You can download all the bits (including debug tarball) from https://www.flamingspork.com/blackbird/stewart-blackbird-2-images/ and follow the instructions for trying it out or flashing blackbird.pnor.

Again, would love to hear how it goes for you!

Speeding up Blackbird boot: the SBE

The Self Boot Engine (SBE) is a small embedded PPE42 core inside the POWER9 CPU which has the unenvious job of getting a single POWER9 core ready enough to start executing instructions out of L3 cache, and poking some instructions into said cache for the core to start executing.

It’s called the “Self Boot Engine” as in generations prior to POWER8, it was the job of the FSP (Service Processor) to do all of the booting for the CPU. On POWER8, there was still an SBE, but it was a custom instruction set (this was the Power On Reset Engine – PORE), while the PPE42 is basically a 32bit powerpc core cut straight down the middle (just the way to make it awkward for toolchains).

One of the things I noted in my post on Booting temporary firmware on the Raptor Blackbird is that we got serial console output from the SBE. It turns out one of thing things explicitly not enabled by Raptor in their build was this output as “it made the SBE boot much slower”. I’d actually long suspected this, but hadn’t really had the time to delve into it.

Since for POWER9, the firmware for the SBE is now open source code, as is the ppe42-binutils and ppe42-gcc toolchain for it. This means we can hack on it!

WARNING: hacking on your SBE firmware can be relatively dangerous, as it’s literally the first thing that needs to work in order to boot the system, and there isn’t (AFAIK) publicly documented easy way to re-flash your SBE firmware if you mess it up.

Seeing as we saw a regression in boot time with the UART output enabled, we need to look at the uartPutChar() function in sbeConsole.C (error paths removed for clarity):

static void uartPutChar(char c)
{
    #define SBE_FUNC "uartPutChar"
    uint32_t rc = SBE_SEC_OPERATION_SUCCESSFUL;
    do {
        static const uint64_t DELAY_NS = 100;
        static const uint64_t DELAY_LOOPS = 100000000;

        uint64_t loops = 0;
        uint8_t data = 0;
        do {
            rc = readReg(LSR, data);
...
            if(data == LSR_BAD || (data & LSR_THRE))
            {
                break;
            }
            delay(DELAY_NS, 1000000);
        } while(++loops < DELAY_LOOPS);

...
        rc = writeReg(THR, c);
...
    } while(0);

    #undef SBE_FUNC
}

One thing you may notice if you’ve spent some time around serial ports is that it’s not using the transmit FIFO! While according to Wikipedia the original 16550 had a broken FIFO, but we’re certainly not going to be hooked up to an original rev of that silicon.

To compare, let’s look at the skiboot code, which is all in hw/lpc-uart.c:

static void uart_check_tx_room(void)
{
	if (uart_read(REG_LSR) & LSR_THRE) {
		/* FIFO is 16 entries */
		tx_room = 16;
		tx_full = false;
	}
}

The uart_check_tx_room() function is pretty simple, it checks if there’s room in the FIFO and knows that there’s 16 entries. Next, we have a busy loop that waits until there’s room again in the FIFO:

static void uart_wait_tx_room(void)
{
	while (!tx_room) {
		uart_check_tx_room();
		if (!tx_room) {
			smt_lowest();
			do {
				barrier();
				uart_check_tx_room();
			} while (!tx_room);
			smt_medium();
		}
	}
}

Finally, the bit of code that writes the (internal) log buffer out to a serial port:

/*
 * Internal console driver (output only)
 */
static size_t uart_con_write(const char *buf, size_t len)
{
	size_t written = 0;

	/* If LPC bus is bad, we just swallow data */
	if (!lpc_ok() && !mmio_uart_base)
		return written;

	lock(&uart_lock);
	while(written < len) {
		if (tx_room == 0) {
			uart_wait_tx_room();
			if (tx_room == 0)
				goto bail;
		} else {
			uart_write(REG_THR, buf[written++]);
			tx_room--;
		}
	}
 bail:
	unlock(&uart_lock);
	return written;
}

The skiboot code ends up being a bit more complicated thanks to a number of reasons, but the basic algorithm could be applied to the SBE code, and rather than busy waiting for each character to be written out before sending the other into the FIFO, we could just splat things down there and continue with life. So, I put together a patch to try out.

Before (i.e. upstream SBE code): it took about 15 seconds from “Welcome to SBE” to “Booting Hostboot”.

Now (with my patch): Around 10 seconds.

It’s a full five seconds (33%) faster to get through the SBE stage of booting. Wow.

Hopefully somebody looks at the pull request sometime soon, as it’s probably useful to a lot of people doing firmware and Operating System development.

So, Happy New Year for Blackbird owners (I’ll publish a build with this and other misc improvements “soon”).

A close-to-upstream firmware build for the Raptor Blackbird

UPDATE: A newer version is available here

It goes without saying that using this build is a At Your Own Risk and I make zero warranty. AFAIK it can’t physically destroy your system.

My GitHub op-build branch stewart-blackbird-v1 has all the changes built into this build (the VERSION displayed in firmware will be slightly weird as I did the tagging afterwards… this is not meant to be “howto release firmware to the public”). Follow op-build pull 3341 for the state of upstreaming everything.

Binaries are over at https://www.flamingspork.com/blackbird/stewart-blackbird-v1-images/ (see the git branch of op-build for source).

To flash it (temporarily), grab blackbird.pnor, get it to /tmp on your BMC and follow the instructions I posted the other day.

I’d be interested in any feedback on what does/does not work.

Are you Fans of the Blackbird? Speak up, I can’t hear you over the fan.

So, as of yesterday, I started running a pretty-close-to-upstream op-build host firmware stack on my Blackbird. Notable yak-shaving has included:

Apart from that, I was all happy as Larry. Except then I went into the room with the Blackbird in it an went “huh, that’s loud”, and since it was bedtime, I decided it could all wait until the morning.

It is now the morning. Checking fan speeds over IPMI, one fan stood out (fan2, sitting at 4300RPM). This was a bit of a surprise as what’s silkscreened on the board is that the rear case fan is hooked up to ‘fan2″, and if we had a “start from 0/1” mix up, it’d be the front case fan. I had just assumed it’d be maybe OCC firmware dying or something, but this wasn’t the case (I checked – thanks occtoolp9!)

After a bit of digging around, I worked out this mapping:

IPMI fan0Rear Case FanMotherboard Fan 2
IPMI fan1Front Case FanMotherboard Fan 3
IPMI fan2CPU FanMotherboard Fan 1

Which is about as surprising and confusing as you’d think.

After a bunch of digging around the Raptor ports of OpenBMC and Hostboot, it seems that the IPL Observer which is custom to Raptor controls if the BMC decides to do fan control or not.

You can get its view of the world from the BMC via the (incredibly user friendly) poking at DBus:

busctl get-property org.openbmc.status.IPL /org/openbmc/status/IPL org.openbmc.status.IPL current_status; busctl get-property org.openbmc.status.IPL /org/openbmc/status/IPL org.openbmc.status.IPL current_istep

Which if you just have the Hostboot patch in (like I first did) you end up with:

s "IPL_RUNNING"
s "21,3"

Which is where Hostboot exits the IPL process (as you see on the screen) and hands over to skiboot. But if you start digging through their op-build tree, you find that there’s a signal_linux_start_complete script which calls pnv-lpc to write two values to LPC ports 0x81 and 0x82. The pnv-lpc utility is the external/lpc/ binary from skiboot, and these two ports are the “extended lpc port 80h” state.

So, to get back fan control? First, build the lpc utility:

git clone git@github.com:open-power/skiboot.git
cd skiboot/external/lpc
make

and then poke the magic values of “IPL complete and linux running”:

$ sudo ./lpc io 0x81.b=254
[io] W 0x00000081.b=0xfe
$ sudo ./lpc io 0x82.b=254
[io] W 0x00000082.b=0xfe

You get a friendly beep, and then your fans return to sanity.

Of course, for that to work you need to have debugfs mounted, as this pokes OPAL debugfs to do direct LPC operations.

Next up: think of a smarter way to trigger that than “stewart runs it on the command line”. Also next up: work out the better way to determine that fan control should be on and patch the BMC.

Booting temporary firmware on the Raptor Blackbird

In a future post, I’ll detail how to build my ported-to-upstream Blackbird firmware. Here though, we’ll explore booting some firmware temporarily to experiment.

Step 1: Copy your new PNOR image over to the BMC.
Step 2: …
Step 3: Profit!

Okay, not really, once you’ve copied over your image, ensure the computer is off and then you can tell the daemon that provides firmware to the host to use a file backend for it rather than the PNOR chip on the motherboard (i.e. yes, you can boot your system even when the firmware chip isn’t there – although I’ve not literally tried this).

root@blackbird:~# mboxctl --backend file:/tmp/blackbird.pnor 
SetBackend: Success
root@blackbird:~# obmcutil poweron

If we look at the serial console (ssh to the BMC port 2200) we’ll see Hostboot start, realise there’s newer SBE code, flash it, and reboot:

--== Welcome to Hostboot hostboot-b284071/hbicore.bin ==--

  3.02606|secure|SecureROM valid - enabling functionality
  5.14678|Booting from SBE side 0 on master proc=00050000
  5.18537|ISTEP  6. 5 - host_init_fsi
  5.47985|ISTEP  6. 6 - host_set_ipl_parms
  5.54476|ISTEP  6. 7 - host_discover_targets
  6.56106|HWAS|PRESENT> DIMM[03]=8080000000000000
  6.56108|HWAS|PRESENT> Proc[05]=8000000000000000
  6.56109|HWAS|PRESENT> Core[07]=1511540000000000
  6.61373|ISTEP  6. 8 - host_update_master_tpm
  6.61529|SECURE|Security Access Bit> 0x0000000000000000
  6.61530|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
  6.61543|ISTEP  6. 9 - host_gard
  7.20987|HWAS|FUNCTIONAL> DIMM[03]=8080000000000000
  7.20988|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
  7.20989|HWAS|FUNCTIONAL> Core[07]=1511540000000000
  7.21299|ISTEP  6.11 - host_start_occ_xstop_handler
  8.28965|ISTEP  6.12 - host_voltage_config
  8.47973|ISTEP  7. 1 - mss_attr_cleanup
  9.07674|ISTEP  7. 2 - mss_volt
  9.35627|ISTEP  7. 3 - mss_freq
  9.63029|ISTEP  7. 4 - mss_eff_config
 10.35189|ISTEP  7. 5 - mss_attr_update
 10.38489|ISTEP  8. 1 - host_slave_sbe_config
 10.45332|ISTEP  8. 2 - host_setup_sbe
 10.45450|ISTEP  8. 3 - host_cbs_start
 10.45574|ISTEP  8. 4 - proc_check_slave_sbe_seeprom_complete
 10.48675|ISTEP  8. 5 - host_attnlisten_proc
 10.50338|ISTEP  8. 6 - host_p9_fbc_eff_config
 10.50771|ISTEP  8. 7 - host_p9_eff_config_links
 10.53338|ISTEP  8. 8 - proc_attr_update
 10.53634|ISTEP  8. 9 - proc_chiplet_fabric_scominit
 10.55234|ISTEP  8.10 - proc_xbus_scominit
 10.56202|ISTEP  8.11 - proc_xbus_enable_ridi
 10.57788|ISTEP  8.12 - host_set_voltages
 10.59421|ISTEP  9. 1 - fabric_erepair
 10.65877|ISTEP  9. 2 - fabric_io_dccal
 10.66048|ISTEP  9. 3 - fabric_pre_trainadv
 10.66665|ISTEP  9. 4 - fabric_io_run_training
 10.66860|ISTEP  9. 5 - fabric_post_trainadv
 10.67060|ISTEP  9. 6 - proc_smp_link_layer
 10.67503|ISTEP  9. 7 - proc_fab_iovalid
 11.10386|ISTEP  9. 8 - host_fbc_eff_config_aggregate
 11.15103|ISTEP 10. 1 - proc_build_smp
 11.27537|ISTEP 10. 2 - host_slave_sbe_update
 11.68581|sbe|System Performing SBE Update for PROC 0, side 0
 34.50467|sbe|System Rebooting To Complete SBE Update Process
 34.50595|IPMI: Initiate power cycle
 34.54671|Stopping istep dispatcher
 34.68729|IPMI: shutdown complete

One of the improvements is we now get output from the SBE! This means that when we do things like mess up secure boot and non secure boot firmware (I’ll explain why/how this is a thing later), we’ll actually get something useful out of a serial port:

--== Welcome to SBE - CommitId[0x8b06b5c1] ==--
istep 3.19
istep 3.20
istep 3.21
istep 3.22
istep 4.1
istep 4.2
istep 4.3
istep 4.4
istep 4.5
istep 4.6
istep 4.7
istep 4.8
istep 4.9
istep 4.10
istep 4.11
istep 4.12
istep 4.13
istep 4.14
istep 4.15
istep 4.16
istep 4.17
istep 4.18
istep 4.19
istep 4.20
istep 4.21
istep 4.22
istep 4.23
istep 4.24
istep 4.25
istep 4.26
istep 4.27
istep 4.28
istep 4.29
istep 4.30
istep 4.31
istep 4.32
istep 4.33
istep 4.34
istep 5.1
istep 5.2
SBE starting hostboot

And then we’re back into normal Hostboot boot (which we’ve all seen before) and end up at a newer petitboot!

Petitboot 1.11 on a Raptor Blackbird

One notable absence from that screenshot is my installed Fedora is missing. This is because there appears to be a bug in the 5.3.7 kernel that’s currently upstream, and if we drop to the shell and poke at lspci and dmesg, we can work out what could be the culprit:

Exiting petitboot. Type 'exit' to return.
You may run 'pb-sos' to gather diagnostic data
No password set, running as root. You may set a password in the System Configuration screen.
# lspci
0000:00:00.0 PCI bridge: IBM Device 04c1
0001:00:00.0 PCI bridge: IBM Device 04c1
0001:01:00.0 Non-Volatile memory controller: Intel Corporation Device f1a8 (rev 03)
0002:00:00.0 PCI bridge: IBM Device 04c1
0002:01:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9235 PCIe 2.0 x2 4-port SATA 6 Gb/s Controller (rev 11)
0003:00:00.0 PCI bridge: IBM Device 04c1
0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0004:00:00.0 PCI bridge: IBM Device 04c1
0004:01:00.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0004:01:00.1 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0004:01:00.2 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0005:00:00.0 PCI bridge: IBM Device 04c1
0005:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
0005:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
# dmesg|grep -i nvme
[    2.991038] nvme nvme0: pci function 0001:01:00.0
[    2.991088] nvme 0001:01:00.0: enabling device (0140 -> 0142)
[    3.121799] nvme nvme0: Identify Controller failed (19)
[    3.121802] nvme nvme0: Removing after probe failure status: -5
# uname -a
Linux skiroot 5.3.7-openpower1 #2 SMP Sat Dec 14 09:06:20 PST 2019 ppc64le GNU/Linux

If for some reason the device didn’t show up in lspci, then I’d look at the skiboot firmware log, which is /sys/firmware/opal/msglog.

Looking at upstream stable kernel patches, it seems like 5.3.8 has a interesting looking patch when you realize that ppc64le uses a 64k page size:

commit efac0f186ea654e8389f5017c7f643ef48cb4b93
Author: Kevin Hao <haokexin@gmail.com>
Date:   Fri Oct 18 10:53:14 2019 +0800

    nvme-pci: Set the prp2 correctly when using more than 4k page
    
    commit a4f40484e7f1dff56bb9f286cc59ffa36e0259eb upstream.
    
    In the current code, the nvme is using a fixed 4k PRP entry size,
    but if the kernel use a page size which is more than 4k, we should
    consider the situation that the bv_offset may be larger than the
    dev->ctrl.page_size. Otherwise we may miss setting the prp2 and then
    cause the command can't be executed correctly.
    
    Fixes: dff824b2aadb ("nvme-pci: optimize mapping of small single segment requests")
    Cc: stable@vger.kernel.org
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Kevin Hao <haokexin@gmail.com>
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

So, time to go try 5.3.8. My yaks are getting quite smooth.

Oh, and when you’re done with your temporary firmware, either fiddle with mboxctl or restart the systemd service for it, or reboot your BMC or… well, I gotta leave you something to work out on your own :)

Building OpenPOWER firmware on Fedora 31

One of the challenges with Fedora 31 is that /usr/bin/python is now Python 3 rather than Python 2. Just about every python script in existence relies on /usr/bin/python being Python 2 and not anything else. I can’t really recall, but this probably happened with the 1.5 to 2 transition as well (although IIRC that was less breaking).

What this means is that for projects that are half-way through converting to python 3, everything breaks.

op-build is one of these projects.

So, we need:

After all that, you can actually build a pnor image on Fedora 31. Even on Fedora 31 ppc64le, which is literally what I’ve just done.

Upstreaming Blackbird firmware (step 1: skiboot)

Now that I can actually boot the machine, I could test and send my patch upstream for Blackbird support in skiboot. One thing I noticed with the current firmware from Raptor is that the PCIe slot names were wrong. While a pretty minor point, it’s a bit funny that there’s only two slots and the names were wrong.

The PCIe slot names are used to call out the physical location of PCIe cards in the system, so if you, say, hit a bunch of errors, OS/firmware can say “It’s this card in the slot labeled BLAH on the board”.

With my patch, the slot table from skiboot is spat out looking like this:

[   64.296743001,5] PHB#0000:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..ff SLOT=SLOT1 PCIE 4.0 X16 
 [   64.296875483,5] PHB#0001:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=SLOT2 PCIE 4.0 X8 
 [   64.297054197,5] PHB#0001:01:00.0 [EP  ] 8086 f1a8 R:03 C:010802 (  mass-storage) LOC_CODE=SLOT2 PCIE 4.0 X8
 [   64.297285067,5] PHB#0002:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin SATA 
 [   64.297411565,5] PHB#0002:01:00.0 [LGCY] 1b4b 9235 R:11 C:010601 (          sata) LOC_CODE=Builtin SATA
 [   64.297554540,5] PHB#0003:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin USB 
 [   64.297732049,5] PHB#0003:01:00.0 [EP  ] 104c 8241 R:02 C:0c0330 (      usb-xhci) LOC_CODE=Builtin USB
 [   64.297848624,5] PHB#0004:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin Ethernet 
 [   64.298026870,5] PHB#0004:01:00.0 [EP  ] 14e4 1657 R:01 C:020000 (      ethernet) LOC_CODE=Builtin Ethernet
 [   64.298212291,5] PHB#0004:01:00.1 [EP  ] 14e4 1657 R:01 C:020000 (      ethernet) LOC_CODE=Builtin Ethernet
 [   64.298424962,5] PHB#0004:01:00.2 [EP  ] 14e4 1657 R:01 C:020000 (      ethernet) LOC_CODE=Builtin Ethernet
 [   64.298587848,5] PHB#0005:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..02 SLOT=BMC 
 [   64.298722540,5] PHB#0005:01:00.0 [ETOX] 1a03 1150 R:04 C:060400 B:02..02 LOC_CODE=BMC
 [   64.298850009,5] PHB#0005:02:00.0 [PCID] 1a03 2000 R:41 C:030000 (           vga) LOC_CODE=BMC

If you want to give it a go, grab the patch, build skiboot, and flash it on. Alternatively, you can download a built skiboot here. To flash it, do this:

# Copy to your BMC for the Blackbird
scp skiboot-v6.5-146-g376bed3f.lid.xz.stb root@blackbird:/tmp/

# then, ssh to the BMC
$ ssh root@blackbird

# ensure the machine is off
obmcutil poweroff --wait

# Now, make a backup copy (remember to copy it off /tmp on the bmc)
pflash -P PAYLOAD -r /tmp/skiboot-backup

# and flash the new skiboot:
pflash -e -P PAYLOAD -p /tmp/skiboot.lid.xz.stb

# now, power on the box
obmcutil poweron

Black(bird) boots!

Well, after the half false start of not having RAM so really not being able to do much (yeah yeah, I hear you – I’m weak for not just running Linux in L3), my RAM arrived today. Putting the sticks in was easy (of course), although does not make for an exciting photo.

One DIMM in the Blackbird

After that, I SSH’d the the BMC and then did “obmcutil poweron” (as is traditional) and started looking at the console via conneting via SSH to port 2200 on the BMC. I was then greeted by the (by this time in my life rather familiar) Hostboot:

--== Welcome to Hostboot hostboot-3beba24/hbicore.bin ==--
 3.02902|secure|SecureROM valid - enabling functionality
   7.15613|Booting from SBE side 0 on master proc=00050000
   7.19697|ISTEP  6. 5 - host_init_fsi
   7.54226|ISTEP  6. 6 - host_set_ipl_parms
   8.06280|ISTEP  6. 7 - host_discover_targets
   9.19791|HWAS|PRESENT> DIMM[03]=8080000000000000
   9.19792|HWAS|PRESENT> Proc[05]=8000000000000000
   9.19794|HWAS|PRESENT> Core[07]=1511540000000000
   9.55305|ISTEP  6. 8 - host_update_master_tpm
   9.60521|SECURE|Security Access Bit> 0x0000000000000000
   9.60522|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
   9.63093|ISTEP  6. 9 - host_gard
   9.89867|HWAS|Blocking Speculative Deconfig
   9.90128|HWAS|FUNCTIONAL> DIMM[03]=8080000000000000
   9.90129|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
   9.90130|HWAS|FUNCTIONAL> Core[07]=1511540000000000
   9.90329|ISTEP  6.11 - host_start_occ_xstop_handler
  11.19092|ISTEP  6.12 - host_voltage_config
  11.30246|ISTEP  7. 1 - mss_attr_cleanup
  12.61924|ISTEP  7. 2 - mss_volt
  12.92705|ISTEP  7. 3 - mss_freq
  13.67475|ISTEP  7. 4 - mss_eff_config
  14.95827|ISTEP  7. 5 - mss_attr_update
  14.97307|ISTEP  8. 1 - host_slave_sbe_config
  15.05372|ISTEP  8. 2 - host_setup_sbe
  15.10258|ISTEP  8. 3 - host_cbs_start
  15.10381|ISTEP  8. 4 - proc_check_slave_sbe_seeprom_complete
  15.11144|ISTEP  8. 5 - host_attnlisten_proc
  15.11213|ISTEP  8. 6 - host_p9_fbc_eff_config
  15.13552|ISTEP  8. 7 - host_p9_eff_config_links
  15.20087|ISTEP  8. 8 - proc_attr_update
  15.20191|ISTEP  8. 9 - proc_chiplet_fabric_scominit
  15.21891|ISTEP  8.10 - proc_xbus_scominit
  15.22929|ISTEP  8.11 - proc_xbus_enable_ridi
  15.24717|ISTEP  8.12 - host_set_voltages
  15.26620|ISTEP  9. 1 - fabric_erepair
  15.42123|ISTEP  9. 2 - fabric_io_dccal
  15.42436|ISTEP  9. 3 - fabric_pre_trainadv
  15.42887|ISTEP  9. 4 - fabric_io_run_training
  15.43207|ISTEP  9. 5 - fabric_post_trainadv
  15.44893|ISTEP  9. 6 - proc_smp_link_layer
  15.45454|ISTEP  9. 7 - proc_fab_iovalid
  15.87126|ISTEP  9. 8 - host_fbc_eff_config_aggregate
  15.89174|ISTEP 10. 1 - proc_build_smp
  16.54194|ISTEP 10. 2 - host_slave_sbe_update
  18.63876|sbe|System Performing SBE Update for PROC 0, side 0
  41.69727|sbe|System Rebooting To Complete SBE Update Process
  41.72189|IPMI: Initiate power cycle
  42.40652|IPMI: shutdown complete

The first IPL updated the Self Boot Engine firmware on the chip, so it automatically applied the new firmware and rebooted to finish applying it. This is perfectly normal, it just shows itself as a longer boot time. Booting continues:

--== Welcome to Hostboot hostboot-3beba24/hbicore.bin ==--
 3.02810|secure|SecureROM valid - enabling functionality
   6.07331|Booting from SBE side 0 on master proc=00050000
   6.11485|ISTEP  6. 5 - host_init_fsi
   6.60361|ISTEP  6. 6 - host_set_ipl_parms
   6.98640|ISTEP  6. 7 - host_discover_targets
   7.53975|HWAS|PRESENT> DIMM[03]=8080000000000000
   7.53976|HWAS|PRESENT> Proc[05]=8000000000000000
   7.53977|HWAS|PRESENT> Core[07]=1511540000000000
   7.79123|ISTEP  6. 8 - host_update_master_tpm
   7.79263|SECURE|Security Access Bit> 0x0000000000000000
   7.79264|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
   7.82684|ISTEP  6. 9 - host_gard
   8.26609|HWAS|Blocking Speculative Deconfig
   8.26865|HWAS|FUNCTIONAL> DIMM[03]=8080000000000000
   8.26866|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
   8.26867|HWAS|FUNCTIONAL> Core[07]=1511540000000000
   8.27142|ISTEP  6.11 - host_start_occ_xstop_handler
   9.69606|ISTEP  6.12 - host_voltage_config
   9.81183|ISTEP  7. 1 - mss_attr_cleanup
  10.95130|ISTEP  7. 2 - mss_volt
  11.39875|ISTEP  7. 3 - mss_freq
  12.15655|ISTEP  7. 4 - mss_eff_config
  13.63504|ISTEP  7. 5 - mss_attr_update
  13.65162|ISTEP  8. 1 - host_slave_sbe_config
  13.78039|ISTEP  8. 2 - host_setup_sbe
  13.78143|ISTEP  8. 3 - host_cbs_start
  13.78247|ISTEP  8. 4 - proc_check_slave_sbe_seeprom_complete
  13.79015|ISTEP  8. 5 - host_attnlisten_proc
  13.79114|ISTEP  8. 6 - host_p9_fbc_eff_config
  13.79734|ISTEP  8. 7 - host_p9_eff_config_links
  13.85128|ISTEP  8. 8 - proc_attr_update
  13.85783|ISTEP  8. 9 - proc_chiplet_fabric_scominit
  13.87991|ISTEP  8.10 - proc_xbus_scominit
  13.89056|ISTEP  8.11 - proc_xbus_enable_ridi
  13.91122|ISTEP  8.12 - host_set_voltages
  13.93077|ISTEP  9. 1 - fabric_erepair
  14.05235|ISTEP  9. 2 - fabric_io_dccal
  14.13131|ISTEP  9. 3 - fabric_pre_trainadv
  14.13616|ISTEP  9. 4 - fabric_io_run_training
  14.13934|ISTEP  9. 5 - fabric_post_trainadv
  14.14087|ISTEP  9. 6 - proc_smp_link_layer
  14.14656|ISTEP  9. 7 - proc_fab_iovalid
  14.59454|ISTEP  9. 8 - host_fbc_eff_config_aggregate
  14.61811|ISTEP 10. 1 - proc_build_smp
  15.24074|ISTEP 10. 2 - host_slave_sbe_update
  17.16022|sbe|System Performing SBE Update for PROC 0, side 1
  40.16808|ISTEP 10. 4 - proc_cen_ref_clk_enable
  40.27866|ISTEP 10. 5 - proc_enable_osclite
  40.31297|ISTEP 10. 6 - proc_chiplet_scominit
  40.55805|ISTEP 10. 7 - proc_abus_scominit
  40.57942|ISTEP 10. 8 - proc_obus_scominit
  40.58078|ISTEP 10. 9 - proc_npu_scominit
  40.60704|ISTEP 10.10 - proc_pcie_scominit
  40.66572|ISTEP 10.11 - proc_scomoverride_chiplets
  40.66874|ISTEP 10.12 - proc_chiplet_enable_ridi
  40.68407|ISTEP 10.13 - host_rng_bist
  40.75548|ISTEP 10.14 - host_update_redundant_tpm
  40.75785|ISTEP 11. 1 - host_prd_hwreconfig
  41.15067|ISTEP 11. 2 - cen_tp_chiplet_init1
  41.15299|ISTEP 11. 3 - cen_pll_initf
  41.15544|ISTEP 11. 4 - cen_pll_setup
  41.18530|ISTEP 11. 5 - cen_tp_chiplet_init2
  41.18762|ISTEP 11. 6 - cen_tp_arrayinit
  41.19050|ISTEP 11. 7 - cen_tp_chiplet_init3
  41.19286|ISTEP 11. 8 - cen_chiplet_init
  41.19553|ISTEP 11. 9 - cen_arrayinit
  41.19986|ISTEP 11.10 - cen_initf
  41.20215|ISTEP 11.11 - cen_do_manual_inits
  41.20497|ISTEP 11.12 - cen_startclocks
  41.20802|ISTEP 11.13 - cen_scominits
  41.21171|ISTEP 12. 1 - mss_getecid
  42.25709|ISTEP 12. 2 - dmi_attr_update
  42.30382|ISTEP 12. 3 - proc_dmi_scominit
  42.32572|ISTEP 12. 4 - cen_dmi_scominit
  42.32798|ISTEP 12. 5 - dmi_erepair
  42.35000|ISTEP 12. 6 - dmi_io_dccal
  42.35218|ISTEP 12. 7 - dmi_pre_trainadv
  42.35489|ISTEP 12. 8 - dmi_io_run_training
  42.37076|ISTEP 12. 9 - dmi_post_trainadv
  42.39541|ISTEP 12.10 - proc_cen_framelock
  42.40772|ISTEP 12.11 - host_startprd_dmi
  42.41974|ISTEP 12.12 - host_attnlisten_memb
  42.44506|ISTEP 12.13 - cen_set_inband_addr
  42.58832|ISTEP 13. 1 - host_disable_memvolt
  43.67808|ISTEP 13. 2 - mem_pll_reset
  43.75070|ISTEP 13. 3 - mem_pll_initf
  43.85043|ISTEP 13. 4 - mem_pll_setup
  43.87372|ISTEP 13. 6 - mem_startclocks
  43.88970|ISTEP 13. 7 - host_enable_memvolt
  43.89177|ISTEP 13. 8 - mss_scominit
  45.10013|ISTEP 13. 9 - mss_ddr_phy_reset
  45.38105|ISTEP 13.10 - mss_draminit
  45.95447|ISTEP 13.11 - mss_draminit_training
  47.20963|ISTEP 13.12 - mss_draminit_trainadv
  47.32161|ISTEP 13.13 - mss_draminit_mc
  47.49186|ISTEP 14. 1 - mss_memdiag
  69.53224|ISTEP 14. 2 - mss_thermal_init
  69.66891|ISTEP 14. 3 - proc_pcie_config
  69.71959|ISTEP 14. 4 - mss_power_cleanup
  69.72385|ISTEP 14. 5 - proc_setup_bars
  69.83889|ISTEP 14. 6 - proc_htm_setup
  69.84748|ISTEP 14. 7 - proc_exit_cache_contained
  69.89430|ISTEP 15. 1 - host_build_stop_image
  73.08679|ISTEP 15. 2 - proc_set_pba_homer_bar
  73.12352|ISTEP 15. 3 - host_establish_ex_chiplet
  73.13714|ISTEP 15. 4 - host_start_stop_engine
  73.19059|ISTEP 16. 1 - host_activate_master
  74.44590|ISTEP 16. 2 - host_activate_slave_cores
  74.53820|ISTEP 16. 3 - host_secure_rng
  74.54651|ISTEP 16. 4 - mss_scrub
  74.56565|ISTEP 16. 5 - host_load_io_ppe
  74.78752|ISTEP 16. 6 - host_ipl_complete
  75.50085|ISTEP 18.11 - proc_tod_setup
  75.94190|ISTEP 18.12 - proc_tod_init
  75.97575|ISTEP 20. 1 - host_load_payload
  77.12340|ISTEP 20. 2 - host_load_hdat
  78.05195|ISTEP 21. 1 - host_runtime_setup
  83.87001|htmgt|OCCs are now running in ACTIVE state
  89.72649|ISTEP 21. 2 - host_verify_hdat
  89.77252|ISTEP 21. 3 - host_start_payload
 [   90.400516933,5] OPAL skiboot-c81f9d6 starting…

The rest of the skiboot log was also spat out, and then the familiar Petitboot screen:

Welcome to Petitboot!

It lives! I even had a bit of a look at the sensors to see power consumption and temperatures. All looks good:

ipmitool sdr|grep -v ns
 occ0             | 0x00              | ok
 occ1             | 0x00              | ok
 p0_core3_temp    | 51 degrees C      | ok
 p0_core5_temp    | 49 degrees C      | ok
 p0_core7_temp    | 50 degrees C      | ok
 p0_core11_temp   | 49 degrees C      | ok
 p0_core15_temp   | 50 degrees C      | ok
 p0_core17_temp   | 50 degrees C      | ok
 p0_core19_temp   | 50 degrees C      | ok
 p0_core21_temp   | 50 degrees C      | ok
 dimm0_temp       | 36 degrees C      | ok
 dimm4_temp       | 39 degrees C      | ok
 fan0             | 1300 RPM          | ok
 fan1             | 1200 RPM          | ok
 fan2             | 1000 RPM          | ok
 p0_power         | 60 Watts          | ok
 p0_vdd_power     | 31 Watts          | ok
 p0_vdn_power     | 10 Watts          | ok
 cpu_1_ambient    | 30.90 degrees C   | ok
 pcie             | 27 degrees C      | ok
 ambient          | 25.40 degrees C   | ok

Next up? I guess I should install an OS.

Blackbird (singing in the dead of night..)

Way back when Raptor Computer Systems was doing pre-orders for the microATX Blackboard POWER9 system, I put in a pre-order. Since then, I’ve had a few life changes (such as moving to the US and starting to work for Amazon rather than IBM), but I’ve finally gone and done (most of) the setup for my own POWER9 system on (or under) my desk.

An 8 core POWER9 CPU, in bubble wrap and plastic packaging.

Everything came in a big brown box, all rather well packed. I had the board, CPU, heatsink assembly and the special tool to attach the heatsink to the board. Although unique to POWER9, the heatsink/fan assembly was one of the easier ones I’ve ever attached to a board.

The board itself looks pretty much as you’d expect – there’s a big spot for the CPU, a couple of PCI slots, a couple of DIMM slots and some SATA connectors.

The bits that are a bit unusual for a micro-ATX board are the big space reserved for FlexVer, the ASPEED BMC chip and the socketed flash. FlexVer is something I’m not ever going to use, and instead wish that there was an on-board m2 SSD slot instead, even if it was just PCIe. Having to sacrifice a PCIe slot just for a SSD is kind of a bummer.

The Blackbird POWER9 board
The POWER9 chip in socket

One annoying thing is my DIMMs are taking their sweet time in getting here, so I couldn’t actually populate the board with any memory.

Even without memory though, you can start powering it on and see that everything else works okay (i.e. it’s not completely boned). So, even without DIMMs, I could plug it in, and observe the Hostboot firmware complaining about insufficient hardware to IPL the box.

It Lives!

Yep, out the console (via ssh) you clearly see where things fail:

--== Welcome to Hostboot hostboot-3beba24/hbicore.bin ==--

  3.03104|secure|SecureROM valid - enabling functionality
  6.67619|Booting from SBE side 0 on master proc=00050000
  6.85100|ISTEP  6. 5 - host_init_fsi
  7.23753|ISTEP  6. 6 - host_set_ipl_parms
  7.71759|ISTEP  6. 7 - host_discover_targets
 11.34738|HWAS|PRESENT> Proc[05]=8000000000000000
 11.34739|HWAS|PRESENT> Core[07]=1511540000000000
 11.69077|ISTEP  6. 8 - host_update_master_tpm
 11.73787|SECURE|Security Access Bit> 0x0000000000000000
 11.73787|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
 11.76276|ISTEP  6. 9 - host_gard
 11.96654|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
 11.96655|HWAS|FUNCTIONAL> Core[07]=1511540000000000
 12.07554|================================================
 12.07554|Error reported by hwas (0x0C00) PLID 0x90000007
 12.10289|  checkMinimumHardware found no functional dimm cards.
 12.10290|  ModuleId   0x03 MOD_CHECK_MIN_HW
 12.10291|  ReasonCode 0x0c06 RC_SYSAVAIL_NO_MEMORY_FUNC
 12.10292|  UserData1  HUID of node : 0x0002000000000000
 12.10293|  UserData2  number of present, non-functional dimms : 0x0000000000000000
 12.10294|------------------------------------------------
 12.10417|  Callout type             : Procedure Callout
 12.10417|  Procedure                : EPUB_PRC_FIND_DECONFIGURED_PART
 12.10418|  Priority                 : SRCI_PRIORITY_HIGH
 12.10419|------------------------------------------------
 12.10420|  Hostboot Build ID: hostboot-3beba24/hbicore.bin
 12.10421|================================================
 12.51718|================================================
 12.51719|Error reported by hwas (0x0C00) PLID 0x90000007
 12.51720|  Insufficient hardware to continue.
 12.51721|  ModuleId   0x03 MOD_CHECK_MIN_HW
 12.51722|  ReasonCode 0x0c04 RC_SYSAVAIL_INSUFFICIENT_HW
 12.54457|  UserData1   : 0x0000000000000000
 12.54458|  UserData2   : 0x0000000000000000
 12.54458|------------------------------------------------
 12.54459|  Callout type             : Procedure Callout
 12.54460|  Procedure                : EPUB_PRC_FIND_DECONFIGURED_PART
 12.54461|  Priority                 : SRCI_PRIORITY_HIGH
 12.54462|------------------------------------------------
 12.54462|  Hostboot Build ID: hostboot-3beba24/hbicore.bin
 12.54463|================================================
 12.73660|System shutting down with error status 0x90000007
 12.75545|================================================
 12.75546|Error reported by istep (0x1700) PLID 0x90000007
 12.77991|  IStep failed, see other log(s) with the same PLID for reason.
 12.77992|  ModuleId   0x01 MOD_REPORTING_ERROR
 12.77993|  ReasonCode 0x1703 RC_FAILURE
 12.77994|  UserData1  eid of first error : 0x9000000800000c04
 12.77995|  UserData2  Reason code of first error : 0x0000000100000609
 12.77996|------------------------------------------------
 12.77996|  host_gard
 12.77997|------------------------------------------------
 12.77998|  Callout type             : Procedure Callout
 12.77998|  Procedure                : EPUB_PRC_HB_CODE
 12.77999|  Priority                 : SRCI_PRIORITY_LOW
 12.78000|------------------------------------------------
 12.78001|  Hostboot Build ID: hostboot-3beba24/hbicore.bin
 12.78002|================================================

Looking forward to getting some DIMMs to show/share more.

Looking at the state of Blackbird firmware

Having been somewhat involved in OpenPOWER firmware, I have a bunch of experience and opinions on maintaining firmware trees for products, what working with upstream looks like and all that.

So, with my new Blackbird system I decided to take a bit of a look as to what the firmware situation was like.

There’s two main parts of firmware: BMC and Host. The BMC firmware runs purely on the ASPEED AST2500 and is based on OpenBMC while the host firmware is what runs on the POWER9 and is based off of OpenPOWER Firmware as assembled by op-build.

Initial impressions on the BMC is that there doesn’t seem to be any web based UI for it, which is kind of disappointing, as the Web UI being developed upstream has some nice qualities, and I’d say I even enjoyed using it when it was built into BMC firmware for systems we had when I was at IBM.

Looking at the git trees, the raptor-v1.00 tag is OpenBMC 2.7.0-dev-533-g386e5602e while current master is 2.8.0-dev-960-g10f7830bd. The spot where it split off was 2.7.0-dev-430-g7443ee80b, from April 2019 – so it’s not too old, but I’m also not convinced there should have been some security patches since then.

I’m not sure if any of the OpenBMC code is upstream, I haven’t looked.

Unfortunately, none of the host firmware is upstream.

On the host firmware side, v2.3-rc2-67-ga6a5f142 is the Raptor tag, and that compares with current master of v2.4-305-g54d8daf4, the place where Raptor forked was v2.3-rc2-9-g7b556015, again in April of 2019. Considering there was an upstream release in May of 2019 (v2.3), and again in July (v2.4), it could have easily have made it into an upstream release.

Unfortunately, there doesn’t seem to have been an upstream op-build release since v2.4 back in July (when I made it shortly before leaving IBM).

The skiboot component of host firmware has had an upstream release since I left (v6.5 in mid-August 2019), so the (rather trivial) platform support could have easily made it. I have a cleaned up and ready to upstream patch for it, I just need some DIMMs to actually test with before I send the patch.

As the current firmware situation stands, producing another build with updated upstream code is tricky due to the out-of-tree nature of the Blackbird patches, and a straight “git merge” is probably doable by some people, but not everybody.

On my TODO list is to get all the code into a state I can upstream it, assess vulnerability to CVE-2019-6260, and work out how I want to make it do Secure Boot (something that isn’t in upstream firmware yet, and currently would require a TPM, which I do not have).

AWS Welcomes Stewart

A little over a month ago now, I started a new role at Amazon Web Services (AWS) as a Principal Engineer with Amazon Linux. Everyone has been wonderfully welcoming and helpful. I’m excited about the future here, the team, and our mission.

Thanks to all my IBM colleagues over the past five and a half and a bit years too, I really enjoyed working with you on OpenPOWER and hope it continues to gain traction. I have my Blackbird now and am eagerly waiting for a spare 20 minutes to assemble it.

Switching to iPhone Part 2: Seriously?

In which I ask of Apple, “Seriously?”.

That was pretty much my reaction with Apple sticking to Lightning connectors rather than going with the USB-C standard. Having USB-C around the place for my last two (Android) phones was fantastic. I could charge a phone, external battery, a (future) laptop, all off the same wall wart and with the same cable. It is with some hilarity that I read that the new iPad Pro has USB-C rather than Lightning.

But Apple’s dongle fetish reigns supreme, and so I get a multitude of damn dongles all for a wonderfully inflated price with an Australia Tax whacked on top.

The most egregious one is the Lightning-to-3.5mm dongle. In the office, I have a good set of headphones. The idea is to block out the sound of an open plan office so I can actually get some concentrating done. With tiny dedicated MP3 players and my previous phones, these sounded great. The Apple dongle? It sounds terrible. Absolutely terrible. The Lighting-to-3.5mm adapter might be okay for small earbuds but it is nearly completely intolerable for any decent set of headphones. I’m now in the market for a Bluetooth headphone amplifier. Another bunch of money to throw at another damn dongle.

Luckily, there seems to be a really good Bluetooth headphone amplifier on Amazon. The same Amazon that no longer ships to Australia. Well, there’s an Australian seller, for six times the price.

Urgh.

Switching to iPhone: Part 1

I have used Android phones since the first one: the G1. I’m one of the (relatively) few people who has used Android 1.0. I’ve had numerous Android phones since then, mostly the Google flagship.

I have fond memories of the Nexus One and Galaxy Nexus, as well as a bunch of time running Cyanogen (often daily builds, because YOLO) to get more privacy preserving features (or a more recent Android). I had a Sony Z1 Compact for a while which was great bang for buck except for the fact the screen broke whenever you looked at it sideways. Great kudos to the Sony team for being so friendly to custom firmware loads.

I buy my hardware from physical stores. Why? Well, it means that the NSA and others get to spend extra effort to insert hardware modifications (backdoors), as well as the benefit of having a place to go to/set the ACCC on to get my rights under Australian Consumer Law.

My phone before last was a Nexus 5X. There were a lot of good things about this phone; the promise of fast charging via USB-C was one, as was the ever improving performance of the hardware and Android itself. Well… it just got progressively slower, and slower, and slower – as if it was designed to get near unusable by the time of the next Google phone announcement.

Inevitably, my 5X succumbed to the manufacturing defect that resulted in a boot loop. It would start booting, and then spontaneously reboot, in a loop, forever. The remedy? Replace it under warranty! That would take weeks, which isn’t a suitable timeframe in this day and age to be without a phone, so I mulled over buying a Google Pixel or my first ever iPhone (my iPhone owning friends assured me that if such a thing happens with an iPhone that Apple would have swapped it on the spot). Not wanting to give up a lot of the personal freedom that comes with the Android world, I spent the $100 more to get the Pixel, acutely aware that having a phone was now a near $1000/year habit.

The Google Pixel was a fantastic phone (except the price, they should have matched the iPhone price). The camera was the first phone camera I actually went “wow, I’m impressed” over. The eye-watering $279 to replace a cracked screen, the still eye-watering cost of USB-C cables, and the seat to process the HDR photos were all forgiven. It was a good phone. Until, that is, less than a year in, the battery was completely shot. It would power off when less than 40% and couldn’t last the trip from Melbourne airport to Melbourne city.

So, with a flagship phone well within the “reasonable quality” time that consumer law would dictate, I contacted Google after going through all the standard troubleshooting. Google agreed this was not normal and that the phone was defective. I was told that they would mail me a replacement, I could transfer my stuff over and then mail in the broken one. FANTASTIC!! This was soooo much better than the experience with the 5X.

Except that it wasn’t. A week later, I rang back to ask what was going on as I hadn’t received the replacement; it turns out Google had lied to me, I’d have to mail the phone to them and then another ten business days later I’d have a replacement. Errr…. no, I’ve been here before.

I rang the retailer, JB Hi-Fi; they said it would take them at least three weeks, which I told them was not acceptable nor a “reasonable timeframe” as dictated by consumer law.

So, with a bunch of travel imminent, I bought a big external USB-C battery and kept it constantly connected as without it the battery percentage went down faster than the minutes ticked over. I could sort it out once I was back from travel.

So, I’m back. In fact, I drove back from a weekend away and finally bit the bullet – I went to pick up a phone who’s manufacturer has a reputation of supporting their hardware.

I picked up an iPhone.

I figured I should write up how, why, my reasons, and experiences in switching phone platforms. I think my next post will be “Why iPhone and not a different Android”.

How I do email (at home)

I thought I might write something up on how I’ve been doing email both at home and at work. I very much on purpose keep the two completely separate, and have slightly different use cases for both of them.

For work, I do not want mail on my phone. For personal mail, it turns out I do want this on my phone, which is currently an Android phone. Since my work and personal email is very separate, the volume of mail is really, really different. Personal mail is maybe a couple of dozen a day at most. Work is… orders of magnitude more.

Considering I generally prefer free software to non-free software, K9 Mail is the way I go on my phone. I have it set up to point at the IMAP and SMTP servers of my mail provider (FastMail). I also have a google account, and the gmail app works fine for the few bits of mail that go there instead of my regular account.

For my mail accounts, I do an INBOX ZERO like approach (in reality, I’m pretty much nowhere near zero, but today I learned I’m a lot closer than many colleagues). This means I read / respond / do / ignore mail and then move it to an ARCHIVE folder. K9 and Gmail both have the ability to do this easily, so it works well.

Additionally though, I don’t want to care about limits on storage (i.e. expire mail from the server after X days), nor do I want to rely on “the cloud” to be the only copy of things. I also don’t want to have to upload any of past mail I may be keeping around. I also generally prefer to use notmuch as a mail client on a computer.

For those not familiar with notmuch, it does tags on mail in Maildir, is extremely fast and can actually cope with a quantity of mail. It also has this “archive”/INBOX ZERO workflow which I like.

In order to get mail from FastMail and Gmail onto a machine, I use offlineimap. An important thing to do is to set “status_backend = sqlite” for each Account. It turns out I first hacked on sqlite for offlineimap status a bit over ten years ago – time flies. For each Account I also set presynchook = ~/Maildir/maildir-notmuch-presync (below) and a postsynchook = notmuch new. The presynchook is run before we sync, and its job is to move files around based on the tags in notmuch and the postsynchook lets notmuch catch any new mail that’s been fetched.

My maildir-notmuch-presync hook script is:

#!/bin/bash
notmuch search --output=files not tag:inbox and folder:fastmail/INBOX|xargs -I'{}' mv '{}' "$HOME/Maildir/INBOX/fastmail/Archive/cur/"

notmuch search --output=files folder:fastmail/INBOX and tag:spam |xargs -I'{}' mv '{}' "$HOME/Maildir/INBOX/fastmail/Spam/cur/"
ARCHIVE_DIR=$HOME/Maildir/INBOX/`date +"%Y%m"`/cur/
mkdir -p $ARCHIVE_DIR
notmuch search --output=files folder:fastmail/Archive and date:..90d and not tag:flagged | xargs -I'{}' mv '{}' "$ARCHIVE_DIR"

# Gmail
notmuch search --output=files not tag:inbox and folder:gmail/INBOX|grep 'INBOX/gmail/INBOX/' | xargs -I'{}' rm '{}'
notmuch search --output=files folder:gmail/INBOX and tag:spam |xargs -I'{}' mv '{}' "$HOME/Maildir/INBOX/gmail/[Gmail].Spam/cur/"

So This keeps 90 days of mail on the fastmail server, and archives older mail off into month based archive dirs. This is simply to keep directory sizes not too large, you could put everything in one directory… but at some point that gets a bit silly.

I don’t think this is all the most optimal setup I could have, but it does let me read and answer mail on my phone and desktop (as well as use a web client if I want to). There is a bit of needless copying of messages by offlineimap under certain circumstances, but I don’t get enough personal mail for it to be a problem.

Ten years of libeatmydata!

So, ten years ago (how is that even possible… it seems like it was just a couple of years ago), there was the first commit in the libeatmydata repository (now in git on github rather than in bzr on launchpad). The first implementation was literally just this:

#include <sys/types.h>
#include 
#include 
#include <sys/stat.h>
#include 

int errno;

int fsync(int fd)
{
       errno=0;
       return 0;
}

Soooo…. kind of incredibly simple. But, hey, it worked! Little did I know, that these two lines of code were going to grow into 166 lines of C in order to do it a bit more “properly”.

My initial use case was making the MySQL test suite run faster: 30% faster back then! In fact, it was better than using tmpfs! It’s still used for that (even though I no longer hack on MySQL with any regularity), see github issue #1 for a recent bug that cropped up.

Since then, I’m aware of eatmydata being used to build entire operating systems and in production in way too many places (on way too many machines). The probability that any given human who’s used a computer in the past 10 years has used libeatmydata, used a package built with it or used a service with it running somewhere in production is so close to 1 that I don’t want to think about it.

Well… here’s to the next ten years of eating data!

j-core + Numato Spartan 6 board + Fedora 25

A couple of changes to http://j-core.org/#download_bitstream made it easy for me to get going:

  • In order to make ModemManager not try to think it’s a “modem”, create /etc/udev/rules.d/52-numato.rules with the following content:
    # Make ModemManager ignore Numato FPGA board
    ATTRS{idVendor}=="2a19", ATTRS{idProduct}=="1002", ENV{ID_MM_DEVICE_IGNORE}="1"
  • You will need to install python3-pyserial and minicom
  • The minicom command line i used was:
    sudo stty -F /dev/ttyACM0 -crtscts && minicom -b 115200 -D /dev/ttyACM0

and along with the instructions on j-core.org, I got it to load a known good build.

Windows NT4 for PowerPC guest on OPAL on POWER8 in qemu

Sometimes, programming is just for fun. This is what PREPHV is for Andrei Warkentin. To quote the README:

“This is mostly a huge ugly hack, derived from my
ppc64le_hello code. The running philosophy here is
to throw things together late at night with my family
asleep and see how far I get without a real design
or without a real desire to implement boring things
like IDE (*sigh*) emulation”

Since my day job is maintaining the firmware that it runs on, I decided to have a go (it also ties in with the retro stuff I’ve been blogging about). So…

screenshot-from-2016-10-30-17-22-20and I’m off! (yes, this is the very latest qemu and skiboot):

screenshot-from-2016-10-30-17-23-32screenshot-from-2016-10-30-17-23-48Yes, prephv does clear all thirty two megabytes of guest memory

screenshot-from-2016-10-30-17-24-15A quick diversion, if you try Windows NT 3.51 for PowerPC, you get this:

screenshot-from-2016-10-30-18-17-35

But on NT4, you continue unharmed:

screenshot-from-2016-10-30-17-22-32A sign I needed to hack my filesystem of bits of NT installer bits a bit more:

screenshot-from-2016-10-30-17-22-45But, on my next try:

screenshot-from-2016-10-30-17-25-26Well… looks like there’s an instruction that needs to be emulated (and there’s no code to currently do that). Mind you… this is decently far into booting before we hit anything fatal, which is a pretty impressive effort – and it is tempting to continue and see if it’ll run on real hardware and if it could be made to work well enough to not find any disks :)