Two Photos from Healseville Sanctuary

If you’re near Melbourne, you should go to Healseville Sanctuary and enjoy the Australian native animals. I’ve been a number of times over the years, and here’s a couple of photos from a (relatively, as in, the last couple of years) trip.

Leah trying to photograph a much too close bird
Koalas seem to always look like they’ve just woken up. I’m pretty convinced this one just had.

Why you should use `nproc` and not grep /proc/cpuinfo

There’s something really quite subtle about how the nproc utility from GNU coreutils works. If you look at the man page, it’s even the very first sentence:

Print the number of processing units available to the current process, which may be less than the number of online processors.

So, what does that actually mean? Well, just because the computer some code is running on has a certain number of CPUs (and here I mean “number of hardware threads”) doesn’t necessarily mean that you can spawn a process that uses that many. What’s a simple example? Containers! Did you know that when you invoke docker to run a container, you can easily limit how much CPU the container can use? In this case, we’re looking at the --cpuset-cpus parameter, as the --cpus one works differently.

$ nproc
8

$ docker run --cpuset-cpus=0-1 --rm=true -it  amazonlinux:2
bash-4.2# nproc
2
bash-4.2# exit

$ docker run --cpuset-cpus=0-2 --rm=true -it  amazonlinux:2
bash-4.2# nproc
3

As you can see, nproc here gets the right bit of information, so if you’re wanting to do a calculation such as “Please use up to the maximum available CPUs” as a parameter to the configuration of a piece of software (such as how many threads to run), you get the right number.

But what if you use some of the other common methods?

$ /usr/bin/lscpu -p | grep -c "^[0-9]"
8
$ grep -c 'processor' /proc/cpuinfo 
8

$ docker run --cpuset-cpus=0-1 --rm=true -it  amazonlinux:2
bash-4.2# yum install -y /usr/bin/lscpu
......
bash-4.2# /usr/bin/lscpu -p | grep -c "^[0-9]"
8
bash-4.2# grep -c 'processor' /proc/cpuinfo 
8
bash-4.2# nproc
2

In this case, if you base your number of threads off grepping lscpu you take another dependency (on the util-linux package), which isn’t needed. You also get the wrong answer, as you do by grepping /proc/cpuinfo. So, what this will end up doing is just increase the number of context switches, possibly also adding a performance degradation. It’s not just in docker containers where this could be an issue of course, you can use the same mechanism that docker uses anywhere you want to control resources of a process.

Another subtle thing to watch out for is differences in /proc/cpuinfo content depending on CPU architecture. You may not think it’s an issue today, but who wants to needlessly debug something?

tl;dr: for determining “how many processes to run”: use nproc, don’t grep lscpu or /proc/cpuinfo

Photos from Tasmania (2017)

On the random old photos train, there’s some from spending time in Tasmania post linux.conf.au 2017 in Hobart.

All of these are Kodak E100VS film, which was no doubt a bit out of date by the time I shot it (and when they stopped making Ektachrome for a while). It was a nice surprise to be reminded of a truly wonderful Tassie trip, taken with friends, and after the excellent linux.conf.au.

Photos from long ago….

It’s strange to get unexpected photos from a while ago. It’s also joyous.

These photos above are from a park down the street from where we used to live. I believe it was originally a quarry, and a number of years ago the community got together and turned it into a park. It’s a quite decent size (Parkrun is held there), and there’s plenty of birds (and ducks!) to see.

Moorabbin Station

It’s a very strange feeling seeing photos from both the before time, and from where I used to live. I’m sure that if the world wasn’t the way it was now, and there wasn’t a pandemic, it would feel different.

All of the above were shot on a Nikon F80 with 35mm Fuji Velvia 50 film.

op-build v2.5 firmware for the Raptor Blackbird

Well, following on from my post where I excitedly pointed out that Raptor Blackbird support: all upstream in op-build v2.5, that means I can do another in my series of (close to) upstream Blackbird firmware builds.

This time, the only difference from straight upstream op-build v2.5 is my fixes for buildroot so that I can actually build it on Fedora 32.

So, head over to https://www.flamingspork.com/blackbird/op-build-v2.5-blackbird-images/ and grab blackbird.pnor to flash it on your blackbird, let me know how it goes!

Refurbishing my Macintosh Plus

Somewhere in the mid to late 1990s I picked myself up a Macintosh Plus for the sum of $60AUD. At that time there were still computer Swap Meets where old and interesting equipment was around, so I headed over to one at some point (at the St Kilda Town Hall if memory serves) and picked myself up four 1MB SIMMs to boost the RAM of it from the standard 1MB to the insane amount of 4MB. Why? Umm… because I could? The RAM was pretty cheap, and somewhere in the house to this day, I sometimes stumble over the 256KB SIMMs as I just can’t bring myself to get rid of them.

This upgrade probably would have cost close to $2,000 at the system’s release. If the Macintosh system software were better at disk caching you could have easily held the whole 800k of the floppy disk in memory and still run useful software!

One of the annoying things that started with the Macintosh was odd screws and Apple gear being hard to get into. Compare to say, the Apple ][ which had handy clips to jump inside whenever. In fitting my massive FOUR MEGABYTES of RAM back in the day, I recall using a couple of allen keys sticky-taped together to be able to reach in and get the recessed Torx screws. These days, I can just order a torx bit off Amazon and have it arrive pretty quickly. Well, two torx bits, one of which is just too short for the job.

My (dusty) Macintosh Plus

One thing had always struck me about it, it never really looked like the photos of the Macintosh Plus I saw in books. In what is an embarrassing number of years later, I learned that a lot can be gotten from the serial number printed on the underside of the front of the case.

So heading over to the My Old Mac Serial Number Decoder I can find out:

Manufactured in: F => Fremont, California, USA
Year of production: 1985
Week of production: 14
Production number: 3V3 => 4457
Model ID: M0001WP => Macintosh 512K (European Macintosh ED)

Your Macintosh 512K (European Macintosh ED) was the 4457th Mac manufactured during the 14th week of 1985 in Fremont, California, USA.

Pretty cool! So it is certainly a Plus as the logic board says that, but it’s actually an upgraded 512k! If you think it was madness to have a GUI with only 128k of RAM in the original Macintosh, you’d be right. I do not envy anybody who had one of those.

Some time a decent (but not too many, less than 10) years ago, I turn on the Mac Plus to see if it still worked. It did! But then… some magic smoke started to come out (which isn’t so good), but the computer kept working! There’s something utterly bizarre about looking at a computer with smoke coming out of it that continues to function perfectly fine.

Anyway, as the smoke was coming out, I decided that it would be an opportune time to turn it off, open doors and windows, and put it away until I was ready to deal with it.

One Global Pandemic Later, and now was the time.

I suspected it was going to be a capacitor somewhere that blew, and figured that I should replace it, and probably preemptively replace all the other electrolytic capacitors that could likely leak and cause problems.

First thing’s first though: dismantle it and clean everything. First, taking the case off. Apple is not new to the game of annoying screws to get into things. I ended up spending $12 on this set on Amazon, as the T10 bit can actually reach the screws holding the case on.

Cathode Ray Tubes are not to be messed with. We’re talking lethal voltages here. It had been many years since electricity went into this thing, so all was good. If this all doesn’t work first time when reassembling it, I’m not exactly looking forward to discharging a CRT and working on it.

The inside of my Macintosh Plus, with lots of grime.

You can see there’s grime everywhere. It’s not the worst in the world, but it’s not great (and kinda sticky). Obviously, this needs to be cleaned! The best way to do that is take a lot of photos, dismantle everything, and clean it a bit at a time.

There’s four main electronic components inside a Macintosh Plus:

  1. The CRT itself
  2. The floppy disk drive
  3. The Logic Board (what Mac people call what PC people call the motherboard)
  4. The Analog Board

There’s also some metal structure that keeps some things in place. There’s only a few connectors between things, which are pretty easy to remove. If you don’t know how to discharge a CRT and what the dangers of them are you should immediately go and find out through reading rather than finding out by dying. I would much prefer it if you dyed (because creative fun) rather than died.

Once the floppy connector and the power connector is unplugged, the logic board slides out pretty easily. You can see from the photo below that I have the 4MB of RAM installed and the resistor you need to snip is, well, snipped (but look really closely for that). Also, grime.

Macintosh Plus Logic Board

Cleaning things? Well, there’s two ways that I have used (and considering I haven’t yet written the post with “hurray, it all works”, currently take it with a grain of salt until I write that post). One: contact cleaner. Two: detergent.

Macintosh Plus Logic Board (being washed in my sink)

I took the route of cleaning things first, and then doing recapping adventures. So it was some contact cleaner on the boards, and then some soaking with detergent. This actually all worked pretty well.

Logic Board Capacitors:

  • C5, C6, C7, C12, C13 = 33uF 16V 85C (measured at 39uF, 38uF, 38uF, 39uF)
  • C14 = 1uF 50V (measured at 1.2uF and then it fluctuated down to around 1.15uF)

Analog Board Capacitors

  • C1 = 35V 3.9uF (M) measured at 4.37uF
  • C2 = 16V 4700uF SM measured at 4446uF
  • C3 = 16V 220uF +105C measured at 234uF
  • C5 = 10V 47uF 85C measured at 45.6uF
  • C6 = 50V 22uF 85C measured at 23.3uF
  • C10 = 16V 33uF 85C measured at 37uF
  • C11 = 160V 10uF 85C measured at 11.4uF
  • C12 = 50V 22uF 85C measured at 23.2uF
  • C18 = 16V 33uF 85C measured at 36.7uF
  • C24 = 16V 2200uF 105C measured at 2469uF
  • C27 = 16V 2200uF 105C measured at 2171uF (although started at 2190 and then went down slowly)
  • C28 = 16V 1000uF 105C measured at 638uF, then 1037uF, then 1000uF, then 987uF
  • C30 = 16V 2200uF 105C measured at 2203uF
  • C31 = 16V 220uF 105C measured at 236uF
  • C32 = 16V 2200uF 105C measured at 2227uF
  • C34 = 200V 100uF 85C measured at 101.8uF
  • C35 = 200V 100uF 85C measured at 103.3uF
  • C37 = 250V 0.47uF measured at <exploded>. wheee!
  • C38 = 200V 100uF 85C measured at 103.3uF
  • C39 = 200V 100uF 85C mesaured at 99.6uF (with scorch marks from next door)
  • C42 = 10V 470uF 85C measured at 556uF
  • C45 = 10V 470uF 85C measured at 227uF, then 637uF then 600uF

I’ve ordered an analog board kit from https://console5.com/store/macintosh-128k-512k-plus-analog-pcb-cap-kit-630-0102-661-0462.html and when trying to put them in, I learned that the US Analog board is different to the International Analog board!!! Gah. Dammit.

Note that C30, C32, C38, C39, and C37 were missing from the kit I received (probably due to differences in the US and International boards). I did have an X2 cap (for C37) but it was 0.1uF not 0.47uF. I also had two extra 1000uF 16V caps.

Macintosh Repair and Upgrade Secrets (up to the Mac SE no less!) holds an Appendix with the parts listing for both the US and International Analog boards, and this led me to conclude that they are in fact different boards rather than just a few wires that are different. I am not sure what the “For 120V operation, W12 must be in place” and “for 240V operation, W12 must be removed” writing is about on the International Analog board, but I’m not quite up to messing with that at the moment.

So, I ordered the parts (linked above) and waited (again) to be able to finish re-capping the board.

I found https://youtu.be/H9dxJ7uNXOA video to be a good one for learning a bunch about the insides of compact Macs, I recommend it and several others on his YouTube channel. One interesting thing I learned is that the X2 cap (C37 on the International one) is before the power switch, so could blow just by having the system plugged in and not turned on! Okay, so I’m kind of assuming that it also applies to the International board, and mine exploded while it was plugged in and switched on, so YMMV.

Additionally, there’s an interesting list of commonly failing parts. Unfortunately, this is also for the US logic board, so the tables in Macintosh Repair and Upgrade Secrets are useful. I’m hoping that I don’t have to replace anything more there, but we’ll see.

But, after the Nth round of parts being delivered….

Note the lack of an exploded capacitor

Yep, that’s where the exploded cap was before. Cleanup up all pretty nicely actually. Annoyingly, I had to run it all through a step-up transformer as the board is all set for Australian 240V rather than US 120V. This isn’t going to be an everyday computer though, so it’s fine.

Macintosh Plus booting up (note how long the memory check of 4MB of RAM takes. I’m being very careful as the cover is off. High, and possibly lethal voltages exposed.

Woohoo! It works. While I haven’t found my supply of floppy disks that (at least used to) work, the floppy mechanism also seems to work okay.

Macintosh Plus with a seemingly working floppy drive mechanism. I haven’t found a boot floppy yet though.

Next up: waiting for my Floppy Emu to arrive as it’ll certainly let it boot. Also, it’s now time to rip the house apart to find a floppy disk that certainly should have made its way across the ocean with the move…. Oh, and also to clean up the mouse and keyboard.

Raptor Blackbird support: all upstream in op-build

Thanks to my most recent PR being merged, op-build v2.5 will have full support for the Raptor Blackbird! This includes support for the “IPL Monitor” that’s required to get fan control going.

Note that if you’re running Fedora 32 then you need some patches to buildroot to have it build, but if you’re building on something a little older, then upstream should build and work straight out of the box (err… git tree).

I also note that the work to get Secure Boot for an OS Kernel going is starting to make its way out for code reviews, so that’s something to look forward to (although without a TPM we’re going to need extra code).

A op-build v2.5-rc1 based Raptor Blackbird Build

I have done a few builds of firmware for the Raptor Blackbird since I got mine, each of them based on upstream op-build plus a few patches. The previous one was Yet another near-upstream Raptor Blackbird firmware build that I built a couple of months ago. This new build is based off the release candidate of op-build v2.5. Here’s what’s changed:

PackageOld VersionNew Version
hcodehw030220a.opmsthw050520a.opmst
hostbootacdff8a390a2654dd52fed67bdebe2b5
kexec-lite18ec88310c4134e6b0130b3c1ea489e
libflashv6.5-228-g82aed17av6.6
linuxv5.4.22v5.4.33
linux-headersv5.4.22v5.4.33
machine-xml17e9e84d504582c88e782e30829e0d6be
occ3ab29212518e65740ab4dc96fd6cf584c42
openpower-pnor6fb8d914134d544a84175f00d9c6dc395faf3
sbec318ab00116d92f08c78fb7838495ad0aab7
skibootv6.5-228-g82aed17av6.6
Changes in my latest Blackbird build

Go grab blackbird.pnor from https://www.flamingspork.com/blackbird/stewart-blackbird-6-images/, and give it a go! Just scp it to your BMC, and flash it:

pflash -E -p /tmp/blackbird.pnor

There’s two differences from upstream op-build: my pull request to op-build, and the fixing of the (old) buildroot so that it’ll build on Fedora 32. From discussions on the openpower-firmware mailing list, it seems that one hopeful thing is to have all the Blackbird support merged in before the final op-build v2.5 is tagged. The previous op-build release (v2.4) was tagged in July 2019, so we’re about 10 months into what was a 2 month release cycle, so speculating on when that final release will be is somewhat difficult.

My POWER9 CPU Core Layout

So, following on from my post on Sensors on the Blackbird (and thus Power9), I mentioned that when you look at the temperature sensors for each CPU core in my 8-core POWER9 chip, they’re not linear numbers. Let’s look at what that means….

stewart@blackbird9$ sudo ipmitool sensor | grep core
 p0_core0_temp            | na                                                                                                               
 p0_core1_temp            | na                                                                                                               
 p0_core2_temp            | na                                                                                                               
 p0_core3_temp            | 38.000                                                                                                           
 p0_core4_temp            | na          
 p0_core5_temp            | 38.000      
 p0_core6_temp            | na          
 p0_core7_temp            | 38.000      
 p0_core8_temp            | na          
 p0_core9_temp            | na          
 p0_core10_temp           | na          
 p0_core11_temp           | 37.000      
 p0_core12_temp           | na          
 p0_core13_temp           | na          
 p0_core14_temp           | na          
 p0_core15_temp           | 37.000      
 p0_core16_temp           | na          
 p0_core17_temp           | 37.000      
 p0_core18_temp           | na          
 p0_core19_temp           | 39.000      
 p0_core20_temp           | na          
 p0_core21_temp           | 39.000      
 p0_core22_temp           | na          
 p0_core23_temp           | na        

You can see I have eight CPU cores in my Blackbird system. The reason the 8 CPU cores are core 3, 5, 7, 11, 15, 17, 19, and 21 rather than 0-8 or something is that these represent the core numbers on the physical die, and the die is a 24 core die. When you’re making a chip as big and as complex as modern high performance CPUs, not all of the chips coming out of your fab are going to be perfect, so this is how you get different models in the line with only one production line.

Weirdly, the output from the hwmon sensors and why there’s a “core 24” and a “core 28”. That’s just… wrong. What it is, however, is right if you think of 8*4=32. This is a product of Linux thinking that Thread=Core in some ways. So, yeah, this numbering is the first thread of each logical core.

[stewart@blackbird9 ~]$ sensors|grep -i core
 Chip 0 Core 0:            +39.0°C  (lowest = +25.0°C, highest = +71.0°C)
 Chip 0 Core 4:            +39.0°C  (lowest = +26.0°C, highest = +66.0°C)
 Chip 0 Core 8:            +39.0°C  (lowest = +27.0°C, highest = +67.0°C)
 Chip 0 Core 12:           +39.0°C  (lowest = +26.0°C, highest = +67.0°C)
 Chip 0 Core 16:           +39.0°C  (lowest = +25.0°C, highest = +67.0°C)
 Chip 0 Core 20:           +39.0°C  (lowest = +26.0°C, highest = +69.0°C)
 Chip 0 Core 24:           +39.0°C  (lowest = +27.0°C, highest = +67.0°C)
 Chip 0 Core 28:           +39.0°C  (lowest = +27.0°C, highest = +64.0°C)

But let’s ignore that, go from the IPMI sensors (which also match what the OCC shows with “occtoolp9 -LS” (see below).

$ ./occtoolp9 -SL
Sensor Details: (found 86 sensors, details only for Status of 0x00)                                           
     GUID Name             Sample     Min    Max U    Stat   Accum     UpdFreq   ScaleFactr   Loc   Type 
....
   0x00ED TEMPC03………     47      29     47 C    0x00 0x00037CF2 0x00007D00 0x00000100 0x0040 0x0008
   0x00EF TEMPC05………     37      26     39 C    0x00 0x00014E53 0x00007D00 0x00000100 0x0040 0x0008
   0x00F1 TEMPC07………     46      28     46 C    0x00 0x0001A777 0x00007D00 0x00000100 0x0040 0x0008
   0x00F5 TEMPC11………     44      27     45 C    0x00 0x00018402 0x00007D00 0x00000100 0x0040 0x0008
   0x00F9 TEMPC15………     36      25     43 C    0x00 0x000183BC 0x00007D00 0x00000100 0x0040 0x0008
   0x00FB TEMPC17………     38      28     41 C    0x00 0x00015474 0x00007D00 0x00000100 0x0040 0x0008
   0x00FD TEMPC19………     43      27     44 C    0x00 0x00016589 0x00007D00 0x00000100 0x0040 0x0008
   0x00FF TEMPC21………     36      30     40 C    0x00 0x00015CA9 0x00007D00 0x00000100 0x0040 0x0008

So what does that mean for physical layout? Well, like all modern high performance chips, the POWER9 is modular, with a bunch of logic being replicated all over the die. The most notable duplicated parts are the core (replicated 24 times!) and cache structures. Less so are memory controllers and PCI hardware.

P9 chip layout from page 31 of the POWER9 Register Specification

See that each core (e.g. EC00 and EC01) is paired with the cache block (EC00 and EC01 with EP00). That’s two POWER9 cores with one 512KB L2 cache and one 10MB L3 cache.

You can see the cache layout (including L1 Instruction and Data caches) by looking in sysfs:

$ for i in /sys/devices/system/cpu/cpu0/cache/index*/; \
  do echo -n $(cat $i/level) $(cat $i/size) $(cat $i/type); \
  echo; done
 1 32K Data
 1 32K Instruction
 2 512K Unified
 3 10240K Unified

So, what does the layout of my POWER9 chip look like? Well, thanks to the power of graphics software, we can cross some cores out and look at the topology:

My 8-core POWER9 CPU in my Raptor Blackbird

If I run some memory bandwidth benchmarks, I can see that you can see the L3 cache capacity you’d assume from the above diagram: 80MB (10MB/core). Let’s see:

[stewart@blackbird9 lmbench3]$ for i in 5M 10M 20M 30M 40M 50M 60M 70M 80M 500M; \
  do echo -n "$i   "; \
  ./bin/bw_mem -N 100  $i rd; \
done
  5M    5.24 63971.98
 10M   10.49 31940.14
 20M   20.97 17620.16
 30M   31.46 18540.64
 40M   41.94 18831.06
 50M   52.43 17372.03
 60M   62.91 16072.18
 70M   73.40 14873.42
 80M   83.89 14150.82
 500M 524.29 14421.35

If all the cores were packed together, I’d expect that cliff to be a lot sooner.

So how does this compare to other machines I have around? Well, let’s look at my Ryzen 7. Specifically, a “AMD Ryzen 7 1700 Eight-Core Processor”. The cache layout is:

$ for i in /sys/devices/system/cpu/cpu0/cache/index*/; \
  do echo -n $(cat $i/level) $(cat $i/size) $(cat $i/type); \
  echo; \
done
 1 32K Data
 1 64K Instruction
 2 512K Unified
 3 8192K Unified

And then the performance benchmark similar to the one I ran above on the POWER9 (lower numbers down low as 8MB is less than 10MB)

$ for i in 4M 8M 16M 24M 32M 40M 48M 56M 64M 72M 80M 500M; \
  do echo -n "$i   "; ./bin/x86_64-linux-gnu/bw_mem -N 10  $i rd;\
done
  4M    4.19 61111.04
  8M    8.39 28596.55
 16M   16.78 21415.12
 24M   25.17 20153.57
 32M   33.55 20448.20
 40M   41.94 20940.11
 48M   50.33 20281.39
 56M   58.72 21600.24
 64M   67.11 21284.13
 72M   75.50 20596.18
 80M   83.89 20802.40
 500M 524.29 21489.27

And my laptop? It’s a four core part, specifically a “Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz” with a cache layout like:

$ for i in /sys/devices/system/cpu/cpu0/cache/index*/; \
   do echo -n $(cat $i/level) $(cat $i/size) $(cat $i/type); \
     echo; \
   done
   1 32K Data
   1 32K Instruction
   2 256K Unified
   3 6144K Unified 
$ for i in 3M 6M 12M 18M 24M 30M 36M 42M 500M; \
  do echo -n "$i   "; ./bin/x86_64-linux-gnu/bw_mem -N 10  $i rd;\
done
  3M    3.15 48500.24
  6M    6.29 27144.16
 12M   12.58 18731.80
 18M   18.87 17757.74
 24M   25.17 17154.12
 30M   31.46 17135.87
 36M   37.75 16899.75
 42M   44.04 16865.44
 500M 524.29 16817.10

I’m not sure what performance conclusions we can realistically draw from these curves, apart from “keeping workload to L3 cache is cool”, and “different chips have different cache hardware”, and “I should probably go and read and remember more about the microarchitectural characteristics of the cache hardware in Ryzen 7 hardware and 10th gen Intel Core hardware”.

OCC and Sensors on the Raptor Blackbird (and other POWER9 systems)

This post we’re going to look at three different ways to look at various sensors in the Raptor Blackbird system. The Blackbird is a single socket uATX board for the POWER9 processor. One advantage of the system is completely open source firmware, so you can (like I have): build your own firmware. So, this is my Blackbird running my most recent firmware build (the BMC is running the 2.00 release from Raptor).

Sensors over IPMI

One way to get the sensors is over IPMI. This can be done either in-band (as in, from the OS running on the blackbird), or over the network.

stewart@blackbird9$ sudo ipmitool sensor |head
occ                      | na         | discrete   | na    | na        | na        | na        | na        | na        | na        
 occ0                     | 0x0        | discrete   | 0x0200| na        | na        | na        | na        | na        | na        
 occ1                     | 0x0        | discrete   | 0x0100| na        | na        | na        | na        | na        | na        
 p0_core0_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core1_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core2_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core3_temp            | 38.000     | degrees C  | ok    | na        | -40.000   | na        | 78.000    | 90.000    | na        
 p0_core4_temp            | na         |            | na    | na        | na        | na        | na        | na        | na        
 p0_core5_temp            | 38.000     | degrees C  | ok    | na        | -40.000   | na        | 78.000    | 90.000    | na        
 p0_core6_temp            | na         |            | na    | na        | na        | na        | na        | na        | na    

It’s kind of annoying to read there, so standard unix tools to the rescue!

stewart@blackbird9$ sudo ipmitool sensor | cut -d '|' -f 1,2
 occ                      | na                                                                                                               
 occ0                     | 0x0                                                                                                              
 occ1                     | 0x0                                                                                                              
 p0_core0_temp            | na                                                                                                               
 p0_core1_temp            | na                                                                                                               
 p0_core2_temp            | na                                                                                                               
 p0_core3_temp            | 38.000                                                                                                           
 p0_core4_temp            | na          
 p0_core5_temp            | 38.000      
 p0_core6_temp            | na          
 p0_core7_temp            | 38.000      
 p0_core8_temp            | na          
 p0_core9_temp            | na          
 p0_core10_temp           | na          
 p0_core11_temp           | 37.000      
 p0_core12_temp           | na          
 p0_core13_temp           | na          
 p0_core14_temp           | na          
 p0_core15_temp           | 37.000      
 p0_core16_temp           | na          
 p0_core17_temp           | 37.000      
 p0_core18_temp           | na          
 p0_core19_temp           | 39.000      
 p0_core20_temp           | na          
 p0_core21_temp           | 39.000      
 p0_core22_temp           | na          
 p0_core23_temp           | na          
 p0_vdd_temp              | 40.000 
 dimm0_temp               | 35.000      
 dimm1_temp               | na          
 dimm2_temp               | na          
 dimm3_temp               | na          
 dimm4_temp               | 38.000      
 dimm5_temp               | na          
 dimm6_temp               | na          
 dimm7_temp               | na          
 dimm8_temp               | na          
 dimm9_temp               | na          
 dimm10_temp              | na          
 dimm11_temp              | na          
 dimm12_temp              | na          
 dimm13_temp              | na          
 dimm14_temp              | na          
 dimm15_temp              | na          
 fan0                     | 1200.000    
 fan1                     | 1100.000    
 fan2                     | 1000.000    
 p0_power                 | 33.000      
 p0_vdd_power             | 5.000       
 p0_vdn_power             | 9.000       
 cpu_1_ambient            | 30.600      
 pcie                     | 27.000      
 ambient                  | 26.000  

You can see that I have 3 fans, two DIMMs (although why it lists 16 possible DIMMs for a two DIMM slot board is a good question!), and eight CPU cores. More on why the layout of the CPU cores is the way it is in a future post.

The code path for reading these sensors is interesting, it’s all from the BMC, so we’re having the OCC inside the P9 read things, which the BMC then reads, and then passes back to the P9. On the P9 itself, each sensor is a call all the way to firmware and back! In fact, we can look at it in perf:

$ sudo perf record -g ipmitool sensor
$ sudo perf report --no-children
“ipmitool sensors” perf report

What are the 0x300xxxxx addresses? They’re the OPAL firmware (i.e. skiboot). We can look up the symbols easily, as the firmware exposes them to the kernel, which then plonks it in sysfs:

[stewart@blackbird9 ~]$ sudo head /sys/firmware/opal/symbol_map 
[sudo] password for stewart: 
0000000000000000 R __builtin_kernel_end
0000000000000000 R __builtin_kernel_start
0000000000000000 T __head
0000000000000000 T _start
0000000000000010 T fdt_entry
00000000000000f0 t boot_sem
00000000000000f4 t boot_flag
00000000000000f8 T attn_trigger
00000000000000fc T hir_trigger
0000000000000100 t sreset_vector

So we can easily look up exactly where this is:

[stewart@blackbird9 ~]$ sudo grep '18e.. ' /sys/firmware/opal/symbol_map 
 0000000000018e20 t .__try_lock.isra.0
 0000000000018e68 t .add_lock_request

So we’re managing to spend a whole 12% of execution time spinning on a spinlock in firmware! The call stack of what’s going on in firmware isn’t so easy, but we can find the bt_add_ipmi_msg call there which is probably how everything starts:

[stewart@blackbird9 ~]$ sudo grep '516.. ' /sys/firmware/opal/symbol_map   0000000000051614 t .bt_add_ipmi_msg_head  0000000000051688 t .bt_add_ipmi_msg  00000000000516fc t .bt_poll

OCCTOOL

This is the most not-what-you’re-meant-to-use method of getting access to sensors! It’s using a debug tool for the OCC firmware! There’s a variety of tools in the OCC source repositiory, and one of them (occtoolp9) can be used for a variety of things, one of which is getting sensor data out of the OCC.

$ sudo ./occtoolp9 -SL
     Sensor Type: 0xFFFF
 Sensor Location: 0xFFFF
     (only displaying non-zero sensors)
 Sending 0x53 command to OCC0 (via opal-prd)…
   MFG Sub Cmd: 0x05  (List Sensors)
   Num Sensors: 50
     [ 1] GUID: 0x0000 / AMEintdur…….  Sample:     20  (0x0014)
     [ 2] GUID: 0x0001 / AMESSdur0…….  Sample:      7  (0x0007)
     [ 3] GUID: 0x0002 / AMESSdur1…….  Sample:      3  (0x0003)
     [ 4] GUID: 0x0003 / AMESSdur2…….  Sample:     23  (0x0017)

The odd thing you’ll see is “via opal-prd” – and this is because it’s doing raw calls to the opal-prd binary to talk to the OCC firmware running things like “opal-prd --expert-mode htmgt-passthru“. Yeah, this isn’t a in-production thing :)

Amazingly (and interestingly), this doesn’t go through host firmware in the way that an IPMI call will. There’s a full OCC/Host firmware interface spec to read. But it’s insanely inefficient way to monity sensors, a long bash script shelling out to a whole bunch of other processes… Think ~14.4 billion cycles versus ~367million cycles for the ipmitool option above.

But there are some interesting sensors at the end of the list:

Sensor Details: (found 86 sensors, details only for Status of 0x00)                                                  
     GUID Name             Sample     Min    Max U    Stat   Accum     UpdFreq   ScaleFactr   Loc   Type   
....
   0x014A MRDM0………..    688       3  15015 GBs  0x00 0x0144AE6C 0x00001901 0x000080FB 0x0008 0x0200
   0x014E MRDM4………..    480       3  14739 GBs  0x00 0x01190930 0x00001901 0x000080FB 0x0008 0x0200
   0x0156 MWRM0………..    560       4  16605 GBs  0x00 0x014C61FD 0x00001901 0x000080FB 0x0008 0x0200
   0x015A MWRM4………..    360       4  16597 GBs  0x00 0x014AE231 0x00001901 0x000080FB 0x0008 0x0200

is that memory bandwidth? Well, if I run the STREAM benchmark in a loop and look again:

0x014A MRDM0………..  15165       3  17994 GBs  0x00 0x0C133D6C 0x00001901 0x000080FB 0x0008 0x0200
   0x014E MRDM4………..  17145       3  18016 GBs  0x00 0x0BF501D6 0x00001901 0x000080FB 0x0008 0x0200
   0x0156 MWRM0………..   8063       4  24280 GBs  0x00 0x07C98B88 0x00001901 0x000080FB 0x0008 0x0200
   0x015A MWRM4………..   1138       4  24215 GBs  0x00 0x07CE82AF 0x00001901 0x000080FB 0x0008 0x0200

It looks like it! Are these exposed elsewhere? Well, another blog post at some point in the future is where I should look at that.

lm-sensors

$ rpm -qf /usr/bin/sensors
 lm_sensors-3.5.0-6.fc31.ppc64le

Ahhh, old faithful lm-sensors! Yep, a whole bunch of sensors are just exposed over the standard interface that we’ve been using since ISA was a thing.

[stewart@blackbird9 ~]$ sensors                                                                  
 ibmpowernv-isa-0000                                       
 Adapter: ISA adapter                                      
 Chip 0 Vdd Remote Sense:  +1.02 V  (lowest =  +0.72 V, highest =  +1.02 V)
 Chip 0 Vdn Remote Sense:  +0.67 V  (lowest =  +0.67 V, highest =  +0.67 V)
 Chip 0 Vdd:               +1.02 V  (lowest =  +0.73 V, highest =  +1.02 V)
 Chip 0 Vdn:               +0.68 V  (lowest =  +0.68 V, highest =  +0.68 V)
 Chip 0 Core 0:            +47.0°C  (lowest = +25.0°C, highest = +71.0°C)            
 Chip 0 Core 4:            +47.0°C  (lowest = +26.0°C, highest = +66.0°C)            
 Chip 0 Core 8:            +48.0°C  (lowest = +27.0°C, highest = +67.0°C)            
 Chip 0 Core 12:           +48.0°C  (lowest = +26.0°C, highest = +67.0°C)            
 Chip 0 Core 16:           +47.0°C  (lowest = +25.0°C, highest = +67.0°C)                      
 Chip 0 Core 20:           +47.0°C  (lowest = +26.0°C, highest = +69.0°C)            
 Chip 0 Core 24:           +48.0°C  (lowest = +27.0°C, highest = +67.0°C)                     
 Chip 0 Core 28:           +51.0°C  (lowest = +27.0°C, highest = +64.0°C)                     
 Chip 0 DIMM 0 :           +40.0°C  (lowest = +34.0°C, highest = +44.0°C)                     
 Chip 0 DIMM 1 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)                     
 Chip 0 DIMM 2 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 3 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 4 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 5 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 6 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 7 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 8 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 9 :            +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 10 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 11 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 12 :          +43.0°C  (lowest = +36.0°C, highest = +47.0°C)
 Chip 0 DIMM 13 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 14 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 DIMM 15 :           +0.0°C  (lowest =  +0.0°C, highest =  +0.0°C)
 Chip 0 Nest:              +48.0°C  (lowest = +27.0°C, highest = +64.0°C)
 Chip 0 VRM VDD:           +47.0°C  (lowest = +39.0°C, highest = +66.0°C)
 Chip 0 :                  44.00 W  (lowest =  31.00 W, highest = 132.00 W)
 Chip 0 Vdd:               15.00 W  (lowest =   4.00 W, highest = 104.00 W)
 Chip 0 Vdn:               10.00 W  (lowest =   8.00 W, highest =  12.00 W)
 Chip 0 :                 227.11 kJ
 Chip 0 Vdd:               44.80 kJ
 Chip 0 Vdn:               58.80 kJ
 Chip 0 Vdd:              +21.50 A  (lowest =  +6.50 A, highest = +104.75 A)
 Chip 0 Vdn:              +14.88 A  (lowest = +12.63 A, highest = +18.88 A)

The best thing? It’s really quick! The hwmon interface is fast and efficient.

Yet another near-upstream Raptor Blackbird firmware build

In what is coming a month occurance, I’ve put up yet another firmware build for the Raptor Blackbird with close-to-upstream firmware (see here and here for previous ones).

Well, I’ve done another build! It’s current op-build (as of yesterday), but my branch with patches for the Raptor Blackbird. The skiboot patch is there, the SBE speedup patch is now upstream. The machine-xml which is straight from Raptor but in my repo.

Here’s the current versions of everything:

$ lsprop /sys/firmware/devicetree/base/ibm,firmware-versions/
skiboot          "v6.5-228-g82aed17a-p4360f95"
bmc-firmware-version
                 "0.00"
occ              "3ab2921"
hostboot         "acdff8a-pe7e80e1"
buildroot        "2019.05.3-15-g3a4fc2a888"
capp-ucode       "p9-dd2-v4"
machine-xml      "site_local-stewart-a0efd66"
hostboot-binaries
                 "hw013120a.opmst"
sbe              "c318ab0-p1ddf83c"
hcode            "hw030220a.opmst"
petitboot        "v1.12"
phandle          0000064c (1612)
version          "blackbird-v2.4-514-g62d1a941"
linux            "5.4.22-openpower1-pdbbf8c8"
name             "ibm,firmware-versions"

If we compare this to the last build I put up, we have:

Componentoldnew
skibootv6.5-209-g179d53df-p4360f95v6.5-228-g82aed17a-p4360f95
linux5.4.13-openpower1-pa361bec5.4.22-openpower1-pdbbf8c8
occ3ab2921no change
hostboot779761d-pe7e80e1acdff8a-pe7e80e1
buildroot2019.05.3-14-g17f117295f2019.05.3-15-g3a4fc2a888
capp-ucodep9-dd2-v4no change
machine-xmlsite_local-stewart-a0efd66no change
hostboot-binarieshw011120a.opmsthw013120a.opmst
sbe166b70c-p06fc80cc318ab0-p1ddf83c
hcodehw011520a.opmsthw030220a.opmst
petitbootv1.11v1.12
versionblackbird-v2.4-415-gb63b36efblackbird-v2.4-514-g62d1a941

So, what do those changes mean? Not too much changed over the past month. Kernel bump, new petitboot (although I can’t find release notes but it doesn’t look like there’s a lot of changes), and slight bumps to other firmware components.

Grab blackbird.pnor from https://www.flamingspork.com/blackbird/stewart-blackbird-4-images/ and give it a whirl!

To flash it, copy blackbird.pnor to your Blackbird’s BMC in /tmp/ (important! the /tmp filesystem has enough room, the home directory for root does not), and then run:

pflash -E -p /tmp/blackbird.pnor

Which will ask you to confirm and then flash:

About to erase chip !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Erasing... (may take a while)
[==================================================] 99% ETA:1s      
done !
About to program "/tmp/blackbird.pnor" at 0x00000000..0x04000000 !
Programming & Verifying...
[==================================================] 100% ETA:0s   

Another close-to-upstream Blackbird Firmware Build

A few weeks ago (okay, close to six), I put up a firmware build for the Raptor Blackbird with close-to-upstream firmware (see here).

Well, I’ve done another build! It’s current op-build (as of this morning), but my branch with patches for the Raptor Blackbird. The skiboot patch is there, as is the SBE speedup patch. Current kernel (works fine with my hardware), current petitboot, and the machine-xml which is straight from Raptor but in my repo.

Versions of everything are:

$ lsprop /sys/firmware/devicetree/base/ibm,firmware-versions/
skiboot          "v6.5-209-g179d53df-p4360f95"
bmc-firmware-version
		 "0.01"
occ              "3ab2921"
hostboot         "779761d-pe7e80e1"
buildroot        "2019.05.3-14-g17f117295f"
capp-ucode       "p9-dd2-v4"
machine-xml      "site_local-stewart-a0efd66"
hostboot-binaries
		 "hw011120a.opmst"
sbe              "166b70c-p06fc80c"
hcode            "hw011520a.opmst"
petitboot        "v1.11"
phandle          000005d0 (1488)
version          "blackbird-v2.4-415-gb63b36ef"
linux            "5.4.13-openpower1-pa361bec"
name             "ibm,firmware-versions"

You can download all the bits (including debug tarball) from https://www.flamingspork.com/blackbird/stewart-blackbird-2-images/ and follow the instructions for trying it out or flashing blackbird.pnor.

Again, would love to hear how it goes for you!