upgraded mythtv box to breezy

Finally got arround to upgrading the mythtv box. Less RAM being used is good – it’s only got 512MB and some of the recordings were getting pretty fragmented due to having to flush things to disk a bit too often. Hopefully things will be a bit better now.

Annoying things were:

  • Having to reboot into the old kernel to get the wireless driver to download the new kernel source package
  • having to again reboot to get gcc-3.4 to build the rt2570 module with the gcc-3.4 compiled kernel
  • rebuilding lirc again (but that’s easy)
  • still having to manually load the sound card driver.
  • no fancy boot screen (at least on my lilo setup)
  • no automagic convert to grub now that it seems to work with XFS

But a lot less painful than previous big upgrades. The debian way between releases is so nice it’s not funny.

Things left to do:

  • rebuild mythtv with proper lirc support so i can stop using irxevent and get lirc going with an auto launching xine/mplayer and use the box properly as a DVD player too. (this is not actually related to the ubuntu upgrade, it’s just a thing i’ve wanted to do for a while – and it’s my own fault i didn’t do it right in the first place)
  • upgrade from mysql 4.0 to 5.0 (hey, maybe 5.1 ndb dd just to really be eating our own dogfood :) – again, not related to the ubuntu upgrade.

I’ve also set it up to tape the tennis, although at a low priority so my lovely flatmate doesn’t get annoyed if she misses her shows :)

Oh how I need more tuner cards.

Thanks goes to jdub for quickly asking what the upgrade issue was. No thanks goes to me for making my things left to do list look like it was related to the upgrade.

Mangoes are good

that is all.

err… okay, a bit more. Breakfast this morning consisted of doing the dishes (good first step), some toast with jam and vegemite and a mango. Yum. All the time listening to the recording of the company-wide conf call from the other day (2am was just a little bit late that night).

These conf calls are really good actually – being able to throw questions directly at the top (and have them answered) is a great thing. Also getting to know what is going on from a higher perspective is really valuable.

Microsoft’s file system patent upheld: ZDNet Australia: News: Software

Microsoft’s file system patent upheld: ZDNet Australia: News: Software

Saying any part of the FAT file system is “novel and non-obvious” is rather like saying being stabbed in the eye with a fork is “novel and a good way to spend a sunday afternoon”.

Seriously – what the?

I’m really glad I work for a company that opposes software patents.

Thanks to Pia for the links.

Bug 15695 and NDB initial start

The process for starting up a cluster is pretty interesting. Where, of course, “interesting” is translated to “complex”. There’s a lot of things you have to watch out for (namely you want one cluster, not two or ten or anything). You also want to actually start a cluster, not just wait forever for everybody to show up.

Except in some situations. For example, initial start. With an initial start, you really want to have all the nodes present (you don’t want to run the risk of starting up two separate clusters!).

Bug 15695 is a bug to do with Initial Start. If you have three nodes (a management node and two data nodes) and break the network connection just between the two data nodes, and then reconnect it (at the wrong time – where the wrong time means you trigger the bug) the cluster will never start. A workaround is to restart one of the data nodes and everything comes up.

Note that this is just during initial start so it’s not a HA bug or anything. Just really annoying.

This seems to get hit when people have firewalls stopping the nodes talking to each other and then fix the firewall (but not shutting down the cluster).

As is documented in the bug, you can replicate this with some iptables foo.

One of the main blocks involed in starting the cluster (and managing it once it’s up) is the Quorum manager – QMGR. You’ll find the code in ndb/src/kernel/blocks/qmgr/. You’ll also find some in the older CMVMI (Cluster Manager Virtual Machine Interface).

A useful thing to do is to define DEBUG_QMGR_START in your build. This gives you some debugging output printed to the ndb_X_out.log file.

The first bit of code in QmgrMain.cpp is the heartbeat code. execCM_HEARTBEAT simply resets the number of outstanding heartbeats for the node that sent the heartbeat. Really simple signal there.

During SR (System Restart) there is a timeout period for which we try to wait for nodes to start. This means we’ll be starting the cluster with the most number of nodes present (it’s much easier doing a SR with as many nodes as possible than doing NR – Node Recovery – on lots of nodes). NR requires copying of data over the wire. SR probably doesn’t. Jonas is working on optimised node recovery which is going to be really needed for disk data. This will only copy the changed/updated data over the wire instead of all data that that node needs. Pretty neat stuff.

We implement the timeout by sending a delayed signal to ourself. Every 3 seconds we check how the election of a president is going. If we reach our limit (30seconds) we try to start the cluster anyway – not allowing other nodes to join).

Current problem is that each node in this two node not-quite-yet cluster thinks it has won the election and so switches what state it’s in to ZRUNNING (see Qmgr.h) hence stopping the search for other nodes. When the link between the two nodes is brought back up – hugs and puppies do not ensue.

I should have a patch soon too.

For a more complete explanation on the stages of startup, have a look at the text files in ndb/src/kernel/blocks. Start.txt is a good one to read.

at the pub last night

Well, went out for food with jp, rupak and owen before heading down to a bar to meet with others for beer.

Slow start to this morning. Anyway, got some piccies. some people even smiled!

I wish there was a way to bulk transfer photos from my phone to my computer via bluetooth. It’s a real pain going “Send” on each friggin photo.

Pity phone cameras really suck, but hey – i guess they’re pretty portable. Notice how the really noisy pictures are taken with the phone’s “Night Mode”. Which really just means introduce more noise into the photo.

Picture(12).jpg
Picture(13).jpg

Picture(14).jpg Picture(15).jpg Picture(16).jpg Picture(17).jpg Picture(18).jpg

Picture(20).jpgPicture(19).jpgPicture(21).jpgPicture(22).jpg

uh oh, goodbye secure internet banking

Westpac seem to have lost the plot. My housemate signed into her internet banking a few minutes ago (and then I did, just to confirm) to be greeted with one message.

A portion of which is below.

Westpac looses brain
Do I really need to point out the problems with this?

Followup: I’m on the phone to them now. The woman on the other end of the phone wasn’t aware of the move. She’s just gotten the same message and is now confirming it. I’ve asked if this is some kind of April Fools joke. I hope it is.

Followup 2:  Aparrently it’s no joke. “Early 2006”. I’m putting in a complaint. You can to – and please do!

new desktop background

a nice big 1.3MB image

I took this while on holiday in Apollo Bay. We were on a treetop walk – although this fern isn’t anywhere near the top of the trees. For some reason I like ferns.

Hope you like.

New Years Eve

Picture(7).jpgPicture(6).jpgPicture(8).jpgPicture(10).jpgPicture(9).jpg

I’m sure there’s more photos hiding on somebody’s camera and stuff – but these are the ones on my phone.

I’m so impressed that wordpress allows me to upload them now (without pain). Brilliant!

Happy New Year! 

i wonder if image uploads or something actually works…

I wonder if wordpress has gotten better and allows me to easily
upload photos.

Picture(24).jpg

hrrm… looks like it!

WordPress 2.0

ahh… new WordPress version. Seems to be going okay at the moment – we’ll see if anything hits the fan.

linksys better than netgear

Bought a linksys ADSL2 Modem/router today – an AG241. Works out of the box (plus a firmware upgrade). Everything going without problems.

Yay linksys, sucks to netgear.

fuck netgear

Fuck them right in the ear.

The web UI sucks, doesn’t actually work properly under a free browser (errr… okay, Gecko based) – namely the page where you can change the IP of the router.

In modem mode, it seems to still be able to do PPPoA authentication – which makes really weird shit happen. like a netmask of 255.255.255.255 – which I’m not actually convinced is a ADSL modem problem, possible internode instead. Think about how the hell you’re meant to access your gateway.

Oh, and the port forwarding doesn’t work!!!

It’s a Netgear DG632 ADSL Modem Router.

Although it does run linux (or at least some GPL and some LGPL software). Why the hell can’t i just get a console into the darn thing.

just what i wanted to spend time on after a day moving.

Slashdot | Sun Open-Sourcing UltraSPARC Design

Slashdot | Sun Open-Sourcing UltraSPARC Design

This is pretty ultra-cool news. Especially in academic circles and for upcoming chip designers.

I’m sure that there’s decent business models in place so that they don’t cannibalise their hardware sales.

What would be very cool is if cheap manufactures pick up slightly older chip generations and produce them for dirt-cheap prices. This means more commodity hardware – which is very good for humanity.

It also makes it possible to run an even more open platform where even the source to your CPU is available!

new .emacs snippet

for the non lisp hackers – this sets some c mode options depending on the name of the path to the source file.


;; run this for mysql source
(defun mysql-c-mode-common-hook () (setq indent-tabs-mode nil))

;; linux kernel style
(defun linux-c-mode-common-hook () linux-c-mode)

(setq my-c-mode-common-hook '(lambda ()
(turn-on-font-lock)
(setq comment-column 48)
)
)

;; predicates to check
(defvar my-style-selective-mode-hook nil)

(add-hook 'my-style-selective-mode-hook
'((string-match "MySQL" (buffer-file-name)) . mysql-c-mode-common-hook)
)

(add-hook 'my-style-selective-mode-hook
'((string-match "linux" (buffer-file-name)) . linux-c-mode-common-hook)
)

;; default hook
(add-hook 'my-style-selective-mode-hook
'(t . my-c-mode-common-hook) t)

;; find which hook to run depending on predicate
(defun my-style-selective-mode-hook-function ()
"Run each PREDICATE in `my-style-selective-mode-hook' to see if the
HOOK in the pair should be executed. If the PREDICATE evaluate to non
nil HOOK is executed and the rest of the hooks are ignored."
(let ((h my-style-selective-mode-hook))
(while (not (eval (caar h)))
(setq h (cdr h)))
(funcall (cdar h))))

;; Add the selective hook to the c-mode-common-hook
(add-hook 'c-mode-common-hook 'my-style-selective-mode-hook-function)

disk space allocation (part 4: allocating an extent)

For XFS, in normal operation, an extent is only allocated when data has to be written to disk. This is called delayed allocation. If we are extending a file by 50MB – that space is deducted from the total free space on the filesystem, but no decision on where to place that data is made until we start writing it out – due to memory pressure or the kernel automatically starts writing the dirty pages out (the sync once every 5 seconds on linux).

When an extent needs to be allocated, XFS looks it up in one of two b+trees it has of free space. There is one sorted by starting block number (so you can search for “an extent near here”) and one by size (so you can search for “an extent of x size”).

The ideal situation being that you want as large an extent as possible as close to the tail end of the file as possible (i.e. just making the current extent bigger).

The worst-case scenario is having to allocate extents to multiple files at once with all of them being written out synchronously (O_SYNC or memory pressure) as this will cause lots of small extents to be created.

disk space allocation (part 3: storing extents on disk)

Here I’m going to talk about how file systems store what part of the disk a part of the file occupies. If your database files are very fragmented, performance will suffer. How much depends on a number of things however.

XFS can store some extents directly in the inode (see xfs_dinode.h). If I’m reading things correctly, this can be 2 extents per fork (data fork and attribute fork). If more than this number of extents are needed, a btree is used instead.

HFS/HFS+ can store up to 8 extents directly in the catalog file entry (see Apple TechNote 1150 – which was updated in March 2004 with information on the journal format). If the file has more than 8 extents, a lookup then needs to be done into the extents overflow file. Interestingly enough, in MacOS X 10.4 and above (i think it was 10.4… may have been 10.3 as well) if a file is less than 20MB and has more than 8 extents, on an open, the OS will automatically try to defragment that file. Arguably you should just fix your allocation strategy, but hey – maybe this does actually help.

File systems such as ext2, ext3 and reiserfs just store a list of block numbers. In the case of ext2 and ext3, the futher into a file you are, the more steps are required to find the disk block number associated with that block in the file.

So what does an extent actually look like? Well, for XFS, the following excerpt from xfs_bmap_btree.h is interesting:

#define ISUNWRITTEN(x) ((x)->br_state == XFS_EXT_UNWRITTEN)

typedef struct xfs_bmbt_irec
{
xfs_fileoff_t br_startoff; /* starting file offset */
xfs_fsblock_t br_startblock; /* starting block number */
xfs_filblks_t br_blockcount; /* number of blocks */
xfs_exntst_t br_state; /* extent state */
} xfs_bmbt_irec_t;

It’s also rather self explanetry. Holes (for sparse files) in XFS don’t have extents, and an extent doesn’t have to have been written to disk. This allows you to preallocate space in chunks without having written anything to it. Reading from an unwritten extent gets you zeros (otherwise it would be a security hole!).

Sweden!

I’m in the Stockholm office at the moment. on the network, grabbing mail and all that foo.

Spent yesterday in London with Leandra, finally having arrived after the plane was delayed for four hours in Melbourne. Urgh.

There’s snow! it’s cool.

will have to check how photos look at some point.

and oh, good to be back in europe – good jam.

disk space allocation (part 2: examining your database files)

memberdb/log.MYD:
 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..943]:        5898248..5899191  3 (36536..37479)     944
   1: [944..1023]:     6071640..6071719  3 (209928..210007)    80
   2: [1024..1127]:    6093664..6093767  3 (231952..232055)   104
   3: [1128..1279]:    6074800..6074951  3 (213088..213239)   152
   4: [1280..1407]:    6074672..6074799  3 (212960..213087)   128
   5: [1408..1423]:    6074264..6074279  3 (212552..212567)    16
memberdb/log.MYI:
 EXT: FILE-OFFSET      BLOCK-RANGE        AG AG-OFFSET        TOTAL
   0: [0..7]:          10165832..10165839  5 (396312..396319)     8

The interesting thing about this is that the log table grows very slowly. This table stores a bunch of debugging output for my memberdb applicaiton. It should possibly be a partitioned ARCHIVE table (and probably will in the future).

The thing about a file growing slowly over time is that it’s more likely to have more than 1 extent (I’ll examine why in the near future).

My InnoDB data and log files only have 1 extent.. I think I’ve done a xfs_fsr on my file system though.

disk space allocation (part 1: seeing what’s happenned)

(a little while ago I was writing a really long entry on everything possible. I realised that this would be a long read for people and that less people would look at it, so I’ve split it up).

This sprung out of doing work on the NDB disk data tree. Anything where efficient use of the filesystem is concerned tickles my fancy, so I went to have a look at what was going on.

Filesystems store what part of the disk belongs to what file in one of two ways. The first is to keep a list of every disk block (typically 4kb) that’s being used by the file. A 400kb file will have 100 block numbers. The second way is to store a range (extent). That is, a 400kb file could use 100 blocks starting at disk block number 1000.

XFS has a tool called xfs_bmap. It gives you a list of the extents allocated to a file.

So, let’s have a look at what it tells us about some recordings on my MythTV box.

myth@orpheus:~$ ls -lah myth-recordings/10_20050912183000_20050912190000.nuv
 -rw-r--r--  1 myth myth 452M 2005-09-12 19:00 myth-recordings/10_20050912183000_20050912190000.nuv
myth@orpheus:~$ xfs_bmap -v myth-recordings/10_20050912183000_20050912190000.nuv
myth-recordings/10_20050912183000_20050912190000.nuv:
 EXT: FILE-OFFSET       BLOCK-RANGE          AG AG-OFFSET             TOTAL
   0: [0..639]:         228712176..228712815  7 (21106232..21106871)    640
   1: [640..1663]:      83674040..83675063    2 (24358056..24359079)   1024
   2: [1664..923519]:   83675368..84597223    2 (24359384..25281239) 921856
   3: [923520..924031]: 84631272..84631783    2 (25315288..25315799)    512

Just to make things fun, this is all in 512byte blocks. But anyway, the real interesting thing is the number of extents. Ideally, every file would have one extent as this means that we avoid disk seeks – *the* most expensive disk operation.

XFS also provides the xfs_fsr tool (File System Repacker) that can defragment files (even on a mounted file system). On IRIX this used to run out of cron – fun when a bunch of machines hit a CXFS volume all at the same time.