Rusty on floating point (and keeping neat code)

Rusty talks about the “fun” of floating point and how this all ties into Wesnoth.

Platform consistency is certainly a good thing – so I’m guessing the attack_prediction code isn’t run by each node in a network game in a way where machines could disagree on the outcome.

This does however bring up an interesting thing. What if, in the future, it was going to be on a per-node basis and people wanted it to be consistent. How do you warn that this isn’t the case (to somebody who is really just reading the docs on this function)?

Is it easy (or is there even a good way) to separate code that’s on one machine versus every one? In NDB we have some protocols where some things are done on a master and others on the slaves (and sometimes, when we go back to refactor the code, we move some of this stuff around – e.g. some work on the BACKUP block that I did a while ago).

In NDB we rely on separate documentation (a diagram showing what signals go where and from who) and keep the code for executing the signals together in the code. We require the coder to think when they’re changing things about where the code is going to be executed (on the master, the slave or both).

We’ve also started to get some better habits in naming structures that are only going to be filled out on the master (or slave) or both. Writing code that looks at the wrong thing has been a source of bugs (especially while hacking on something) that are annoying to track down.

So how do we have these functions that in some cases shouldn’t be used (e.g. when consistency across platforms is important, or should only be used on the slave side of a distributed protocol)? Or rather, how do we warn others (and ourselves) from getting it wrong in the future?

Is the ultimate answer just that “you should read the code and understand it before you use it”? Probably, because any comments are going to be out of date anyway….

i now look forward to some sort of discussion.

One thought on “Rusty on floating point (and keeping neat code)

  1. I think there’s a reason why we haven’t heard of “round to base value” before. How does a rounding scheme work where 6-0.5 rounds to 6, but 5+0.5 rounds to 5?

    As for managing consistency of computations on a heterogeneous cluster, you are right that IEEE754 does not make guarantee identical results all the way down to the last bit. The solution is surely either to “pick a compute node” (no good if it might go away), or share a standard FP library between all nodes?

Leave a Reply