Why “returns -1 on error” is bad

(a general note on what’s good practice)

In C, 0 is false and !0 is true.

In the dim past there was an elsewhere where 0 was true and !0 was false. Why? Because there can be more than one error state and this is usually more interesting than how many ways success could have been acheived.

Well, that sucks too – there’s information on success that could be useful (e.g. we succeeded, but only n bytes worth instead of the m you asked for).

So, the way of <0 on failure and else success came about for packing the maximum amount of information into the int that we commonly return from functions (and usually fits nicely in a register and it all leads to hugs, puppies and a warm feeling inside).

So what do most people do on error? Return -1.

Hrrmm… this casually (if not totally) defeats the point. In any function that does any real work, there’s going to be more than one place where failure could occur (even if it’s an error path that should never really happen… it will, but never to you… always to a guy somewhere in a country that you didn’t know existed and knows less $native_language than you have digits).

So if you get a bug report in with a log message (because you do print log messages when errors occur! – especially non-totally-fatal ones!) about a failure, and you go to look at that function and go “aha! this function must have returned -1!” Well, it just so happens that there are five places that could return -1. Where did your program fail? Without a core dump or something, you will never know.

So, what if these five places returned different error codes (which, of course, you wrote to the log)? Then you’d be able to narrow down the search for buggy code!

It doesn’t have to be a unique number, or even user understandable (especially when these are places that shouldn’t fail – or so you think) but it makes your job a hell of a lot easier if you can quickly jump to the bit of code you should look at.

In cluster, we have this great system where when really bad stuff happens, we get these nice trace logs of what signals have been cruising around the cluster recently. This greatly helps with debugging. It sort of makes you go “wow” when you first see a crash reported, trace file follows, and then a patch a few hrs later that fixes the problem. This is because it’s an aid in tracking down exactly where to look for the problem.

“It crashed” is never a useful bug report. But only having the facilities in your software for only being able to say “it crashed” unless you’re a developer guru dude isn’t very useful either.

The various backtrace reporting tools do a bit to help. As always, the more information the better. This is certainly the case when you look at the backtrace and go “how on earth did we ever get there?” or the stack is just completely hosed and you have no hope of finding your arse from your elbow (although these days valgrind will help you here).

Here endith the lesson.

One thought on “Why “returns -1 on error” is bad

  1. I’ve used -ERRCODE in the distant past, which works rather well.
    So >= 0 is a positive result, and

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.