Avoid unsigned types by default

This post is very old. Please bear in mind that information here might be incorrect or obsolete, and links can be broken. If something seems wrong, please feel free to comment or contact me and I'll update the post.

For some reason or another, unsigned types in C++ are heavily overused. I know that Java doesn’t have them, and many students who learn programming with Java think this is a really serious deficiency which they have to make up by using unsigned types everywhere … however, it turns out, that you will rarely need unsigned types, and in most cases, they do more harm than good.

First of all, there are some cases where you need unsigned types. Pretty much all of these cases can be grouped into two categories:

  • Interop: Someone somewhere expects an unsigned value, so you have to convert your stuff at some point. You should try to convert it as late as possible, but with some sanity check. For instance, if you call an API which expects an unsigned size, and you have a signed size in your application, convert only after sanity-checking your signed value!
  • I/O: Disk or network I/O often requires the use of unsigned data types, as you typically want to work on the raw bit pattern of the value and not on the actual value. In this case, unsigned types fit and should be used.

Everywhere else, you should really try hard to avoid them. Especially when you design an API, stay away from unsigned types. There’s a very good article by Scott Meyers on this topic which explains the problems with unsigned types and interfaces. The main problem is that an unsigned type makes error/sanity checking impossible. If you have a function like memset(void* t, size_t size) and a pod like

struct pod {
unsigned char flags;
Vector3 coords;
};

and you pass on something like memset(&pod, sizeof (pod) - sizeof (Vector4)); because you vaguely remembered that pod has a Vector4 and a flags field in it, you’re going to clear around 18446744073709551613 bytes worth of data if running on a 64-bit system — with signed types, a quick assert(size >= 0); would have failed with size == -3.

This happens far more often than you would expect. Since I’m trying to use signed types as much as possible, I’ve found lots of lots of places where I can sanity-check values using asserts and pre-condition checks than before. Even for typical use cases like I/O routines (fwrite) or memory stuff (malloc), you can get away with one less bit in practically every case — I don’t remember ever having seen a call which would allocate or write more than 231 elements on a 32-bit machine; let alone 263 elements on a 64-bit machine. So, next time before you start typing unsigned type size, think twice if you’re really ever going to need that extra bit, or whether you are willing to trade it off for error checking.

Note: This does not imply that you don’t have to think about valid value ranges. The only help you get from signed types is that overflows are much harder to exploit if you sanitize most values in between. You should however still use things like SafeInt to make sure your arithmetic stays in range.

This entry was posted in Programming and tagged , . Bookmark the permalink.

12 Responses to Avoid unsigned types by default

  1. martinsm says:

    You are talking about how to pass int values to unsigned int’s. That is ok, I understand that.
    But if you have some API that returns unsigned int? How then you must use its value? Assign to int or what?
    For example, you have your own API – Array::Array(int size); And then you want to initialize Array object with size from std::vector. It returns size_t which is unsigned int. How you must pass this value to your own Array to avoid compiler warning about truncating value?

  2. Anteru says:

    You could for instance have something like …
    [source lang="cpp"]
    long size = static_cast<long> (vector.size());
    assert (size >= 0); // Overflow otherwise
    [/source]

    This assert should actually never fire. Of course, you can write a small template for this:
    [source lang="cpp"]
    // There is some unsigned2signed template specialization
    template <typename Unsigned>
    typename unsigned2signed<Unsigned>::result U2S (const Unsigned u)
    {
    typename unsigned2signed<Unsigned>::result s =
    static_cast<typename unsigned2signed<Unsigned>::result>(u);
    assert (s>=0);
    return s;
    }[/source]

  3. Creating an array with “negative” size will lead to failed allocation, there’s typically an assertion for that or an exception thrown, if we’re not talking about games.

    Trying to call memset/memcpy/fread/fwrite/etc. with “negative” size will lead to access violation 100% of the time.

    The only problem here is that the resulting error report is slightly less comprehensible than “your size is negative”, i.e. it requires the programmer to be aware of such problems.

    I do not think I’ve met a bug which could be detected by simple assertion with signed arithmetic but instead took some sizable amount of time to track down. Have you?

  4. Anteru says:

    Yeah, I have. The problem was often some silent corruption in an image accessor. I was passing x, y coordinates, and x turned out to be negative due to some other calculation error, and what happened is that I was reading from the wrong locations. Resulted in a slightly corrupted image, and of course the bug was not obvious at all. Think of code like this:
    [source lang="cpp"]
    unsigned x = 512-(4*127+5), y = 12;

    // In range, but invalid
    // operator[] internals
    unsigned o = y * 2048 + x;[/source]
    In such a case, sanity checking for >= 0 is already helpful (or full bound checking.)

    Second: True, calling memset with a “negative” size will lead to an access violation, but this is surely not desirable behaviour, is it? The main point here is to make code more robust against user errors, and an access violation is not really helpful. I’d be much more happy to see fwrite () fail instead with something like an “invalid size” error, which makes it clear that it’s the size that is incorrect, and not one of {file-descriptor, input-pointer, input-size}.

    Robust and easy-to-use code saves time in the long run, and signed types make it easier to write robust code, so it’s a clear decision for me.

  5. Kai Schröder says:

    especially for images i prefer unsigned type because i just need to check

    [source lang="cpp"]BOOST_ASSERT( size != 0 && size < width * height )[/source]

  6. Kai Schröder says:

    hm for some reason half of my text is missing

    lets try it again:

    With unsigned types I just need to check for size = 0 and < width * height

  7. Kai Schröder says:

    ok, text only:

    I just need to check for less than width times height and not additionally for greater than zero

  8. @Anteru
    assert(x < width && y < height); would've caught this, no?

    As for memset, uh… Supposing memset had signed arguments, what would you like it to do in case of negative input? I think it would assert in debug, and fill a lot of memory (the same amount as with unsigned arithmetic!) in release.

    If you do your own checking outside of memset, you're free to bounds-check the arguments, since memset can't do it for you anyway! i.e. assert(size <= size_of_source && size <= size_of_dest).

  9. Paulo says:

    Hello there,
    I’m sorry was just passing by but I just can’t leave without making a comment.
    Don’t want to get this to the religious side but when talking about C++ and then writing things like memsets and pods “a lá” C this make some C++ programmers nervous.

    C++ offers you syntactic sugar to avoid these situations.
    Another thing just because you can access the C API such as fread and the kind, it does not mean that that is C++.
    I/O access in (pure) C++ is done using the iostream family and its very nice operator<>.

    “with signed types, a quick assert(size >= 0); would have failed with size == -3.”
    This reminds me of the -1 checks usual in C.
    How about an assert(size < sizeof(pod)) that would make sense right?
    In the presented case doesn't even make sense to use memset, but even if we assume that we are clearing memory for a buffer.

    In C++ we could use:

    #include
    typedef std::vector Buffer;
    Buffer buffer(some_desired_size, some_default_value);

    or if we already have the Buffer

    buffer.assign(buffer.size(), some_default_value);

    or even

    #include
    std::fill(buffer.begin(), buffer.end(), some_default_value);
    which of course also works for normal pointers.

    In conclusion, no such problem in C++.

    PS: Hey actually C is my favorite language after C++. (JAVA sucks)

  10. Paulo says:

    #include
    typedef std::vector Buffer;

    should be for example:

    #include
    typedef std::vector Buffer;

    I was nervous =)

  11. Paulo says:

    Ermm ok got it, this thing removes the “<" hope you get the idea anyway.

  12. Anteru says:

    To some of the commenter who argue that using unsigned types and range-checking on those is enough: Often it’s true, but I’ve seen enough cases where intermediates for instance were negative and everything just seemed to work, but the end result was wrong and it was quite difficult to figure out what was going on. With signed types, I can usually sanity-check those cases _without_ the exact context (i.e. the proper bounds), as long as I know that some value should be positive. This makes it dead easy to sanity check throughout your whole application.

    Besides, what do you gain from the one additional bit of range? There’s no way you’ll ever going to actually use it; or at least I’ve never seen real-world code which used the whole 32-bit (usually, once you get close to the 32-bit limit, you’ll switch to 64-bit types.) So all you “gain” is the fact that you can only do checks like x < range, instead of being able to check x>=0 (trivial assert you can place everywhere) and x < range.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>