Package loading

Loading packages turned out to be a tougher problem than I thought ;). But: It's working! Hooray! My first first truly parallel serialization system (with the first truly parallel & concurrent GC) has gone live today. It'll take at least two weeks to clean the code up until it is ready for production and then it'll be tested first with the nivenPackageCompiler; which will probably take around a month or so as I want the nPkgComp to be really rock-solid.

Back

Some of you might have noticed it, I've been pretty busy the last two weeks. I was working for a company and now I'm "free" again ;) During the next weeks I'm preparing for my exams but I hope to get a chance to finish off the package loader. Moreover, with a bit of luck, I'll be able to run some tests on a dual-core computer soon.

Multithreaded issues

Here it is, the first blog post which has been requested by one of you, my fellow readers! Today, we dive into a multi-threaded queue and discover nasty concurrency problems.

The problem

The problem we want to look at is a multi-thread ready queue. It shall support the following functions:

  • Insert: A non-blocking insert which returns immediately.
  • Get: A get which blocks until there is something to return.
  • Wait: A blocking wait which - who would believe it - waits until the queue is empty again.

So here we go. We'll take a single Mutex to guard access to the local variables, and two conditions. One conditions will check if the queue is empty; we will use it to block the wait call. The other condition will be used to block the get call.

Implementation

We'll implement it in Python, so we don't have to implement threading primitives on our own.

from threading import *

class MT_Queue:
    def __init__ (self):
        self.queue = list ()
        self.mainLock = Lock ()
        self.queueEmptyCondition = Condition (self.mainLock)
        self.queueFullCondition = Condition (self.mainLock)

    def get (self):
        self.mainLock.acquire ()

        # straightforward?
        if (len (self.queue) == 0):
            self.queueFullCondition.wait ()

        item = self.queue.pop (0)

        if (len (self.queue) == 0):
            self.queueEmptyCondition.notifyAll ()

        self.mainLock.release ()

        return item

    def insert (self, item):
        self.mainLock.acquire ()

        self.queue.append (item)

        self.mainLock.release ()

    def wait (self):
        self.mainLock.acquire ()

        if (len (self.queue) == 0):
            self.mainLock.release ()
            return
        else:
            self.queueEmptyCondition.wait ()

        self.mainLock.release ()

Seems pretty simple, yet we have a real big problem as soon as we have one thread blocked on the get call and another which is running somewhere else while a third thread inserts a new item. (Probably you say this is kinda unlikely, but try for yourself, most probably you'll run into this problem immediately.)

The error

The error we will get is that a thread will try to pop an empty list. How is that you might ask? Let me show you a picture which illustrates the problem. The start situation is like this:

  • Thread #1 is blocked inside get on the wait condition.
  • Thread #2 is working somewhere.
  • Thread #3 is about to insert an item.

Shame on us, we've just written a race condition! If thread #2 is faster than thread #1, everything fails! Let's see how to fix this problem:

def get (self):
        self.mainLock.acquire ()

        # fixed
        while True:
            if (len (self.queue) == 0):
                self.queueFullCondition.wait ()

            if (len (self.queue) == 0):
                continue
            else:
                break

        item = self.queue.pop (0)

        if (len (self.queue) == 0):
            self.queueEmptyCondition.notifyAll ()

        self.mainLock.release ()

        return item

In case we've been waiting but someone else was faster, we'll simply run once more into the wait condition. This way, nothing bad can happen.

Conclusion

Multithreading has many very suble points one has to be aware of. When using a condition, always take a look in which order threads may wake up, and be aware that two threads waiting on the same mutex have the same chance to get it. There is no preference to the thread which waits on a condition. If you have questions, feel free to ask in the comments!

Thread-pool

It is a really hot summer here, and so I decided to build my own pool: And here it is: The niven thread-pool ;) (sorry for this bad joke) Anyway, the first working thread-pool is done, the last milestone towards the multithreaded package loader. I'll give it a few more revisions to mature a bit, but I expect to have it fully running and debugged in the next three weeks.

Exception(al?) performance

Read on for an exception performance review with Visual C++ and x86 vs. the new x86_64 ABI.

Test setup

All tests were compiled using Visual C++ 8.0 in the default release mode. The host system was a Turion64 with 1.8 GHz running Windows XP x64. Here are the raw performance timings:

Base-line performance without any kind of exception handling
ABI Time (in sec.)
x86 0.281
x86_64 0.156

Obviously, the x86_64 ABI is already better in this simple test-case, let's so how well it performs when it comes to exception handling.

Throwing exception

Let's start off with the unusual case - we'll throw hundreds of thousands of exceptions in tight loops, rethrow them in recursive functions and catch them at the highest level. After that, we'll only throw at the deepest recursion level and capture at the top-most level.

Worst-case: Exceptions thrown at deepest level, captured and rethrown inside the function, captured in main
ABI Time (in sec.)
x86 47.25
x86_64 208.234

Really horrible, isn't it? From roughly 200 times up to 1000 times slower. Note however that in this example, 12.5 million exceptions have been throw, of which 10 million have been rethrown again and 2.5 million exceptions have been caught somewhere.

Exceptions thrown at deepest level, not handled inside the function, captured in main
ABI Time (in sec.)
x86 12
x86_64 23.469

Much much better already, again, 2.5 million exceptions thrown and captured. Obviously the stack unwinding is fast, but the x86 ABI is able to find the right handler more than twice as fast as the x86_64.

Not throwing exception

Let's take a look what happens when we handle exceptions basically everywhere but if we don't throw them.

No exceptions thrown, captured and rethrown inside the function, captured in main
ABI Time (in sec.)
x86 0.484
x86_64 0.218

This is an interesting result, isn't it? When no exception is throw, the performance is nearly equal to the completely exception-free version.

Best-case: No exceptions thrown, not handled inside the function, captured in main
ABI Time (in sec.)
x86 0.421
x86_64 0.203

This is probably the usual case. High-level code is catching exceptions which might be thrown somewhere in the low level code. Actually the situation exceptions have been primary designed for ;) . And here, the x86_64 ABI really pays off. Seems those folks did a good job, the exception handling version became merely 25% slower, while the x86 slowed down by 50%. Please note that the test function was rather pathetic, a single fmul/fdiv and an add. In real-world code, exception handling won't slow things down that much (except you start adding exception handling to high-performance code inside the tightest loops), I've observed slowdowns in the range of 1-2% on larger functions.

Conclusion

Exception performance is not as bad as many think, and most importantly, it's getting better with the new x86_64 ABI. Finally you pay for the exceptions your throw and not for those you don't throw, and this is how it was meant to be from the beginning.