Skip to main content

Intel Larrabee instruction set is ready-to-use

Intel just released a C++ header which allows developers to use Larrabee instructions with current compilers, by simply writing out the future instrisincs as C code. Some interesting bits:

  • 512 bit vector types (8x64bit double, 16x32bit float, or 16x32bit integer)
  • Lots of 3-operand instructions, like a=a*b+c
  • Most combinations of +,-,*,/ are provided, that is, you can choose between a*b-c, c-a*b and so forth.
  • Some instructions have built-in constants: 1-a*b
  • Many instructions take a predicate mask, which allows to selectively work on just a part of the 16-wide vector primitive types
  • 32bit integer multiplications which return the lower 32bit of the result
  • Lots of trigonometric functions [Update] They don't say which ones map directly to instructions, and provide them only for the sake of completeness.
  • Bit helper functions (scan for bit, etc.)
  • Explicit cache control functions (load a line, evict a line -- that would have been helpful on a project I worked on once)
  • Horizontal reduce functions: Add all elements inside a vector, multiply all, get the minimum, and logical reduction (or, and, etc.).

Especially the reduce functions look interesting, as they are more general than the dot-product instruction available in SSE. Nothing revolutionary though, but all in all it looks like a very nice and useful instruction set, although I was hoping for 8-bit instructions as well (with 8 bit components, and RGBA, you could process 4x4 pixels at once -- that would be a real killer for image processing).

[Update] The instructions are 3 operand only, storing the result in the first operand!

March 2009 DirectX SDK released

Just seen it, the March 2009 DX SDK has been just released. New stuff included:

  • Direct2D, DirectWrite and DXGI 1.1 -- of these, DXGI 1.1 sounds most interesting, gotta take a look at it.
  • XNA Math -- cross-platform math library, with SSE2 optimized code.

Still downloading at the moment ;)

Virtual texture mapping, part 2

Over the last months, I've written a virtual texture mapping implementation as part of my student research work. Some people have already got a copy to read (you know who you are ;) ), rest assured that I'll continue to work on this stuff. I'm going to post about it on this blog, as soon as the work becomes a bit more mature, currently the framework is in early alpha stage, and we are working on a better content creation pipeline. Our artist -- although very talented -- had a hard time to produce demo content, and hence we (that is, a co-developer and me) have to write some tools to help him.

My solution is basically a reimplementation of Sean Barret's "Sparse Virtual Textures" (about which I already blogged about), this time with DX10, though I didn't use anything DX10 specific. However, I measured lots of stuff, and tweaked based on that, and I still have lots and lots of things I have to try and measure. The implementation supports 4:1 anisotropic hardware filtering, and requires roughly 5x more texture space than framebuffer size (for a framebuffer with 400k pixels, you would need a 2M cache). No special shader tricks are needed, the lookup costs < 10 cycles (of which most are fixed overhead costs, so it becomes cheaper with more lookups).

Not all is lost though, you can use the comments to ask specific questions, and I'll try to answer them.

New theme

As you probably have noticed (unless you only use a feed-reader for my page ;) ) -- I have a new theme. If you spot any problems (usability or display), please leave a comment, and tell me which browser/OS combination you use. So far, it works fine with Opera/Windows, Firefox/Windows & Linux, Chrome/Windows (except for small issues with the code), Konqueror/Linux (same problems as with Chrome, and some with the title bar text).

[Update] Fixed Gravatar-display inside the comments, my gravatar should be displayed properly now. Made the check for comments created by the admin more reliable.

Pimpl your C++ code

We'll take a look at the PIMPL (private implementation) pattern today, which is especially useful for larger projects, where compile times become a problem. Pimpl allows to decouple the interface from the implementation, to a point where nearly each class can be fully forward declared only. This reduces the compile times dramatically. Another usage of Pimpl is to hide large or ugly include files (windows.h, anyone?) from the clients.

So how does it work? The idea is to forward define an inner class, and always store a pointer to it. Let us take a look at an example:

#include <boost/tr1/memory.hpp>

class Container
    Container (const size_t size);

    Container (const Container& other);
    Container& operator =(const Container& other);

    int& operator [] (const int index);
    const int& operator [] (const int index) const;

    class Impl;
    std::tr1::shared_ptr<Impl> impl_;

This is our public class interface, and see, we don't expose our container type. In this case, we'll use a standard vector. Our implementation looks like this:

// Implementation
#include <vector>

class Container::Impl
    Impl (const size_t size)
        vec.resize (size);

    std::vector<int> vec;

Container::Container (const size_t size)
: impl_ (new Impl (size))

We need those copy constructors, otherwise, we would
share our state. For most classes, it is best to make them
noncopyable anyway.
Container::Container (const Container& other)
: impl_ (new Impl (other.impl_->vec.size ()))
    impl_->vec = other.impl_->vec;

Container& Container::operator = (const Container& other)
    impl_->vec = other.impl_->vec;

    return *this;

int& Container::operator [] (const int index)
    return impl_->vec [index];

const int& Container::operator [] (const int index) const
    return impl_->vec [index];

That's it! We have to pay by one additional memory allocation (which can be circumvented by semi-portable trickery), but we gain a lot while compiling. A large library which makes excessive use of Pimpl is Qt, but it pays off, as it includes the bare minimum required to get a compile, and nothing more. For the sake of completeness, a small usage example:

#include "container.h"
#include <iostream>

int main ()
    Container c (23);
    c [13] = 37;

    std::cout < c [13] < std::endl;

    Container copy = c;
    copy [13] = 4711;

    std::cout < c [13] < std::endl;