Skip to main content

Build systems: MSBuild

Last week we've covered Make -- an ubiquitous build tool in the Linux world. There's a variant on Make for Windows as well (nmake), but the real workhorse on Windows is MSBuild. MSBuild is the default build system used by Visual Studio for .NET and C++. Originally a part of the .NET framework, it has grown over time to cover more use cases and is now maintained as an independent project.


MSBuild is actually multiple projects in one. At its core, it's actually Make, but there are many layers and functions which have been added to it to cater for the many different use cases. Before we look at all the extra functionality, I want to show you the core concept of MSBuild, which is the same as for Make or any other build system really: Transforming inputs into outputs.

In MSBuild, the build files are written in XML. A target node defines the inputs and outputs, and contains multiple task nodes which do the transformation. Here's a simple example:

<Target Name="MyTarget" Inputs="file.cpp;file.h" Outputs="library.lib">
    <CL Sources="file.cpp"/>
    <LIB Sources="file.obj" OutputFile="library.lib">

The CL and LIB elements are tasks. You can think of them as built-in function calls. If you squint a bit, doesn't this resemble Make a lot? Just as reminder, this is how it would look like in Make:

library.lib : file.cpp file.h
    cl.exe /c file.cpp
    lib.exe /OUT:library.lib file.obj

This is the core foundation of both Make and MSBuild. However, where Make only adds some convenience functionality on top, and relies on other tools to generate the build files, MSBuild integrates both this low-level core and many high-level concepts into one framework.


Besides variables like Make, MSBuild adds logic on variables through conditions. Conditions can be applied nearly everywhere and allow you to execute targets and tasks only if some condition holds. All tasks and targets can get a Condition attribute. On top of that, more complex constructs can be created including When, Choose and Otherwise elements. This enables complex switch statements and other logic to be evaluated as part of the build process. Typically, this logic will end up populating properties and item lists which are then consumed elsewhere.

MSBuild doesn't stop on logic embedded in conditional expressions and the XML though. MSBuild is built on .NET and that provides another source of interesting functionality -- the ability to call .NET methods. It's not quite C#, but it allows you do to things like @(theItem->IndexOf('r')). Or you can initialize a property using $([System.DateTime]::Now). The way it's integrated doesn't allow for large-scale scripting, but to make it easy to call a function here and there.


Thanks to .NET and easy class loading, MSBuild also provides a very wide range of pre-made tasks. Instead of specifying the command line of a tool directly, there's a task for everything from file copying to C# manifest resource name creation. The tasks are also a good example where MSBuild has grown on a per-client basis, instead of being designed into shape: There's very specific functionality like a RequiresFramework35SP1AssemblyTask which can do things like writing a desktop shortcut -- functionality which I would have expected to be provided outside of the core MSBuild distribution.

On the other hand, there's a large chunk of functionality which I care about that is actually external -- the ability to build C++ projects. C++ support into MSBuild was gradually added and unfortunately, it never ended up being a first-class citizen. Even today, the solution file is not a MSBuild file, and there's special magic to handle it (if you want to see what's happening, you need to set the environment variable MSBuildEmitSolution=1 and then invoke MSBuild on the solution.)

C++ integration

C++ integration happened through the addition of a slew of new tasks like the CL Task which is a wrapper around cl.exe. It literally wraps every single parameter you can pass to CL into an attribute. Unlike the C# integration however, the output handling was omitted from the C++ tasks. The Csc task (which calls the C# compiler) provides TaskParameter values which allow you to specify the output easily (you can write the output file name into a property for example). Unfortunately, that's not the case for the CL task, which requires you to specify the outputs manually.

There's a lot more to MSBuild than I covered here, but the things mentioned so far are enough to get us started with building our sample project. Without further ado, let's get ready for some building!

Sample project build

Let's walk through the sample project build file. Unlike all other tools, I didn't manage to split up the project into multiple self-contained files -- everything is in the top-level build.xml.

We're starting with some setup code:

<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
<Import Project="$(VCTargetsPath)\Microsoft.CppCommon.targets" />

This tells MSBuild to include C++ tasks, otherwise, there's no CL task. I also define three item groups so I can reference those files more easily:

    <StaticLibraryFiles Include="statlib/StaticLibrarySource.cpp;statlib/"/>
    <DynamicLibraryFiles Include="dynlib/DynamicLibrarySource.cpp;dynlib/DynamicLibraryHeader.h"/>
    <ExecutableFiles Include="executable/ExecutableSource.cpp"/>

Now, the individual targets come. The most interesting is the static library, which calls Python.

<Target Name="StatLib" Inputs="@(StaticLibraryFiles)" Outputs="statlib.lib">
    <Exec Command="python statlib/ > table.cpp" Outputs="table.cpp">
        <Output TaskParameter="Outputs" ItemName="GeneratedFiles"/>
    <CL Sources="statlib/StaticLibrarySource.cpp"/>
    <CL Sources="@(GeneratedFiles)"/>
    <LIB Sources="StaticLibrarySource.obj;table.obj" OutputFile="statlib.lib"/>

The static library is a target, it depends on the static library files -- I'm referencing the item group here -- and then it executes the tasks provided inside in-order. There's no searching of modules in MSBuild, so I again just hard-coded the call to Python here. I'm also wiring the output of the Exec command into an ItemName which I can then reference below in a CL task -- that's the @(GeneratedFiles). Finally, I'm calling the linker, specifying all file names manually.

If there's a dependency between targets, it has to be specified manually inside the Target element. This way, MSBuild can build the whole dependency graph and then execute one target after the other. MSBuild cannot see into individual targets when scheduling, and inside each target everything is executed serially, which is the reason why it can't just compile all C++ files first, then later link the libraries as they become ready. For in-target parallelism, you have to rely on the task handling this internally, for instance through the /MP option which compiles multiple C++ files in parallel.

In my opinion, MSBuild is a curious piece of software. It's firmly rooted in the well-designed and highly-consistent world of .NET, which is visible in the "original" parts of the application. At the same time, it was clearly used by many different clients and grew in basically all directions without the guidance of an architect. In the end, we get a large bag of functions and tools to solve the software build problem -- but it could be that this is a reflection of the mess that is needed to build any piece of large software.

Build systems: Make

As mentioned last week, we're going to cover Make today, one of the most widely used tools to build software. Make is a good example what I mentioned in the the intro as the core of a build system: A rule based system which transforms data and takes care of dependencies. Because Make is conceptually very simple, I'm going to cover the inner workings in more detail before diving into the sample project.

Short aside: I'll be covering Linux only. Make itself doesn't care much about the platform, but for C++, it's much easier to explain things on Linux where a simple g++ foo.c produces a binary without having to deal with SDK paths and so on.

The prototypical build file for Make is called a Makefile, which contains all the targets and rules for a given project. Here's an example:

file.o : file.cpp file.h
    g++ -c file.o file.cpp

This describes a very simple rule which produces file.o if file.cpp or file.h changes. It uses the command g++ -c file.o file.cpp to generate file.o. We can test that this works as expected:

$ touch file.cpp file.h
$ cat > Makefile <<EOL
  file.o : file.cpp file.h
      g++ -c file.cpp
$ make
g++ -c file.cpp

If we run Make again, it'll notice there's nothing to do, because both dependencies are the same.

$ make
make: 'file.o' is up to date.

Let's touch the header file -- notice both files are empty, so the contents don't change, but as Make is filestamp based, it will consider the dependency out of date:

$ touch file.h
$ make
g++ -c file.cpp

This dependency checking means that Make needs to run stat on every file. We can verify this easily by using strace make 2>&1 | grep stat | grep '"file', which yields:

stat("file.o", {st_mode=S_IFREG|0664, st_size=936, ...}) = 0
stat("file.cpp", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
stat("file.h", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0

Here we can see that Make starts at the target, then it checks the inputs, and if one of them is newer, it'll go ahead and execute the build rule. That's really all there is to Make. You get a couple of goodies on top -- fancy way to write rules, variables, and so on -- but the end result will be always an output, one or more inputs, and rules to transform the inputs into the output.

Sample project build

Now that we know how Make works, how does it help us to build our sample project? The answer is unfortunately not much. Make doesn't know about C++ libraries, header files and such. The only part of make we can use directly are the built-in rules which allow us to skip writing our own rules to transform .cpp into .o files, and a few pre-configured variables like CFLAGS. You can see those variables at work in the DynamicLibrary project, where it sets the #define as well as the linking through variables:

LDFLAGS += -shared -L../statlib -lStaticLibrary

The += specifies we're appending to those variables. That's important as someone might want to use custom CPPFLAGS -- let's say, -Os -- and we don't want to override those. Everything is left to the user. I've set up the project with four Makefiles total. One for each binary, and a global Makefile to glue everything together. The individual project files are very similar and follow the same template. Let's walk through the static library as an example:


Just as mentioned above, we're adding some options to a variable. As mentioned in the intro the -fPIC option is the tricky one and must be set for a static library which will be linked into a dynamic library. Given that Make doesn't know anything about C++ projects, nor that the dynamic library even exists, we have to specify it manually here.

OBJ = StaticLibrarySource.o table.o
TARGET = libStaticLibrary.a

We set up a couple of variables to reduce repetition. OBJ contains all object files, the corresponding .cpp files are automatically picked up through the built-in rule. We're also setting a system variable .DEFAULT_GOAL to indicate what to build if invoked without any parameters. By default, Make picks the first rule, which is not the one we want here.

table.cpp :
    python3 > table.cpp

This is our code generator. Notice that there's really nothing to it, a file table.cpp is generated from a dependency by executing python3 > table.cpp. We could use automatic variables here to avoid repeating the dependency and output name if we wanted to. This would change the rule to python3 $@ > $<. The automatic variable $@ references the target of the rule and $< references the first pre-requisite. This is mostly useful when writing implicit rules, and for clarity, I've just skipped it here.

Note that I've hardcoded the path to Python. Make also doesn't have a way to find existing libraries or programs. The usual method -- short of generating the Makefiles from other tools -- is to specify environment variables or have a special settings file which is included into all Makefiles.

$(TARGET) : $(OBJ)
    ar rcs $(TARGET) $(OBJ)

all : $(TARGET)

These are the main rules. $(TARGET) builds the library by invoking ar, which builds a static library from the object files.

clean :
    rm -f table.cpp $(OBJ) $(TARGET)

.PHONY : clean all

Finally, the cleanup rule. We need to repeat everything here, as Make doesn't know what files it generated, and what side effects may have occurred, so we just spell it out. We also use the special .PHONY dependency to indicate that those targets don't produce a file.

The other targets look virtually the same, so the only interesting file left is the global driver file which glues together the remaining projects. The way it works is it calls Make recursively for each directory, in the right order. As recursive Make is complicated I'm not even striving to set up everything optimally. Rather, I'm using the simplest setup I could find which shows the basic idea. We set up new targets for every directory, and provide a rule which invokes Make for that target. Similarly, a new target is specified for cleaning, which just invokes Make and passes clean to it.

Any inter-project dependencies need to be expressed at a global scope, and I've hard-coded all include directories into the respective projects. With a lot of work, you can make everything variable and pass those around to recursive invocations. At this point, it should become obvious that you probably should generate the files from some other, higher-level build system which can express all those cross-project dependencies. Make is a low-level tool which executes rules, and in good Unix tradition, that's all it does -- and it does it well.

As usual, you can find the complete setup in the repository. Note that it has been only tested on Linux, where everything is set up to work correctly with Make out of the box.

Build systems: Intro

Hi and welcome to a brand-new blog series! Over the coming weeks, I'll be presenting build systems to you, and today I want to set the stage for this. I'll be looking at the topic through a C++ lens, as my sample project will be written in C++, but that's just because I'm a C++ person and that's what is most interesting for me. That's just a minor detail though -- we'll cover multiple build systems which support many different languages. Before we look into the project I want to build, let's take a look first and what we're going to discuss.

Build systems

A build system solves a rather simple problem at its core: It transforms data using rules and takes dependencies into account. You can think of any build system as a graph traversal engine. Every node in that graph has one or more inputs, produces one or more outputs, and the node itself encodes the rules how to transform the inputs into the desired outputs. That's it, once everything is laid out in this way, you just need to find out which inputs have changed, and then build the nodes which (transitively) depend on those inputs.

Unfortunately, that's a really hard problem, as the task of finding all dependencies is a tough one -- as the set of dependencies include the compilers, system SDKs, any library which happens to be in your project, the specific settings you pass on to your compiler and any file that's read by one of your transformation steps (and more!). Spelling out all those dependencies is not realistic, and thus most build systems will provide some shortcuts to reasonably approximate it. That's where the language support of a build system comes in handy, as this allows you to express fewer of those dependencies manually and just let the build system figure out the details.

The language support part can also extend to dependency management. Strictly speaking, it's not part of the core business, but a build system has a lot of knowledge about the things it builds, and that makes it convenient to build more than just the shared library file you asked for. For instance, a build system could also deploy all headers you need to include said shared library, the linker options you need to link against it, the version, and so on, up to the point that you can consume it in another project just "as if" it was built locally.

Sample project

With this background, let's look at the sample project I want to build. It's a small project consisting of three separate parts:

Let's look at the tricky parts for each of them. The static library has two interesting requirements to a build system. It needs to find Python before you can actually generate the source file. If the build system provides some way to find a dependency or binary, that's a big plus because it'll make this part more reliable. The other requirement is a tricky one -- on Linux, a static library which goes into a dynamic library must be compiled with -fPIC, i.e. as position-independent code. This can be only discovered by looking at the consumers of the static library. A C++ aware build system should at least notice this and warn you.

The dynamic library is straightforward, except that it requires a preprocessor definition to build. That definition must be set only for the dynamic library, not for the consumer of it.

The executable itself has no extra requirements. It just includes the dynamic library header and links it. Includes is a good topic though -- there's one external constraint I'm putting on the build systems which is that each part must have its own build file, to show how the build system handles modularity. This also means the build system needs to set up the include directories, as by default, the individual parts just #include the headers of their dependencies. That's it -- a small, but not trivial project, and the "benchmark" I'll be using going forward.

You can find the whole setup -- without build system files -- in my build system repository on Bitbucket. With every new blog post, I'll be adding one new set of build files so you can follow along. As a small teaser, we're going to start next week with Make!

Introducing kyla 2.0

One year passed and I've been using kyla quite a bit for my personal projects. As it turns out, it still looks like I need to write everything three times before it works properly. Even though it's 2.0, it's actually the third major update of kyla, as there has been one rewrite before 1.0 happened.

So what happened? The original goals I've set for kyla still remained, but while trying to re-introduce encryption, I've ran into a couple of design issues which resulted in more and more ugly code. At the same time, the database scheme turned out to not be as forward-looking as I thought it would be, so that had to be updated as well. Finally, kyla 1.0 only shipped with a very basic UI, and for kyla 2.0 I've revamped the default UI significantly to make it possible to just ship with it instead of having to write your own.

Introducing 2.0


The new interface installing the ndk.

A lot of things changed for 2.0 -- check out the changelog for details. All of this has been driven by my primary use case, which is deploying the ndk. The ndk is the SDK for my home framework, and consists of roughly 15.000 files, totaling 1 GiB of data.

The biggest change is probably the updated database scheme. It should be easy to extend now, and has proper layering. At the core, we still have the features, but the association to file contents is more loose. Eventually, adding shortcuts for instance should be possible, unlike in the previous revision which was very much hard-coded to be content-only.

The second part is the UI - it's completely redone to allow a feature tree, that is, nested features, and provides a much more polished experience. UI information is provided in a separate layer and is not deployed. This means that once installed, you still have to use the feature ids just as in kyla 1.0, but as long as the source repository is present, the UI will remain available for configuration, updates and downgrades.

The future

For me, kyla 2.0 solves all my problems in a much more robust way than kyla 1.0 could. In particular, the new UI is something I can ship to people. I've also implemented various under-the-hood performance improvements so it's no longer doing really embarrassing things :).

Will there be future updates? Maybe. At the moment, it just works, so there's no need to fix things, but I'll look into issues as I run into them, just as I rewrote kyla once I've hit some roadblocks with 1.0. Just make sure to tell me in case you run into an issue. As before, you can find the source code both on Bitbucket and Github. Enjoy!

Work is all about predictability

As a programmer, you might get home and continue tinkering on your home projects -- guess what, I bet it very much looks like what you do at work. For me at least, I do the same stuff at home as I do at work; I even run my own bug tracker and wiki at home on my own server, manage releases of projects, write documentation and take care of user feedback.

Recently, it dawned upon me that the work bit in what you do at work is only about one thing: Predictability. If you think about it, what's the main difference between a bug report you send me at work compared to one for my private projects? It's the fact that at work, I know I will have time to look at it and I can give you an estimate when it'll be done -- and that's about it. If I get asked when my Python Clang bindings will be updated to the latest Clang shipping in Ubuntu, the answer is "as soon as possible" but you shouldn't rely on this at all. I try my best to spend a fixed amount of time on my personal projects each week, but it's really hard to commit hours to it -- and that's the main difference between work and "hobby projects" for me.

Oh and before someone comes with money argument -- there's plenty of private projects that make money, so I'm not taking this as the main difference. Money makes you a professional, but that's all. I could very well have some ads on this page (and maybe I'll put some down, eventually) and this blog would make money; would it become work that moment? No, unless I start devoting fixed amounts of time to it, with a predictable schedule and so on.

So next time you think about work, think about "how predictable is it?". Because that's what you should be, during the hours you sit around at the office. Doesn't mean you can't have creative solutions, but it means people you interact with you can work off the assumption that you're going to spend a particular amount of cycles on your backlog every day and making forward progress. You can't rely on me doing things at home in a predictable way, and that's fine -- because, you know, it's fun, not work :)