Build systems: Make

As mentioned last week, we’re going to cover Make today, one of the most widely used tools to build software. Make is a good example what I mentioned in the the intro as the core of a build system: A rule based system which transforms data and takes care of dependencies. Because Make is conceptually very simple, I’m going to cover the inner workings in more detail before diving into the sample project.

Short aside: I’ll be covering Linux only. Make itself doesn’t care much about the platform, but for C++, it’s much easier to explain things on Linux where a simple g++ foo.c produces a binary without having to deal with SDK paths and so on.

The prototypical build file for Make is called a Makefile, which contains all the targets and rules for a given project. Here’s an example:

file.o : file.cpp file.h
    g++ -c file.o file.cpp

This describes a very simple rule which produces file.o if file.cpp or file.h changes. It uses the command g++ -c file.o file.cpp to generate file.o. We can test that this works as expected:

$ touch file.cpp file.h
$ cat > Makefile <<EOL
  file.o : file.cpp file.h
      g++ -c file.cpp
  EOL
$ make
g++ -c file.cpp

If we run Make again, it’ll notice there’s nothing to do, because both dependencies are the same.

$ make
make: 'file.o' is up to date.

Let’s touch the header file — notice both files are empty, so the contents don’t change, but as Make is filestamp based, it will consider the dependency out of date:

$ touch file.h
$ make
g++ -c file.cpp

This dependency checking means that Make needs to run stat on every file. We can verify this easily by using strace make 2>&1 | grep stat | grep '"file', which yields:

stat("file.o", {st_mode=S_IFREG|0664, st_size=936, ...}) = 0
stat("file.cpp", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
stat("file.h", {st_mode=S_IFREG|0664, st_size=0, ...}) = 0

Here we can see that Make starts at the target, then it checks the inputs, and if one of them is newer, it’ll go ahead and execute the build rule. That’s really all there is to Make. You get a couple of goodies on top — fancy way to write rules, variables, and so on — but the end result will be always an output, one or more inputs, and rules to transform the inputs into the output.

Sample project build

Now that we know how Make works, how does it help us to build our sample project? The answer is unfortunately not much. Make doesn’t know about C++ libraries, header files and such. The only part of make we can use directly are the built-in rules which allow us to skip writing our own rules to transform .cpp into .o files, and a few pre-configured variables like CFLAGS. You can see those variables at work in the DynamicLibrary project, where it sets the #define as well as the linking through variables:

LDFLAGS += -shared -L../statlib -lStaticLibrary
CPPFLAGS += -I../statlib -DDBUILD_DYNAMIC_LIBRARY=1

The += specifies we’re appending to those variables. That’s important as someone might want to use custom CPPFLAGS — let’s say, -Os — and we don’t want to override those. Everything is left to the user. I’ve set up the project with four Makefiles total. One for each binary, and a global Makefile to glue everything together. The individual project files are very similar and follow the same template. Let’s walk through the static library as an example:

CPPFLAGS += -fPIC

Just as mentioned above, we’re adding some options to a variable. As mentioned in the intro the -fPIC option is the tricky one and must be set for a static library which will be linked into a dynamic library. Given that Make doesn’t know anything about C++ projects, nor that the dynamic library even exists, we have to specify it manually here.

OBJ = StaticLibrarySource.o table.o
TARGET = libStaticLibrary.a
.DEFAULT_GOAL := $(TARGET)

We set up a couple of variables to reduce repetition. OBJ contains all object files, the corresponding .cpp files are automatically picked up through the built-in rule. We’re also setting a system variable .DEFAULT_GOAL to indicate what to build if invoked without any parameters. By default, Make picks the first rule, which is not the one we want here.

table.cpp : tablegen.py
    python3 tablegen.py > table.cpp

This is our code generator. Notice that there’s really nothing to it, a file table.cpp is generated from a dependency tablegen.py by executing python3 tablegen.py > table.cpp. We could use automatic variables here to avoid repeating the dependency and output name if we wanted to. This would change the rule to python3 $@ > $<. The automatic variable $@ references the target of the rule and $< references the first pre-requisite. This is mostly useful when writing implicit rules, and for clarity, I’ve just skipped it here.

Note that I’ve hardcoded the path to Python. Make also doesn’t have a way to find existing libraries or programs. The usual method — short of generating the Makefiles from other tools — is to specify environment variables or have a special settings file which is included into all Makefiles.

$(TARGET) : $(OBJ)
    ar rcs $(TARGET) $(OBJ)

all : $(TARGET)

These are the main rules. $(TARGET) builds the library by invoking ar, which builds a static library from the object files.

clean :
    rm -f table.cpp $(OBJ) $(TARGET)

.PHONY : clean all

Finally, the cleanup rule. We need to repeat everything here, as Make doesn’t know what files it generated, and what side effects may have occurred, so we just spell it out. We also use the special .PHONY dependency to indicate that those targets don’t produce a file.

The other targets look virtually the same, so the only interesting file left is the global driver file which glues together the remaining projects. The way it works is it calls Make recursively for each directory, in the right order. As recursive Make is complicated I’m not even striving to set up everything optimally. Rather, I’m using the simplest setup I could find which shows the basic idea. We set up new targets for every directory, and provide a rule which invokes Make for that target. Similarly, a new target is specified for cleaning, which just invokes Make and passes clean to it.

Any inter-project dependencies need to be expressed at a global scope, and I’ve hard-coded all include directories into the respective projects. With a lot of work, you can make everything variable and pass those around to recursive invocations. At this point, it should become obvious that you probably should generate the files from some other, higher-level build system which can express all those cross-project dependencies. Make is a low-level tool which executes rules, and in good Unix tradition, that’s all it does — and it does it well.

As usual, you can find the complete setup in the repository. Note that it has been only tested on Linux, where everything is set up to work correctly with Make out of the box.

Build systems: Intro

Hi and welcome to a brand-new blog series! Over the coming weeks, I’ll be presenting build systems to you, and today I want to set the stage for this. I’ll be looking at the topic through a C++ lens, as my sample project will be written in C++, but that’s just because I’m a C++ person and that’s what is most interesting for me. That’s just a minor detail though — we’ll cover multiple build systems which support many different languages. Before we look into the project I want to build, let’s take a look first and what we’re going to discuss.

Build systems

A build system solves a rather simple problem at its core: It transforms data using rules and takes dependencies into account. You can think of any build system as a graph traversal engine. Every node in that graph has one or more inputs, produces one or more outputs, and the node itself encodes the rules how to transform the inputs into the desired outputs. That’s it, once everything is laid out in this way, you just need to find out which inputs have changed, and then build the nodes which (transitively) depend on those inputs.

Unfortunately, that’s a really hard problem, as the task of finding all dependencies is a tough one — as the set of dependencies include the compilers, system SDKs, any library which happens to be in your project, the specific settings you pass on to your compiler and any file that’s read by one of your transformation steps (and more!). Spelling out all those dependencies is not realistic, and thus most build systems will provide some shortcuts to reasonably approximate it. That’s where the language support of a build system comes in handy, as this allows you to express fewer of those dependencies manually and just let the build system figure out the details.

The language support part can also extend to dependency management. Strictly speaking, it’s not part of the core business, but a build system has a lot of knowledge about the things it builds, and that makes it convenient to build more than just the shared library file you asked for. For instance, a build system could also deploy all headers you need to include said shared library, the linker options you need to link against it, the version, and so on, up to the point that you can consume it in another project just “as if” it was built locally.

Sample project

With this background, let’s look at the sample project I want to build. It’s a small project consisting of three separate parts:

Let’s look at the tricky parts for each of them. The static library has two interesting requirements to a build system. It needs to find Python before you can actually generate the source file. If the build system provides some way to find a dependency or binary, that’s a big plus because it’ll make this part more reliable. The other requirement is a tricky one — on Linux, a static library which goes into a dynamic library must be compiled with -fPIC, i.e. as position-independent code. This can be only discovered by looking at the consumers of the static library. A C++ aware build system should at least notice this and warn you.

The dynamic library is straightforward, except that it requires a preprocessor definition to build. That definition must be set only for the dynamic library, not for the consumer of it.

The executable itself has no extra requirements. It just includes the dynamic library header and links it. Includes is a good topic though — there’s one external constraint I’m putting on the build systems which is that each part must have its own build file, to show how the build system handles modularity. This also means the build system needs to set up the include directories, as by default, the individual parts just #include the headers of their dependencies. That’s it — a small, but not trivial project, and the “benchmark” I’ll be using going forward.

You can find the whole setup — without build system files — in my build system repository on Bitbucket. With every new blog post, I’ll be adding one new set of build files so you can follow along. As a small teaser, we’re going to start next week with Make!

Introducing kyla 2.0

One year passed and I’ve been using kyla quite a bit for my personal projects. As it turns out, it still looks like I need to write everything three times before it works properly. Even though it’s 2.0, it’s actually the third major update of kyla, as there has been one rewrite before 1.0 happened.

So what happened? The original goals I’ve set for kyla still remained, but while trying to re-introduce encryption, I’ve ran into a couple of design issues which resulted in more and more ugly code. At the same time, the database scheme turned out to not be as forward-looking as I thought it would be, so that had to be updated as well. Finally, kyla 1.0 only shipped with a very basic UI, and for kyla 2.0 I’ve revamped the default UI significantly to make it possible to just ship with it instead of having to write your own.

Introducing 2.0

/images/2017/kyla2_ndk2.png

The new interface installing the ndk.

A lot of things changed for 2.0 — check out the changelog for details. All of this has been driven by my primary use case, which is deploying the ndk. The ndk is the SDK for my home framework, and consists of roughly 15.000 files, totaling 1 GiB of data.

The biggest change is probably the updated database scheme. It should be easy to extend now, and has proper layering. At the core, we still have the features, but the association to file contents is more loose. Eventually, adding shortcuts for instance should be possible, unlike in the previous revision which was very much hard-coded to be content-only.

The second part is the UI - it’s completely redone to allow a feature tree, that is, nested features, and provides a much more polished experience. UI information is provided in a separate layer and is not deployed. This means that once installed, you still have to use the feature ids just as in kyla 1.0, but as long as the source repository is present, the UI will remain available for configuration, updates and downgrades.

The future

For me, kyla 2.0 solves all my problems in a much more robust way than kyla 1.0 could. In particular, the new UI is something I can ship to people. I’ve also implemented various under-the-hood performance improvements so it’s no longer doing really embarrassing things :).

Will there be future updates? Maybe. At the moment, it just works, so there’s no need to fix things, but I’ll look into issues as I run into them, just as I rewrote kyla once I’ve hit some roadblocks with 1.0. Just make sure to tell me in case you run into an issue. As before, you can find the source code both on Bitbucket and Github. Enjoy!

Work is all about predictability

As a programmer, you might get home and continue tinkering on your home projects — guess what, I bet it very much looks like what you do at work. For me at least, I do the same stuff at home as I do at work; I even run my own bug tracker and wiki at home on my own server, manage releases of projects, write documentation and take care of user feedback.

Recently, it dawned upon me that the work bit in what you do at work is only about one thing: Predictability. If you think about it, what’s the main difference between a bug report you send me at work compared to one for my private projects? It’s the fact that at work, I know I will have time to look at it and I can give you an estimate when it’ll be done — and that’s about it. If I get asked when my Python Clang bindings will be updated to the latest Clang shipping in Ubuntu, the answer is “as soon as possible” but you shouldn’t rely on this at all. I try my best to spend a fixed amount of time on my personal projects each week, but it’s really hard to commit hours to it — and that’s the main difference between work and “hobby projects” for me.

Oh and before someone comes with money argument — there’s plenty of private projects that make money, so I’m not taking this as the main difference. Money makes you a professional, but that’s all. I could very well have some ads on this page (and maybe I’ll put some down, eventually) and this blog would make money; would it become work that moment? No, unless I start devoting fixed amounts of time to it, with a predictable schedule and so on.

So next time you think about work, think about “how predictable is it?”. Because that’s what you should be, during the hours you sit around at the office. Doesn’t mean you can’t have creative solutions, but it means people you interact with you can work off the assumption that you’re going to spend a particular amount of cycles on your backlog every day and making forward progress. You can’t rely on me doing things at home in a predictable way, and that’s fine — because, you know, it’s fun, not work :)