Build systems: Conclusion

We’ve made it — the last post in this series! It’s been quite a ride, over the last weeks, we’ve investigated nine different build systems. Let’s summarize what we’ve seen. I’ve broken down the features into a couple of grand categories:

Overview

  • Portable: Is the build definition portable across different compilers? This requires some kind of abstraction level from the underlying compiler, for instance, include directories being passed through variables, etc.
  • Modular: Does the build system scale to larger builds? Can you write parts of the build independently of each other and then combine them into a bigger solution?
  • Ubiquitous: Can you expect this build system to be available?
  • Type: Does the build tool actually build on its own (in that case, you’ll see “Build” specified), or does it rely on other tools (“Generator”)?
Tool Portable Modular Ubiquitous Type
Make No No Yes Build
MSBuild No No Yes Build
SCons Yes No No Build
Waf Yes No No Build
FASTbuild No No No Build
Bazel Yes Yes No Build
Buck Yes Yes No Build
Premake Yes No No Generator
CMake Yes Yes Yes Generator

Survey results

Last week I’ve also started survey to get a better idea what people are using out there “in the wild”. From the responses, the most popular build systems by far are MSBuild (including Visual Studio), followed by CMake and make. Given over 95% indicated they use Windows, this should be no surprise, as MSBuild is the default system for the extremely popular Visual Studio IDE. Still, CMake is showing up as a very strong competitor, second to MSBuild in both availability and usage.

I’ve asked which language is used for the “build system glue” as well, and that answer is interesting as well. 50% use whatever their build system uses, Python, or shell scripts. Combining all shell script languages makes shell the most popular solution for the extra tasks the build systems don’t cover. The interesting bit here is although Python was really popular in this category, Python based build systems don’t seem to be interesting for developers.

Wrapping up

We’ve looked at various build systems over the last couple of weeks, and if there’s one thing we’ve learned, then this: There’s no “one size fits all” solution, and for C++, there might as well never be until C++ gets a standard ABI which will make library reuse a reality. CMake has tackled this problem head-on with the “find module” concept which is in my opinion one of the main reasons for its popularity. It’s going to be interesting to see if other projects will just embrace CMake’s approach or just migrate to CMake. Microsoft has heavily invested in CMake providing a C++ package manager dubbed vcpkg which is completely based on CMake. At the same time, large projects like the Unreal Engine 4 use multiple fully customized build tools like the Unreal Build System. I’m personally very curious to see how the ecosystem will evolve going forward, and what features and concepts future build systems will bring to the table.

To this end, I hope that you got a good idea of what we have today in terms of build tools and their concepts, so you’ll be ready for the future. That’s it for this series, thanks a lot for reading, and I’d also like to thank my tireless reviewers for their help, in particular, Baldur and Jasper. Thanks guys!

Build systems: CMake

Welcome to the last build system we’re going to look at in this cycle! As hinted last week, there’s one problem left: How can we find binaries and other dependencies already present in the host environment? This is a tricky question, as we have to solve two problems at the same time: First, we need to be able to find things in a cross-platform manner, which means we need things like “invoke a binary” to query version numbers for instance. Second, we need a rather precise search to ensure that our build system has a good view of the dependency we’re about to integrate. If it’s just a binary, this is usually not a problem, but a C++ library might require special compile definitions, a link library, an include path and more to be set on the target project.

Fortunately, I’ve got just the build system for you to demonstrate this — CMake. CMake is in many ways the union of various build systems we’ve seen so far. It’s a build system generator just like Premake. It comes with its own scripting language, just like Bazel. It’s called “as-if” it’s a library just like Scons. And of top of all of this, it provides two new concepts: Find modules and imported targets.

Find modules

One of the two big features of CMake are find modules, which allow you to find an existing library or binary with a simple command. Instead of trying to hard-code paths to the Python interpreter, we can just call find_package(PythonInterp 3.5 REQUIRED) and CMake will handle everything. After the command has executed, CMake will populate a few variables so we can consume Python directly. In the case of the module above, PYTHON_EXECUTABLE will point to the Python binary, and we can immediately invoke it in our generator step.

The find modules don’t end with binaries though. You can also search for libraries, and even search for library components. This is super helpful when dealing with large libraries like Boost or Qt, which are very large and require searching up many paths. The search method is just the same, but there are two ways in which the results can be returned. The “old” way is to populate a couple of variables, which are then manually added to your project. The “new” way is to provide an imported target, which handles all of the required settings in a transparent way.

Imported targets

An imported target in CMake is nothing else than a “normal” target, but assembled manually, instead of being generated as part of the current CMake build. Let me elaborate a bit on this. In CMake, if you add a new target like using for example add_library, it produces a named object you can reference elsewhere — just like we saw it in Waf. CMake allows to specify settings on a target which will affect the consumers. For example, you can specify an include directory which is automatically added to everyone linking against this target.

An imported target is now a target which is imported from elsewhere, but internally behaves “as if” it was a normal CMake target. That is, it gets all the goodness of injecting include directories, compile settings and more we mentioned above. There are two ways to get an imported target. The simple one is to let CMake handle this: If your project is set up correctly for installation, CMake can generate a project description which can be imported directly. This is somewhat similar to how Bazel handles large builds, except that for CMake, you bundle up the project output with the build definition.

The other way to create an imported target is to assemble it by hand, which is what most of the modern find modules do. What happens is that an imported target is created out of thin air, and the various components are then specified manually. If you’re curious about the details, the add_library documentation and the sample FindModule got you covered!

Sample project

Let’s get started with our sample project then, and the way we generate our lookup table:

find_package(PythonInterp 3.5 REQUIRED)

add_custom_command(
    OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/lookup_table.cpp
    COMMAND ${PYTHON_EXECUTABLE} ${CMAKE_CURRENT_SOURCE_DIR}/tablegen.py > ${CMAKE_CURRENT_BINARY_DIR}/lookup_table.cpp
    DEPENDS tablegen.py)

In the first line, we let CMake find the Python interpreter. The next command registers a custom command which will be executed to produce the lookup_table.cpp file. Note that the file needs to be consumed somewhere for this to make sense, otherwise it’s just a leaf in the build graph with no dependency on it. For a custom command, we can specify the dependencies manually, as CMake doesn’t try to run them in a sandbox and log I/O access or anything like that.

Next, we’ll use that generated file in our static library definition:

add_library(statlib STATIC
    StaticLibrarySource.cpp ${CMAKE_CURRENT_BINARY_DIR}/lookup_table.cpp)

We still need to set up the include directories. For this project, we only have one include directory — the current directory — but targets consuming the static library need to use that include directory as well. In CMake, that’s called a PUBLIC include directory:

target_include_directories(statlib
    PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})

We’re nearly done, there’s only one bit left, which is specifying the -fPIC flag. Unfortunately, CMake doesn’t handle this automatically, nor does it warn when we try to consume a static library without it enabled. On the upside, CMake provides a compiler agnostic way of specifying it:

set_property (TARGET statlib PROPERTY POSITION_INDEPENDENT_CODE TRUE)

With that, our project is complete. In the dynamic library, we have a case where we need to set a compile definition but only for the library. This is a PRIVATE setting — just as we had PUBLIC above:

target_compile_definitions(dynlib
    PRIVATE BUILD_DYNAMIC_LIBRARY=1)

This way, we can have very fine-grained visibility of the settings, though we can’t restrict project visibility as we could in Bazel. As usual, you can find the whole project in the sample repository.

Before we wrap up, some small remarks regarding the various features. The module finding is something which is aimed at Linux environments, which provide all libraries in standard locations. On Windows, the find experience for libraries is such that you typically have to specify the path to the library. It’s not a big problem in practice, as you can specify the various paths on the first invocation of CMake.

Another thing to keep in mind that unlike Premake, which generates portable build files using relative paths to the source, CMake bakes in the full directories. Your build description is yet another build artifact and should be treated as such, so it’s not a tool you can use to generate Visual Studio project files. The developer must have CMake installed, and there’s no easy way to package up CMake into a single executable for instance like there is with Premake.

With this, we’re nearly at the end of the series! For the last blog post, I’d like to gather some information about what build systems you use, and to this end, I’ve set up a survey. It should only take a minute or so to fill it out. Thanks for your help and see you again next week for the grand finale!

Build systems: Premake

Welcome back to another post in the build system series! So far, we’ve looked at many different build systems, but all of them had one thing in common: They did build the project all by themselves, no other tool was required. I’ve hinted a bit at generating build files in the Make entry, and today, we’re going to look at a build tool which is actually a “meta” build tool — all it does is generate build files for other build tools. I’ve picked Premake to showcase this new concept, as it’s literally the name of the game here ☺

Build generators?

It might seem a bit weird at first to build a build tool which then requires another build tool to be invoked, but if we look a bit closer, there are good reasons for this separation. As I mentioned in the intro, the goal of a build system is to transform some data using some commands. We saw how Make and MSBuild focus on implementing that goal and not much else. By separating the build execution from the build script generation, we can have two focused tools, instead of mixing high- and low-level concepts together. The other build tools we looked at all have some task and dependency engine somewhere, but it might not be directly accessible, making it hard to understand what exactly is being executed in the end.

There are various advantage of going “meta”, one being portability between systems. On Linux, Make is super popular, on Windows, MSBuild is the default, and if you want to target both, it makes sense to generate the files from a single source. Especially once it comes to supporting multiple versions of the same tool — Visual Studio 2013, 2015, 2017, for example — being able to just generate the files reduces the maintenance burden a lot.

Another major advantage of splitting the generation from the execution is performance. We already saw build systems splitting the initial configuration into a separate step from the actual build when we looked at Waf. In general, the build description will change only rarely, and by making the build description minimal, we can improve the daily build experience. In fact, this very idea is what lead to the development of Ninja, a super low-level tool similar to Make, which is designed to execute machine generated build scripts.

Sample project

Let’s see how Premake differs from the other tools we’ve used so far. On the build file side, Premake uses a scripting language to allow for easy high-level customization. Here, it’s Lua, and the build files are actually Lua scripts which are interpreted at generation time. After generation, Premake writes a build file for the tool you’ve selected, which then needs to build once more. One feature of Premake is that it writes portable build files, so you can use it as a generator for build files which are then shipped, removing the need to invoke Premake on the developer’s machine. Next week, we’ll see a different approach to build files where they are treated as an intermediate output only.

The actual build description looks very similar to what we’ve seen before. As usual, we’ll start with the static library here:

project "statlib"
    kind "StaticLib"
    language "C++"

We tell Premake we’re building a static library in C++, nothing special so far. Next up, we define the source files:

files {"StaticLibrarySource.cpp", "%{cfg.buildtarget.directory}/table.cpp"}

He we have our generated file, specified just “as if” it already exists. We don’t specify the actual generation script for it, instead, we use a “pre-build” command which will be executed before the build runs to generate this file. This way, we side-step the issue of specifying build-time dependencies inside Premake. Premake will write a build description which assumes the file to exist, and ask the underlying build system to produce it in a pre-build step, but Premake is not going to provide the actual dependency to the build system. This means that it will get rebuilt no matter what. This can be also seen from how we specify this:

p = path.getabsolute (os.getcwd ())
prebuildcommands { "\"C:\\Program Files\\Python36\\python.exe\" " .. p .. "/tablegen.py > %{cfg.buildtarget.directory}/table.cpp" }

We invoke Lua to get the current working directory, so we can run the Python executable with the correct path. We don’t tell Premake about what is required to build the table, and what outputs it produces, we just pass on the raw command line and ask Premake to execute it.

That’s it for the static library — unsurprisingly, Premake requires just as much information as any other tool we looked at to generate correct build files. That’s it for Premake, the reminder of the build description will look very familiar to you if you’ve been following this series so far. As always, head to the sample repository for the build definition.

With this, what is left in terms of build systems? There’s still one problem we had in all the systems so far, which is finding binaries and dependencies, that none of the build systems have tackled so far. Next week, we’re going to investigate this issue in a lot of detail — stay tuned!

Build systems: Buck

Last week we looked at Bazel, and today we’re going to look at one of the build systems in the Bazel family: Buck. Buck is Facebook’s variant of Bazel, so most of what was written last week still applies. The interesting bit we’re going highlight today is how Buck handles include directories. Before we start though, let’s understand first why this is something worth writing about!

The include mess

What’s wrong with #include files? Well, what isn’t — especially when it comes to building bigger projects. The main issue is collision of include file names. Assume you have two libraries A and B both with an utility.h, and which you want to consume from a single project. As we’ve seen before in Bazel and Waf, some build systems allow to transitively forward include directories, which is great as nothing is ambiguous. But in the setup just described, you’ll have a collision between the two utility.h files, and if you don’t have full control over the layout of your project, you can’t actually resolve it! The only way to disambiguate is to find a parent directory from which the utility.h files have a unique path.

Interestingly, this is something which C++ modules are supposed to fix eventually, but in the meantime, the solution space is rather sparse. In my own framework, I decided to use what I call canonical include files. Qt, another huge C++ library, uses a similar approach. In both cases, an indirection layer is used to solve the problem, but the indirection layer is created manually.

Buck & includes

Buck provides tooling to solve the problem once and for all. Just like in Bazel, we define the public #include files, but what Buck does now it automatically copies them into a new directory, and then forces this as the include directory — with the project name attached to it. What we included previously using #include "StaticLibraryHeader.h" now becomes #include "statlib/StaticLibraryHeader.h". When building a project, you’ll notice that in the buck-out/gen/statlib folder, there’s a new folder named statlib#default,private-headers and this contains all headers we marked as export from the static library.

The only difference compared to Bazel — where we specified an includes directive which contained the current directory — is that we need to specify them manually:

cxx_library(
    name = "statlib",
    srcs = ["StaticLibrarySource.cpp", ":gentable"],
    headers = ["StaticLibraryHeader.h"],
    visibility = ["//dynlib:dynlib"],
    exported_headers = ["StaticLibraryHeader.h"],
    link_style = "static_pic"
)

As long as your library names don’t collide, this seems like a very nice solution to this age-old C++ problem. Going forward, it should allow a clean transition to modules, which will cause other build tools quite some headache as they introduce yet another build target and dependency per C++ library.

Otherwise, Buck behaves very similar to Bazel, with some changes to the command names, but nothing earth-shattering. See for yourself in the sample repository! Thanks for reading and see you again next week!

Build systems: Bazel

As hinted last week, we’re going for some really high-level build system today: Bazel. Bazel is Google’s internal build tool, designed for scale, 100% robust builds, and also fast execution. The motto is “{Fast, Correct} - Choose two”, and today we’re going to find out how Bazel achieves this goal. Besides the features, we’re also covering Bazel as the new concepts it introduced spawned a whole family of build systems like Please, Pants, as well as Buck.

Overview

Unlike the systems we saw so far, Bazel wants to guarantee that every build is repeatable. It thus requires very explicit dependencies, and even runs the build in a sandbox to ensure that only the specified files get accessed. This part is hidden from the user and Bazel manages to not require additional work over the other tools we saw so far. Reliable builds are an important puzzle piece for building project at scale. The other major component is composing builds from individual parts, which we’re going to look at next.

One major new concept Bazel introduces to help building at scale is project visibility. Each folder with a BUILD file defines a package — or a project — which can define how it can be consumed. For instance, you can have some shared library which should be visible to anyone, typically a common library. At the same time, so other library might be only really useful inside your small corner of the build tree, and Bazel allows you to limit the visibility to this part from the project itself. From the outside, it’s impossible to link against this project, even though it’s part of the build. This fine-grained visibility makes it easy to compose large builds, because everyone can have his own private utility library and there will be no conflict, nor ambiguity which one to use.

The visibility is expressed through a path syntax: //path/to/project:rule. It’s possible to compose multiple projects into a workspace which introduces another top-level hierarchy changing the path to @project//path/to/project:rule, but that’s already quite advanced use of Bazel — check the documentation if this kind of scaling is interesting for you.

Along with the global view of all projects, Bazel also comes with a very powerful query language to inspect the build. Let’s say for instance I want to know all files our static library creates. This is a query which was impossible to specify for all the build systems we looked at, but in Bazel, it’s really simple:

$ bazel query 'kind("generated file", //statlib:*)'
//statlib:table.cpp

And voilà, we found out that our statlib project has one rule producing a generated file called table.cpp. The power of the query language doesn’t stop here thought, we can also get the full dependency graph easily:

/images/2017/bazel-build-dep-graph.svg

The full dependency graph for our sample project.

Bazel comes with it’s own programming language, called Skylark. It looks like Python and you might be tempted to say that it’s the same as in Waf and SCons, but in fact, Skylark is a separate language which just shares the Python syntax for legacy reasons. Many moons ago before Google open sourced Bazel, they used the predecessor called Blaze, and that one had a Python interpreter which ran on the build files as a pre-process. Due to this heritage, Python remained part of Bazel, but there’s no Python interpreter any more in modern Bazel.

Sample project

Time to look at our sample project with Bazel. This time, we have three separate and independent BUILD files. As usual, we’re starting with our static library, and the rule to generate the lookup table:

genrule(
    name = "gentable",
    srcs = ["tablegen.py"],
    outs = ["table.cpp"],
    cmd = "python3 $< > $@"
)

genrule is the most basic level at which you can specify rules for Bazel. Here, we have to specify the inputs and outputs, the command which will be executed, and Bazel will produce a node in its execution graph which consumes tablegen.py and produces table.cpp.

We can use it right away in the next rule in the project:

cc_library(
    name = "statlib",
    srcs = ["StaticLibrarySource.cpp", "table.cpp"],
    hdrs = ["StaticLibraryHeader.h"],
    visibility = ["//dynlib:__pkg__"],
    includes = ["."],
    linkstatic = 1
)

Here things get interesting. Just like we saw with Waf, we can specify the include directory for our clients, which in our case is the current directory. Without this, a library linking to statlib will not see the includes at all. We also specify linkstatic, but lo and behold, Bazel will take care of the -fPIC story for us! We also make the static library visible to our dynamic library through the visibility statement.

The dynamic library project is just what we’d expect:

cc_library(
    name = "dynlib",
    srcs = ["DynamicLibrarySource.cpp"],
    hdrs = ["DynamicLibraryHeader.h"],
    deps = ["//statlib"],
    includes = ["."],
    visibility = ["//visibility:public"],
    copts = ["-DBUILD_DYNAMIC_LIBRARY=1"]
)

We link against the static library using the deps statement, and we make it visible to everyone. As usual, you can inspect the full build files in the source repository.

Before we close, there’s one more interesting bit about Bazel — the ability to run your program directly. This is not as trivial as it sounds, as the dynamic library needs to be in the search path for the executable. Bazel takes care of this using the bazel run command:

$ bazel run executable
INFO: Found 1 target...
Target //executable:executable up-to-date:
bazel-bin/executable/executable
INFO: Elapsed time: 0.141s, Critical Path: 0.00s

INFO: Running command line: bazel-bin/executable/executable
94

That’s quite impressive if you ask me! Bazel is a really interesting project as it tackles the issues of scaling your build system and making it robust at the same time, without sacrificing much on the readability or maintainability side. The query language is very interesting to extract lots of information about your build and allows to express very precise queries, instead of just printing the whole dependency graph and letting you figure out the details. You might think that that’s in for build systems in complexity, but next week, we’re going to look at one of the forks of Bazel which refines the visibility concept even further. Thanks for reading and hope to see you again next week!