Build systems: Buck

Last week we looked at Bazel, and today we're going to look at one of the build systems in the Bazel family: Buck. Buck is Facebook's variant of Bazel, so most of what was written last week still applies. The interesting bit we're going highlight today is how Buck handles include directories. Before we start though, let's understand first why this is something worth writing about!

The include mess

What's wrong with #include files? Well, what isn't -- especially when it comes to building bigger projects. The main issue is collision of include file names. Assume you have two libraries A and B both with an utility.h, and which you want to consume from a single project. As we've seen before in Bazel and Waf, some build systems allow to transitively forward include directories, which is great as nothing is ambiguous. But in the setup just described, you'll have a collision between the two utility.h files, and if you don't have full control over the layout of your project, you can't actually resolve it! The only way to disambiguate is to find a parent directory from which the utility.h files have a unique path.

Interestingly, this is something which C++ modules are supposed to fix eventually, but in the meantime, the solution space is rather sparse. In my own framework, I decided to use what I call canonical include files. Qt, another huge C++ library, uses a similar approach. In both cases, an indirection layer is used to solve the problem, but the indirection layer is created manually.

Buck & includes

Buck provides tooling to solve the problem once and for all. Just like in Bazel, we define the public #include files, but what Buck does now it automatically copies them into a new directory, and then forces this as the include directory -- with the project name attached to it. What we included previously using #include "StaticLibraryHeader.h" now becomes #include "statlib/StaticLibraryHeader.h". When building a project, you'll notice that in the buck-out/gen/statlib folder, there's a new folder named statlib#default,private-headers and this contains all headers we marked as export from the static library.

The only difference compared to Bazel -- where we specified an includes directive which contained the current directory -- is that we need to specify them manually:

cxx_library(
    name = "statlib",
    srcs = ["StaticLibrarySource.cpp", ":gentable"],
    headers = ["StaticLibraryHeader.h"],
    visibility = ["//dynlib:dynlib"],
    exported_headers = ["StaticLibraryHeader.h"],
    link_style = "static_pic"
)

As long as your library names don't collide, this seems like a very nice solution to this age-old C++ problem. Going forward, it should allow a clean transition to modules, which will cause other build tools quite some headache as they introduce yet another build target and dependency per C++ library.

Otherwise, Buck behaves very similar to Bazel, with some changes to the command names, but nothing earth-shattering. See for yourself in the sample repository! Thanks for reading and see you again next week!

Build systems: Bazel

As hinted last week, we're going for some really high-level build system today: Bazel. Bazel is Google's internal build tool, designed for scale, 100% robust builds, and also fast execution. The motto is "{Fast, Correct} - Choose two", and today we're going to find out how Bazel achieves this goal. Besides the features, we're also covering Bazel as the new concepts it introduced spawned a whole family of build systems like Please, Pants, as well as Buck.

Overview

Unlike the systems we saw so far, Bazel wants to guarantee that every build is repeatable. It thus requires very explicit dependencies, and even runs the build in a sandbox to ensure that only the specified files get accessed. This part is hidden from the user and Bazel manages to not require additional work over the other tools we saw so far. Reliable builds are an important puzzle piece for building project at scale. The other major component is composing builds from individual parts, which we're going to look at next.

One major new concept Bazel introduces to help building at scale is project visibility. Each folder with a BUILD file defines a package -- or a project -- which can define how it can be consumed. For instance, you can have some shared library which should be visible to anyone, typically a common library. At the same time, so other library might be only really useful inside your small corner of the build tree, and Bazel allows you to limit the visibility to this part from the project itself. From the outside, it's impossible to link against this project, even though it's part of the build. This fine-grained visibility makes it easy to compose large builds, because everyone can have his own private utility library and there will be no conflict, nor ambiguity which one to use.

The visibility is expressed through a path syntax: //path/to/project:rule. It's possible to compose multiple projects into a workspace which introduces another top-level hierarchy changing the path to @project//path/to/project:rule, but that's already quite advanced use of Bazel -- check the documentation if this kind of scaling is interesting for you.

Along with the global view of all projects, Bazel also comes with a very powerful query language to inspect the build. Let's say for instance I want to know all files our static library creates. This is a query which was impossible to specify for all the build systems we looked at, but in Bazel, it's really simple:

$ bazel query 'kind("generated file", //statlib:*)'
//statlib:table.cpp

And voilà, we found out that our statlib project has one rule producing a generated file called table.cpp. The power of the query language doesn't stop here thought, we can also get the full dependency graph easily:

/images/2017/bazel-build-dep-graph.svg

The full dependency graph for our sample project.

Bazel comes with it's own programming language, called Skylark. It looks like Python and you might be tempted to say that it's the same as in Waf and SCons, but in fact, Skylark is a separate language which just shares the Python syntax for legacy reasons. Many moons ago before Google open sourced Bazel, they used the predecessor called Blaze, and that one had a Python interpreter which ran on the build files as a pre-process. Due to this heritage, Python remained part of Bazel, but there's no Python interpreter any more in modern Bazel.

Sample project

Time to look at our sample project with Bazel. This time, we have three separate and independent BUILD files. As usual, we're starting with our static library, and the rule to generate the lookup table:

genrule(
    name = "gentable",
    srcs = ["tablegen.py"],
    outs = ["table.cpp"],
    cmd = "python3 $< > $@"
)

genrule is the most basic level at which you can specify rules for Bazel. Here, we have to specify the inputs and outputs, the command which will be executed, and Bazel will produce a node in its execution graph which consumes tablegen.py and produces table.cpp.

We can use it right away in the next rule in the project:

cc_library(
    name = "statlib",
    srcs = ["StaticLibrarySource.cpp", "table.cpp"],
    hdrs = ["StaticLibraryHeader.h"],
    visibility = ["//dynlib:__pkg__"],
    includes = ["."],
    linkstatic = 1
)

Here things get interesting. Just like we saw with Waf, we can specify the include directory for our clients, which in our case is the current directory. Without this, a library linking to statlib will not see the includes at all. We also specify linkstatic, but lo and behold, Bazel will take care of the -fPIC story for us! We also make the static library visible to our dynamic library through the visibility statement.

The dynamic library project is just what we'd expect:

cc_library(
    name = "dynlib",
    srcs = ["DynamicLibrarySource.cpp"],
    hdrs = ["DynamicLibraryHeader.h"],
    deps = ["//statlib"],
    includes = ["."],
    visibility = ["//visibility:public"],
    copts = ["-DBUILD_DYNAMIC_LIBRARY=1"]
)

We link against the static library using the deps statement, and we make it visible to everyone. As usual, you can inspect the full build files in the source repository.

Before we close, there's one more interesting bit about Bazel -- the ability to run your program directly. This is not as trivial as it sounds, as the dynamic library needs to be in the search path for the executable. Bazel takes care of this using the bazel run command:

$ bazel run executable
INFO: Found 1 target...
Target //executable:executable up-to-date:
bazel-bin/executable/executable
INFO: Elapsed time: 0.141s, Critical Path: 0.00s

INFO: Running command line: bazel-bin/executable/executable
94

That's quite impressive if you ask me! Bazel is a really interesting project as it tackles the issues of scaling your build system and making it robust at the same time, without sacrificing much on the readability or maintainability side. The query language is very interesting to extract lots of information about your build and allows to express very precise queries, instead of just printing the whole dependency graph and letting you figure out the details. You might think that that's in for build systems in complexity, but next week, we're going to look at one of the forks of Bazel which refines the visibility concept even further. Thanks for reading and hope to see you again next week!

Build systems: FASTbuild

The last weeks we've been steadily increasing the complexity of our build tools -- from bare-bones Make to WAF which had high-level concepts like include directory export. This week, we're going back to the roots again with FASTBuild -- you might have seen it Ubisoft's presentation at CppCon 2014.

Overview

FASTBuild is in many ways similar to Make. It's a very focused tool for low-level execution, without much fluff on top of it. The main difference between FASTBuild and the other tools we've looked at so far is the highly hierarchical configuration setup. In FASTBuild, you build up your configuration step by step, adding all options directly like in Make.

Project files & syntax

FASTBuild uses a custom language for build files. It's a very bare-bones languages, providing structured programming through #if similar to a C preprocessor, and various ways to manipulate variables.

The key concepts are inheritance and composition. This allows scaling to large builds by reusing a lot of the configuration without having global state that applies to all projects. Let's look at how you specify two different target platforms in FASTBuild. Instead of a single option you need to set, you'd specify multiple configurations:

.ConfigX86 =
[
    .Compiler = "compilers/x86/cl.exe"
    .ConfigName = "x86"
]

.ConfigX64 =
[
    .Compiler = "compilers/x64/cl.exe"
    .ConfigName = "x64"
]

.Configs =
{
    .ConfigX86,
    .ConfigX64
}

This sets up two structures and an array Configs containing them both. Later on, you'd pass around the configuration array to your targets, and just iterate over the array, building a target with each of them manually. This approach allows FASTBuild to generate multiple platforms and configurations at the same time.

Functions

Functions provide the actual node graph in FASTBuild. For instance, calling the function Executable will register a new executable to be built, which can depend on different libraries. The build graph is built from those dependencies and cannot be expressed directly in the build file language -- there's no way to create a new build node without resorting to C++ and extending the core of FASTBuild.

... and more

So far, FASTBuild sounds like a very bare-bones replacement of Make, but there are various interesting capabilities built into FASTBuild: Distribution, parallel execution and caching. All of these are related: As FASTBuild knows all dependencies precisely, and has a global understanding of the project, it can automatically distribute the build across multiple cores and even nodes in a network, and it can also cache things which don't require to be rebuild.

Sample project

For FASTBuild, we're going to start with the root build file which contains all the project configuration. As mentioned above, FASTBuild is very explicit about all settings and it requires quite a bit of setup before it can get going. The sample project supports only Windows, and only Visual Studio 2015, but all those settings would be in a separate file for a production build. Building a project on Windows requires a lot of options to be passed to the compiler, I'm picking just the compiler options here as an example:

.CompilerOptions    = '"%1"' // Input
                + ' /Fo"%2"' // Output
                + ' /Z7' // Debug format (in .obj)
                + ' /c' // Compile only
                + ' /nologo' // No compiler spam
                + ' /W4' // Warning level 4
                + ' /WX' // Warnings as errors

Here we can see build-time variable substitutions at work, FASTBuild will replace %1 with the name of the input file automatically.

Slightly down below, you'll notice the first time I'm taking advantage of the structured nature of FASTBuild to specify the DLLOptions. Those are simply the default options, but with minor tweaks:

.DLLOptions =
[
    .CompilerOptions        + ' /DLL /MT'
    .LinkerOptions          + ' /DLL'
]

Later on, we'll see how this way of setting things comes in handy, but let's start with the static library. It consists of two nodes -- an Exec node which invokes Python to generate the table, and a Library node which requires the table to be generated already. The Exec node is very similar to a basic Make rule:

Exec ("tablegen")
{
    .ExecExecutable = "C:\Program Files\Python 3.5\python.exe"
    .ExecInput = "statlib\tablegen.py"
    .ExecOutput = "statlib\table.cpp"
    .ExecArguments = "%1"
    .ExecUseStdOutAsOutput = true
}

The dynamic library is the first one where we're going to use a structure to pass in parameters. Instead of setting the linker options directly, we just pull in the DLLOptions we defined above using the Using command:

DLL("dynlib")
{
    Using (.DLLOptions)

    .LinkerOutput = "dynlib.dll"
    .Libraries = {"statlib" ,"dynlib-obj"}
}

We could have written .LinkerOptions + ' /DLL' as well, but then we'd have to duplicate it everywhere in our project where we want to build a shared library. Notice that the dependency to the static library is established directly by name, but we still need to manually set the include path as there's no communication between targets by default.

Finally, the executable itself has nothing surprising any more, and our sample project is complete (and as usual, available in the repository for your viewing pleasure.) I'm a bit on the edge regarding FASTBuild -- I like the fact that it's very focused on building C++ code fast, but I wish it would allow for some more flexibility and extensibility. For instance, it would be interesting to be able to define new functions in the build language. Even if slower than the built-in nodes, this would make it possible to build more complex tasks going beyond the simple property setting & inheritance which is at the core right now.

That's all for this week, I hope you liked it, and next week we'll go into the completely opposite direction and look at a very high-level build tool. Stay tuned!

Build systems: Waf

Another week, another build system -- this week, we're going to look at Waf. It's another Python-based build system, and it's one with a few new concepts, so let's dive into it right away.

Overview

The key idea behind Waf is to have the build split up into separate phases: Configure, then build. Waf is not the first build system ever with a configure step -- that honor probably goes to automake -- but it's the first in this series, and as such, we'll take the opportunity to investigate what configure is good for.

The idea behind a separate configure step is that there's a lot of setup code which only needs to be run once to identify the platform. Think of searching for header files and setting build definitions based on this, which usually requires firing up a compiler just to store the result whether it worked or not. Same goes for platform-specific file generation -- what's the name of the operating system, what configuration options did the user specify, and so on.

Instead of running this every time you build, configure runs once, does all the expensive work and persists it, and subsequent build calls don't even need to execute the configuration steps.

Project files

Waf project files are plain Python scripts, same as we saw in SCons last week. The key difference is that we need to expose a couple of functions per script which are called by Waf, instead of typing in our commands directly and calling Waf on our own.

We need at least a configure and a build function, and both get a context object passed into them. During configure, we'll do the usual things like setting up options for the compiler. This is straightforward as the environment exposes the flags directly, and we can manipulate them using Python, which is very well suited for dictionaries and lists.

Task oriented

Waf's execution graph is task-based -- each task is the atomic unit of work which takes some inputs and transforms them into the outputs. Once the tasks are created, then the scheduler takes over and runs the tasks. Integrating new custom build steps is done through new tasks, as we'll see below in the sample project. One great feature of the Waf scheduler is that it will automatically extract execution dependencies. If we specify a task which produces files, and we have another task which consumes them, Waf will take care of scheduling them in-order without extra work.

C++ support

Waf provides C++ support along other languages like D and Fortran, and it's integrated through language specific tasks. The language integration provides basic build targets, and it has first-class support for C++ specific concepts like include paths. Of all the build systems we've looked at so far, Waf is the first one which allows for true modularity of the build files as it exports & imports include directories across targets.

In one file, we can create a new shared library which specifies the export_includes directory, and if we consume that library elsewhere using use, Waf will take care of setting the correct include directory at the use site. We can thus move around projects freely, and things will just work -- no need to hard-code paths any more.

Sample project

Let's start with the static library project, which requires some code generation. As mentioned above, we'll use a task to generate the lookup table. There's two things to a task, the definition, and then the invocation. Here's the definition:

class tablegen(Task):
    def run(self):
        return self.exec_command ('python {} > {}'.format (
            self.inputs [0].abspath (),
            self.outputs [0].abspath ()
        ))

What we notice right away is that there's no need for special escape variables, the inputs & outputs are specified as member variables. This also implies the task is instantiated for every execution, instead of being reused. This happens very explicitly in the build step:

tg = tablegen (env=ctx.env)
tg.set_inputs (ctx.path.find_resource ('tablegen.py'))
tg.set_outputs (ctx.path.find_or_declare ('table.cpp'))

ctx.add_to_group(tg)

We specify both the inputs and outputs, which initializes the self.inputs, self.outputs members we saw above. Finally, we add it to the current build group, which is the list of tasks that gets executed for this project. Thanks to the auto-discovery of dependencies, it's enough to declare the output table.cpp and use that in the subsequent stlib call to get the execution order right.

The rest is very simple:

ctx.stlib (
    source = 'StaticLibrarySource.cpp table.cpp',
    includes = '.',
    export_includes = '.',
    target = 'statlib',
    cxxflags=ctx.env.CXXFLAGS_cxxshlib)

This adds a static library, uses the specified include directory and source files, gets a name, and exports include directories as explained above. The last line needs a bit more explanation. The stlib has the usual -fPIC problem for Linux, and unfortunately Waf cannot resolve this automatically. We need to specify the flags manually -- the recommended solution is to use the cxxshlib flags for a static library. That's what's happening here in the last line.

The remainder of the sample project is rather boring -- Waf doesn't require a lot of magic to set up. One of the highlights of Waf is that it builds out-of-source by default. All build files, temporaries, and targets get placed into a build folder by default, which is trivial to ignore for source-control systems and avoids polluting the source tree with intermediate files. Even the lookup table we generated above gets built there. There's only one minor caveat here which is: Waf doesn't copy shared libraries into the same folder as the binary, nor does it provide an easy build command line which would set up all paths.

As usual, the sample project is online, and it should work on both Linux and Windows. That's all for this week, see you again next week!

Build systems: SCons

This week we'll cover SCons, a Python-based build system. SCons is a full build system, with built-in support for different programming languages, automated dependency discovery, and other higher-level utilities. Some of you might have heard of it as it has been used in Doom 3 for the Linux builds. That's at least how I heard about SCons for the first time ☺ Apparently, at some point, Frostbite was using SCons as well.

Overview

SCons is written in Python -- and the build files are actually Python scripts. This is great as there's no "built-in" scripting, you can just do whatever you want in Python, and you call SCons like any other library.

SCons at its core is similar bare-bones to Make. You have node objects internally which represent the build graph. Usually, those nodes get generated under-the-hood -- for example, you write a SCons file like this:

Program(['main.c'])

Here, main.c is an Object (as in a C++ object file, not something related to object-oriented programming, mind you). The code above is equivalent to:

Program ([Object ('main.c')])

This creates one C++ Program node, with one C++ Object as the input. You can look at the dependency tree using scons --tree all. For the example above, it would produce something like this:

+-.
+-Sconstruct
+-main
| +-main.o
| | +-main.cpp
| | +-/usr/bin/g++
| +-/usr/bin/g++
+-main.cpp
+-main.o
  +-main.cpp
  +-/usr/bin/g++

That's the full dependency tree. SCons is not timestamp based (at least by default), and the files above will be hashed and checked as dependencies when you ask for a build. When you run the build, you'll see how SCons processes the tree bottom-up:

scons: Building targets ...
g++ -o main.o -c main.cpp
g++ -o main main.o
scons: done building targets.

That's it for the core functionality of SCons. Notice that SCons is aware of all intermediate targets, which means that it can also clean everything by default.

C++ integration

C++ is yet another target language for SCons, and shares functionality with C, D and Fortran. All of them have the same targets and a similar build model. You specify a Program, and based on the list of provided source files, SCons figures out what compiler to use.

There's limited support for discovering libraries and functions. It will search the default paths for it, but I haven't seen a way to register custom search modules. Of course, that's only a minor limitation, as you can just write the search script in Python, but it's a bit disappointing that SCons doesn't come with a set of search scripts.

SCons can optionally generate Visual Studio project files. Optionally, as there's no simple "generate project files" from a given SCons node graph. Instead you build them in a declarative way just like any other target. What you get from SCons is the actual Xml file generation. This makes it easier than writing it from scratch, but it requires still some repetition -- for instance, the source files need to be provided to the target you want to build, and then again to the Visual Studio project. The Visual Studio project generator doesn't query them from the target, this part is left to the user. If you're interested in the details, the documentation has some example code which shows a simple project setup.

Sample project

Let's see how SCons fares with our sample project. On the good news side, it's the first system we're looking at which supports both Linux and Windows with the same build script.

import os

# This is needed so we get the python from PATH
env = Environment(ENV = os.environ)

# include path is relative to root directory (indicated as #)
env.Append (CPPPATH='#')
p = os.path.abspath ('./tablegen.py')

This is just some basic setup so we can later use the same python executable as we used to invoke SCons. SCons doesn't inherit the environment of the surrounding shell by default, so we need to do this manually. This is for Windows, so we can actually find a Python interpreter if we have it in the PATH of the calling script.

pyexec = 'python' if os.name == 'nt' else 'python'
env.Command ('table.cpp', 'tablegen.py', '{} {} > $TARGET'.format (pyexec, p))

Now we're finally coming to the meat of this build:

env.StaticLibrary('statlib', [
    # This adds fPIC in a portable way
    SharedObject ('StaticLibrarySource.cpp'),
    SharedObject ('table.cpp')])

Here we're using the nodes directly, as we'd have to edit the compiler configuration manually otherwise to add -fPIC. Notice that SCons doesn't notice on it's own that this static library is consumed in a shared library, so we need to handle this manually. From here on, we can reference the target statlib in other project files inside the same build, which simplifies the linking -- no hard-coded paths. However, we can't export/import include paths directly, so our dynamic library project will still end up specifying the path for includes manually:

SharedLibrary ('dynlib', ['DynamicLibrarySource.cpp'],
  LIBS=['statlib'], LIBPATH=['../statlib'],
  CPPPATH=['.', '../statlib'], CPPDEFINES = ['BUILD_DYNAMIC_LIBRARY=1'])

That's it for SCons and this week! As always, make sure to look at the sample project for a full, working example.

In my opinion, the main advantages of SCons are twofold. First, it lets you freely mix various languages -- you can build a combined D and C++ and Java program without breaking a sweat. Second: By virtue of being a Python module, it easily integrates with the typical "glue" code in a build without having to change languages. I currently have a lot of Python code running as part of my build, and having the build system written in Python and using Python for the build files would simplify many things. It also removes the context switch between build system language and utility language. Personally, I'd probably consider it for very heterogenous builds, as SCons makes customization and scripting really simple compared to the other tools we've look at so far -- and also compared to quite a few tools that are yet to come. Enough teasing for today, see you again next week!