Scratching your itch: Side projects

Everyone has them, small side projects which somehow never want to get finished. That small tool you wrote to convert some ancient file format into a newer one. The tiny hack to an image library to add support for scanline offset caching to improve TGA loading performance. Small things which make a library much more usable for a particular use case, or tiny tools which help with corner cases that only few people run into.

What do these side projects have in common? First of all, I guess the majority of them never gets released, which means we’re all going to have our small hack to some image library, our own small converter for ancient file formats and other tiny pieces of code on which we hack occasionally. Well, not really occasionally, but every time we run into a new bug or when we find a new use-case. Second, these side projects take away increasing amounts of time and mental capacity as you have this lingering feeling at the back of your head that you should really “finish” this at some point, but you simply don’t have the time to do it “properly”.

I know this feeling very well. Doing it “properly” means for us to have all know issues fixed, have good documentation, port it to all platforms under the sun and ideally have 100% test coverage. After all, it’s a side project, so at least here we can do everything right, right? Here, we can be the programmer we want everyone to believe we are, writing perfect code.

So how can we solve this dilemma? The first step is to understand that even non-perfect code can solve problems, especially if the problem domain is very small. If all you need is to decompress DXT images, a bunch of C-functions with inline comments might not be the packaged library that you would like to see, but it does solve the problem for people. If anyone has the urgent need to decompress DXT, he will use that library, and chances are high he’ll contribute support for that one more format he cares about. This assumes that the code is out there in the first place!

The five-minute guide to release your side project is quite easy, but what you need to do is to prepare yourself to spend some time on the release itself. Not polishing the code, not fixing crazy corner cases, but doing the stuff that really matters:

  • Signing up at some web code repository: Bitbucket, Github, everything else doesn’t matter.
  • Get familiar with Mercurial or git. If your code uses a different revision control system, export and reimport now. Currently, only those two revision control systems matter, with a strong bias on git.
  • Decide on the license to use. BSD or MIT is the license of choice if you want people to use your code. GPL may be acceptable for Python or other script languages where you have to release the whole source anyway, but BSD or MIT is better still.
  • Write a readme: What problems does this code solve, on what system does it run, how do I compile it. The readme is crucial for search engines to find your code. Use something like Markdown or ReStructuredText for it so the plain text can be parsed easily.
  • Decide on a name: You don’t want to rename your project and loose your search engine rank. Check first if the name is already in use, calling your SQL database my-SQL-DB might not work as expected.
  • Write docs if needed: Don’t waste time on docs unless they are really needed. If needed, use Sphinx or something else which is readable in plain text if people don’t build the docs. Sphinx is great as there are web services like ReadTheDocs which you can point people to.
  • If you wrote a sufficiently self-contained library and there’s a distribution system for your language, do the extra leg work to publish your library on the packaging system. For Python, that would be PyPI, for C#, you probably want NuGet to work and for JavaScript npm is your friend. For very small projects, you can skip this.
  • Mark the current version as 1.0. If you don’t feel like 1.0, fix the most urgent bugs and push. If your stuff works, and doesn’t crash on every corner, go ahead with 1.0. I know this might sound a bit crazy (hey, 1.0 means stable, right?) but the sad truth about your pet project is that the current state is probably as stable as it will get (remember? You only fix critical bugs anyway), and there won’t be a “future” proper release. So you can go ahead and call it 1.0 just as well, and other people are more likely to use it. If you see a project hanging around at 0.1 for 3 years, you assume it’s dead, wasn’t used for anything and someone simply forgot to delete it.
  • Most importantly: Ship it! Your code is ready, don’t waste time. If it’s not good but useful people will tell you, and then you can improve stuff that really matters.

The steps above will likely take you something on the order of a few hours for your first project, and less than one hour later on. If you are spending a significant amount on writing docs, packaging or fixing bugs, then your side-project is probably quite big and not really solving one problem any more. Then you’re in framework or application development which is an area where people are much more unlikely to use your code snippet, and you really need to nail a lot of things before you can release stuff. In this case, this post is not for you!

One great example of such a small, reusable library is the stb lib. It’s a bunch of solutions to common problems which you can easily integrate into your own application. However, I’m sure you all have similar code lying around just waiting to get pushed to the web to the benefit of others! So go ahead, give it the small “release polishing” and share it with all of us — thank you!

Building your own home server, part #4

Finishing touches

The last thing that remains to be done is to hook up the UPS with the server so it shuts down once power is low. Fortunately, there’s already a package which does this for us, called apcupsd. You can fetch it using:

$ apt-get install apcupsd apcupsd-cgi

It’s needs a bit of configuration to work with our UPS. Before you continue, make sure you have the USB cable connected to the server. First of all, you have to open the configuration and set the device type:

$ nano /etc/apcupsd/apcupsd.conf

Find the lines which contain UPSTYPE and DEVICE and change them to:

UPSTYPE usb
DEVICE

Now we need to write a configuration file so the daemon knows it’s configure. Edit /etc/default/apcupsd and set

ISCONFIGURED=yes

You can restart it now using service apcupsd restart. One nice thing about the APC UPS daemon is that it also comes with a web-interface:

Web site showing battery load and other power usage metrics.
The apcupsd web interface.

We’ll use the Apache 2 web server to host the interface. This requires us to install the server, map the cgi-bin directory and then enable the CGI module. The following few commands will accomplish this:

$ apt-get install apache2
$ cat ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/ >> /etc/apache2/apache2.conf
$ a2enmod cgi
$ sudo service apache2 restart

You can now navigate to the IP address of your server to the /cgi-bin/apcupsd/multimon.cgi page and get the UPS overview. There’s only one thing left to do, which is to pull the power cable to check that the UPS does actually work. The apcupsd documentation has exactly the right words for this step:

To begin the test, pull the power plug from the UPS. The first time that you do this, psychologically it won’t be easy, but after you have pulled the plug a few times, you may even come to enjoy it.

I couldn’t have said it better myself.

Power usage & performance

A watt meter measuring the power usage.
Measuring the power usage of the whole PC without UPS.

I’ve measured the total system power usage both with the UPS and without. Idle usage of the server alone is around 28-29W, and goes up to 34W under full load — that is, all CPUs busy and maximum usage of the disk drives. With the UPS, you can expect around 33W while idle, and 36W or so under load. Keep in mind that 99% of the time, the server will be in fact idle.

Performance wise, I get sustained write rates onto the ZFS filesystem of roughly 150 MiB/s. This includes the time to generate the checksums and writes to both disk drives. You can test this easily by writing a zero-byte file:

$ dd if=/dev/zero of=/tank/dummyfile count=8192 bs=1048576
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB) copied, 52.9725 s, 162 MB/s

Over network, I could easily reach more than 900 MBit over my home gigabit ethernet, which does not achieve exactly 1000 MBit even in the best circumstances. I assume that with a proper, server-grade switch you’ll be able to achieve close to 1000 MBit — if you have some numbers, please get in touch!

Closing remarks

That’s all folks, you have now a full-blown Linux server at home at your disposal. There’s no end to the things you can run on it, ranging from DLNA servers like MediaTomb over databases like PostgreSQL to virtual machine hosts using KVM. You’ll probably want to set up a backup solution as well, which can be easily done with ZFS as you can backup a snapshot while the file system is being mutated by the users. I hope you enjoyed this guide, and if you have any questions, comment or drop me a line!

The server described here has found its new location at my friends home so I can’t run additional tests on it. The deployment was very simple, I plugged in the cable and it immediately showed up on the network. Interestingly, after swapping the network cable to the monitoring port and back, I was also able to access the management console even though the network was plugged in into the “client” port. Otherwise, nothing special was needed to get it working in a different network.

Building your own home server, part #3

Software

Ok, our home server hardware is ready, but what software are we going to run? That totally depends on your use case, but I guess at least you’ll want to run a file server on it. In this post, we’ll set up a not-so-basic Samba file server using Ubuntu Linux.

With a Samba file server, you can serve both Windows and Linux clients, with fine-grained access right management. As the file-system, I’ll be setting up the super-robust ZFS, which is a next-gen file-system with extremely high reliability and some cool features. I’ll also set up automatic snapshots and integrate them into Windows shadow copies, so Windows clients will be able to restore files that they have mistakenly deleted on their own.

As the operating system, we’ll be using a long-term support release of Ubuntu Linux. You can use any other Linux you want of course, but the installation instructions here will be for Ubuntu 14.04, which does support ZFS and our hardware, and is available for free.

Server grade hardware

And now comes the seriously cool part. As we bought server-grade hardware, we can take advantage of server-grade management tools. In particular, here’s what we won’t need:

  • Spare monitor
  • Keyboard
  • Graphics card
  • USB thumb drive

Instead, we’ll forward the screen output from the server via network to our desktop machine, mount the installation media over network and even restart the machine without getting up from our desk!

All you need is to figure out which IP address has been assigned to the network ports and point your browser at it. You’ll get the management dashboard, from which you can redirect the console.

Browser window showing the server status.
The server management console. All of this runs on the management port.

The login is “ADMIN”/”ADMIN”, just in case. What’s seriously cool is that we can now open a “remote console” here which will forward the display output to our desktop, even while the machine is starting up. In fact, you can even get the BIOS welcome screen:

Window showing the BIOS welcome screen
The BIOS welcome screen, forwarded through the management console.

From here on, it’s smooth sailing, or as my administrator friend MoepMan likes to say: “Stuff works just as in the advertisement!” You plug in your Ubuntu ISO using virtual media, start the installer as always and follow the on-screen instructions, and in less than 20 minutes, the machine will boot into Linux. There are only three things you need to double-check during the setup:

  • Make sure to install to the SSD drive. When the installer asks you which drive to use, take the 40 GiB sized, and just use the guided partitioning.
  • Double-check that the first network port is selected as the default.
  • When you can select which software to install, pick OpenSSH server and Samba.

That’s it, some waiting, and a reboot later, you’re all set.

Window showing the Ubuntu installation progress bar.
Installing Ubuntu from an ISO mounted from the host. This should take only a few minutes.

Network administration

Once the installer has finished, it’s time to log in using SSH. On Linux, SSH is built-in, so you can just use ssh your-server-name and log in, on Windows you’ll need to get an SSH client like Putty.

With SSH, you get a console on the server, pretty much the same as if you would log in sitting in front of it. In fact, you can run the whole installation by just logging in on the server through the console forwarding, but I’ll use SSH because it is quite a bit more comfortable if I can copy/paste into my console window.

As we’re setting up the server, we’ll be running lots of commands with administrator rights. The best approach is to elevate us once and then just do everything as an administrator. On Ubuntu, simply use sudo -i once logged in to become root (that is, administrator.)

The first step should be to update all installed packages, which you do using:

$ apt-get update && apt-get upgrade

You’ll probably have to reboot at this point, so just type in reboot and log in again. On Ubuntu, you can’t log-in as root, so make sure you log in as your normal user and then switch to root.

ZFS

The first thing we want to set up is ZFS. Unfortunately, due to licensing restrictions, it’s not shipped by default with Ubuntu, so we need to register a repository and fetch it from there. That’s actually not that complicated:

$ apt-add-repository ppa:zfs-native/stable
$ apt-get update
$ apt-get install ubuntu-zfs zfs-auto-snapshot

This will take quite some time to build the kernel modules, so be patient. Now we can create our first pool. ZFS works in two layers: There are pools, which group hard drives, and then there are file systems which are created inside a pool. We’ll be using a mirrored pool over our two hard drives and create two file systems inside it.

Before we can do this, we have to check our hard drives though, in particular, we want to know the sector size. Modern hard drives have sectors with 4096 bytes, but due to legacy reasons, they often advertise as 512 bytes sectors, and that mismatch can cost us some performance. Let’s check using fdisk -l, which will print output similar to this:

Disk /dev/sda: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

Bingo, our hard drive uses 4096 sized sectors. Now we’re ready to create our first pool. We want to add our hard drives by their disk ID, so the pool will survive if we swap the cables. You can see all drives by id if you call ls /dev/disk/by-id. The two Western Digital should be easy to spot, with their name starting with ata-WDC-WD20EFRX.

To create the pool, call:

$ zpool create tank -o ashift=12 mirror ata-WDC_WD20EFRX-1 ata-WDC_WD20EFRX-2

The ashift=12 line tells ZFS to use blocks with \(2^{12}=4096\) bytes. We’re calling the pool tank, because this is what the ZFS documentation always uses, and because it doesn’t matter much :)

If you get an error like this, don’t worry:

does not contain an EFI label but it may contain partition

Just use the -f flag as recommended, after double checking that you are using the right drives. I got this for one disk drive for whatever reason, but as I don’t care about the data, we can just go ahead and ignore this. ZFS will then take over ownership of the drive and just destroy anything that is written on it.

You can now go to /tank and see that it’s running. We’ll also want to create a few file systems. Let’s say we’ll have two users on our server (Markus and Raymund — you can create users using adduser username), and we want a shared file system. Nothing easier than that:

$ zfs create tank/Markus
$ zfs create tank/Raymund
$ zfs create tank/Shared

In ZFS, you should create one file system for every “use” case, as many settings are per-file-system (compression, deduplication, sharing, etc.) Moreover, file systems don’t cost you anything.

In case your system disk fails, you’ll want to reimport the pool instead of recreating it. The command to do this is:

$ zpool import -f tank

All that remains to be done is to set the access rights for the file systems, which are simply mounted as directories below /tank, and also behave like them. We’ll assume that each user owns his folder:

$ chown -R raymund /tank/Raymund
$ chown -R markus /tank/Markus

This sets the folder owners, from there on, the users can log in and set the permissions to their liking.

Samba

Samba is the Linux implementation of the SMB protocol used by Windows for file sharing. Setting up Samba is very simple, as its configuration is contained in a single file. We’ll set up three shares: Two for the users which can be only used with a valid log-in, and a public share for the Shared folder which can be read without logging in onto the server. For writing into Shared, a valid account will be still required.

All we need is to edit the /etc/samba/smb.conf file and add the following lines at the end:

[Shared]
path = /tank/Shared
public = yes
writable = yes
create mask = 0775
directory mask = 0775

# Duplicate this for Raymund
[markus]
path = /tank/Markus
public = no
valid users = markus
writable = yes

The part within the brackets is the name of the share, and the rest should be self explaining. On the public shared directory, we set the file access masks such that everyone can read the data, but only the user who created a file can modify it again. One quick restart of the samba server using service smbd restart, and you should see the network shares from Windows.

Volume shadow copies for Samba using ZFS

One major feature of ZFS are zero-cost snap-shots. Unlike other file systems, ZFS is always copy-on-write, so you can store the state of the file system at a particular moment in time for free by simply creating a snapshot. Later, if you find that you want to restore a file, you just open the snapshot and take it from there. This is a bit similar to Windows’ “file history”, but works on the file-system level instead of individual files. The cool thing is that we can expose ZFS snapshots to Windows clients through through the file history interface right in their explorer.

The setup is straightforward, but a bit tricky. In particular, using the zfs-auto-snapshot script is not enough, as Samba requires the snapshots in a particular format. Each snapshot must contain the date and time in UTC, with a uniform prefix. So we just roll our own script to do this: zfs-snapshot. This scripts must be started regularly (every 15 minutes, for example), and what it’ll do is create a snapshot in the right format and also automatically delete old snapshots. Using the default settings, it will keep only hourly snapshots for one week, then daily snapshots for a month, then monthly for a year and so forth — that is, the older the snapshot, the lower the frequency. I’ve stored the script as /usr/local/bin/zfs-snapshot.py. Now let’s set up a cron job — basically, a simple timer which will call our script regularly:

$ cat >> /etc/cron.d/zfs-snapshot << EOL
> PATH="/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin"
> 
> */15 * * * * root zfs-snapshot.py
> EOL
$ chmod +x /etc/cron.d/zfs-snapshot

This will run the zfs-snapshot script every 15 minutes. All that is left is integrating it with Samba, so the snapshots actually show up in Windows. For each share, append the following lines:

vfs objects = shadow_copy2
shadow: snapdir = .zfs/snapshot
shadow: sort = desc
shadow: format = shadow_copy-%Y.%m.%d-%H.%M.%S

They’re all the same for each network share, as they are all hosted on their own ZFS filesystems and hence the snapshots are in the .zfs folder. Yet another reason to use a separate file system per share! That’s it, one more restart and you should see snapshots showing up in Windows.

At this stage, the rest totally depends on your needs. We have basic file sharing set up, on a robust file system with automatic snapshots. Next time, we’ll look at power usage and how to integrate the APC UPS.