Richard Crowley’s blog

A blueprint is not a diff but it doesn’t matter

The support we at DevStructure have seen since opening the source to Blueprint in February has been fantastic and addicting — thank you all.  As a first introduction to how Blueprint works, I want to address an analogy I’ve seen many times: that a blueprint is a diff of your server.  It’s a perfect analogy and something instantly understood by developers and operators alike.

The funny thing is: it isn’t a diff.  Even better: it doesn’t matter.

Blueprint aims to describe the absolute state of a server in a concise format that is both human- and machine-readable.  That’s the JSON format you see if you run blueprint show name without a format argument (-P, -C, or -S).  There are many, many things that are a part of the absolute state of a server that we can omit in favor of brevity without sacrificing correctness and we take advantage of several.

Building Blueprint around the notion of a diff was a non-starter because it would impose an order of operations on the configuration process.  To diff between the current and some past state of a server, one could rely on file modification timestamps or explicitly invoke blueprint start-paying-attention.  Either option requires knowing the time at which you began the configuration process.  Worse still, the latter requires you to know ahead of time and install Blueprint first.

When faced with the prospect of figuring out what you did to configure a server, "You should have installed Blueprint before you started," is not the answer I want to give.  Blueprint is just as happy given a server you’ve been tweaking since 2007 as it is on a pristine machine.  This is because Blueprint deals in absolutes.

An image or a giant tarball certainly declare the absolute state of a system but miss the boat entirely on human readability and conciseness.  To have our cake and eat it, too, we needed to rise above the abstraction of files and think about packages, the more useful building blocks of Linux systems.

Reducing the noise

Of course, there are hundreds of packages on any given Linux server and most of them can be assumed and therefore omitted from a blueprint.  Packages like coreutils, grep, and libc6 are essential to a Linux userland — Blueprint itself won’t even work without them.  The system package managers like APT and Yum are easy to interrogate and can be asked about exactly these essential packages directly.  In addition, packages like ubuntu-standard and all their dependencies are omitted for the same reasons.

/etc is home to thousands of configuration files, the vast majority of which simply don’t matter to a blueprint.  By examining file metadata and comparing the MD5 sums of file content to both the manifest of files in installed packages and a secondary list of files maintained within Blueprint, almost all of those files may be confidently omitted from a blueprint, leaving only new and modified files.

/usr/local is a special directory.  It is the de facto standard home of programs compiled and installed from source.  Sort of.  There are plenty of files and plenty of alternative package managers (like Python’s easy_install and pip) that place files in /usr/local.  Blueprint intelligently trims away the fat and manages only the files that are truly important so you can compile from source knowing that Blueprint will faithfully package your build up for later.

Sort of a diff, after all

Blueprints certainly look and feel like diffs from outside and that’s to a great degree by design.  The noise reduction techniques used to keep blueprints concise and understandable are very much in the spirit of diffing tools but operate outside the notion of points-in-time.

A blueprint is a bit like a diff between the minimal Linux installation, as best can be introspected from the installation itself, and the current, running state of the system.  Create blueprints from your perfectly configured development environment and deploy with confidence.