Richard Crowley’s blog

Dependency management for grown ups

Sprouting a package manager seems to be a rite of passage for programming languages.  Perl has CPAN.  PHP has PEAR and PECL.  Python has setuptools/distribute and PyPI.  Ruby has RubyGems.  Node.js has npm.  Go has goinstall.  Haskell has Cabal.  I’m sure there are more but I’m tired of searching and I made my point.

They each bring to the table a subtly different command-line grammar, novel new rules for dependency resolution (“~>”, for example) and a new set of repositories to fail.  Different grammars are annoyances.  RubyGems’ “~>” syntax is ugly but important (it allows a version greater-than or equal-to the specified version but only within the same minor series).  The new repositories are, for better and worse, unsupervised by the Debian maintainers or any equivalent body.

The fatal flaw in every single case is the systemic inability of each of these package managers to fully describe the dependencies of their packages.  The nokogiri gem is a a typical representative case through no fault of its own.  It depends on libxslt1-dev and libxml2-dev and to complicate matters further, those are the Debian/Ubuntu names for packages known as libxslt-devel and libxml2-devel in the Red Hat world.

I will not be endorsing a utopian future in which Debian and Red Hat have unified their package naming scheme such that RubyGems could programmatically specify and satisfy these dependencies in a portable manner.  I will also not be rallying everyone to create Debian packages and RPMs, run apt and yum repositories, and generally do everything twice.  Finally, I realize it would be a waste of breath calling everyone to standardize on one package format and manager.

A new tool appears to be the only pragmatic way forward and others seem to agree.  The curious thing is the recent introduction of several promising contenders that each fall short for exactly the same reason.  Ruby’s Bundler and RVM as well as Python’s pip/virtualenv all isolate sets of packages and generate dependency lists from their isolated installations.

A bundle, an RVM gemset, and a pip/virtualenv requirements.txt file are each an incomplete list of dependencies, plagued by the same systemic inability to fully describe dependencies that plague the package managers on which they’re based.  What makes these better than the language’s package format?  [1]

The answer is invariably the installation isolation they can each provide.  Isolation begets precision and precision enables confident, rapid deployment.  In all of these cases, the isolation is sabotaged by the underlying package manager and fails to provide complete isolation for a set of dependencies.  You really want isolated environments on the same box?  UNIX has had a way to do this since 1979.  It’s called chroot(2).  [2]

The pragmatic new tool isn’t Bundler, RVM, pip/virtualenv, or anything else with language-blinders on.  It’s configuration management.  This is a natural, responsible extension of your Gemfile, .gems or requirements.txt with the added bonus of completeness.  It can take the form of a well-written shell script, a Puppet manifest, a Chef recipe, or the equivalent in your configuration management tool of choice.  All that really matters is that you maintain it.

Typical configuration management comes in client-server architectures but that’s not necessary to dip your toe in.  Both Puppet and Chef support a standalone mode where manifests/recipes are read from a local file.

Take for example the following Puppet manifest which implements the Nokogiri example from above for CentOS and Ubuntu.  Other operating systems can be added easily or the whole thing can be trimmed down if you only use one.  For greatest predictability, versions should be specified rather than left to latest.

$build = $operatingsystem? {
    "centos" => ["autoconf", "automake", "gcc", "glibc-devel"],
    "ubuntu" => "build-essential",
}
$xml = $operatingsystem ? {
    "centos" => "libxml-devel",
    "ubuntu" => "libxml2-dev",
}
$xslt = $operatingsystem ? {
    "centos" => "libxslt-devel",
    "ubuntu" => "libxslt1-dev",
}
package {
    $build:
        ensure => latest;
    "ruby":
        ensure => latest;
    "ruby-dev":
        ensure  => latest,
        require => Package[$build];
    "rubygems":
        ensure  => latest,
        require => Package["ruby-dev"];
    $xml:
        ensure => latest;
    $xslt:
        ensure => latest;
    "nokogiri":
        ensure   => latest,
        provider => gem,
        require  => Package["rubygems", $xml, $xslt];
}

Run the file with puppet apply example.pp.

The bottom line

The language-specific choices out there provide incomplete dependency management and false sense of isolation.  Complete dependency management can be had through Puppet, Chef, or any other configuration management software.  Complete isolation can be had through virtualization, chroot(2) or cgroups on newer kernels, or by using DevStructure.

Language-specific package managers may be here to stay but that’s no reason to let complete dependency management slip away.  Learn about configuration management.  Your deploys will thank you.

Disclosure: I’m one of the founders of DevStructure.