During the recent Australian bushfire season, I'd been administering four Raspberry Pis around the house for air quality monitoring. The usual progression of sysadminning automation followed:

  • SSH'ing in and running commands by hand
  • Gradually making shellscripts for some common tasks
  • Copy/pasting the shellscripts to set different parameters for different Pis
  • Running those shellscripts across all the Pis by creating many terminals and running them in parallel.
  • Adding error handling
  • Making different scripts for install and update when the big shellscript gets too slow...

I ended up with a bunch of disastrous scripts like this one, for installing Prometheus node-exporter:

#!/bin/bash
ssh pi@pi4b "\
curl -SL https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-armv7.tar.gz\
> node_exporter.tar.gz \
&& sudo tar -xvf node_exporter.tar.gz -C /usr/local/bin/ --strip-components=1 \
&& sudo ln -sf /home/pi/dc/nodeexporter.service /etc/systemd/system/nodeexporter.service \
&& sudo systemctl daemon-reload \
&& sudo systemctl enable nodeexporter \
&& sudo systemctl start nodeexporter \
"

Ugh, inline bash being passed to SSH.

Or this script, to pull new docker images, restart docker, and provision the daemon.json config file:

#!/bin/bash
rsync -v -r $(dirname "$0")/ pi@pi4b:dc/
ssh pi@pi4b "cd dc;\
  docker-compose up --remove-orphans -d ;\
  docker kill --signal=HUP dc_prometheus_1 dc_blackbox_1 ;\
  sudo cp daemon.json /etc/docker/daemon.json"

You get the idea. It'd probably work fine if I worked at it, and used the right combinations of set -o pipefail, set -x, etc. The error handling in particular can be very difficult to get right, and I'm not confident that I know enough shell to do it right.

I also wanted to not copy-paste these for every Raspberry Pi, but I didn't feel confident in using variables in the shell - I've heard horror stories, particularly around quoting and spaces, I'm not confident I'd get it right. It feels like there's got to be a better tool for this kind of thing.

I asked around. A very smart colleague recommended Ansible. His take on infrastructure-as-code tools was:

  • Ansible is imperative-enough to get things done for a small installation, and that at my scale the pure-functional "declare the deps of everything and we'll figure out what needs to change" would probably annoy me more than helping for migrating from shell scripts.
  • Unlike some other deployment tools, Ansible is agentless: you don't need a deployment daemon running on the target machine beyond SSH and a Python interpreter: Ansible's smarts live on the laptop, where you want the smarts to live, and Ansible copies over the scripts to the remote machine. This simplicity is appealing to me - I don't really want extra daemons if I can avoid it.

So I gave Ansible a try for replacing my scripts that SSH'd into the Raspberry Pi boxes and set them up, and I kinda liked it.

Ansible seems to be a collection of pre-made tasks. Tasks are a python module that wraps some common sysadmin task, often a higher-level API to some command-line tool. for example:

  • ensure this apt / brew / yum package is at the latest version
  • ensure this file from src is at dest
  • ensure this git repo is checked out
  • ensure a user exists in this group

Note that these are all phrased to do nothing if the condition is already met: most tasks are idempotent, which can save a lot of time avoiding unnecessary re-provisioning.

You can string together tasks like this into plays, which are really just a set of tasks that run one after the other, like my brew update && brew upgrade && ... example above.

Perhaps an example playbook that installs docker on a Ubuntu machine will make this clear:

#!/usr/bin/env ansible-playbook -v

# apt-get update
- name: Update apt packages to the latest version
  apt:
    update_cache: yes
    upgrade: full

# apt-get install python3-pip docker-ce
- name: install docker-ce and pip
  apt:
    name:
    - python3-pip
    - docker-ce

# copy from host to remote, only if the file's changed, setting permissions
- name: copy /etc/docker/daemon.json
  copy:
    src: daemon.json
    dest: /etc/docker/daemon.json
    mode: '0644'
    backup: true

# ensure the user pi is in group docker
- name: add user to group docker
  user:
    name: "pi"
    groups:
    - docker
    append: true

# sudo systemctl daemon_reload
# sudo systemctl enable docker
# sudo systemctl start docker
- name: enable & start docker service
  systemd:
    name: docker
    state: started
    enabled: true
    daemon_reload: yes

# pip install --user docker docker-compose
- name: pip install python libraries
  pip:
    name:
    - docker
    - docker-compose
    extra_args: --user

There's a bunch of neat advantages here:

  • Error handling is clear: execution stops at the first problem in the playbook.
  • Most commands are idempotent: they won't do anything if the system is already as-desired.
  • You can run the commands in parallel across a cluster of machines.
  • Although I have mixed feelings about YAML, it's higher-level than direct shellscripts
  • I don't have to figure out the
  • It's great being able to add comments.
  • I didn't get into it above, but ansible has a robust templating system using jinja2 (the templating system the django web framework uses) - I'm far more confident with jinja2 than I am with shell variable expansion.
  • The task docs are great, with examples of how to use every task which you can mostly just copy/paste (this is absolutely key)
  • It's pretty mature software - lots of Stack Overflow from 2013 answering questions you have today.

The disadvantages:

  • Ansible is quite slow, particularly on slow Raspberry Pis! I've tried to drill down into this, but the profiling tools seem limited to timing the entire run of the command: a sampling profiler would be nice, or more tracepoints. I have another post coming on tracing down a problem with it's speed, I'll save that for later.
  • I have mixed feelings about YAML, and the syntax has bit me a lot while migrating.
  • The docs for how to structure your project are very high-level, I didn't understand roles at all until I saw their fantastic ansible-examples GitHub.
  • Ansible's probably not the future of provisioning. Consensus on Twitter seems to be that Terraform is the most likely winner of the next few years. But that's OK for my little Raspberry Pi farm - boring is good! But it's possible this learning I've taken on will be superseded soon by some new hotness. That's OK too - at least it'll be easier to migrate from Ansible than from shellscripts. It also seems just as likely that in 5 years we'll all be using some tool that doesn't exist at all today.

All in all, I'm very happy with using Ansible - it's saved me from my own terrible shellscripts, and seems to scale pretty well for my small setup.