Photo of me

Hugo Osvaldo Barrera

Software Developer. Python Lover. IT Consultant.

Performing backups the right way

For years I’ve had a single task on my TO-DO list: backup photos. I had an awful solution years ago, and only recently did a permanent, proper solution.

Doing backups the right way means taking several items into consideration, and should not be done lightly. Trusting poor backup solutions will result in a false sense of security where you might loose everything suddenly, and not even realize it until it’s too late!

These are items you should keep in mind when designing your backup solution:

What I did

Automation, transparency and fail-safe are all covered by using cron. My backups run on a daily basis, in case of error cron emails the output, and it does not in any way interfere with my work at all. cron sends an email to the current user with the output of what it runs. My backup script is silent in case of success, but will leak all errors to stdout, which cron will then email to me. By simple creating an MX record for my machine’s hostname, and an alias on my mail server, I get an email in my inbox if anything goes wrong.
If you don’t have your own domain (or don’t control the host’s domain), you can set up a local smtpd server, and forward the emails to yourself.

Efficiency is covered by rsync. rsync is really efficient when it comes to bandwidth usage by only transfering chages over the network and not the entire 17GiB.
As for disk-usage, by using the the --link-dest flag, rsync creates the directory tree for each day, but hardlinks files which have not been altered since the previous day. The result is a 3.4MiB usage increase per day, but the entire file tree for each day available. I can also randomly delete any day, and all other backups are unaffected.

Security is also covered by rsync, by simply making it use ssh for transfers (which is actually the default). If you use ssh of course, you’ll have to create a special ssh key pair, and make the private key available to cron, unencrypted. You’ll probably want full-disk encryption to protect it.

As for Maintainability, I had to deal with that myself. Instead of over-complicated solutions, I wrote a 26-line shell script, of which 6 lines are actually whitespace, and 13 are comments. Pretty easy to maintain.

How I did it

First of all, here’s the above mentioned script:

# Backs up photos into the fileserver.
# Pre-existing files are hardlinked to yesterday's copy, so the size
# increase equals the size of new files, while keeping daily snapshots.
set -e

# Since other scripts sync cron tasks across machines, I want to make sure
# this one only runs on hyperion
if [ $(hostname -s) != hyperion ]; then exit 1; fi;

SSH="ssh -i $HOME/.ssh/backup@hades"
TODAY=$(date -I)

# Using -H is too expensive, and I don't use hardlinks in this directory.
# Use -x to avoid copying .enc.mount directories (fuse-mounted encrypted
# data).

# Sync the files:
rsync -aqx -e "$SSH" /home/hugo/photos/ $REMOTE:data/photos/$TODAY/ \

# Link today's as latest:
$SSH $REMOTE "rm data/photos/latest && ln -sf $TODAY data/photos/latest"

Stupidly simple, right?

And, of course, the single line in crontab:

35   5    *    *    *    /bin/sh /home/hugo/.config/cron/scripts/

A final item was securing the backup. I own the destination server, and the $HOME for user backup is mounted onto an encrypted drive. Should that server reboot, I’d get an email, and would need to log in and re-mount that drive manually. This may result in a single backup failing, but I would get 2 emails if it came to that. And a server having been offline is an indicator of a more serious problem.

comments powered by Disqus