Creating Local Backups using `rdiff-backup`

rdiff-backup provides an easy way to maintain reverse-incremental backups of your data. Reverse-incremental backups are different from normal incremental backups because they synthetically update the full backup and keep reverse diffs of all the changed files. It is best illustrated by an example. Let’s consider backups taken on three consecutive days:
1. Full backup (1st day).
2. Full backup (2nd day), reverse-diff: 2nd -> 1st.
3. Full backup (3rd day), reverse diffs: 3rd -> 2nd, 2nd -> 1st.

Compare that with the regular incremental backup model, which would be:
1. Full backup (1st day).
2. Diff: 2nd -> 1st, full backup (1st day).
3. Diffs: 3rd -> 2nd, 2nd -> 1st, full backup (1st day).

This especially makes purging old backups easier. Reverse-incremental backups allow you to simply purge the reverse diffs as they expire. This happens because newer backups never depend on older ones. In contrast, in the regular incremental model, each incremental backup depends on each prior backup in the chain, going back to the full backup. Thus, you can’t remove the full backup until all the incremental backups that depend on it expire as well. This means that most of the time you need to keep more than one full backup, which takes up precious disk space.

rdiff-backup has some disadvantages as well:
1. Backups are not encrypted, making it unsuitable as-is for remote backups.
2. Only the reverse diffs are compressed.

The advantages of rdiff-backup make it suitable for creating local Time Machine-like backups.

The following script, set via cron to run daily, can be used to back up your home directory:

#! /bin/sh

SOURCE="/home/user/"
TARGET="/home/user/backups/rdiff-home/"

## Backup
rdiff-backup --exclude-if-present .nobackup --exclude-globbing-filelist /home/user/backups/home-exclude --print-statistics $SOURCE $TARGET

## Remove old data
rdiff-backup --remove-older-than 1M --force --print-statistics $TARGET

where /home/user/backups/home-exclude should look like:

+ /home/user/Desktop
+ /home/user/Documents
+ /home/user/Music
+ /home/user/Pictures
+ /home/user/Videos
+ /home/user/.vim
+ /home/user/.vimrc
+ /home/user/.ssh
+ /home/user/.gnupg

**

In order to select only certain files and directories to back up.

The --exclude-if-present .nobackup option allows you to easily add a .nobackup file to directories you wish to ignore. The --force argument when purging old backups allows it to remove more than one expired backup in a single run.

Listing backup chains:

$ rdiff-backup -l ~/backups/rdiff-home/

Restoring files from the most recent backup is simple. Because rdiff-backup keeps the latest backup as a normal mirror on disk, you can simply copy the file you need out of the backup directory. To restore older files:

$ rdiff-backup --restore-as-of 10D ~/backups/rdiff-home/.vimrc restored_vimrc

Incremental WordPress Backups using Duply (Duplicity)

This post outlines how to create encrypted incremental backups for WordPress using duplicity and duply. The general method, as you will see, is pretty generic, and I’ve been using it successfully to back up Django sites and MediaWiki installations as well. You can use this method to make secure backups to almost any kind of service imaginable: ftp, sftp, Amazon S3, rsync, Rackspace Open Cloud, Ubuntu One, Google Drive, and whatever else you can think of (as long as the duplicity folks implemented it :-)). If you prefer a simpler solution, and don’t care about incremental or encrypted backups, see my Improved FTP Backup for WordPress or my WordPress Backup to Amazon S3 Script.
Continue reading Incremental WordPress Backups using Duply (Duplicity)

Gmail backup: getmail vs. OfflineIMAP

I’m currently reviewing my backup plans and decided it’s a good occasion to finally start backing up my Gmail account. Firstly, I didn’t seriously consider desktop clients as the main backup tool, as they are hard to automate. The two main options are OfflineIMAP and getmail. Both are available from Ubuntu’s repositories, so installation is easy with both, and both have good tutorials: Matt Cutts’ getmail and EnigmaCurry’s OfflineIMAP.

OfflineIMAP claims to be faster, but I haven’t really checked it (and I’m not sure how important that is, given that it runs in the background). From what I saw, configuring them is mainly a task of cut-and-paste, but getmail requires you to list every label you want to back up, which I consider a major downside. As both are able to save the mail in maildir format, it should be easy to back it up using duplicity.

Conclusion: This was a short comparison, mainly to guide me in choosing the right backup for me. You may have different opinions (which, of course, I would gladly hear). I finally chose OfflineIMAP, mainly due to the labels issue.

Note on desktop clients: It seems that every decent one can be configured to work with a local maildir, so you can use them to read the backups. As I prefer Gmail’s interface, I will only use desktop clients in case I’m offline, so read-only access from a desktop client seems good enough for me.

Automated Encrypted Backups to S3 Using Duplicity

This tutorial will hopefully guide you through making automated encrypted backups to Amazon’s S3 using duplicity. It was written as a follow-up to Using Duplicity and Amazon S3 – Notes and Examples, in order to organize all the necessary information into a simple tutorial.

We’ll start by creating a simple wrapper for duplicity:

#! /usr/bin/python
import sys
import os

duplicity_bin = '/usr/bin/duplicity'

env = {
    'AWS_ACCESS_KEY_ID':     'PUT YOUR KEY ID HERE',
    'AWS_SECRET_ACCESS_KEY': 'PUT YOUR SECRET ACCESS KEY HERE',
    'PASSPHRASE':            'PUT ENCRYPTION PASSPHRASE',
}
env.update(os.environ)

os.execve(duplicity_bin, sys.argv, env)

Save this under duplicity-wrapper.py and chmod 0500 it so only you will be able to read and execute it.

Note: You’ll want to write down the passphrase and store it in a safe location (preferably in two separate locations). That way, in case you need to restore the backups, you won’t have useless encrypted files.

Now edit your crontab and add a line like the following:

10 1 * * 0 /path/to/duplicity-wrapper.py /path/to/folder/ s3+http://bucket-name/somefolder &>> ~/log/backups.log

This will create a weekly backup for /path/to/folder. The backup will be encrypted with whatever passphrase you’ve given in the duplicity-wrapper.py. The output of the backup process will be saved in ~/log/backups.log.

You should also run

/path/to/duplicity-wrapper.py full /path/to/folder/ s3+http://bucket-name/somefolder

in order to create full backups. You might want to periodically verify your backups:

/path/to/duplicity-wrapper.py collection-status s3+http://bucket-name/somefolder
/path/to/duplicity-wrapper.py verify s3+http://bucket-name/somefolder /path/to/folder/

to check the status of the backups and verify them.

And last but not least, in case you ever need the backups, you can restore them using:

/path/to/duplicity-wrapper.py restore s3+http://bucket-name/somefolder /path/to/folder/

Security Considerations

As I know some people will comment on saving the encryption passphrase plainly in a file, I will explain my reasoning. I use the above encryption in order to secure my files in case of data leakage from Amazon S3. In order to read my backups, or silently tamper with them, someone will have to get the passphrase from my machine. While this isn’t impossible, I will say it’s unlikely. Furthermore, if someone has access that allows him to read files from my computer, he doesn’t need the backups; he can access the files directly.

I’ve given some thought to making the backups more secure, but it seems you always have to compromise on either automation or incremental backups. But, as I wrote, the current solution seems to me strong enough given the circumstances. Nonetheless, if you’ve got a better solution, it would be nice to hear.

Django Backup Script

This is a backup script for Django projects. It’s able to automate backups of both the database and files to a local folder and a remote FTP server. It is somewhat old and has a few limitations, mainly supporting only MySQL and not supporting the new way of specifying databases introduced in Django 1.2.

It’s loosely based on my WordPress backup script and inspired the database settings auto-detection found in the newer WordPress backup script.

Usage is simple:

$ django_backup /path/to/my/proj
$ django_backup --db-only /path/to/my/proj

The latter command only backs up the database.

The script uses a few configuration variables at the top of the script to set the folder where the local backups are kept and the remote FTP server settings. The database settings are extracted directly from the settings.py of the backed-up project.
Continue reading Django Backup Script

Improved FTP Backup for WordPress

This script backs up both the database and files of a WordPress blog to a remote FTP server (while keeping a local copy). It’s an update of my WordPress Backup to FTP script. The main changes are auto-detecting database settings and better support for caching plugins (specifically WP-Cache). The new version makes it easier to back up multiple WordPress blogs to the same FTP server.
Continue reading Improved FTP Backup for WordPress

WordPress Backup to FTP

Update: A newer version of the script is available.

This script allows you to easily back up your WordPress blog to an FTP server. It’s actually a modification of my WordPress Backup to Amazon S3 Script, but instead of saving the backup to Amazon S3, it uploads it to an FTP server. Another update is that now the SQL dump includes the database creation instructions, so you don’t need to create it manually before restoring from the backup.

Although I’ve written it with WordPress in mind (to create backups of my blog), it isn’t WordPress-specific. It can be used to back up any website that consists of a MySQL database and files. I’ve successfully used it to back up a MediaWiki installation.
Continue reading WordPress Backup to FTP

Back Up a SourceForge-Hosted SVN Repository – sf-svn-backup

SourceForge urges its users to back up their projects’ code repositories. As I have several projects hosted on SourceForge, I should do it too. Making the backups isn’t complicated at all, but because it isn’t properly automated, I’ve been lazy about it.

sf-svn-backup was written to simply automate the process. The script is pretty simple to use: just pass the project name as the first argument, and the script will write the dump file to stdout.

For example:

sf-svn-backup openyahtzee > openyahtzee.dump

The project name should be its UNIX name (e.g. openyahtzee and not Open Yahtzee). Because the script writes the dump file directly to stdout, it’s easy to pipe the output through a compression program such as gzip to create compressed SVN dump files.

s3backup – Easy Backups of Folders to Amazon S3

This is an updated version of my previous backup script – Backup Directories to Amazon S3 Script. The new script works much better and is safer. Unlike the old script, the new one creates the tarballs in a temporary file under /tmp and allows more control over the backup process.

Continue reading s3backup – Easy Backups of Folders to Amazon S3