Remote backups with rsnapshot

What type of backup strategy do you employ for your solution? Do you have backups within your datacenter, or are you utilizing your hosting providers backup infrastructure if one is available? These are both good starting points for preparing your solution for disaster.

Now, what do you have in place for remote backups? Remote backups are critical in the event something where to happen to your primary datacenter. What if there was a fire, or there was a major natural disaster that took out the datacenter?

Perhaps as a more common scenario, maybe your existing backup solution was having problems and you weren’t aware of it. When the time comes for needing to restore your backups, you find that they are corrupted and unusable. This happens more often then people think.

When you deploy a new solution, you make sure its redundant and highly available. It is important to also do the same with your backup architecture. Having an on-site backup allows you to perform a speedy recovery should something go wrong. Including an off-site backup solution allows you to plan for the worst case scenario, and also gives you the piece of mind that your data is stored outside of that datacenter, under your control.

When having solution architecture discussions with clients, I strongly encourage:
– Use all available backup solutions offered by the hosting provider
– Have an off-site backup solution that is managed by yourself or a different provider

You can never have enough backups. Your data took weeks, months, or sometimes even years to develop and fine tune. If there are concerns about how much will it cost to have a remote backup solution, here is a more important cost consideration: How much will it cost your business and reputation to rebuild all your website and database content from scratch?

As you can probably tell, I am very paranoid about my clients data. So now that I hopefully gave you some food for thought, I’ll show you one inexpensive way I like to perform remote backups for smaller solutions (Under 500G). Please keep in mind that there are many backup solutions available, this is just one of many different types of solutions I present as an option to my clients.

Welcome rsnapshot. Taken from their website, http://www.rsnapshot.org:

"rsnapshot is a filesystem snapshot utility for making backups of local and remote systems.  Using rsync and hard links, it is possible to keep multiple, full backups instantly available. The disk space required is just a little more than the space of one full backup, plus incrementals. 

Depending on your configuration, it is quite possible to set up in just a few minutes. Files can be restored by the users who own them, without the root user getting involved. 

There are no tapes to change, so once it's set up, your backups can happen automatically untouched by human hands. And because rsnapshot only keeps a fixed (but configurable) number of snapshots, the amount of disk space used will not continuously grow.

Many of the more common questions such as, “How do I restore a backup?” are answered in their FAQ which is located here:
http://www.rsnapshot.org/faq.html

I strongly encourage you to review their documentation so you can decide if this software is good for your solution. I like this solution cause it essentially allows you to simply rsync or SCP the needed information from your remote backup server back to your production servers when you need it. There are no complicated tools required to get your critical data back on your solution.

So, what do you need to set this up? You simply need a Linux/UNIX based computer that is running offsite, maybe even at your office if it is in a secure location, and enough hard drive space to store your backups. Installation is quick and easy as I’ll outline below. For this example, I am using a Rackspace Cloud, CentOS 6 server with 2x 200G Cloud Block Storage volumes setup in a Raid 1, encrypted using LUKS, mounted under /opt/storage01. I outlined how to set this up in an older article: http://www.stephenlang.net/2012/12/encryption-block-storage-in-the-cloud/.

My setup is a bit more elaborate, but again, I am just paranoid about data. A simple server with enough free hard drive space will work just as well. Just make sure it is in a secured location.

Procedure

Without further ado, here is how I personally setup rsnapshot. Please note that you have to enable the EPEL repos on your server to yum install rsnapshot. You can enable the EPEL repo by:

CentOS 5

wget http://dl.fedoraproject.org/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm
sudo rpm -Uvh epel-release-5*.rpm

CentOS 6

wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
sudo rpm -Uvh epel-release-6*.rpm

Now, install rsnapshot:

yum install rsnapshot

The rest of our setup will take place in /etc/rsnapshot.conf. Make a quick backup of the configuration:

cp /etc/rsnapshot.conf /etc/rsnapshot.conf.orig

Modify the configuration to meet our needs:

vi /etc/rsnapshot.conf

Set the following to specify where you want your backups to be stored. I put in my preference, but you can change this to anything you like. Just be sure its in a location that is only accessible to root:

snapshot_root	/opt/storage02/snapshots/

Now uncomment cmd_ssh as we’ll be rsyncing over SSH:

cmd_ssh	/usr/bin/ssh

Define the backup intervals. Here is what I use:

interval        hourly  6
interval        daily   7
interval        weekly  4
interval        monthly 3

All that is left configure which remote servers you will be backing up. You will have to be sure that you setup SSH keys so rsnapshot can SSH into the remote servers without a passphrase.

As a side note, when backup up your databases, be sure to backup your MySQL dumps, (Or the dumps from whatever database software you are using). If you try to backup the live database, you will likely have severe corruption if you ever need to restore it.

For our example, I am backing up 2 servers:
– db01.example.com (192.168.2.2) : /etc, /var/lib/mysqlbackup
– web01.example.com (192.168.2.3) : /etc, /var/www, and excluding /var/www/example.com/file/big_log_file.log

# db01.example.com (192.168.2.2)
backup  [email protected]:/etc/  db01.example.com/
backup  [email protected]:/var/lib/mysqlbackup/  db01.example.com/

# web01.example.com (192.168.2.3)
backup  [email protected]:/etc/  web01.example.com/
backup  [email protected]:/var/www  web01.example.com/ exclude=file/big_log_file.log

Finally, setup the cron jobs:

crontab -e
0 */4 * * * /usr/bin/rsnapshot hourly
30 8 * * * /usr/bin/rsnapshot daily
55 8 * * 1 /usr/bin/rsnapshot weekly
15 9 1 * * /usr/bin/rsnapshot monthly

Test to ensure everything works accordingly:

/usr/bin/rsnapshot hourly

– Check the directory to ensure your content was saved:

ls /opt/storage02/snapshots/

– Check the log file to ensure there are no errors:

less /var/log/rsnapshot

Most importantly, you must check to ensure that your backup system is functionality properly pretty often. You will want to periodically test your backups, at least every 90 days, to ensure that your team is familiar with the process, and to ensure that everything is okay with your backups. Backups are not ‘set it and forget it’. Always verify your data’s integrity, otherwise you may have a really bad time the day to find you need to restore from backups!

Duplicity manager

Coming up with a secure and cost effective backup solution can be a daunting task as there are many considerations that much be taken into account. Some of the more basic items to think about are:

- Where to store your backups?
- Is the storage medium redundant?
- How will data retention will handled?
- How will the data at rest be encrypted?

A tool that I prefer for performing encrypted, bandwidth efficient backups to a variety of remote backends such as Rackspace Cloud Files, Amazon S3, and many others is Duplicity.

Taken from Duplicity’s site, (http://duplicity.nongnu.org), Duplicity back directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. Because duplicity uses librsync, the incremental archives are space efficient and only record the parts of files that have changed since the last backup. Because duplicity uses GnuPG to encrypt and/or sign these archives, they will be safe from spying and/or modification by the server.

Duplicity-manager was created to act as a wrapper script for the tasks I commonly perform with Duplicity.

Features

- Simple invocation from cron for nightly backups.
- All in one script for performing backups, restores, searching for content from specific time period.
- Provides an optional menu driven interface to make backups as painless as possible.

Configuration

The currently configurable options are listed below:

# Configuring either Rackspace Cloud Files or Amazon S3 backends

# List of directories to backup
INCLUDE_LIST=( /etc /var/www /var/lib/mysqlbackup )

# GPG Passphrase for encrypting data at rest
# You can use the following to generate a decent GPG passphrase, just be sure
# to store it someone secure off this server.
# < /dev/urandom tr -dc _A-Z-a-z-0-9 | head -c64
export PASSPHRASE=YOUR_PASSPHRASE

# Backup Retention 
retention_type=remove-older-than
retention_max=14D
 
# Number of full backups to keep (alternative to above)
# retention_type=remove-all-but-n-full
# retention_max=3

# Force Full Backup Every XX Days
full_backup_days=7D

# Restore Directory
restore=/tmp

Usage

./duplicity-manager.sh 

Options:

--backup:                      runs a normal backup based off retention settings
--backup-force-full:           forces a full backup
--list-files [age]:            lists the files currently stored in backups
--restore-all [age]:           restores everything to restore directory
--restore-single [age] [path]: restores a specific file/dir to restore directory
--show-backups:                lists full and incremental backups in the archive
--menu:                        user friendly menu driven interface

Examples:

duplicity-manager.sh --list-files 0D              Lists the most recent files in archive
duplicity-manager.sh --restore-all 2D             Restores everything from 2 days ago
duplicity-manager.sh --restore-single 0D var/www/ Restores /var/www from latest backup

Implementation

Download script to desired directory and set it to be executable:

# Linux based systems
cd /root
git clone https://github.com/stephenlang/duplicity-manager

After configuring the tunables in the script (see above), create a cron job to execute the script one a day:

# Linux based systems
crontab -e
10 3 * * * /root/duplicity-manager/duplicity-manager.sh

As with any backup solution, it is critical that you test your backups often to ensure your data is recoverable in the event a restore is needed.