Rsync backups

Created: — modified: — tags: bash

How I organized backups of my files

Update: since January 2019, I've moved to a new backup system, which is hosted on GitHub. I believe it's better for numerous reasons, so please use that!


Introduction

Until recently I was using tar to backup my files. It was working good, especially for syncing backups remotely, since at the end I got only 2~3 new files, which could be easily copied to remote destinations. But this method had few caveats:

So recently I stumbled upon a different idea: to use rsync's --link-dest commandline parameter to keep folders with different versions of files (snapshots of all backed up files).

Using hardlinks to keep same file in different folders helps reduce used space while keeping exact snapshots - to see on a file "how it was yesterday" I just need to look in yesterday's folder!

Also, last folder will always have the most recent backup, one folder before last will have snapshot just before that, and by deleting oldest folders I can automatically gain free space as needed.

Good introduction to using rsync --link-dest for backups is available here (archived version), and more advanced version (which detects folder renames) - here (archived), with a result script available here on github.

Basic script

So here is the most basic version:

#!/bin/bash

#remember: no slashes at the end of $DST!
SRC="/home /etc"
DST=/backup-local
TODAY="$DST/$(date +%Y-%m-%d_%H-%M-%S)"

LAST="$(ls -d $DST/*/ 2>/dev/null | tail -n 1)"
test -z "$LAST" && LAST="$(mktemp -d)"

#Main sync operation
rsync -a --hard-links --no-inc-recursive --human-readable --stats --verbose --link-dest="$LAST" $SRC "$TODAY"

It has four options:

And then it runs rsync.

Pretty easy, huh?

Note that I didn't put $SRC in quotes because I really want it to be expanded to two arguments if it has space (say, "/home /etc"), and I did quote "$DST" because I want it to be one argument even if it has spaces.

Check if there is something to do

Let's avoid creating folders if no files were changed.

To do it, we first run rsync in --dry-run mode, and check if it transfers any files.

Also, let's move most-used rsync commands to a separate variable. Relevant part of script looks like this now:

DRY_RUN_OUTPUT="/tmp/dry-run"
RSYNC="rsync -a --hard-links --no-inc-recursive"

#Check if there is something to do
$RSYNC --dry-run --stats $SRC $LAST >$DRY_RUN_OUTPUT
grep -q 'Number of regular files transferred: 0' $DRY_RUN_OUTPUT && exit 0

#Main sync operation
$RSYNC --human-readable --stats --verbose --link-dest="$LAST" $SRC "$TODAY"

Excluding files

Probably you want to exclude some files from backup (say, private SSH keys, cache, trash, etc). To do this, just create an exclude file with content like this:

*backup*
*cache*
*Cache*
home/*/.local/share/Trash*
home/*/.ssh/id_rsa

and reference it in the RSYNC variable like this:

RSYNC="rsync -a --hard-links --no-inc-recursive --exclude-from=/backup-local/.exclude"

Error checking

Checking for errors is good.

Instead of writing directly to $TODAY folder, let's create a temporary $TODAY.wip folder, and rename it to $TODAY only if main rsync run finished successfully.

We also need LAST variable to ignore such wip folders:

LAST="$(ls -d $DST/*/ 2>/dev/null | grep -v wip | tail -n 1)"

Main RSYNC command will look like this, then:

$RSYNC --human-readable --stats --verbose --link-dest="$LAST" $SRC "$TODAY.wip"
STATUS=$?
if test $STATUS -ne 0; then
       echo something went wrong
       exit $STATUS
fi
mv "$TODAY.wip" "$TODAY"

Checking disk space

Let's limit the script so it always tried to keep at least 10% free disk space (both in bytes and inodes) Also, while cleaning up, it will leave at least two last snapshots (If the disk doesn't have enough free space even with only two last folders - it will stop). Let's add this variables to the config section of our script:

EXTRA_PERCENT=10
KEEP_DIRS=2

And this block before main rsync operation:

TRANSFER_SIZE=$(sed '/Total transferred file size/!d;s/[^0-9]//g' $DRY_RUN_OUTPUT)
TOTAL_SPACE=$(df -B1 --output=size $DST | sed '1d')
let EXTRA_FREE_SPACE=TOTAL_SPACE*EXTRA_PERCENT/100
let FREE_SPACE_NEEDED=TRANSFER_SIZE+EXTRA_FREE_SPACE

CREATED_DIRS=$(sed '/Number of files/!d;s/.* dir: \([^ ]*\).*/\1/;s/[^0-9]//g' $DRY_RUN_OUTPUT)
CREATED_FILES=$(sed '/Number of regular files transferred/!d;s/[^0-9]//g' $DRY_RUN_OUTPUT)
TOTAL_INODES=$(df -B1 --output=itotal $DST | sed '1d')
let EXTRA_INODES=TOTAL_INODES*EXTRA_PERCENT/100
let INODES_NEEDED=CREATED_DIRS+CREATED_FILES+EXTRA_INODES

function check_space {
    FREE_SPACE_AVAILABLE=$(df -B1 --output=avail $DST | sed '1d')
    INODES_AVAILABLE=$(df --output=iavail $DST | sed '1d')
    [ $FREE_SPACE_AVAILABLE -lt $FREE_SPACE_NEEDED -o $INODES_AVAILABLE -lt $INODES_NEEDED ]
    # return status of 1 means "need to clear space"
}

function check_dirs {
    HISTORY_DIRS=$(ls -d $DST/‌*/ | wc -l)
    [ $HISTORY_DIRS -gt $KEEP_DIRS ]
    # return status of 1 means we can delete one
}

while check_space; do
    if check_dirs; then
        OLDEST=$(ls $DST/ | head -n 1)
        rm -rf $DST/$OLDEST
    else
        echo "can't clean up"
        exit 1
    fi
done

Waiting for other copy of same script to finish

If you're running this script from a cron job, you might want to avoid situations when a new backup job starts while the previous one didn't finish.

Checking for .wip extension in last folder name is not good - it might be there both while job is still in progress and if it finished with error. Looking at log file is not an option for the same reason.

We should use file locks:

FLOCK_FILE="$DST/.lock"

exec 200>"$FLOCK_FILE"
flock -n 200 || exit 200

Limiting script running time

Above is good, but if a script gets "stuck" - then no more backups will be running. That's not good. To fix this, we'll run a small function in parallel to our main script, which will kill it after, say, 30 minutes.

TIMEOUT=30m

function timeout_monitor() {
    sleep "$1" &
    pid="$!"
    trap "kill $pid; exit" SIGTERM
    wait "$pid"
    echo "Had to kill due to timeout. Please cleanup process group $2. Bye"
    kill "$2"
}
timeout_monitor "$TIMEOUT" "$$" &
TIMEOUT_MONITOR_PID=$!
trap "kill $TIMEOUT_MONITOR_PID" EXIT

This function greatly inspired by this answer on stackoverflow.

Note, however, that we use trap instead of putting kill command at the end of the script (last line in the code above), because there are many "exit points" in our script.

Also note that we kill sleep process inside timeout_monitor function (also with trap) to clean it up, too. During normal execution main script will finish much faster than timeout_monitor, so it's better to clean up. If something goes wrong - it's more acceptable to leave hanging processes (note the echo command). We could consider killing the whole process group, but it usually means that kill might kill itself, and I haven't found a way to change process group from bash script itself.

First run

Running the script for the first time (if destination directory is empty) is a rare case, so in order to avoid running the script with an empty directory by mistake let's add an option which would forbid it by default. Also, during first run we should ignore timeout (since first backup usually takes a lot of time, and it's normal) and delete the temporary empty folder at the end of the script.

So we replace test -z "$LAST" && LAST="$(mktemp -d)" line with this code:

ALLOW_FIRST_RUN=0

LAST="$(ls -d "$DST"/*/ 2>/dev/null
if [ -z "$LAST" ]; then
    if [ "$ALLOW_FIRST_RUN" -eq 1 ]; then
        LAST="$(mktemp -d)"
        kill "$TIMEOUT_MONITOR_PID"
        trap "rmdir $LAST" EXIT
    else
        echo "First run not allowed, but [$DST] is empty. bye"
        exit 3;
    fi
fi

Note that here trap ... EXIT command will overwrite previously defined one.

Redirect output

If you're running this script from a cron job, you might want to redirect all output (both stdout and stderr) to a file, and "important" messages (from your script) - to stdout (so cron would send you an email if something goes wrong).

But when running the script interactively, you might want to see all output at your screen.

To do this, add this somewhere at the top (before first echo command):

# redirect output to a file if running non-interactively
if test -t 1; then
    exec 3>&1 &> >(tee "$TODAY.log")
else
    exec 5>&1
    exec 1>"$TODAY.log" 2>&1 3> >(sed "1s|^|$DST:\n|" >&5)

fi

and then, all "error" messages can be redirected to stream 3 in order to be emailed when the script is run via cron.

sed also adds a header - this is useful if one cron task is used to run several backup tasks.

Listing all changes

Running rsync with --verbose parameter shows us copied (added or modified) files, but not deleted. To have a better analysis of changes in your log file, add this command at the very end of the script:

$RSYNC --dry-run --itemize-changes "$TODAY/" "$LAST"

Check if there is something to do - different way

In some cases (when destination is on a slow and noisy disk, for example), we would rather avoid touching it when there is nothing to do. So instead of running rsync in --dry-run mode, we can run it in "list" mode, when it lists all the files and their properties and compare this run with the previous one:

if test ! -z "$FILE_LIST"; then
    $RSYNC $SRC >"$FILE_LIST.new"
    diff -q "$FILE_LIST" "$FILE_LIST.new" &>/dev/null && exit 0
    mv "$FILE_LIST.new" "$FILE_LIST"
fi

Note that we will still need to run rsync --dry-run to estimate size of transfer (and amount of disk space needed).

This is pure optional step - it removes one rsync operation from the target device at the cost of extra rsync operation on the source. You chose what's better!

Cleaning up in multiple folders

If you run this script every hour (or even every ten minutes), they you soon will start deleting some (older) backups, and might want to keep daily, weekly, and monthly versions by simple command:

cp -al $LAST "/backup-daily/$(date +%Y-%m-%d)"

Or maybe you backup from several remote machines to one "backup server".

Anyway, you might end up in a situation when you want to clean from several directories, not only where you copy to.

Most probably, you want to keep same amount of snapshots in all of them.

To do this, add these options on the top of the script:

ALLOW_DELETE=1
CLEAN_DIRS="/backup/*"
PRESERVE_DIRS_LIST="$DST/.preserve"
DELETE_DIRS_LIST="$DST/.delete"
TO_SORT_LIST="$DST/.to_sort"

and replace the simple while check_space; do loop from above with this behemoth:

while check_space; do
    echo need to clean up
    if [ "$ALLOW_DELETE" -eq 0 ]; then
        echo 'Need to clean up, but deleting files forbidden. Clean up manually, bye'>&3
        exit 11
    fi
    # find a dir to delete
    for i in $CLEAN_DIRS; do
        ls -d "$i"/*/ | fgrep -v wip | tail -n $KEEP_DIRS >$PRESERVE_DIRS_LIST
        ls -d "$i"/*/ | fgrep -v -f $PRESERVE_DIRS_LIST >$DELETE_DIRS_LIST
        test -s $DELETE_DIRS_LIST || continue # print nothing if there's nothing to delete
        echo -n "$(wc -l <$DELETE_DIRS_LIST) " # note space at the end
        cat $DELETE_DIRS_LIST | head -n 1
    done >$TO_SORT_LIST
    OLDEST_FILE="$(cat $TO_SORT_LIST | sort -n | tail -n 1 | sed 's/^[0-9]* *//')"
    if test -z "$OLDEST_FILE"; then
        echo 'Deleted all i could, but still not enough space. Buy more disks, bye'>&3
        exit 12
    fi
    # some logging before actual deleting
    echo rm -rf "$OLDEST_FILE"
    echo === $TODAY >>$DELETE_LOG
    cat $TO_SORT_LIST | sort -n >>$DELETE_LOG
    echo rm -rf "$OLDEST_FILE" >>$DELETE_LOG
    rm -rf "$OLDEST_FILE"
    if test -d "$OLDEST_FILE"; then
        echo "Could not delete [$OLDEST_FILE]. Check permissions, bye">&3
        exit 13
    fi
done

Note that this also adds a variable $ALLOW_DELETE which can be set to 0 to forbid deletions. This might be useful if you know that destination should not run out of space - in this case deletion would be considered an error.

Also, it doesn't delete *.log files which are usually kept in same directory - you can delete them periodically by running this script:

for f in /backup/*/*.log; do
    [ -d "${f%.log}" ] || rm "$f"
done

It will delete all *.log files, for which relevant directory does not exist.

Maybe it's not the most elegant way, but deleting log files in the same loop as directories might delete the log file we're writing to itself!

Also it checks that the directory was indeed deleted - otherwise, you might end up in an infinite loop - nasty stuff, happened to me once.

Cygwin considerations

If you happen to have Windows, you most probably want to install cygwin on it, in order to use SSH, rsync, and other good things.

But when using Windows systems, you should be aware of some limitations:

This will be shown in the big script below.

Resulting script (big)

With some debug output added.

#!/bin/bash

test -z "$1" || SRC="$1"
test -z "$2" || DST="$2"

test -z "$SRC" && { echo 'SRC not defined'; exit 1; }
test -d "$DST" || { echo 'DST is not a dir'; exit 2; }
DST="${DST%/}" # remove trailing slash

test -z "$ALLOW_FIRST_RUN" && ALLOW_FIRST_RUN=0

test -z "$ALLOW_DELETE" && ALLOW_DELETE=1
test -z "$EXTRA_PERCENT" && EXTRA_PERCENT=10
test -z "$CLEAN_DIRS" && CLEAN_DIRS="$DST"
test -z "$KEEP_DIRS" && KEEP_DIRS=10

test -z "$NTFS_DST" && NTFS_DST=0
test -z "$IGNORE_23" && IGNORE_23=0

test -z "$TIMEOUT" && TIMEOUT=30m
test -z "$TIMEOUT_SMALL" && TIMEOUT_SMALL=10m

FLOCK_FILE="$DST/.lock"
#FILE_LIST=
DRY_RUN_OUTPUT="$DST/.dry-run"
PRESERVE_DIRS_LIST="$DST/.preserve"
DELETE_DIRS_LIST="$DST/.delete"
TO_SORT_LIST="$DST/.to_sort"
DELETE_LOG=/var/log/backup-delete.log

test -z "$DATE_FORMAT" && DATE_FORMAT="%F_%T"
test -z "$TODAY" && TODAY="$DST/$(date "+$DATE_FORMAT")"
test -z "$RSYNC" && RSYNC="rsync -a --no-inc-recursive --hard-links --fake-super"

# http://stackoverflow.com/a/28930451
function timeout_monitor() {
    sleep "$1" &
    pid="$!"
    trap "kill $pid; exit" SIGTERM
    wait "$pid"
    echo "Had to kill due to timeout. Please cleanup process group $2. Bye"
    kill "$2"
}
timeout_monitor "$TIMEOUT" "$$" &
TIMEOUT_MONITOR_PID=$!
trap "kill $TIMEOUT_MONITOR_PID" EXIT

test "$NTFS_DST" -eq 1 && RSYNC="$RSYNC --no-perms --no-owner --no-group"

function NOT_CYGWIN() {
    test "$(uname -o)" != "Cygwin"
}

IS_FIRST_RUN=0
LAST="$(ls -d "$DST"/*/ 2>/dev/null | grep -v wip | tail -n 1)"
if [ -z "$LAST" ]; then
    if [ "$ALLOW_FIRST_RUN" -eq 1 ]; then
        LAST="$(mktemp -d)"
        kill "$TIMEOUT_MONITOR_PID"
        trap "rmdir $LAST" EXIT
    else
        echo "First run not allowed, but [$DST] is empty. bye"
        exit 3;
    fi
fi

# check that no other copy of this script is running
exec 200>"$FLOCK_FILE"
flock -n 200 || exit 200


# check if there was any change in the source
# this is useful when we don't want to touch $DST
# (for example, when it's on slow HDD)
if test ! -z "$FILE_LIST"; then
    timeout "$TIMEOUT_SMALL" $RSYNC "$SRC" >"$FILE_LIST.new"
    diff -q "$FILE_LIST" "$FILE_LIST.new" &>/dev/null && exit 0
    mv "$FILE_LIST.new" "$FILE_LIST"
fi

# check if there is something to do
# Note that it's still needed even if we have a check above,
# because there are commands below that are using $DRY_RUN_OUTPUT
timeout "$TIMEOUT_SMALL" $RSYNC --dry-run --stats "$SRC" "$LAST" >$DRY_RUN_OUTPUT
grep -q 'Number of regular files transferred: 0' $DRY_RUN_OUTPUT && exit 0

# redirect output to a file if running non-interactively
if test -t 1; then
    exec 1> >(tee "$TODAY.log") 2>&1 3>&1
else
    exec 5>&1 # save original stdout, which gets emailed by cron
    #exec 1>"$TODAY.log" 2>&1 3> >(tee >(sed "1s|^|$DST:\n|" >&5))
    exec 1>"$TODAY.log" 2>&1 3> >(sed "1s|^|$DST:\n|" >&5)
fi

# start logging
echo "started $TODAY"

# TODO: check that target device has enough space at all

echo INFO: check disk space
TRANSFER_SIZE=$(sed '/Total transferred file size/!d;s/[^0-9]//g' $DRY_RUN_OUTPUT)
TOTAL_SPACE=$(df -B1 --output=size "$DST" | sed '1d')
let EXTRA_FREE_SPACE=TOTAL_SPACE*EXTRA_PERCENT/100
let FREE_SPACE_NEEDED=TRANSFER_SIZE+EXTRA_FREE_SPACE

echo $TRANSFER_SIZE bytes in new files
echo $EXTRA_FREE_SPACE extra free bytes
echo $FREE_SPACE_NEEDED total space needed

if NOT_CYGWIN; then # no inode info on Cygwin, sorry
    CREATED_DIRS=$(sed '/Number of files/!d;s/.* dir: \([^ ]*\).*/\1/;s/[^0-9]//g' $DRY_RUN_OUTPUT)
    CREATED_FILES=$(sed '/Number of regular files transferred/!d;s/[^0-9]//g' $DRY_RUN_OUTPUT)
    TOTAL_INODES=$(df -B1 --output=itotal "$DST" | sed '1d')
    let EXTRA_INODES=TOTAL_INODES*EXTRA_PERCENT/100
    let INODES_NEEDED=CREATED_DIRS+CREATED_FILES+EXTRA_INODES

    echo $CREATED_DIRS dirs created
    echo $CREATED_FILES files created
    echo $EXTRA_INODES extra inodes
    echo $INODES_NEEDED total inodes needed
fi

# return status of 1 means "need to clear space"
function check_space {
    FREE_SPACE_AVAILABLE=$(df -B1 --output=avail "$DST" | sed '1d')
    INODES_AVAILABLE=$(df --output=iavail "$DST" | sed '1d')
    echo $FREE_SPACE_AVAILABLE space available
    echo $INODES_AVAILABLE inodes available
    if NOT_CYGWIN; then
        test $FREE_SPACE_AVAILABLE -lt $FREE_SPACE_NEEDED -o $INODES_AVAILABLE -lt $INODES_NEEDED
        return $?
    else
        test $FREE_SPACE_AVAILABLE -lt $FREE_SPACE_NEEDED
        return $?
    fi
}

while check_space; do
    echo need to clean up
    if [ "$ALLOW_DELETE" -eq 0 ]; then
        echo 'Need to clean up, but deleting files forbidden. Clean up manually, bye'>&3
        exit 11
    fi
    # find a dir to delete
    for i in $CLEAN_DIRS; do
        ls -d "$i"/*/ | fgrep -v wip | tail -n $KEEP_DIRS >$PRESERVE_DIRS_LIST
        ls -d "$i"/*/ | fgrep -v -f $PRESERVE_DIRS_LIST >$DELETE_DIRS_LIST
        test -s $DELETE_DIRS_LIST || continue # print nothing if there's nothing to delete
        echo -n "$(wc -l <$DELETE_DIRS_LIST) " # note space at the end
        cat $DELETE_DIRS_LIST | head -n 1
    done >$TO_SORT_LIST
    OLDEST_FILE="$(cat $TO_SORT_LIST | sort -n | tail -n 1 | sed 's/^[0-9]* *//')"
    if test -z "$OLDEST_FILE"; then
        echo 'Deleted all i could, but still not enough space. Buy more disks, bye'>&3
        exit 12
    fi
    # some logging before actual deleting
    echo rm -rf "$OLDEST_FILE"
    echo === $TODAY >>$DELETE_LOG
    cat $TO_SORT_LIST | sort -n >>$DELETE_LOG
    echo rm -rf "$OLDEST_FILE" >>$DELETE_LOG
    rm -rf "$OLDEST_FILE"
    if test -d "$OLDEST_FILE"; then
        echo "Could not delete [$OLDEST_FILE]. Check permissions, bye">&3
        exit 13
    fi
done

echo INFO: main sync operation
$RSYNC --human-readable --stats --link-dest="$LAST" "$SRC" "$TODAY.wip"

STATUS=$?
echo INFO: exit if somethig is wrong
if test ! \( $STATUS -eq 0 -o \( $STATUS -eq 23 -a $IGNORE_23 -eq 1 \) \); then
    echo "something went wrong. Please see log file for details. bye" >&3
    echo "$TODAY.log" >&3
    exit $STATUS
fi

sleep 5
mv "$TODAY.wip" "$TODAY"

echo INFO: list of changes
# note: unlike "main sync operation" above, this will list deleted files, too
timeout "$TIMEOUT_SMALL" $RSYNC --dry-run --itemize-changes "$TODAY/" "$LAST"

echo "INFO: finish at $(date "+$DATE_FORMAT")"

This script is supposed to be universal and called by another scripts like this:

/usr/local/bin/backup.sh /2backup/ /backup-local

Remote backups

Thanks to rsync magic, remote backups are as easy as local ones, with few considerations:

Alternative view

One might want to have an alternative view on the backup tree:

To get them, one could use this script (it creates symlinks, so no space considerations needed):

LAST="$(ls -d $DST/*/ 2>/dev/null | grep -v wip | tail -n 1)"
FILES_LIST="/tmp/interesting-files"

rm -rf /backup-older
rm -rf /backup-prev

time find "$LAST" -type f -printf '%P\n' | sed 's/\([ \o47()"&;\\]\)/\\\1/g;s/\o15/\\r/g' | sed 's!\(.*\)!ls -U -i /backup-local/*/\1!' | sh | sed 's/ /$/g;s!\([0-9$]*\)$/backup-local/\([0-9_-]*\)\(.*\)/\([!/]*\)!\2 \1 \3 \4!' | uniq -f 1 | uniq -f 2 -D >"$FILES_LIST"

#/backup-older
cat "$FILES_LIST" | sed 's!\([^ ]*\) \([^ ]*\) \([^ ]*\) \(.*\)!mkdir -p "/backup-older\3/\4"; ln -s "/backup-local/\1\3/\4" "/backup-older\3/\4/\1-\4"!;s/\$/ /g' | sh

#/backup-prev
cat "$FILES_LIST" | tac | uniq -f 2 | sed 's!\([^ ]*\) \([^ ]*\) \([^ ]*\) \(.*\)!mkdir -p "/backup-prev\3"; ln -s "/backup-local/\1\3/\4" "/backup-prev\3/\4"!;s/\$/ /g' | sh

One might think about using rsync --backup-dir instead, but it has a slight drawback of forever keeping deleted files.

History

This post will be updated periodically. To be informed on updates, please write me (email address at the bottom of this page) a short free-form message.