Andrew Wilkinson

Random Ramblings on Programming

Rsync backups to Amazon S3

with 17 comments

Having recently got married I wanted to make sure all the photos taken at the event are safely stored for the posterity. I thought I’d take the opportunity of making sure that all the rest of my photos are safely backed up, and that any new ones are also backed up without me needing to do anything.

One of the simplest places to keep your backups is Amazon S3. There is essentially an unlimited amount of space available, and it’s pretty cheap. rsync is a great tool to use when backing up because it only copies files, and parts of files that have changed so it will reduce the amount of data transferred to the lowest amount of possible. With S3 you not only pay for the data stored, but also for the data transferred so rsync is perfect. So, how do we use rsync to transfer data to S3?

I won’t go through setting up an Amazon S3 or creating a bucket, the Amazon documentation does that just fine.

The first thing to do is download and install s3fs. This is a tool that uses FUSE to mount your S3 account as if it was an ordinary part of your filesystem. Once you’ve got it installed you need to configure it with your access and secret ids. You’ll be given these when you set up your S3 account. Create a file .passwd-s3fs in your home directory and chmod it so it has no group or other permissions.

Mounting your S3 bucket is simple, just run:

s3fs bucket_name /mount/point

Any file operations you conduct in /mount/point will now be mirrored to S3 automatically. Neat!

To copy the files across we need to run rsync.

rsync -av --delete /backup/directory /mount/point

This will copy all files from /backup/directory to /mount/point and so to S3. The -a option means archive mode, which sets the correct options for performing a backup. -v is verbose so you can see how far it gets while --delete means that files will be deleted from /mount/point if they’ve been deleted from the directory your backing up.

On your initial backup you’ll likely be transferring multiple gigabytes of data, and that will saturate the upload on your Internet connection. This prevents you from doing pretty much anything else until it’s finished, so lets look at limiting how fast the back up runs.

trickle is a very useful program that limits the bandwidth that a single program can consume. We don’t want to limit rsync, because that is running locally, it’s the s3fs program we want to limit so alter the mount command to be:

trickle -u 256 s3fs bucket_name /mount/point

This will only allow s3fs to consume a maximum of 256KB/s of upload, allowing you to continue to browse Facebook while the backup is progressing. Simply change to the upload number depending on how fast your internet connection to get the right balance between a usable connection and the speed of the backup.

To automate the backup just add the two commands to a script file and put it in your crontab like so.

@daily /home/username/bin/s3_backup
About these ads

Written by Andrew Wilkinson

January 14, 2011 at 12:40 pm

Posted in unix

Tagged with , , , , ,

17 Responses

Subscribe to comments with RSS.

  1. Fuse sucks.. we had been using s3fs for 3 days now and twice we had to remount to fix : [Errno 107] Transport endpoint is not connected errors.

    Harro

    January 14, 2011 at 12:55 pm

  2. [...] This post was mentioned on Twitter by Rich Adams. Rich Adams said: RT @andrew_j_w: Rsync backups to Amazon S3: http://wp.me/pkxET-53 [...]

  3. “Create a file .passwd-s3fs in your home directory and chmod it so it has no group or other permissions.”

    Can you provide configuration instructions for the password file?

    Austin Watkins

    June 10, 2011 at 9:36 pm

    • Hi Austin,

      The file should be created with the format “accessKeyId:secretAccessKey”

      Hope that helps,
      Andrew

      Andrew Wilkinson

      August 3, 2011 at 1:38 pm

      • if you have more than one bucket, prefix it’s name like so:
        “bucketname:accessKeyId:secretAccessKey”

        Harry French

        August 1, 2012 at 3:12 pm

    • chmod og-rwx

      Chuck LeDuc Díaz (@celeduc)

      December 21, 2011 at 9:54 am

  4. Hi,

    It didn’t work for me well. The backup time was very long, It alway re upload the modify files all over again.
    Better way is to use your own Rsync server in Amazon. I end up using s3rsymc.com. Modified file are partly upload the my backup is much faster.

    ziv

    ziv

    June 25, 2011 at 3:54 pm

  5. Why not use the rsync argument –bwlimit to limit the bandwidth?

    Peter Risdon

    August 3, 2011 at 12:25 pm

    • Hi Peter,

      The rsync doesn’t know it’s copying the file remotely, it thinks it’s doing a local rsync. You need to limit the bandwidth of the fuse filesystem as that is what is actually transferring the data over the network.

      Andrew

      Andrew Wilkinson

      August 3, 2011 at 1:37 pm

      • I didn’t realise the –bwlimit only worked on remote filesystems. There’s nothing in the manpage to that effect.

        BTW s3cmd is a good tool that’s quite like rsync when used with –synch

        Peter Risdon

        August 3, 2011 at 3:14 pm

  6. [...] Rsync backups to Amazon S3 AWS Amazon S3 s3fs Google Code Page This entry was posted in Ecosystem and tagged backup, debian, linux, s3, sync, synchronization, windows. Bookmark the permalink. ← Great backup & folder synchronisation with Windows Live Mesh [...]

    • Hi Andrew,

      Going back to what we were talking about the other day, s3fs doesn’t currently support server-side encryption out-of-the-box, but someone has created a patch for it which enables it with a command-line option:

      http://code.google.com/p/s3fs/issues/detail?id=226

      From the comments it doesn’t appear to work with RRS at the moment.

      Alister

      May 18, 2012 at 5:55 pm

      • That’s an interesting find. How does the encryption work? Is it encrypted by Amazon or s3fs? If it’s done by Amazon then presumably they can decrypt your data, and if it’s done by s3fs how do ensure you can access your backups from a different machine?

        Andrew Wilkinson

        May 19, 2012 at 9:13 am

  7. Using the -a flag will cause comparison with timestamps of the files at Source and Destination – since S3FS does not support timestamp modification on the S3 bucket – this will cause rsync to find all the files to be different and it will always recopy all the files every time. The solution is to use –size-only and -r instead of -a.
    rsync -v -r –size-only /path/to/sourcefolder/ /path/to/destfolder/
    (-v is for verbose output, -r is for recursive directory copying, –size-only only uses file-size to determine if files are different or not).

    You can verify the difference yourself by doing a rsync dry-run with -n flag (rsync test mode which only lists which files are going to be synced, but does not actually copy any files):
    rsync -n -v -r –size-only /path/to/sourcefolder/ /path/to/destfolder/

    Robert Collier

    August 31, 2012 at 6:46 pm

  8. You should also be using the –inplace flag if using S3FS. Keep in mind that rsync by default will upload to a location prefaced by the hidden ‘.’ then execute a mv. Not knowing this as remote, that means a full download/upload cycle *after* moving a new file in for the mv.

    Brad Quellhorst

    March 19, 2014 at 2:25 am

  9. Was wanting to try this, including the trickle part, on an amazon beanstalk instance but trickle isn’t there and “sudo yum install trickle” fails. Downloaded the trickle source but the configure step fails not finding libevent, yet “sudo yum install libevent” says libevent is already installed. Any tips on how I’d install trickle on amazon linux?

    da

    May 7, 2014 at 5:56 pm

  10. If anybody sees my question above if it helps others I did figure it out – it was simply that the “Extra Packages” repo was not enabled on this beanstalk instance in /etc/yum.repos.d/epel.repo

    da

    May 7, 2014 at 8:38 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 102 other followers

%d bloggers like this: