Using Amazon S3 to backup Media Temple’s Grid (gs)
Proper backups are like eating your vegetables -- we all say we'll do it and that it is a good idea, but it is so much easier NOT to do it and eat Oreo cookies instead. Don't risk losing your website because you didn't bother backing up.
Proper backups are like eating your vegetables -- we all say we'll do it and that it is a good idea, but it is so much easier NOT to do it and eat Oreo cookies instead. Then you wake up one day, are 25 years old and are a really picky eater and annoy your boyfriend because you won't go eat at the Indian place he loves that doesn't have a menu but only serves vegetarian stuff that scares you. And the people at Subway give you dirty looks when you tell them you don't want anything on your sandwich. Don't risk losing your website because you didn't bother backing up.
Update: I posted a video tutorial that walks through all of these steps here. I still recommend reading through this page because the video tutorial assumes that you will be following these steps.
This a tutorial for creating an automated back-up system for (mt) Media Temple's (gs) Grid Service. Although it will almost certainly work on other servers and configurations, this is written for users who are on the Grid who want an easy way to do automated backups. I personally feel most comfortable having my most important files backed-up offsite, so I use Amazon's S3 service. S3 is fast, super cheap (you only pay for what you use) and reliable. I use S3 to store my website backups and my most important computer files. I spend about $1.50 a month, and that is for nearly 10 GBs of storage.
You can alter the script to simply store the data in a separate location on your server (where you can then just FTP or SSH in and download the compressed archive), but this process is assuming that you are using both the (gs) and S3.
This tutorial assumes that you know how to login to your (gs) via SSH using either the Terminal in OS X or Linux or PuTTY for Windows. If SSH is still confusing, check out (mt)'s Knowledge Base article and take a deep breath. It looks more scary than it really is.
Acknowledgements
I would be remiss if I didn't give a GIGANTIC shout-out to David at Stress Free Zone and Paul Stamatiou (I met Paul at the Tweet-up in March) who both wrote great guides to backing stuff up server side to S3. I blatantly stole from both of them and rolled my own script that is a combination of the two. Seriously, thank you both for your awesome articles.
Furthermore, none of this would even be possible without the brilliant S3Sync Ruby utility.
Installing S3Sync
Although PHP and Perl script exist to connect with the S3 servers, the Ruby solution that the S3Sync dudes created is much, much better.
The (gs) already has Ruby on it (version 1.8.5 as of this writing), which is up-to-date enough for S3Sync.
OK, so log-in to your (gs) via SSH. My settings (and the defaults for (gs), I assume) are to place you in the .home directory as soon as you login to SSH.
Once you are at the command line, type in the following command:
wget http://s3.amazonaws.com/ServEdge_pub/s3sync/s3sync.tar.gz
This will download the latest S3Sync tarball to your .home folder
tar xvzf s3sync.tar.gz
This uncompresses the archive to its own directory.
rm s3sync.tar.gz cd s3sync mkdir certs cd certs wget http://mirbsd.mirsolutions.de/cvs.cgi/~checkout~/src/etc/ssl.certs.shar sh ssl.certs.shar cd .. mkdir s3backup
That will delete the compressed archive, make a directory for certificates (certs), download an SSL certificate generator script, execute that script and create a backup directory within the s3sync directory called "s3backup."
Now, all you need to do is edit two files in your newly created s3sync folder. You can use TextEdit, TextMate, NotePad or any other text editor to edit these files. You are only going to be changing a few of the values.
I edited the files via Transmit, but you can use vi straight from the command line if you are comfortable.
The first file you want to edit is called s3config.yml.sample
You want to edit that file so that the aws_access_key and aws_secret_access_key fields correspond to those from your S3 account. You can find those in the Access Information area after logging into Amazon.com's Web Services page.
Make sure that the ssl_cert_dir: has the following value (if you created your s3sync folder in the .home directory):
/home/xxxxx/users/.home/s3sync/certs were xxxxx is the name of your server.
You can get your entire access path by typing in
pwd
at the command line.
Save that file as s3config.yml
The next step is something I had to do in order to get the s3 part of the script to connect, but it may not be required for all server set-ups, but it was for the (gs).
Edit the s3config.rb file so that the area that says
confpath = [xxxxx]
looks like this
confpath = ["./", "#{ENV['S3CONF']}", "#{ENV['HOME']}/.s3conf", "/etc/s3conf"]
Writing the backup script (or editing mine)
OK, that was the hard part. The rest is pretty simple.
I created the following backup script called, "backup_server.sh" This script will backup the content of the domain directories you specify (because if you are like me, some of your domain folders are really just symlinks) and all of your MySQL databases. It will then upload each directory and database in its own compressed archive to the S3 Bucket of your choice. Buckets are unique, so create a Bucket using either the S3Fox tool or Transmit or another S3 manager that is specific for your website.
This is the content of the script:
#!/bin/sh # A list of website directories to back up websites="site1.com site2.com site3.com" # The destination directory to backup the files to destdir=/home/xxxxx/users/.home/s3sync/s3backup # The directory where all website domain directories reside domaindir=/home/xxxxx/users/.home/domains # The MySQL database hostname dbhost=internal-db.sxxxxx.gridserver.com # The MySQL database username - requires read access to databases dbuser=dbxxxxx # The MySQL database password dbpassword=xxxxxxx echo `date` ": Beginning backup process..." > $destdir/backup.log # remove old backups rm $destdir/*.tar.gz # backup databases for dbname in `echo 'show databases;' | /usr/bin/mysql -h $dbhost -u$dbuser -p$dbpassword` do if [ $dbname != "Database" ]; then echo `date` ": Backing up database $dbname..." >> $destdir/backup.log /usr/bin/mysqldump --opt -h $dbhost -u$dbuser -p$dbpassword $dbname > $destdir/$dbname.sql tar -czf $destdir/$dbname.sql.tar.gz $destdir/$dbname.sql rm $destdir/$dbname.sql fi done # backup web content echo `date` ": Backing up web content..." >> $destdir/backup.log for website in $websites do echo `date` ": Backing up website $website..." >> $destdir/backup.log tar -czf $destdir/$website.tar.gz $domaindir/$website done echo `date` ": Backup process complete." >> $destdir/backup.log # The directory where s3sync is installed s3syncdir=/home/xxxxx/users/.home/s3sync # The directory where the backup archives are stored backupdir=/home/xxxxx/users/.home/s3sync/s3backup # The S3 bucket a.k.a. directory to upload the backups into s3bucket=BUCKET-NAME cd $s3syncdir ./s3sync.rb $backupdir/ $s3bucket:
For (mt) Media Temple (gs) Grid Server users, you just need to change the "site1.com" values to your own domains (you can do as many as you want) and substitute all the places where marked "xxxxx" with your server number (again, you can find this by entering "pwd" at the command line) and with your database password (which is visible in the (mt) control panel under the "Database" module.
Make sure you change the value at the end of the script that says "BUCKET-NAME" to the name of the S3 Bucket you want to store you backups in.
Now that you have edited the script, upload it to your /data directory.
Change the permissions (you can do this either via SSH
chmod a+x backup_server.sh
or using your FTP client to 755.
Now, test the script.
In the command line type this in:
cd data ./backup_server.sh
And watch the magic. Assuming everything was correctly input, an archived version of all your domain directories and all of your MySQL databases will be put in a folder called "s3backup" and then uploaded directly to your S3 server. Next time you run the script, the backup files will be replaced.
Check to make sure that the script is working the way you want it to work.
Automate the script
You can either run the script manually from the command line or set it to run automatically. I've set mine to run each night at midnight. To set up the cron job, just click on the Cron Jobs button in the (mt) Admin area:
and set you parameters. The path for your script is: /home/xxxxx/data/backup_server.sh.
Enjoy your backups!
One note: The compressed domain archives retain their entire directory structure, as such, there is a .home directory that may not appear in Finder or Windows Explorer unless you have invisible or hidden files turned on. Don't worry, all your data is still retained in those archives.
Update (7/27/2008):
If you are getting an error that says something like
Permanent redirect received. Try setting AWS_CALLING_FORMAT to SUBDOMAIN
Add the following array to your s3config.yml file
AWS_CALLING_FORMAT: SUBDOMAIN
The error is either because your bucket is in the EU or there is something else funky with its URL structure. Changing that value should allow the script to perform as intended.
47 people have left comments
Christina said:
Thanks for your original tutorial and article Paul -- it helped me out tremendously when I was setting the whole thing up. And yes, S3 is the bees knees as they say.
George Ornbo said:
Great write up. The Amazon S3 service is perfect for backing up a web server remotely, safely and cheaply. I'm also using it to do off site backups for machines inside the network. For a small business it is a great solution.
Mike Marley said:
Neat article Christina.
/me adds it to list of KB articles that need to be written.
Mike Marley
Senior Technical Support
(mt) Media Temple, Inc.
Ross said:
Nice post....very very detailed. Hopefully you put it in the how to part of the MT forum?
Michael said:
Excellent resource, thank you. One note on my experience when configuring on the gs:
I was receiving errors like "-o is not a valid variable" or something like this, when the script was trying to execute the mysql dump. I changed it to --opt (vs. -opt in your script).
Thanks again!
Christina said:
Thanks for the info Michael! The script actually DOES say –opt, but the way that “code” is displayed on this page didn’t show the dashes clearly (I’ll have to try to change that) — the downloadable script has the correct dashes too. I’m glad that it worked for you and I appreciate the feedback on the –opt thing. I’ll do my best to change the text display now.
duivesteyn said:
I’ve modified this to better suit CPanel based sites with sql support at http://duivesteyn.net/2008/amazon-s3-backup-for-webserver-public_html-sql-bash/
hope it helps someone
Christina said:
duivesteyn said: I’ve modified this to better suit CPanel based sites with sql support at http://duivesteyn.net/2008/amazon-s3-backup-for-webserver-public_html-sql-bash/
hope it helps someone
Oh that's awesome! Thanks for posting your script!
Thejesh GN said:
s3 is amazing. I also use it deliver images and other media files to blog since its fast.
Karl Hardisty said:
Christina,
Thanks for putting in the time and effort to not only come up with the script, but to make it robust/structured/pretty enough for sharing. It's good to know that someone has tested it and put it to public scrutiny, and that it's worthwhile. You've saved me (and plenty others I'm sure) a lot of time.
Much appreciated.
Host Disciple » Blog Archive » Inside the hosting mind of a blogger said:
[...] hosted there (calm down fellas she is dating someone already). Christina wrote a very informative how-to on backing up your MediaTemple GS [...]
Patrick Sikorski said:
Can I borrow your wisdom? I'm getting some weird errors:
warning.com@cl01:~/data$ ./backup_server.sh
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
You didn't set up your environment variables; see README.txt
s3sync.rb [options] version 1.2.6
--help -h --verbose -v --dryrun -n
--ssl -s --recursive -r --delete
--public-read -p --expires="" --cache-control=""
--exclude="" --progress --debug -d
--make-dirs --no-md5
One of or must be of S3 format, the other a local path.
Reminders:
* An S3 formatted item with bucket 'mybucket' and prefix 'mypre' looks like:
mybucket:mypre/some/key/name
* Local paths should always use forward slashes '/' even on Windows
* Whether you use a trailing slash on the source path makes a difference.
* For examples see README.
Not sure where my problem is, do you have any idea?
Awesome article by the way!
Christina said:
Patrick Sikorski said: Can I borrow your wisdom? I'm getting some weird errors:
warning.com@cl01:~/data$ ./backup_server.sh
tar: Removing leading `/' from member names
tar: Removing leading ...
Patrick,
OK, this was a problem I had in the beginning, and I had to change the s3config.rb file so that confpath = ["./", "#{ENV['S3CONF']}", "#{ENV['HOME']}/.s3conf", "/etc/s3conf"] -- make sure that has been changed and try again.
As for the tar: removing leading "/" from member names, that's fine.
Hope this helps!
Patrick Sikorski said:
Did this go as smoothly for you as it did for me lol. Now for some reason I'm getting this.
./s3sync.rb:28:in `require': ./s3config.rb:19: syntax error, unexpected tIDENTIFIER, expecting ']' (SyntaxError)
config = YAML.load_file("#{path}/s3config.yml")
^
./s3config.rb:25: syntax error, unexpected kEND, expecting $end from ./s3sync.rb:28
After you told me about that code, I realized that I didn't copy it right. This is probably something just as stupid.
Christina said:
Patrick,
Did you rename the s3confi.yml.sample file to s3config.yml?
If you did, I'll have to check the codebase (it is possible a new version of S3sync was released since I've written the article) and investigate.
We'll get this working!
This might be the sort of thing I should do a screencast of, from start to finish, to supplement the written guide. Hmm...
Patrick Sikorski said:
Yes I renamed the file. I guess a new version could have been released... but you didn't write the article that long ago. Update your version of it and see if it breaks (backup first lol). A screencast would be cool....!
Christina said:
OK -- a new version has NOT been released, so I'm thinking this is probably as simple as a mis-typed comma or period somewhere.
I'll make a screencast today, going from start to finish.
Patrick Sikorski said:
Awesome, I'll delete everything and start over when you make the screen cast. Thanks!
Video Tutorial: Automate Media Temple (gs) backups with Amazon S3 | www.ChristinaWarren.com said:
[...] This entry was posted on Sun, July 13th, 2008. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site. Previous Entry [...]
Matt said:
The backups are going fine, but the s3sync portion keeps giving me:
Connection reset: Connection reset by peer
99 retries left, sleeping for 30 seconds
Connection reset: Connection reset by peer
98 retries left, sleeping for 30 seconds
Connection reset: Connection reset by peer
97 retries left, sleeping for 30 seconds
Connection reset: Connection reset by peer
...
and so on
Any ideas?
Matt said:
Hmmm... I guess that's normal? I checked my S3 bucket and it had all the files there and in the right size. So are those messages just there to say that it is still working?
Christina said:
Hmm, I don't get any of those Matt -- but if the copies are transferring over correctly, I guess its fine.
Cedric said:
I tried 2 times, but it doesn't work at all for me :
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
Permanent redirect received. Try setting AWS_CALLING_FORMAT to SUBDOMAIN
S3 ERROR: #
./s3sync.rb:290:in `+': can't convert nil into Array (TypeError)
from ./s3sync.rb:290:in `s3TreeRecurse'
from ./s3sync.rb:346:in `main'
from ./thread_generator.rb:79:in `call'
from ./thread_generator.rb:79:in `initialize'
from ./thread_generator.rb:76:in `new'
from ./thread_generator.rb:76:in `initialize'
from ./s3sync.rb:267:in `new'
from ./s3sync.rb:267:in `main'
from ./s3sync.rb:735
Maxwell Scott-Slade said:
I have set it up exactly the same as your blog (btw, thanks for all this) but I also get this error:
tar: Removing leading `/' from member names
tar: Removing leading `/' from member names
Permanent redirect received. Try setting AWS_CALLING_FORMAT to SUBDOMAIN
S3 ERROR: #
./s3sync.rb:290:in `+': can't convert nil into Array (TypeError)
from ./s3sync.rb:290:in `s3TreeRecurse'
from ./s3sync.rb:346:in `main'
from ./thread_generator.rb:79:in `call'
from ./thread_generator.rb:79:in `initialize'
from ./thread_generator.rb:76:in `new'
from ./thread_generator.rb:76:in `initialize'
from ./s3sync.rb:267:in `new'
from ./s3sync.rb:267:in `main'
from ./s3sync.rb:735
Christina said:
OK, so both Cedric and Maxwell are getting the same error. I looked up that error and it appears to be associated with EU buckets. Are either of you using buckets in the EU?
To change this, you need to add this line to your s3config.yml file:
AWS_CALLING_FORMAT: SUBDOMAIN
Christina said:
Ced -- no, I just double-checked. I think the issue is either with using an EU bucket name or something else in the bucket-name not being correct. EU Bucket names cannot contain capital letters (so they are not case-sensitive), whereas US bucket names can.
Make sure your bucket name is correct in the script. I think adding the AWS_CALLING_FORMAT parameter to the yml file will solve the problem.
Christina said:
Ced,
Glad to hear it! I've updated the post with that information in case anyone else runs into the same issue.
Maxwell Scott-Slade said:
Thanks Christina, that's all working fine now. It's so awesome to know that the site is getting backed up everyday to a safe place 100%. It's nice to turn on the email feature in the Cron job section so you know it's all done.
A guide to remember. I never used SSH before, now that I have I feel pretty happy it all works!
Faire un backup de son blog sur Amazon S3 | 64k said:
[...] avec ce service, mais je n’avais pas vraiment pris le temps de tester. Je suis tombé sur un billet de Christina Warren qui m’a décidé, puisqu’il décrit toute la procédure pour faire un backup d’un [...]
Easily Backup Your Entire Website to S3 | HighEdWebTech said:
[...] sent me a great link to a Ruby script that will backup your website and push that backup file to Amazon S3 for safe, [...]
Philip said:
So does this backup everything or just the content that's changed since the last backup?
Links of Interest - CSS-Tricks said:
[...] Warren has a comprehensive and excellent tutorial on creating a Ruby script to back up your entire web server, including databases, and upload them [...]
Christina said:
Philip,
I was having some problems having effective recursive backups, so it's just doing everything. Your comment reminds me to re-investigate the best/most effective way to do it recursively though (it would be like a few character changes to the script), so I'll do that later this week and post an update with my findings. Realistically, your databases are going to be changing more frequently than your actual directories, so you can always set a separate CRON job to run the databases and certain folders every day, and other folders less frequently. That's what I do anyway -- my blogs and databases are backed up daily and a few domains that are just basically image storing for right now get updated once a week or once a month.
Matt said:
The backup part works great for me, but not the s3sync. Cron job doesn't even bother with copying anything over. When I do the site copy to S3 manually, it usually dies after copying just a few sites. Wish I could get that part working as that is the important part!
Christina said:
Matt,
Are you getting an error of any kind? About how much data is being copied over before it dies? I've successfully copied over more than 500 megabytes before using the script (a test case and also a backup of some photographs I uploaded to my gs temporarily when at my parent's house). Let's see if we can figure out why it isn't working.
Matthew Barker said:
Nope, not getting any errors that matter it seems. Earlier I reported this error:
Connection reset: Connection reset by peer
99 retries left, sleeping for 30 seconds
98 retries left, sleeping for 30 seconds
...
But that doesn't seem to really matter. I am backing up 10 sites, with only 1 being larger than 500 MB; the gzipped/tarred file is currently 1.3 GB in size. The odd thing about all of this is that sometimes everything works when I do it manually, but that is only sometimes. It generally quits when transferring the 1.3 GB file to Amazon, with no error messages encountered. But with the cron job running, it sometimes quits when tarring the 1.3 GB site, but generally tars everything just fine, but doesn't transfer a thing. That's the hard part about trying to troubleshoot this problem; sometimes it works, sometimes it doesn't; and when it doesn't work, it doesn't die at the same place every time.
Peter said:
I was hoping I could download your full backup script and customize it to my needs, but it looks like access it denied on the file from S3. Is there any chance you could make it public?
Marcus McCurdy said:
Thanks for this great write up. I went through it last night and it worked like a champ and I learned a thing or two in the process. This really makes backing up a media temple site a breeze.
Christina said:
Matt,
I remember you had that error. I'll do what I can to investigate why this seems to not be working. It might be necessary to create two separate Cron jobs - one for the biggest site, one of the others - to see if that is an acceptable workaround.
Peter -- OK, try it now -- I changed the URL structure. It was working fine, I'm not sure what could have changed. If you still have issues, let me know.
Marcus -- yay! I'm so happy this worked for you!










Paul Stamatiou said:
S3 is great huh? That s3config.rb path thing must have been a change with the recent s3sync as I had to do the same thing when I just upgraded.. didn't have to do that when I had published it. Anyways, thanks for bringing that to my attention, I've updated my article.