Backup and Restore
Very simple backup (and restore!) script(s). Since I tend to run the backup at the end of sessions, and just leave it running, it does not matter how inefficient and slow it is, but once it gets going, it should not expect me to answer questions.
Not being able to ask questions is something encryption tools are unfond of - they don't generally like passwords/passphrases to be provided from anything but the keyboard. However, my script requires the passphrase to be provided three times (once for validation, once for encryption, and once for the test of the restore). The script takes a while to run (see the sample run below...), and I don't want to sit there and wait for it to require my password yet again.
The backup script produces one file, that I can keep in whichever safe place I please: USB drive, online storage, or even in my home directory, as a safety net when I'm making changes that I might want to undo.
Objectives
- Why roll my own? There are many solutions to this problem out there, but I wanted something that was "one-click" (actually, "single-command"), low-tech, (vendor- and tool-) independent, and with few dependencies on the environment - neither a network connection nor anything more than a simple console should be required. It can't get much simpler than bash, tar and bzip2.
- Adding GnuPG encryption made things quite a bit less simple, but seeing as the purpose of the backup is to keep it outside of my encrypted home directory, encrypting the backup really is a must.
- A simple text terminal has to be sufficient to run the backup. This might be all that is available if the system breaks unexpectedly, in which case, backing up data is the first thing to do.
-
The restore has to work, no matter what happens. With a simple archive / compress / encrypt strategy, a manual restore is no problem, all that is required is the backup file and appropriate GPG credentials - the restore script is more convenient, but not necessary.
Saying that "the restore has to work" sounds obvious, but it's not. I have worked on several projects where regular backups were considered an obvious must, but nobody bothered doing regular test restores. This makes for interesting times when something goes wrong, and a backup exists, but the restore is not working...
- Every possible effort should be made to ensure that no errors go undetected during the backup.
- Last but not least, the script must be testable. This was one of the reasons why GPG was a good solution for encryption - although discouraged, it let's you script everything if you insist.
The Backup Script
The essence of the script is ( pseudo-code):
tar rvf tar-file files-to-backup bzip2 tar-file gpg --encrypt tar-file.bz2
Rigorous input parameter checking, internal sanity checks, and validation of the output make the script balloon up to 150+ lines.
- Check GPG credentials
- Since the script is likely to run for a while before actually needing the credentials, it starts out by checking that the credentials will work with GnuPG to encrypt/decrypt a file. No reason for the script to first create the archive and then fail because there was a credential typo.
- Check that the specified files to be archived ( backup.txt) exist
- Also, to fail-fast, the patterns specified in backup.txt are checked, to make sure they expand to actual files. A typo in that file could mean that important files are not included in the backup, which would be a bad thing.
- Create the archive
- Add all specified files to an appropriately timestamped tar-file.
- Compress the archive
- Use bzip2 to compress the archive. This is a separate step to facilitate swapping in another compression-tool.
- Check the compressed file
- Check that the compressed file is OK. There is no reason why it wouldn't be, but better safe than sorry.
- Encrypt the compressed file
- Since the backup can be stored pretty much anywhere - it's just a file, that's the whole point - encryption is necessary..
- Perform a test restore of the encrypted file
-
Once the compressed archive has been encrypted, the backup itself is done. However, it is crucial that the restore script works, but I expect to be using it a lot less than the backup script. Hence, I decided to make validation of the restore procedure part of the backup procedure. This slows things down considerably, which, in this case, is worth it.
After running the restore, the script checks that the restored version of a file which is known to be in the archive (the backup script) is identical to the original.
- Cleanup
- Delete any files that have been generated and are no longer needed.
Errors are written to stderr.
Progress info is intentionally kept sparse. The System V banner utility is used to provide regular feedback which is readable from far away. This is practical since I tend to do other things away from my machine while the backup is running.
Input: backup.txt
This file contains patterns matching the files that will be included in the backup:
Documents .mozilla-thunderbird .bashrc .m2/settings.xml
These can be any pattern that expands to filenames in the directory where the Backup Script is run, the only limitation is that all patterns must match existing files - otherwise, the backup script will fail. This is a simple safeguard to avoid typos in a pattern causing files to not be backed up.
Arguments: GPG Credentials
The script optionally accepts the GPG username and password (a.k.a. "passphrase") as commandline arguments.
This is a needed since I want to write automated tests for the scripts - there's just no way to automate testing of a script that prompts you for a password.
Convenience sometimes beats security - it is generally a bad idea to pass passwords as arguments, and unnecessarily storing them in variables is to be avoided as well. In this case, I only pass the test credentials as arguments (making sure they don't end up in the bash history, of course).
If the credentials are not given as arguments, the script prompts. The advantage of having the backup script prompt rather than leaving it up to GPG is that my script only prompts for the credentials once. They are then saved, quite unsafely, to memory, for the duration of the backup. The convenience of only having to enter my passphrase once, while still ensuring that I didn't mistype it, was just too good to pass up. GPG would only be prompting after the backup archive was created, which might be hours after the script was launched. One typo, and it would be back to square one.
The Restore Script
This is simply the Backup script in reverse order (also pseudo-code):
gpg --decrypt backup-file.tar.bz2.gpg bunzip2 backup-file.tar.bz2 tar xvf backup-file.tar
There's a lot less sanity checking and validation than in the Backup script- either the restore works or it doesn't.
Errors and progress info is written to stderr.
The Automated Test Script
Once you get used to unit testing, it becomes uncomfortable to write code without testing it. It is just so much more efficient to have a lean little test that will tell you whether your code is working. I experimented with bash unit testing on the molk.ch build project, and it worked really well.
Hence, when writing something as important as the script that will ensure that my data does not go away if my system does, automated testing was not optional. My actual backup is quite big, and while fiddling with the gory details of making the backup script encrypt the backup file, it was very efficient to have a test script which gave me feedback within seconds of making a change.
A major advantage over just backing up a couple of files manually on the commandline - which would have been another way to get quick feedback - is that the tests also work as executable documentation. The next time I modify the script, which might be years from now, the tests will tell me why the script works the way it does.
The coverage of the test script is far from perfect. Some of the more exotic things that can go wrong, like the compressed file being corrupted, I did not bother to reproduce. The tests covers the most likely scenarios - the important part is not to test everything, but to test the parts that matter.
The files
These are the files making up the backup/restore solution:
- backup.sh
- This is the script that performs the backup.
- restore.sh
- This is the script that performs the restore.
- backup-test.sh
- Verifies that the backup-restore scripts are working.
- backup.txt
- This is the file containing the patterns describing the files to backup.
Sample output
The following is a sample run of the backup script ( $ is the prompt, emphasized text is typed by the user):
$ ./backup.sh test1 test1 Backing up: backup.sh backup-test.sh backup.txt backup-test.key restore.sh .bashrc # # # ##### #### # # # # # # # # #### # # # # # # # # # # # # ## # # # # # # # # ###### # # # # # # # # ####### ##### # # # # # # # # # # # ### # # # # # # # # # # # # # ## # # # # # # #### # # # ## # # # #### Creating archive ml-backup-2010.11.16-21.19.tar... Adding 'backup.sh' Adding 'backup-test.sh' Adding 'backup.txt' Adding 'backup-test.key' Adding 'restore.sh' Adding '.bashrc' Created ml-backup-2010.11.16-21.19.tar: 20K ####### # # ##### ##### # # # #### # # # # # # # ## # # # # # # # # # # # # # # # # ##### ##### # # # # # ### # # # # # # ## # # ####### # # # # # # #### Compressing ml-backup-2010.11.16-21.19.tar to ml-backup-2010.11.16-21.19.tar.bz2... ml-backup-2010.11.16-21.19.tar: 3.181:1, 2.515 bits/byte, 68.56% saved, 20480 in, 6439 out. Testing compressed file... Created ml-backup-2010.11.16-21.19.tar.bz2: 6.3K ####### # # # #### ##### # # ##### ##### # # # #### # ## # # # # # # # # # # # ## # # # ##### # # # # # # # # # # # # # # # # # # # # ##### # ##### # # # # # # ### # # ## # # # # # # # # # ## # # ####### # # #### # # # # # # # # #### Encrypting ml-backup-2010.11.16-21.19.tar.bz2 to ml-backup-2010.11.16-21.19.tar.bz2.gpg... Created ml-backup-2010.11.16-21.19.tar.bz2.gpg: 6.9K ####### # ###### #### ##### # # # #### # # # # # ## # # # # ##### #### # # # # # # # # # # # # # # # ### # # # # # # # ## # # # ###### #### # # # # #### Testing restore of ml-backup-2010.11.16-21.19.tar.bz2.gpg... ##### # # # ###### ## # # # # # #### # # # # # ## # # ## # # # # # ##### # # # # # # # # # # # # # ###### # # # # # # # # ### # # # # # # # ## # # ## # # ##### ###### ###### # # # # # # # ####