Backup (and Backup Management) is a Big Deal and should not be taken lightly.
There are many paid-for solution of various qualities. There are also various backup strategies. I won’t cover those but I’ll just say that you should have both an on-site and an off-site solution. On-site for frequent backups (and high-speed transfers) and off-site in case, you know, your site burns down (or, less dramatically, if somebody steals your backup drive). In all cases the data should be encrypted even before the transfer with a key known only to you.
For my off-site solution I use an external drive where I backup the files monthly and hand it over to a friend but, while one of the best solution, it’s a little bit of a hassle. So instead I wanted to backup into the proverbial Cloud and started looking for programs (free and/or open source but also paid for). I tried those:
- S3 Backup: AWS kept returning an error message with regards to the Cipher Suite used. I couldn’t start any back up.
- Cloud Berry: Great features and reasonable price ($30) but horrible UI and it didn’t seem to run in the background at all, or be able to stick to the tray, so no go.
- Arq: Looks horrible especially for the price they ask ($50)
- Duplicati: Tried the 2.0 beta and kept running into errors. That’s expected from a beta but I wanted a stable solution.
- Carbonite: It ignored my audio files and was generally too picky with which files it accepted to back up. I couldn’t find a way to specify “grab everything” so I didn’t like that.
- restic: The one I picked.
restic is great
I tried restic because a friend I trust a lot on such matters insisted several times that it’s really good. The fact that it was all CLI bothered me because I wanted a built-in scheduling capability. But I found a workaround. Let’s first introduce the software.
restic is a free, open source software written in Go. Its goal is to be fast, efficient and secure. No code audit has been executed against the source as far as I know but the References page of the documentation shows that a great deal of thought was given to the design. People should read that page but, in short, they use strong encryption, signatures and hashes a lot. There can always be implementation errors leading to security flaws, I guess, but at least the design seems solid.
restic is all CLI. You grab the binary on their GitHub repo (since it’s written in Go there are native binaries for tons of different OS and CPU architectures), throw that in your OS’ PATH and off you go.
The workflow is pretty simple. You initiate a “repository” (a target for the backup) and provide an encryption password for it. restic creates the file hierarchy it needs to operate and the repo is ready. The repo can be a local drive (however you mount it – could be SATA, USB, SMB/CIFS, iSCSI, …) or a remote service (Amazon S3 or S3 API-compatible services, OpenStack Swift services, SFTP, or BackBlaze B2). It makes no difference as far as you’re concerned, restic operates the same.
Let’s init a repo on a local drive. I’m using PowerShell on Windows 10 Pro but the commands are identical on all OSes:
PS E:\restic> restic init --repo E:\restic\ enter password for new backend: **** enter password again: **** created restic backend 4a37b462c0 at E:\restic\ Please note that knowledge of your password is required to access the repository. Losing your password means that your data is irrecoverably lost.
This look less than a second to execute. Remote repositories take longer because of the network round trips but they don’t take more than 2-3 seconds.
Now we can backup some files and target that repo:
PS E:\restic> restic -r E:\restic\ backup C:\Users\Arnaud\bin\ enter password for repository: scan [C:\Users\Arnaud\bin] scanned 2 directories, 7 files in 0:00 [0:01] 100.00% 28.929 MiB/s 28.929 MiB / 28.929 MiB 9 / 9 items 0 errors ETA 0:00 duration: 0:01, 19.54MiB/s snapshot b18aa7fa saved
I’ll add a file to my bin directory and run the command again.
PS E:\restic> restic -r E:\restic\ backup C:\Users\Arnaud\bin\ enter password for repository: using parent snapshot b18aa7fa scan [C:\Users\Arnaud\bin] scanned 2 directories, 8 files in 0:00 [0:00] 100.00% 0B/s 28.934 MiB / 28.934 MiB 10 / 10 items 0 errors ETA 0:00 duration: 0:00, 215.77MiB/s snapshot 47481576 saved
We can see that the extra file has been grabbed. But what’s really interesting is that restic tells us it’s using a “parent snapshot.” What’s that?
According to restic’s documentation, this is what a snapshot is:
A Snapshot stands for the state of a file or directory that has been backed up at some point in time. The state here means the content and meta data like the name and modification time for the file or the directory and its contents.
restic hashes the content of the file and use this as a basis for comparison. But it doesn’t do the whole file at a time ; files are sliced into (encrypted) blocks. When restic compares it’ll easily detect which bits of a file changed by comparing the snapshots (essentially a diff). That means the integrity of the files is checked but also that only the blocks that changed need to be sent. All the while keeping everything encrypted even before transit. This is fantastic.
As the documentation explains there are snapshots for files and directories but also for the whole repository. It’s possible to list them like so:
PS E:\restic> restic -r E:\restic\ snapshots enter password for repository: ID Date Host Tags Directory ---------------------------------------------------------------------- b18aa7fa 2017-08-24 14:34:42 CLAVAIN C:\Users\Arnaud\bin 47481576 2017-08-24 14:35:50 CLAVAIN C:\Users\Arnaud\bin
We can see all the snapshots for that specific repo. Of course you can roll back to previous snapshots etc. This is very much like Git or maybe even Docker. I should add that it’s entirely possible to save multiple source directories into one common repository. If I wanted to add my Desktop I could:
PS E:\restic> restic -r E:\restic\ backup C:\Users\Arnaud\Desktop\ enter password for repository: scan [C:\Users\Arnaud\Desktop] scanned 1 directories, 3 files in 0:00 [0:00] 100.00% 0B/s 9.264 KiB / 9.264 KiB 4 / 4 items 0 errors ETA 0:00 duration: 0:00, 0.09MiB/s snapshot 2389bc9c saved PS E:\restic> restic -r E:\restic\ snapshots enter password for repository: ID Date Host Tags Directory ---------------------------------------------------------------------- b18aa7fa 2017-08-24 14:34:42 CLAVAIN C:\Users\Arnaud\bin 47481576 2017-08-24 14:35:50 CLAVAIN C:\Users\Arnaud\bin 2389bc9c 2017-08-24 14:45:12 CLAVAIN C:\Users\Arnaud\Desktop
And you can also backup both source destinations at once by simply appending the extra directories:
restic -r E:\restic\ backup C:\Users\Arnaud\Desktop\ C:\Users\Arnaud\bin\
My “real” set of files to back up is a little over 400 GB. Even when nothing is changed between two backups, restic still has to process all those files (precisely to check if anything as changed). This operation takes about two minutes and a half on my fairly decent CPU (Intel i5-4690K @ 3.5 Ghz). This is just the time it takes to split, re-hash and compare the files. I feel this is pretty good, 2:30 min for 400 GB.
One might have the best backup solution in the world, it’s useless if the restoring doesn’t work well. As a reminder, you should simulate a data loss and test your restore process on a regular basis (I myself am guilty of not doing this nearly enough).
I deleted two files from my desktop and will now ask restic to restore them.
PS E:\restic> restic -r E:\restic\ restore 2389bc9c --target C:\Users\Arnaud\ enter password for repository: restoring <Snapshot 2389bc9c of [C:\Users\Arnaud\Desktop] at 2017-08-24 14:45:12.863773 +0200 CEST by CLAVAIN\Arnaud@CLAVAIN> to C:\Users\Arnaud\ ignoring error for C:\Users\Arnaud\Desktop\desktop.ini: OpenFile: open \\?\C:\Users\Arnaud\Desktop\desktop.ini: Access is denied. There were 1 errors
There was an error because desktop.ini was locked by Windows. Other than that the other files were restored. Note the –target argument. It’s mandatory. I wish restic would restore to the location saved in the snapshot (“C:\Users\Arnaud\Desktop” as we saw before when listing the snapshots) if the target location is omitted.
Now comes the little downside of restic compared to other solutions. There’s no scheduling included since restic is just a CLI program. Under Linux it’s pretty easy since users can just crontab a script. Windows can do the same but it’s a bit more convoluted than on Linux. We must do the following:
- Write a simple PowerShell script
- Put the repository’s password into a text file (*)
- Create a Task in the Windows’ Scheduler to execute the script
(*) Yes, we’re putting a password in clear-text in a text file. restic’s assumption is that its host system is safe ; it’s therefore fine to store passwords there in files or environmental variables. The encryption is used to ensure nobody on the remote repository can decrypt the files (since nothing can prevent the hosting provider from snooping if they are shady and want to) ; the files are already on your system anyway so having the password is moot.
My script is C:\Users\Arnaud\restic-usbdrive.ps1 and its content is just the backup command from before, plus an extra flag to point to the password file:
restic -r E:\restic\ -p C:\Users\Arnaud\.s1kr3t\restic-usbdrive backup C:\Users\Arnaud\Desktop\ C:\Users\Arnaud\bin\
So of course I need to create that “restic-usbdrive” file now and put the repo password in it. Now we just need to go to Window’s Task Scheduler and set a basic one up.
Choose the options and frequency relevant to you but make sure you pick “Start a program” as the task to perform. Then simply point to your script:
Windows allow for a lot of conditions to run tasks – the frequency but also on events like when you log on or off. It can prevent duplicates, so if the task is long-running another one won’t stop. It can start the task if a scheduled event was missed (for instance the computer was in sleep ; but it can also wake it up if you so choose).