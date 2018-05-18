Photo by Glenn Fleishman
Roll Your Own Cloud Backups with Arq and B2
It’s surprising Apple still doesn’t offer iCloud backups for macOS. Time Machine requires a separate external drive or partition, making it feel long in the tooth. And it doesn’t help that Apple just killed the Time Capsule (see “RIP: Apple AirPort, 1999–2018,” 27 April 2018). Despite Apple’s commitment to iCloud and the availability of up to 2 terabytes of storage, the company offers no set-and-forget backup option for the Mac. It’s a bizarre omission because Apple has every other piece in place to make it an offering.
Paid cloud services can readily fill this gap, such as Backblaze (a TidBITS sponsor), but you can also now roll your own cloud service at a reasonable price by combining Haystack Software’s Arq backup app for macOS with Backblaze’s B2 on-demand, usage-based cloud storage service. I reviewed Arq for Macworld in March 2017, and found it generally good, although it needs more refinement in its restore process; Arq added B2 support a year ago.
Backblaze B2 competes with Amazon’s Simple Storage Service (S3) and Google Cloud Storage, the two biggest similar firms in the space. All cloud storage companies regularly lower their prices, and a recent price drop from B2 now makes it a reasonable option for your own backup.
This article provides a roadmap for how you can roll your own cloud backup and not give up anything in the process. Expect more options to arise in the future.
Why Build Your Own Solution?
Cloud-based backups predate even the term “cloud” for distributed online storage. Mozy was one of the first in 2005, and Code42’s CrashPlan followed in 2007. (Code42 is in the process of exiting the personal backup business, see “CrashPlan Discontinues Consumer Backups,” 22 August 2017.) The advantage in the early days was not having to manage a server, pay for specific amounts of storage, or find software reliable enough to transfer data routinely and automatically.
The rise of on-demand, usage-based cloud storage and its precipitous price drop since Amazon S3 first appeared make it possible to consider the benefits of rolling your own cloud-backup solution. That would let you control the entire backup process, paying only for ongoing archival storage and downloading data when you need to restore files. Plus, you could manage the security of your archived data through client-side encryption, an area of increasing concern.
Arq makes all of this feasible, and I’ll explain how to set it up in the how-to section below. But first, where should you store your data?
The Best Storage Option for Your Money
Currently, B2’s pricing is cheaper than similar storage from Amazon S3. With its recent price drop, B2 now charges $0.005 per GB per month for storage, and charges only for downloads at $0.01 per GB transferred, which occurs almost entirely when you’re restoring files from a backup. (It’s free to upload data.) Amazon and Google have tiers of service. Their standard “fast access” tiers cost much more than B2 for storage and retrieval, and while their deep-storage options compete more closely with B2, they can still wind up being more expensive for storage or retrieval on restores. (I went into excessive depth about these tiers in “Investigating ChronoSync 4.7 for Cloud Backup,” 22 December 2016.)
B2 support has only recently become widespread in macOS software, which means price and opportunity finally intersect for many users. If you could limit your total archive to 1 TB, you’d pay $5 per month in storage ($60 per year); at 5 TB, that’s $25 per month ($300 per year). For a single machine, most unlimited hosted backup services will be as cheap or cheaper, but for multiple computers, rolling your own could cost less or about the same, as you’re only paying for the total data stored among all your backups. Restoring data costs $1 per 100 GB, so a typical restore won’t cost much.
If you have a lot of data to restore relative to your broadband connection, Backblaze is testing the B2 Snapshot Return Refund Program, which will charge you the standard download fees and then ship you a drive for a refundable fee ($99 for up to 128 GB; $189 for up to 4 TB) and return shipping costs.
You could also save money by using a sync or storage service that you’re already paying for and that has unused capacity:
- Dropbox: Dropbox’s lowest-tier paid service includes 1 TB of cloud storage, and Arq can talk directly to Dropbox’s API. You can use Dropbox’s Selective Sync or Smart Sync to prevent those backups from being unnecessarily synced to a desktop computer.
- Amazon Drive: If you’re paying for 1 TB or more on Amazon Drive, you might have hundreds of gigabytes available, and Arq can store files directly there.
- Server: If you happen to have a real or virtual server at a data center with spare storage and data transfer capacity, Arq lets you transfer via SFTP.
With these prices in mind, let’s look at how to make this happen.
Set up Your B2 Account
Start by creating an account for Backblaze B2 and obtaining the credentials you need:
- Visit the B2 signup page and sign up for an account. (I highly recommend enabling two-factor authentication when prompted.)
- Backblaze includes 10 GB of storage for free, but fill out the Billing section if you want to store more than that immediately.
- Click Buckets on the left, and then click Show Account ID and Application Key, which you’ll need to plug into your archiving app—Arq, in this case.
- In the Account ID & Application Key screen, click Create Application Key.
- Copy both the account ID and the application key, and store them securely, Someone might be able to derive your account ID, but wouldn’t be able to access your stored data without the application key. (Encryption, as described below, also helps protect your data.)
- At this point, you can either choose to create a “bucket,” or you can do it in Arq.
What’s a bucket? You can think of it as a folder in a cloud-storage system. Unlike a folder on your Mac’s drive, every bucket name has to be unique across the entire cloud system! Your backup software can generate one randomly, or you can smash down on the keyboard to create one.
With a B2 account in hand, let’s configure Arq.
Configure an Arq Backup
Arq has a one-time $50 license fee—it includes perpetual updates—and offers a 30-day trial, so you can experiment with it before being locked in. Arq can back up folders or entire volumes from internal or external drives attached to the computer on which Arq runs, or from mounted network volumes, avoiding the need for an Arq license for each backed-up computer. Be aware that it has a stripped-down interface, which doesn’t look much more advanced than a screen-based terminal app, but it’s fairly powerful within those parameters.
To set up your backup, follow these steps after launching Arq:
- Choose Arq > Preferences.
- Click the plus (+) sign in the lower-left corner.
- Select Backblaze B2, and click Continue. (The “Which destination is best for me?” help that comes up offers good price comparisons.)
- Enter your B2 account ID and application key that you set previously, and then click Continue.
- At this point, either name a bucket at this stage—see details above about limitations—or use one you’ve already created. Then click Continue.
Every destination uses the same parameters for encryption (see step 3 below), schedule, budget, and scripts. You can modify all but the encryption parameters by selecting the destination in Preferences, clicking Edit, and setting the options in the Schedule, Budget, and Before and After Backup tabs.
The Budget tab is the most interesting one for managing costs. You set a maximum total size for backups, and as long as it’s larger than a complete set of your files, Arq automatically thins older archived files to keep you within that amount. (It always maintains a single full set of all files, no matter the budget.) With Amazon S3 and others, you can set a maximum monthly dollar amount, which Arq calculates based on Amazon’s rates. Arq can also remove “unreferenced,” or locally deleted, items every 30 days or at a rate you set.
Next, you use the main Arq window to add the folders you want to back up:
- Under Configure Backups, select To B2, and then click Add a Folder to Backups.
- Select a folder. With your startup volume, I recommend picking folders like your home folder and the Applications folder individually to avoid backing up system files and logs. For external drives that don’t have system files, you can select the entire volume as a “folder.”
- When Arq prompts you, set a passphrase for encryption; I’ll explain more about this later. Use a password manager like 1Password or LastPass to generate and store a relatively long passphrase, like 15 to 20 characters. Do not do this by hand because the passphrase cannot be recovered if lost. Arq creates a local file that contains necessary encryption details. A warning reminds you to write down the password, which is poor advice in the modern age—use a secure password manager. The dialog has a single button labeled “I Wrote It Down,” which is unhelpful because at this stage you’ve already set the password and can’t back up to change it.
- Click the folder under To B2 and then click Edit Backup Selections to modify which folders and files Arq will monitor for changes.
As I noted earlier, restoring files is not as simple as setting up backups:
- In the Restore Files section, click From B2, and select your computer.
- Underneath, in the list of backed-up folders, click one of these items to expand it.
- Under the expansion, where Arq lists a snapshot for each archived operation, select a snapshot.
- In the list of available files on the right (which includes the last modification date for each file), select a single item and either click Restore or drag it to a location in the Finder.
Unfortunately, you can restore only a single file or folder at a time; there’s no provision to make multiple selections at once. If there are conflicting items at the restore location, you’re prompted with Do Not Overwrite, Cancel, and Overwrite. But Do Not Overwrite doesn’t selectively replace files. Instead, it creates a nested local directory with the full set of restored files.
Handling Encryption in Arq
Arq uses its own encryption system, relying on standard libraries. Arq’s developer, Haystack Software, documents it fully on its Web site (in a text file!), and notes that it uses an encryption approach similar to the one used by the Git file-versioning system.
Arq transforms your passphrase into a number of encryption keys, which are stored in a local file that’s encrypted directly using your passphrase. While Arq is in use, it keeps the encryption keys available for itself, which is true for all backup software with client-side encryption and decryption.
The encryption keys are never transmitted to a server in any fashion, which is the best behavior if you want the highest level of control over your archived files, and the least possibility that any unwanted party—personal, criminal, or governmental—could gain access to those files.
Backblaze’s consumer backup solution keeps your encryption key private until and unless you have to restore data, at which point it has to be transferred to the company’s servers to decrypt archives and create a downloadable Zip archive of your restored files. It’s not stored permanently, but it’s a point of weakness for someone who could gain privileged access, and one not found in SpiderOak or CrashPlan.
Google and Amazon’s cloud-based server systems also allow encryption, but they encrypt and decrypt on the server side with a user-provided key, so the key ends up out of your control even though the process is designed to be secure.
Arq’s only encryption problem is that its passphrase-entry approach isn’t integrated with anything else, so you must retain a copy in some secure fashion, such as with a password manager like 1Password or LastPass. Haystack Software should consider adding integrations.
Why Not Other Backup or Sync Apps?
You may wonder why I don’t discuss two other popular file transfer apps that support B2 and other cloud services, SFTP, and other connection methods.
- ChronoSync by Econ Technologies ($50 perpetual license, 15-day trial) is a terrific clone, mirror, sync, and archive app that keeps getting better. Unfortunately, it doesn’t offer any client-side encryption options. ChronoSync can use Google and Amazon’s server-side encryption (see above). If Econ Technologies added client-side encryption, it would be a strong competitor to Arq.
- Panic’s Transmit 5 also supports cloud-storage systems like B2 and synchronization, but it lacks scheduling, restoring, and archiving features necessary for a backup solution. It doesn’t offer client-side or server-side encryption.
The Future of Rolling Your Own Cloud Backups
I still wish Apple would provide an iCloud-based backup service, not to put other companies out of business, but to provide a minimum level of archiving that would be easily and affordably available. That would teach everyday users that cloud-backup solutions exist, which could grow the market for independent backup services with more to offer.
Arq and B2 aren’t the perfect combination, but they’re the best option that I’ve seen to date for a combination of control, archiving features, and price. I expect we’ll see more, between CrashPlan’s exit from the market, the growing interest in controlling one’s own encryption, and the drop in cloud-based storage pricing.
Notable Replies
Great article. I’ve been using Arq for years, and storing to B2 with it since September, with three different Macs. It’s reliable and it’s inexpensive.
Since we’re talking about “rolling your own”, what about really doing it by hand? What about a situation where you have off-site ssh access to some box with lots of storage and halfway decent bandwaidth. Is there anything special you’d need to consider if you’d decide to just use rsync to do your own “networked version” of TimeMachine?
If we assume this box has a fixed IP it’s easy. Often that might not be the case though. Then let’s say this is a Mac. Any easy way to exploit FindMyMac so that you could use some kind of generic hostname that would automatically get forwarded to the current IP of your remote Mac? Something like the_name_I_gave_my_Mac.some_generic_Apple_name.apple.com
And assuming that’s not possible, what about exploiting FindMyMac to at least get the current IP of that remote Mac? There used to to be the free OpenDNS with its DNS forwarding daemon, but of course that went all commercial so there’s no more free option there. I would assume FindMyMac must allow doing this some way or another…
No offense to any TidBITS reader, but that’s really beyond the scope of the publication. We have some number of readers for whom it wouldn’t be a big deal, but it’s the kind of thing that could spiral out of control to provide the documentation for, and it’s really more of a Unix-style solution.
Back To My Mac (not Find My Mac, which is opaque to users) has a lot of tunneling and reliability issues that I believe have led people in the past trying to build remote-connection and other services that determined this address to halt development on those products! I remember some AppleTalk bridges, for instance!
I also think there’s an issue of having a graphical front end and being able to use reliable third-party software that’s automated. I just don’t want to recommend generally to people to work at the bare-metal level. And there’s a fair amount of advice on this all over the net, if you’re looking for it.
I’ve been using Arq for about a year (local and cloud backups) since Crashplan started becoming problematic, which was a few months before they abandoned non-business users.
Initially I used B2 but found it somewhat unreliable and would suffer timeout issues.
I switched to using Arq/Wasabi instead and all those issues went away. Cheap too.
Another example of Apple here Apple is missing in action, particularly as stated that Apple has all the components in place. But then Apple has never really got the Cloud as evident by its past web storage and services mishaps and mistakes.
Apple, once a leader in innovation and exploration, now sadly trails the Amazons and Googles.
Glenn, I wonder why, although you mention Amazon Drive in the context of unused capacity, that your piece elevate B2 above it (it is in the title, after all), as Amazon Drive seems like a relatively more straightforward thing for many in TidBITS readership to set up, and is, as I understand it, the same cost regardless of whether or not there is storage capacity unused in the $60 tier or not (ACD is $60/TB/yr = $5/TB/month vs. B2 at $0.005/GB per month = $5.12/TB/month).
Thus, is there another benefit of B2 that causes you to put it first? Durability, transfer speeds, etc…?
A bunch of reasons, some mentioned, some implied.
ARQ has a listing of various destinations and prices at the bottom of a suggestion of which destination is best for you here: https://www.arqbackup.com/documentation/pages/strategy.html
Two things about the question of Amazon Drive vs. B2 (or Wasabi, or AWS, or Google Cloud Storage). Glenn is right; with designations like B2, you pay for what you use rather than a full TB, like with Amazon Drive. I do not use a full TB, so I pay about $3/month for B2 storage. Also, you’ll see in that table that B2 is listed as “Best” speed, Amazon Drive as “Better” (though I haven’t really measured the speed in any sort of test). I should also mention that I have Office365, which comes with 1 TB of storage for the same $70/year fee, of which I don’t use that much file storage, sot I do use it as another ARQ destination so that I have another online location (not of exactly the same folders, but really my most critical files. For example, I don’t back up my iTunes library there because even if B2 fails, I can use iTunes Match as an emergency restoration for that if it comes to that.)
Oh, and there are transmission fees with B2, AWS, and Google Cloud Storage, which you don’t pay with Amazon Drive, OneDrive, Dropbox, etc., but they are low. (B2 charges fees to download at $0.01/GB, so you’d pay when you needed to restore something, plus small transaction fees for writing, deleting, etc., files.) So, the storage of a full 1TB would be slightly higher than 0.005/GB/month, as you’d have to pay for actually writing the files onto storage. B2 seems to compare well with AWS’s inexpensive Glacier storage without the download rate limiting that Glacier does when you need to restore. Also, one more advantage of B2 is that, like Crashplan used to do, you can have them put a snapshot of your data on a hard drive and mail it to you for a fee, which I don’t think Amazon would do with Amazon Drive, if you truly had a disaster and wanted to get a local copy of the data (I believe it is sent FedEx next day.)
Lastly, with the online drive sync services like Amazon Drive, you have to remember to customize options to prevent downloading/syncing all of the folders used for backup data to all of your synced devices. It’s not a huge deal, but you have to remember to do it.
This is harder than you might think, and there’s a lot of things to consider that don’t seem obvious at all when you start.
The first is that there are many files on a UNIX system that you cannot simply copy and expect that copy to work. Mostly database files that are open. So if you sync your /var/db/ you have what you thin is a backup, but is probably not.
Second is versioning. This is doable in rsync, but not easy.
Third is making sure that the local drives are always mounted (and yes, this is a problem on Mac OS more often than you’d think).
Those are the main issues, but there are others (privacy? Are you backing up users mail? How do you secure that?)
I do this myself, backing up my servers via rsnapshot and running some scripts on the server to dump backups of the databases in a format that can be backed-up and scripts to do many other things, but it is not something I would recommend someone try to do as it took me many months to figure out how to get it all to work well, and years later I am still tuning it.
And, of course, this is all shell script unix stuff where the Mac is nothing more than another bash session, there’s no GUI. No Mac-goodness. It’s just Unix all the way down.
(My servers get back-up to my home computer which gets backed up to backblaze with an encryption key so the files on backblaze are secured).
But for remotely backing up a Mac something like what Time Machine does? There just isn’t a roll-your-own solution that works. Hell, having a Synology or Drobo on site mostly doesn’t work because of the frequent “Must discard time machine backup and create a new one” issues which are far too frequent. This even happens with a local dedicated drive, but far less often.
I’m not sure how frequently they keep that updated, but it’s a good guide. I picked B2 as it’s the best choice for most readers, as it’s intended for API-based cloud storage, it has no tiny transaction fees (only storage and download fees), and doesn’t require any real sorting out. Its Web-based front-end is also simple. Amazon S3 is baroque and Google Cloud is complicated but better. (Also, it’s Arq, not ARQ—not an initialism or an uppercase name.)
The tiny fees for GET, PUT, and other operations add up to almost nothing for backup operations (as opposed to continuous interactions with stored data for other purposes, where they can be meaningful).
I can’t find that option for B2. Can you point to an article? They do have an (expensive) upload option for up to 70 TB via a rentable storage unit.
I realized yesterday it’s actually a lot easier than I first thought.
Basically, it only required I set up port forwarding for AFP/SMB. Then the ssh session opens up a tunnel for the remote mount of a disk hooked up to the remote system via afp://localhost:forwarded_port. Then TM to that. Done.
The only thing I haven’t got figured out yet is what to do when the remote IP changes. I have other means of checking on that, but in general I’d think that there must be some way to use Back To My Mac (yeah, that’s what I should have written) as a DNS forwarding service.
It’s on their pricing page: https://www.backblaze.com/b2/cloud-storage-pricing.html (under “data by mail”). Also, just saw this: https://help.backblaze.com/hc/en-us/articles/360001925414 where they say that returning the drive makes the process free (except for shipping) - though it also says that it’s a trial program.
I also just found it here:
That is extremely neat.
Weird, I’d searched their help files. Thanks!
So, B2 promises 99.999999% durability and is in a single data center in Sacramento, CA and Google Cloud Drive promises 99.999999999% and is spread across multiple data centers.
1000x more reliable sounds good in theory, although perhaps the solution for reliability isn’t a more reliable single cloud solution, but rather, a second, entirely redundant (different software, etc…) cloud solution.
Since this is the reliability of a backup, not of the only copy of data, I can’t see it mattering much. The odds of something happening to a backup that has 99.999999% durability at the same time you need to restore data would seem infinitesimally low. And of course, the Internet backup shouldn’t be your only backup, so something would have to happen to both the original and your local backups, plus the Internet backup.
None of my data is so valuable that I’d go beyond three backups (bootable duplicate, local archive, remote archive). Others may have different opinions.
Exactly! Also, I presume based on Backblaze’s history (and that of Amazon S3 and other cloud providers) that their backup of my data is probably 1,000x more reliable than any local backup I make! They do redundancy in such a way that they’re doing backups of my backups.
If the remote backup is remote enough, that’s true. But if I lived in Northern CA, I’d a couple of qualms about the ‘remote’ backup being with BB in Sacramento. Sacramento has a lower earthquake risk than the rest of CA, but it’s not immune. There’s also risk of volcano–Shasta is about due to pop off as are a few others. Ash is amazingly nasty and if the winds are going the wrong way it could easily mess up all of N. CA.
At least BB says where their servers are. Amazon Drive and the other consumer ones I looked at don’t. Probably back east somewhere, but after the disaster is the wrong time to find out for sure.
Is anyone else disturbed that Backblaze’s only option for two-factor auth is insecure SMS? I created a free test account a while back, but couldn’t even play with it because it demands a phone number and sms before letting you have access. I might be able to tolerate that once, but as a second factor, it’s not only insecure (NIST says flat out not to use it), but since I’m anonymously prepaid, if I lose the phone, I’ve maybe permanently lost that phone number and the second factor.
Does Wasabi do something more sensible? I can’t find anything in their help.
I share this concern about SMS-based second-factor, though it’s not a showstopper for me. SMS isn’t routinely interceptable, though it can be, and I think you’re in a small minority of people who would be unable to recover their phone number. Nonetheless, SMS is weak relative to other means!
B2 lets you disable SMS as a backup and switch entirely to TOTP.
I can’t say that I am. At the worst case, if somebody was able to somehow get the username and password I use for the account, all that they would get from it are the names of the computers I use to back up to B2 and be able to download some blobs of pseudo-random data.
They could delete it all. For those of us with slow connections (all too common in the US), that could be a year or three of uploading time.
Thanks. I did look for something like that in their help, but only found sms. BB is now back in the running.
I wish that Arq had filters, or that Chronosync did their own encryption. I might have to daisy chain them to get the most important stuff up first.
That’s fine with me. As I said, I have backups in at least two other locations for everything (locally on an always-on Mac mini, and on OneDrive, and iTunes Match for my iTunes music and iCloud Photo Library for photos.) If B2 was ever that unreliable, I wouldn’t want to stay on the platform anyway, and I’d just switch to Google Cloud Platform or AWS or Wasabi and keep on going. And it took about a week to upload a full backup for me. I’ve already switched from AWS to Google Cloud Platform to B2 over the last few years.
