Skip to content
Thoughtful, detailed coverage of everything Apple for 29 years
and the TidBITS Content Network for Apple professionals
Photo of skyscrapers poking out of the clouds

Photo by Aleksandar Pasaric from Pexels

32 comments

Backing Up VM Image Files to Internet Backup Services

Code 42, makers of the CrashPlan backup service for small businesses (but not individuals, see “CrashPlan Discontinues Consumer Backups,” 22 August 2017), has announced that, starting in May 2019, the service would no longer allow users to back up applications, virtual machine image files from apps like Parallels Desktop and VMware Fusion, and some backup files.

Email announcement from Code42

Code42 explains the change by saying that excluding these files will likely result in smaller backup archives and faster restores, syncs, and backups. That’s obvious, no? If you eliminate large files from a backup, everything will run faster (and Code42 won’t need to add storage as quickly). More troubling is the comment earlier on in the note, which says:

We have always recommended you not include applications or large files in your selection as they may not backup correctly.

Seriously? Code42 is actually admitting that CrashPlan may not back up large files correctly? Isn’t that Job #1 for any backup app?

Needless to say, Code42’s announcement perturbed some users, who notified us of the change. TidBITS member Peter Erbland said, “It seems like it is defeating the purpose of an offsite cloud backup in case of a catastrophic loss of data.”

I was curious about Code42’s performance claims, however, so I checked with CrashPlan competitor Backblaze (which has sponsored TidBITS in the recent past). Yev Pusin, Backblaze’s Director of Marketing, explained that Code42 wasn’t just making an obvious statement about smaller archive sizes improving performance for a few reasons:

  • On initial backup or when a lot of data changes, large files take a long time to upload. That’s to be expected, of course, but what people may not realize is that smaller files can be blocked from uploading during that time, leaving them unprotected.
  • After the initial upload, apps like Backblaze and CrashPlan do block-level data deduplication, which means that they analyze small blocks of each file, compare them to what’s already backed up, and copy only those blocks that are new or changed. It might seem as though large files wouldn’t present a problem after initial backup as long as they didn’t change all that much. However, as Yev pointed out, the resources necessary to analyze all the blocks in a multi-gigabyte file are significant—you need enough drive space to store a copy of the file, and then the backup app has to spend a lot more time and CPU power analyzing all those blocks.
  • On the restore side of the equation, if you’ve been backing up a large VM image for months, with changes happening regularly, and then you need to restore it, you’ll hit another performance problem. That’s because the backup app needs to reassemble all those individually backed-up blocks into the current representation of the file, and the more of those there are, the longer it will take the backup servers to provide your file.

For these reasons, Backblaze also excludes VM image files and other large file types (it also doesn’t back up system files or applications), as you can see in the app’s Exclusions screen.

Backblaze Exclusions screen

So what should you do if you have a large VM image file that you want backed up? In fact, I’m in precisely this situation, since I have Parallels Desktop set up to run a Windows app called HyTek Meet Manager for managing Finger Lakes Runners Club track meets. I use Meet Manager infrequently—just a day or two six times per year—but it’s vital that its data be backed up when I am using it so I wouldn’t have to recreate a meet file if something bad happened.

With Backblaze at least, there are three solutions:

  • Share a Mac folder with Windows in the virtual machine, and store all the essential data in that folder. This solution, which is what I do, is by far the easiest and best since it ensures that all the Windows app data is backed up just like Mac app data.
  • Back up the VM image file as it exists on the drive. Although Backblaze excludes VM image file types by default, nothing prevents you from editing that exclusion list and removing the file type corresponding with your image file. However, it will impact performance in the ways discussed above.
  • Install the Windows version of Backblaze within the virtual machine and set it to back up the important Windows app data. Unfortunately, this approach requires buying another license for Backblaze since it thinks it’s running on another computer. It’s not a very satisfying solution unless you work in the virtual machine all the time and have a lot of data that can’t easily be stored in a shared Mac folder.

As much as it’s easy to think of Internet backup services as being complete backups, they just aren’t. There’s no point in backing up system files in particular because no Internet backup service I’m aware of can perform a “bare-metal” restore after reformatting a Mac’s drive.

(Interestingly, the company that’s in the best position to provide a complete backup service is Apple, since Macs already have Internet Recovery for reinstalling macOS, and iOS devices can back up to and restore from iCloud without needing a computer with iTunes involved. If Apple ever decides it needs more Services revenue, all it needs to do is let Macs back up to iCloud as well and sell a lot more iCloud storage. That would be a significant hit to existing Internet backup services.)

While applications can usually be backed up and restored, that’s an awful lot of data to analyze, transfer, and store in a situation where you can generally redownload with just a few clicks. The Microsoft Office suite is 8.6 GB on its own, and the big four of Adobe’s Acrobat, Illustrator, InDesign, and Photoshop weigh in at 5.6 GB. Apple’s iWork suite of Pages, Numbers, and Keynote is svelte in comparison at 1.8 GB.

But remember, an Internet backup service is only one part of a solid backup strategy, and it should always be seen as the backup of last resort, the one you turn to if everything else is lost. Spending a little time reinstalling macOS and downloading apps isn’t the end of the world—losing your actual data is.

So whether you’re using Backblaze or CrashPlan or some other Internet backup service, always make sure that you also have a local versioned backup made by something like Time Machine and a bootable duplicate created by something like SuperDuper, Carbon Copy Cloner, or Chronosync. The versioned backup lets you recover an older version of a file that became corrupted or was damaged by human error and then saved, and the bootable duplicate enables you to get up and running as quickly as possible if your entire drive dies.

Subscribe today so you don’t miss any TidBITS articles!

Every week you’ll get tech tips, in-depth reviews, and insightful news analysis for discerning Apple users. For 28 years, we’ve published professional, member-supported tech journalism that makes you smarter.

Registration confirmation will be emailed to you.

Comments About Backing Up VM Image Files to Internet Backup Services

Notable Replies

  1. I walked away from CrashPlan about two years ago when they really started messing with things, backups became flaky and they couldn’t fix anything. I went to BackBlaze and Arq. This year I won’t be renewing my BB subscription. The service is good enough, but I never used it to restore anything and having go use a web browser instead of a native app to do it seemed clunky.

    My best goto nowadays is Arq Cloud Backup which is really plug and play. With one fee you can backup as many devices as you want - you are paying a flat $5 per TB of storage.

    I still run TM, but it’s my last port of call to recover anything.

  2. Retrospect can perform a bare metal restore from a cloud backup, at least in theory, but my internet connection is slow enough that I haven’t tested a full restore yet.

  3. The Crashplan persons told me a year ago that I shouldn’t backup applications. That was when their software just wouldn’t do any backup anymore. I had too many files I was told. Head on desk.

    Backblaze also doesn’t want to back up applications. Arq is just too unreliable.

  4. Seriously, does no one read The Tao of Backup (Copyright 1997!) any more?

    It is right there in step one, Coverage:

    The novice asked the backup master which files he should backup.

    The master said: “Even as a shepherd watches over all the sheep in his flock, and the lioness watches over all her cubs, so must you backup every file in your care, no matter how lowly. For even the smallest file can take days to recreate.”

    The novice said: “I will save my working files, but not my system and application files, as they
    can be always be reinstalled from their distribution disks.”

    The master made no reply.

    The next day, the novice’s disk crashed. Three days later, the novice was still reinstalling software.

    Backup everything.

  5. What Crashplan don’t tell you in those “product updates” is that if you were backing up files that they started excluding or even genuine data but in one of their newly excluded folders, that they will just silently stop backing it up.

    And then they’ll prevent you restoring anything from their backups either…

    And there’s no way to change the behaviour or get back anything you’ve been backing up for years…

    I’d not recommend Crashplan any more to anyone, they silently stop backing up files with no warning and no recourse.

  6. Hi

    Can you clarify what you are finding as too unreliable with Arq ?

    Thanks

  7. Arq stopped working a couple of times. When my computer didn’t boot I found out that the last Arq backup was 10 days old. Once a month Arq verifies all files. This takes 2 days or so and during that time no backup is done. I now get an email for each completed backup and can see immediately when the backup stops.

  8. Exactly. Which is why I find these cloud “backups” with all their petty restrictions to be a complete and utter waste of time. Not to mention money.

    You can always move a disk off site. Heck, you can fedex it to another continent and rotate if you want to be really paranoid. And then there’s iCloud, Dropbox, et al. for individual files.

  9. Of course, but there was no such thing as Internet backup in 1997, so The Tao of Backup couldn’t imagine a situation where you’d have such a thing as a tertiary offsite backup made over a bandwidth-constrained pipe. And it was written in a time when you couldn’t download a new version of the operating system and every app on your drive at will.

    (OK, maybe Internet backup was a thing in 1997 since it certainly was in 1998.)

    (And I can’t resist pointing out that I suggested the entire idea of Internet backup in 1992, in an April Fools article.)

    Anyway, as I said in the article, an Internet backup should be in addition to a full versioned backup and a full bootable duplicate. Both of those provide the “everything” backup.

    Anyone who relies solely on an Internet backup service is simply foolish. But there’s nothing wrong with ensuring that you have a highly offsite backup of your data that’s always up to date.

    Sure, you could put a lot of time and effort into moving a drive offsite on a regular schedule, but in the event of a local catastrophe that destroys your versioned backup and bootable duplicate, the extra few hours to install macOS and key applications is meaningless (not to mention the fact that you’re probably not storing another Mac offsite, so you’ll have to spend time acquiring the necessary new hardware in the event of theft or fire or flood). And it’s pretty unlikely that your offsite transfer schedule happens every 15 minutes, so you could lose a lot of work and irretrievable data in the space between your last offsite transfer and the catastrophe.

    I’ll stick with Internet backup, thank you very much.

  10. I totally agree. We found the services to be more trouble and expense then they were worth. Time Machine, iCloud, and via work, Dropbox, have served us better.

  11. Just remember with iCloud that you won’t be able to restore an older version of a file unless you backup locally (Dropbox by default saves 30 days of incremental versions in the cloud I believe.)

    This happened to me just yesterday - an encrypted file in iCloud would not decrypt properly. I backed up the local version of iCloud storage to Arq so I was able to get the last working version back (plus I have a cloned backup from about a week ago; that version would have been fine, too, but I just thought of that right now.)

  12. That’s why we use Time Machine with Time Capsule (and I am still unhappy Apple put the kibosh on this) and some other removable hard drives. We don’t use iCloud to archive everything, we store critical documents, and stuff we’re currently working on or will need to work on, and some photos.

  13. Personally I only use iCloud for stuff like contacts, calendars, and bookmarks. Actual files go onto Dropbox, onto my own home brew backup running on a server I co-own in Europe (rsync over ssh), or onto disk through TM and SD. I do TM and SD both at home and at work and those disks get rotated. That way if my lab or my house burns down I should still be ok. If all of Berkeley is wiped out by The Big One (and I manage to survive [despite 4M dead expected within the first 24 hrs—yeah take that insurance! :wink: ]), I can still get all of my stuff from HDDs I’ve sent to storage overseas and the more recent stuff from my European server.

    Now if both California and central Europe are wiped out and for some reason even the Lab in Japan where I store a HDD is lost, well I assume I won’t survive an event like that anyway. :wink: And if I do, I assume my data will be the last of my worries. :smiley:

  14. “few extra hours”? I don’t think so, grasshopper. More like “few extra days”, or weeks, If you can remember all the apps, settings, add-ons, support apps, etc on your system to begin with.

    I absolutely agree with the previous, “Backup everything”. Else recovery means you restore forever. Namaste.

  15. No, I’ve done this, more or less, on the occasions when I decide that I need a clean start. I install the latest version of macOS, restore my documents and settings and all that from backup (which could all be retrieved from an Internet backup, if slowly), and then reinstall applications one at a time as I need them. It’s only a few hours to be up and running at a basic level, and I download extra apps slowly as I need them. It’s not a big deal and doesn’t take that long.

    But as I said, remember, we’re talking catastrophe here—it’s very unlikely that you’ll be up and running that quickly anyway because your Mac was probably stolen, burnt, or water-damaged along with your local backups. So speed of recovery is already not an issue because most people don’t have a secondary offsite Mac to which they can restore to. And you’re probably more interested in dealing with the catastrophe than getting your Mac up and running again.

    Of course, if we’re talking about a business scenario where downtime is seriously problematic, it would be smart to have a full disaster recovery plan in place, which would include alternative working space, plans for acquiring backup computers, and so on. But that’s a whole 'nother level.

    Local backups, particularly a bootable duplicate, are provide quick recovery times; Internet backups of documents and data protect you from most dire situations.

  16. I’m on the verge of dropping Backblaze.

    I’ve a versioned backup and a bootable clone backup, one in the same building the other in a different building.

    All critical files are on Dropbox or iCloud, 3Tb between them.

    My photography is the biggest concern I have. I’ve got all the JPEGs backed up on Flickr at least, 150k of them, but my RAW and TIFF files are my key files and those are backed up on a local RAID.

    They were also all up on Backblaze. I’ve got 7Tb up on Backblaze all in all. My new iMac arrived, a thing of joy compared to my 6 year old MBPro, but after weeks of trying I cannot get to inherit my backup status. The Backblaze option is now… start again. Rural broadband meant that the original upload took six months or so and there’s been a whole bunch up after that. I don’t know if I can face it.

    I’ve had to load up my MBPro, reattach all my drives and keep the backup and drives active.

    It doesn’t inspire too much confidence, the support person said my backup was either too large or corrupt.

  17. I have way too many backup systems and the Arq verification is annoying, but important. Not noticed it run for more than a couple of hours though myself and you can always cancel it. I use Arq to backup to both a local drive and Wasabi. Then I also use Arq CloudBackup for myself and family. Crashplan was great until I had too many files (or they decided I had too many) and it spent 99% of its time verifying backups and not backing anything up for days.

    Arq CloudBackup is my goto when people ask for advice, otherwise BackBlaze but I’m less happy with them than I was.

    Oh yeah, I back up everything even though I have a clone drive of my system. Storage and bandwidth is cheap and I’ve restored entire apps in the past when an upgrade didn’t go the way I wanted (or Apple started removing features from Preview and I wan’t the old version back).

  18. Yev from Backblaze here -> Tommy, sorry to hear that. The inherit backup feature is definitely intended to make it easier to get back up and running should you get a new machine, but unfortunately at such a large data set we’ve seen that it can fail in some cases. I’ll make sure the engineering hears about this issue and we take a look at it. The rural broadband is definitely a bummer, we say that your broadband should be able to push your backup to us in 30-days or so for best results, so if the data set exceeds what the bandwidth can push, it definitely can put you in a jacked state. It sounds like you have a decent system in place already, I’ll just add that while it may take a while to get your 7TB backed up to Backblaze again, it is nice to have it offsite (not just on the local RAID) - but I definitely understand the reluctance to go through that again.

  19. Thank you Yev, for reaching out, much appreciated.

    I’ll give it another go, and see how I get on. If your engineering team have a thought on what I might try, I’d appreciate it.

    Cheers

  20. For the past year I have been backing up a sparsebundle image of my boot drive (made nightly by Carbon Copy Cloner) to CrashPlan. I see that sparsebundles are not on the new excluded list, but sparseimages are.

    I tested the ability to download the sparsebundle and restore it to a drive with CCC, and it all worked beautifully.

    I told Code42 about this, suggesting it would be a popular use of CrashPlan if they included it in their help documentation. I understand now why they seemed very unenthusiastic about the suggestion!

    I will watch to see if my sparsebundle stops being backed up.

    Of course sparsebundles work by breaking up the image into small bands, which means the amount of data (changed bands) backed up each day is small. Maybe this is why they have not (apparently) excluded sparsebundles.

  21. I have Parallels Desktop set up to run a Windows app called HyTek Meet Manager for managing Finger Lakes Runners Club track meets. I use Meet Manager infrequently—just a day or two six times per year

    Have you considered using WineBottler to run HyTek Meet Manager? This can be smoother and quicker than loading a full VM environment, and it might simplify the cloud backup issues. It’s been some years since I’ve needed to use it, so I can’t remember if it uses disk images in the setup, but my memory is that all data files are ‘normal’ Mac folders and files, so might be a cleaner solution for cloud backup.

    https://winebottler.kronenberg.org

  22. @ace, I also meant to ask where you read the original CrashPlan announcement about this? I’ve been a customer for years, and this is the first I’ve seen about it (though now see others have been discussing it), which is a bit disturbing. :grimacing:

  23. No, I wasn’t aware of WineBottler (though I presume it’s associated with Wine). In this case, reliability is paramount, since we can’t have Meet Manager flaking out on us (for any additional reasons) during a track meet. So I’d be leery of something that might not be seen as “real” Windows.

    Several CrashPlan users sent me the email they received from Code42. I don’t know if there was another public post.

  24. Several CrashPlan users sent me the email they received from Code42.

    I only got mine this week, well after the article, so I assume they are being sent out to users gradually.

    More disturbing, this morning I got a backup status email that says my last full backup was 9.4 days ago and my “selected for backup” is only 11GB… when before it was 900GB. I know I don’t have 890GBs of VM images, so I’m not sure what’s going on and haven’t had a chance to investigate yet.

    But between general Crashplan flakiness (it stops running with no error message or warning) and and their increase in price (I recently went from the $2.50/month initial offer to $9.99), I’m pretty sure I’m going to find another cloud backup solution soon.

    Funny timing how their price increase comes right when the service has gone downhill…

  25. nls

    I also got an email CrashPlan about not backing up applications, but it was a surprise to me to learn that they had been accepting settings from users which backed apps, because from and after the very first time I signed up for CrashPlan years and years ago they always emphasized in writing that users were not supposed to use the service to backup apps. I used SuperDuper for a full weekly clone and Time Machine for continuous incremental backups. It upsets to learn CP has not been enforcing their own rules, and now we who followed the rules may be punished, because perhaps that’s why CP has been slowing everything down on my MBP lately.

  26. nls

    QUOTE: Don’t back up operating system and application files

    The Code42 app isn’t designed to back up system and application files and we don’t recommend adding these files to your backup selection. Doing so could cause issues with the priority and status of other files you want backed up. Additionally, since the Code42 app isn’t designed to download your operating system or applications, there is no advantage to backing up these types of files. UNQUOTE

  27. I’m perturbed that Backblaze excludes .dmg and .sparseimage. Yes, they often contain app installers and system images, but at least for me and all of my users, they can also contain valuable data. Before reliable full disk encryption, they were the only easy to use encrypted containers for sensitive files such as bank statements and contracts, and most of my users still prefer using them in addition to full disk encryption.

  28. @nls . I have had this conversation with Code 42 and I completely agree that backing up the live operating system with constantly changing apps, logs prefs etc is futile. However backing up a daily static sparsebundle image is not futile and is a technically valid way of having an offsite restorable clone of your boot drive. Unlike a VM, a sparsebundle does not have to be backed in entirety for even a small change. Only the changed bands (which are file level) are backed up. I have tested that this works by downloading the sparsebundle and restoring it and booting from it. Code42 support agreed with all this.
    I suggested to Code42 that they could put this in their Help as a great way to use CrashPlan, and they gave me a polite thank you for the suggestion!
    I have not previously been backing up my Parallels VMs to CrashPlan, but when I got the email, I added a small unused one to see if it was backed up, and if so, when it will get deleted. So far both the VM and the Sparsebundle are still being backed up.
    BTW the quickest way of seeing what is actually in your CrashPlan cloud is with the mobile phone app (which is free).

  29. nls

    Thank you, Mike - excellent point! Aside from your test small unused one, have you attempted backing up an actual VM and Sparsebundle to CP? If so, how did that go and how long did it take, etc.?

  30. The Parallels VM is 8GB and my CCC boot drive sparsebundle image is 165GB. They are both backed up to CP. The Parallels VM is not used so doesn’t change from data to day. The boot drive sparsebundle changes by 3-4GB per day normally but the largest daily change currently showing in CCC task history is 26GB. No idea how long they took. I have 21MB/s upload and CP seems to more than keep up, between six hourly backups.

  31. I had to use BackBlaze to recover a large number of videos and image files. I use BackBlaze to supplement other backup drive but bad data corrupted these backup drives. I found BackBlaze restore worked reasonably well with everything I needed being recovered. However the process was very slow despite having a high speed fat internet service. As a non-American I am concerned with the BackBlaze servers being located in the States and therefore I am subject to US (lack of privacy and consumer) laws.

    I thought BackBlaze did back up .dmg and .spareimage containers, but a post here said otherwise. If so then this is probably a dealbreaker for me and will look to move off BackBlaze and go elsewhere. Any suggestions will be appreciated.

Join the discussion in the TidBITS Discourse forum

Participants