How You Can Lose a File Despite Three Layers of Backup (and How To Avoid It)
I have preached the gospel of file backups for decades, from floppies through digital tape systems to today’s local and cloud-based systems for continuous archiving of even the tiniest changes to files. Color me a rainbow for how surprised I was a few days ago when I found I had permanently lost the original form of a file that a colleague had shared with me 45 days earlier. And it was my own fault.
How to Lose Data in 45 Days
Reader, I have not one, not two, but three continuous archiving systems deployed:
- Dropbox: I store nearly all my active documents in Dropbox for up-to-the-second uploads of the slightest change.
- Time Machine: Although I came late to the party after years of dubious feelings about its reliability, I eventually added Time Machine backups to my home network. I was glad I did when Time Machine helped recover from two disasters in the last year.
- Backblaze: A stalwart secure cloud-hosted backup provider, Backblaze has rescued several terabytes of data for me over the years. I count on it to keep a deep archive of both current files and those I’ve changed and deleted.
So how did I manage to lose data despite these three backup systems? My colleague shared the file with me by adding it to a shared Dropbox folder. It synced properly in Dropbox since it was copied to my computer. But then I worked on it right away, causing both Backblaze and Time Machine to back up only the version I had edited heavily, not the original.
I have Backblaze set to archive continuously, but it does take some amount of time to recognize new or changed files. As its support document notes:
We designed Backblaze to be lightweight, so it might take 2 hours to reflect new numbers and find your new files. The reason it takes 2 hours is that Backblaze runs VERY SLOWLY on purpose to try to keep the load off your CPU and disk. The result is that it can take up to 2 hours to detect any new files, or new hard drives, or if a file has changed, or a configuration has changed.
Apple schedules Time Machine to run every hour, and Time Machine also tries to keep your Mac from being overloaded. Either I’d modified the file before Time Machine copied it to a drive, or it may have backed up the original but deleted it later to recover space on the backup drive.
You see, a salient fact here is that when I needed the original version of the file, 45 days had elapsed from when my colleague first uploaded the original version. I pay Backblaze for its Extended Version History option on my office Mac, which gives me a year of depth, but given the original version wasn’t captured, that didn’t help.
Time Machine makes hourly snapshots but then prunes them as time passes. As Apple explains it:
After you connect the storage device and select it as your backup disk, Time Machine automatically makes hourly backups for the past 24 hours, daily backups for the past month, and weekly backups for all previous months.
That explanation is a little backward: Time Machine drops hourly snapshots over time on a rolling basis. It retains no more than the 24 most recent hourly backups, one per day for the previous month, and then one per week before that unless it has to start deleting the oldest versions for space reasons. There’s no easy way to know what versions it might have pruned. Regardless, the file I wanted wasn’t available in Time Machine’s backup.
But let’s circle back to Dropbox. The file was still in Dropbox; couldn’t I just pull up an older version? Dropbox maintains older versions of files, but only for 30 days. However, I was paying for Dropbox’s Extended Version History Add-On!
Or at least I thought I was. For many years, I had paid for that option—known as Packrat before the marketing folks got to it. But during the pandemic, I opted to upgrade from Dropbox Plus to Dropbox Family when my spouse needed more space than the free tier Dropbox provides plus some extra storage she’d earned through referrals years ago. The Plus and Family plans both offer 2 TB of storage, and I was consuming only a fraction of my Plus plan’s storage. Plus costs $11.99 per month ($119.98 annually), and Family is $19.99 per month ($239.88). Up to six users can join a Family plan, making the upgrade a good deal for us.
What I hadn’t noticed or brushed off when I upgraded is that the Family plan has no option for Extended Version History. Drat.
A Safer Approach
What should I have done? The same technique we’ve employed for decades at TidBITS and Take Control: use folders and file-naming conventions to create a clear version path that doesn’t depend on local or cloud-based archiving systems.
Instead of opening a file and modifying it directly, first make a copy with an incremented number in its name, and work on that version. Put the previous version in a folder labeled “Old” (nested in a folder with a name that’s more descriptive), and you’re golden. I rely on this technique to manage nearly all my other files. In collaborative situations, we also employ a “Checked Out” folder and add our initials after the incremented version number to clarify who has modified the file.
(One extra tip: despite my long history with the Mac, I had missed the fact that the Finder sorts alphabetically and numerically when punctuation is involved, meaning you don’t need to add leading zeroes. In other words, I learned only recently that a file whose name contains 1.1-9-gf
will sort correctly after a similarly named file that contains 1.1-8-gf
and before one that contains 1.1-10-gf
. Neat, eh? TidBITS wrote about this problem—and the solution that eventually made its way into the Finder—25 years ago in “The Natural Order of Things,” 3 February 1997.)
This system gives you an extra set of supports on top of the belt, suspenders, and duct tape you already have holding up your trousers. If something goes terribly wrong, you typically have a deep well of backups and likely versions of current files. It’s hard to lose work entirely this way and nearly impossible to lose much work.
One slip-up in a life of blameless file management, and that’s where I found myself.
Fortunately, my colleague managed versioning better and was able to send me the unmodified original a few hours after I had come up empty while poring through my archiving systems. And I’ve learned a lesson: archive-based file versioning may be fine, but there’s no shame in relying on old-fashioned manual versioning to back up your backup.
I have used a similar manual versioning process over the years, though more infrequently since I retired.
Glenn, do you, or does TidBITS or Take Control have a nice shareable script that facilitates the process?
DEC’s VMS operating system did versioning for all files. It has been so long, I do not remember the file name structure now. Interesting to hear of a manual alternative.
If you want to manually retain versions of files, you might also consider a version control package like Git. Although designed for source code, it works well for all kinds of files (but may not be very efficient for compressed/binary file formats).
I use it to maintain an archive of the files on my local web server (which are mostly HTML text). But you have to remember to do a
git add
/git commit
whenever you want to preserve the current version. Nothing is automatic here. But it may be more convenient than keeping every version stored with a different file-name.Thanks for sharing this dreadful tale .
I have had a related problem a couple times recently. Somehow I lost or overwrote my newest work on a file. I don’t know how it happened though. But it seems to be related to Dropbox delayed syncing and moving between various devices to edit the same file.
I hate losing work and I did. I’d like to figure out how to reproduce it. But somehow I think Dropbox is destroying my data by getting confused about where the latest copy is.
I’m a big Dropbox fan. But this is a problem. And I’m still pissed that they quietly removed syncing Mac aliases last year. That was not cool and I still don’t have a solution to those handy shortcuts to key folders.
All by hand. Intentional, in fact. We want to have the motor memory of having to do it and having done it.
I kind of treat Backblaze, Dropbox, and Time Machine as my versioning system. Dropbox in particular is extremely good at this — if I hadn’t missed losing the extended history option, this story would never have been written! But it’s less explicit. I like an automatic versioning system (continuous version backup) and an explicit manual one (using folders and naming).
And loving it, right?
Long ago, we used Subversion for TidBITS articles. Even with the integration with BBEdit and automation with Keyboard Maestro, having to update before working and commit after making changes was a royal pain in the butt. Even worse was when we had an actual conflict and had to reconcile it.
It was a huge relief to switch to Google Docs, which saves so constantly that even when something goes wrong (with Google Docs itself, the browser, or the Mac), I usually lose no more than the last few characters I typed. And because it versions everything automatically, there’s never any problem with going all the way back to the beginning.
Obviously, Glenn wasn’t using Google Docs for this work, but it’s one of the reasons we rely so heavily on it for all TidBITS writing nowadays.
Glenn,
Yes, the Finder does sort that way, but other tools (such as AppleScripts or Acrobat when working with multiple files) might not. So be careful.
I still insert leading zeroes, from habit as much as keeping everything neat.
Anyone else old enough to remember file versions on VAX/VMS??? That was a really nice feature.
Timemachine deserves a little blame, backup the original file on open if you haven’t seen it before, thanks timemachine.
Wasn’t default versioning also the hope for ZFS on mac, still not clear why we ended up with APFS instead.
Definitely an important article for everyone to read.
As a tech consultant, I’ve been advising clients to keep multiple iterations of documents (preferably with useful suffixes like your SOP) for decades.
It’s the only way to truly cover your ass.
Of course eons ago this sometimes caused space storage issues, but a non-issue once the age of bigger drives came along.
And I’d like to point out an important detail that I would suggest adding to this article: Cloud storage services like iCloud, Dropbox and OneDrive allow for disk-saving modes, where some files aren’t stored locally. Which means many folders and files may only exist in the cloud and not actually on your hard drive. Though this may not apply to a document you modified in the last 45 days, there is the potential for this to bite you in the ass. Why/how? Because the only real version of these documents exists in the cloud, and nowhere on your Mac. This means that neither Time Machine or Backblaze (or any other backup service or utility) will be able to backup these folders or files. So there’s a very real potential for data loss here.
The best solution to this is, if you have the HD capacity, turn off this feature, so that all the files actually exist in their entirety locally.
I do sometimes forget how little storage we had in the past!
This is a very smart point to make. I wrote a column (and parts of many others) about the danger of this regarding iCloud Photos over at Macworld. If you choose Optimized Storage on all your devices, the only “truth” is the cloud. So if Apple ever has a double-triple disaster, all your high-res media is on their servers and could be lost. I’ve written about strategies to make sure you download your library locally so you have a local copy of the images!
It seems to me it would have been simpler if you had forced an immediate TM backup (Back Up Now from the Apple TM Menu) of a closed file that this file would be instantiated in the TM structure as the base file. All subsequent file close operations would be treated as versions.
BTW - Are you using DropBox’s Sync Online option?
Yes. You got quite a lot of files (each time you save) but it was so much easier to manage your files from the command line. I can’t tell you how many times I went back through versions to retrieve something I needed or to so easily back out of edits. When you work like this, you really depend on it. Drives weren’t as reliable or long-lived, so I made printouts as well.
Unix worked well too for not losing stuff because the user always had mostly full control. The times you do suddenly fully lose an hour or two’s of your own work (or more), you usually can recreate it in perhaps a quarter of the time it took to create it the first time. I think this true on any system.
It’s gotten more important to know how each application works and how the (GUI) OS works. I do not think things have evolved for the better although I realize many do.
If I’m being smart, I generally try to keep the original version of a file, if I didn’t create it.
My fantasy is that every application would incorporate a “diff” function where one could get a listing of changes from two versions of a file, with as much detail as possible. I must have used Unix diff 100,000 times in my life for all types of situations. Sometimes just to get a handle on what was lost. It’s not going to tell you about a destroyed file, but it can be the next best thing.
I’m using GIT https://git-scm.com for local staging and versioning, with the GitFinder GUI (fully integrated in the finder, like, say, Dropbox or iCloud) https://gitfinder.com. I can’t believe that people still add numbers to their files to create versions
cheers
–e.
Sorry Enrico, but for collaborative work I use a similar manual process to Glenn. Generally I include a date in the filename (eg xxx_29sep2021.docx - this format avoids the crazy mix of date formats around the world but I appreciate it does not help sorting files in Finder) and those who send an edited copy back to me add their initials. Redundant versions and feedback go into an “archive” folder, to be trashed eventually.
It is cumbersome, I know, but years of bad experiences with so-called collaborative systems (starting with Lotus!) taught me it is worth the extra effort.
But keep in mind that Time Machine can take quite a while to do even a small backup. So you would be sitting there waiting for Time Machine to complete 100%, just for the sake of being able to work with one document. Doesn’t seem like an efficient use of time.
Making a copy of the original, before making changes, is the safer approach, as you know definitively that you have a copy of the document before modification.
If you use xxx_2021/09/29.docx, the date should be obvious to anyone no matter their local date format, and Finder will sort all xxx’s together then the individual ones by date.
ISO format that I learned about while working Sweden is what I’ve used ever since.
Name_20210929-1833_index.extension
Is it dangerous to use slashes in names? I assume it still is in some places since the slash separates folders/directories.
I use “2021.01.07-12.52.19” for photos or just “2021.01.07 etc” for some files. For readability I prefer the 29 Sept 2021 style and wish it sorted. I have a KM shortcut for the “dot date” I use it so often.
For years I named files “Some name.z” or worked my way back up the alphabet. Helped having the latest at the top of an alphabetic sort. I’m retired now so don’t often need that kind of versioning. Apple seems to have made “Save as” harder to use over the years, so they helped make me quit using that method. Auto-saving is much more robust than it used to be. But I haven’t read the article yet, so all of this may not apply.
I wouldn’t say “dangerous”, but yes, there are some limitations. Adobe products don’t work with them at all (at least the older versions I’m using). They’re stored on the actual file system as colons, so if you work with files in Terminal as well, they’ll show up as colons and you’ll have to use colons if you want to create files (or folders) that show up with slashes in them in the Finder (and Open/Save dialogs, etc.) And if you’re going to work cross platform, you’ll need to worry about limitations or restrictions on other OS’s. If those apply, yes, something like a period as a separator would be better. No separator would work as well, like in ISO formats, but I find those a bit harder to read. The key though, is using yyyy mm dd format to get sorting by date. Beyond that, the separator between the components is what works for you (and your team if working with others.)
I used Dropbox for many years using their feature to retrieve older versions of a document. Even if this was limited to 30 days, I could find again documents deleted by error.
I had to stop using dropbox when my wife and I started sharing the iMac and created an account for each of us. I imagined it would work like Apple’s iCloud Drive shared folders work, but it did not. Once the Dropbox driver was installed for one user, it would produce an error message when the other user logged in. The Dropbox support offered a solution which implied using the macOS unix command line interface, but I am not unix savvy and I did not want to do things which might have had negative consequences for other parts of the system.
I’m horrified but, yeah, YMMV!
–e.
I solved that one by upgrading to a 2 TB DropBox account and we both use it. After 45 years we have few secrets from each other and we just consider it the family DropBox…with his and hers folders inside it. We can see each other’s stuff but just ignore it for the most part. Anything I really wanted to keep secret would just get hidden in 1Password somewhere…she knows that password but would likely not find buried stuff…or in a secondary 1PW vault or an encrypted .dmg file…but I haven’t seen the need for that.
We have 50 GB of data combined, so 15€ per month is a bit much. We survived somehow until it was possible to use Apple’s iCloud family sharing, which comes at 3€ per month for 200 GB while still surpassing our needs. Dropbox’s price may be lower per GB, but as for now we do not need 2000 GB. Since we are both 70, it does not look like we will ever need so much volume space.
Yes, indeed. I was a VAX/VMS Internals guru for years. VMS added ‘;nnnnn’ to a maximum of 65,536 (or 2^16) every time you saved a file. You could run out of numbers if you wrote a recursive CLD script with a bug in it. VMS is the most stable long-running enterprise operating system ever built. Microsoft lost a lot of money when they stole code from it. Windows on the ALPHA chip was the penalty, which only help kill off VMS faster.
VMS is sort-of still available on Intel, but the interrupt stack is too short to really work well. I even gave up supporting the code I published for VMS.
PDP-11 was pretty good, too. It holds the world record for the longest running computer without a reboot. It does one job, that wasn’t worth the effort to replace. By now I hope it has been replace because of the heat budget alone, but I think the thing is still working.
One model of the PDP-11, the PDP-11/35, had core memory, which retained everything even after cycling power. Our system would resume after a power outage doing exactly what it was doing when the power went out!
Slashes are fine if you’re not using cloud storage. But if you are using cloud storage, you might want to avoid slashes. Some cloud services are tolerant of slashes and Dropbox is one of them. But One Drive, for example will refuse to sync when it encounters slashes (and quite a list of other characters, too).
Makes sense to me. OneDrive (and its big brother, SharePoint) are designed so you can mount the remote volumes on a Windows desktop. As such, they prohibit all of the filename characters that a Windows desktop system prohbits:
Not mentioned in this thread is the versioning feature used by many Apple products to revert to previous versions, e.g. in TextEdit see File > Revert To. This gives a Time Machine-like interface to scroll through and revert to past versions. Not sure if this feature is incorporated into any non-Apple product.
Even more potential for file sorting disasters: Windows and Excel etc.
I have shared many sorted files with Windows users which apparently sorts alphabetically, so that messed up the file order I used, especially when we processed file lists in Excel or other programs which may sort again differently.
And most of the Office apps will give you grief with slashes as well. I am fighting a losing battle with people in the office about that-I send out the email about what not to put in a filename and still find recently created files with all those reserved (aka prohibited) characters in them.
Thanks very much the tip about file numbering followed by punctuation - all these year’s I’ve been putting in leading zeros to order files in list when I didn’t have to.
Regarding Dropbox - the Dropbox subreddit on Reddit has lots and lots of stories about accounts being abruptly disabled without notice, years worth of files lost with no way to recover, no way to appeal, no communication with DB. etc.
Will start looking at alternatives soon.
I’ve always started everything by making a copy of the base-start file, unless, as in imaging, the base file remains unaltered. So… that’s good practice.
What I may suggest though and which you perhaps haven’t thought of, is that e.g. Pages, makes a ‘version’ every time you hit ‘Save’. It keeps these versions organised for you, in exactly the same way that Time Machine does and as you might discover, the UI is essentially Time Machine too, allowing you to easily move back through time.
There is an app called ForeverSave that lets you save documents automatically at a specified interval. Unfortunately, it has not been updated since December of 2016 although it still “seems to” work in Monterey public beta.
Document versions are standard for any document-based Mac application. A new version is created each time you manually save a document.
That would be great, but, for example, I don’t see Microsoft Word doing this. Or is it hidden somewhere? I was wondering if any 3rd party vendor had implemented the feature.
This is a very good article, which got me thinking …
I have 3 backups; Daily, Weekly and TimeMachine. So why not also have an Hourly backup?
When I’m working on documents and files, I put them into a folder called “Temporary” and move them into permanent locations as I finish working with them. I set up a new task in Carbon Copy Cloner to back up this folder (and nothing else) every hour. Although it may execute every hour, it’s very quick and just takes a few seconds.
Of course, if I happen to create a new file or document and delete it, all within the same time frame, I will not have a backup, but I figured nothing is perfect.
You can check for (and optionally remove) old versions using VersionsManager. The only document-based applications I’ve seen using versions are Preview, BBEdit, and VMWare (I don’t use TextEdit but it presumably does). Microsoft products don’t seem to, nor do Adobe products, although for both I’m using older versions so maybe the latest versions do.
I haven’t used Microsoft Word for many years, but my memory is that it was frustrating and not very Mac-like. Nisus Writer supports Versions, as do most of the other (document-based) Mac applications that I use and/or develop.
For those interested, you could always consider a real time backup like this one. It monitors a directory and makes immediate copies once the directory is modified.
There was one several years ago called SynkPro which was a beautifully written real time app which had all sorts of options for doing live syncs. Sadly it was discontinued around the time of Sierra (I think) as Apple changed some of the foundations which allowed it to work. It can still be downloaded here but as the company has ceased operation there will be no support. It’s such a shame, great software which we still use in production on some older machines.
Actually, there’s an ISO standard for dates: ISO 8601 - Wikipedia