Why Do Simple Updates Require Big Downloads?
[Editor’s note: I was foolhardy enough to ask Michael Ash, a software engineer at Rogue Amoeba, why the company kept pushing me 10 MB updates of Airfoil for every micro-release. After reading my comment on the company’s blog entry about an update to their products, Mike sent me an expanded explanation that slightly boggled me. He agreed to adapt it into this article, which explains why we download so much. Consider this a look inside the sausage factory of software development. -Glenn]
It would be nice to push out just tiny update packages to our customers when we make minor updates to our software, but it’s not practical. Because we’ve built the Sparkle software updater into our system, we’ve traded a lot of ease-of-use for our users and ourselves for the extra overhead. It works out in the end. There are three approaches for integral software updaters that we could have taken, and we chose the last of them.
At the very bottom of the options, you have true binary “diff” updates, where only the changed portions of the changed files are included. This can be done by preprocessing the differences between the new version and the previous version, but this approach is unreliable and tough to pull off well. On Leopard, adding an unsigned program to the firewall list will alter the binary, causing problems for any such delta or difference-based updater. The same is true for any other modification or inadvertent corruption to the program, as the updater is now trying to apply changes to a file which doesn’t match. An alternative approach is to add more server-side smarts so that the updater computes the differences on the fly using checksums – a
kind of shorthand that uniquely identifies a stream of data – like the Unix utility rsync does. This ensures that you always end up with what we have. But more server smarts means more server resources and maintenance. At this point, carrying out updating using plain HTTP stops working and you have to use fancier protocols, which means more points of failure and more cases in which users need help.
As an intermediate level, you have file-granular updates, where the updater downloads only changed files. I’ve personally written two such systems at other companies, and they work decently well. The server gives the application the capability to download each file individually, something which can be done with a regular Web server, and a list of files and checksums. The app compares the checksums against what it has stored locally, downloads anything that has changed, and you end up with an updated program. The problem with this approach is that the largest files in an application are also those which are virtually guaranteed to change with any new build: the actual program binaries themselves. This intermediate approach saves you
from having to re-download any resources which haven’t changed from one release to the next, but the savings aren’t as big as you might hope.
And then at the end you have whole-app updaters such as Sparkle, which is what we use (for more on Sparkle, see “Sparkle Improves Application Update Experience,” 2007-08-20). The Mac developer community seems to have more or less standardized around Sparkle these days. I’m amazed at how often I open an application and find that it’s using Sparkle to keep itself up to date. Aside from the programs where I implemented it myself, I don’t recall the last time I saw an application using a more granular updater. Even Apple seems to publish monster updates for their applications. Apple does use more granular packages for sequential updates to the operating system itself, but in some situations,
these seem to cause problems that are fixed by reinstalling using the latest Mac OS X combo updaters.