The Unbearable Lightness of URLs
Last week’s issue of TidBITS had the second installment of our sporadic Tools We Use column; the first installment covered NewerRAM’s useful little GURU utility. Several readers followed the link in last week’s article to the GURU article and then discovered to their horror that the URL we gave for GURU back in November of 1998 no longer worked.
<https://tidbits.com/getbits.acgi?tbart=05191>
<http://www.newerram.com/New_Folder/guru.html>
We often receive notes like this, since many companies don’t think ahead when redesigning Web sites and in the process break existing URLs, a process colloquially called "linkrot." Most people are quite nice about the fact that old issues of TidBITS point at broken URLs, but there’s often just a hint of irritation: why haven’t we dealt with this linkrot already?
Historical Accuracy — One of our most strongly held and frequently assaulted beliefs is that our content is almost immutable – the only thing we ever change after distributing an issue is a typographical error. Our reason for this policy is that we’re adherents to the concept of historical accuracy. We feel that if we wrote something in the 30-Nov-98 issue of TidBITS, those words should remain fixed forever. Otherwise, how could anyone viewing that issue know they were reading the same text we published on that date? And though we flatter ourselves with this thought, what about historians in the future, attempting to divine what it was about the Macintosh community that set it apart from other groups of computer users? We want to present the future with an accurate view of the past.
This policy is often tested because it’s tempting, even for us, to go back in to fix mistakes. We don’t want to look bad, and if we could just make one tiny little change in an article… No. If we make a mistake, that mistake is set in stone, and we can correct it the next week.
These attitudes harken back to publishing world driven existing entirely on paper, and although I’m no fan of publishing on paper purely for the sake of a physical object, paper lends itself both to information permanence and to archiving. You can’t change the words on the page in a magazine, and it’s easy to pile up all the issues of a magazine, in order, and sort through them for some piece of information. Though TidBITS is electronic, we strive to achieve a similar level of information permanence and archiving.
In short, then, you can rely on what you read in an issue of TidBITS to remain the same forever. We will never go in to our archive to change content other than to fix a typo. Soon, we hope to implement forward linking in our database so any corrections would be accessible from the original article.
In addition, we take pains to ensure that all of our URLs are permanent. Our issue naming scheme is simple and consistent, and our custom GetBITS CGI ensures that we have short, permanent URLs to individual articles in our database, not to mention threads in TidBITS Talk.
Dealing with Broken URLs — Luckily, finding a Web page again after a URL breaks isn’t difficult, assuming the page still exists. The trick is to delete pages and directories from the end of the URL until you get to a page from which you can start browsing for the desired page again. Take this URL.
<http://www.tidbits.com/about/about-tidbits.html>
If you received an error message accessing that URL, the first thing to do would be to delete "about-tidbits.html" and send the URL to the Web browser again. If that shorter URL also generated an error page, you’d delete "about/" and try once more. That takes you to the top level of the site and should provide a useful starting point for additional searching.
Binary URLs — When I posted the updated GURU URL to TidBITS Talk, a related issue came up. When posting different versions of programs, how do you deal with the fact that including the version number in the name automatically ensures broken URLs after an upgrade? If you’re distributing software, check out the thread for a variety of ways to prevent binary linkrot.
<https://tidbits.com/getbits.acgi?tlkthrd=588>
Additional Reading — Finally, Jakob Nielsen’s excellent Alertbox column has touched on this topic several times, first in relation to linkrot (apparently more than six percent of the links on the Web are broken), and the second about "content gardening," the act of going back in to keep pages fresh. Jakob linked to my comments about historical accuracy and the need to avoid historical revisionism – it’s worth a read if you’re interested in the topic.
<http://www.useit.com/alertbox/980614.html>
<http://www.useit.com/alertbox/981129.html>