Divided We Fall: Internet Redundancy
Bob Jacobsen <[email protected]> made an interesting comment in reference to our pointers in TidBITS-262 to earthquake information servers that combine information from several different sources. Bob wrote, "The combination of different services also points up a weakness of the net – all of these servers rely on the Xerox PARC Map Server. This is particularly interesting with respect to earthquakes, since PARC is very near one of the remaining "dark spots" in the San Andreas fault, in an area with a high probability of a major quake in the next 30 years." Bob went on to wonder how many other "high utility" services live in only a single Internet location, much like the Connection Machine WAIS server that Thinking Machines took down at the end of last year. WAIS, Inc. is working to bring back most of the sources that lived on the Connection Machine; TidBITS is back, but searches currently return an entire issue rather than a specific article.
Bob suggested that perhaps this was an area in which "service oriented" people could work to replicate some of the less glamorous parts of the Internet information infrastructure, and in fact that’s exactly how the Info-Mac and Umich FTP mirror networks have sprung up. The mother sites at <sumex-aim.stanford.edu> and <mac.archive.umich.edu> are too busy these days for many individuals to get through, but they serve numerous mirror sites that spread the load, and in some respects, the risk. If the machine at <sumex-aim.stanford.edu> went down, say because of an earthquake or a malevolent hacker (see TidBITS-216), service to that machine would be interrupted but the mirror sites would remain active. In a pinch, one of them might even volunteer to become the host site that the others would connect to each day. Aside from moving administrative tools over, the process of switching to a different host site wouldn’t be too bad – certainly easier than setting up a new machine from scratch in a different location.
Unfortunately, not too many other Internet resources follow this philosophy of mirroring resources. Some useful Web pages, such as the CUI page of Web Search Engines at
expressly encourage others to copy the page and support it at their sites. However, although this page is a useful service in its own right, it primarily points at other unique search engines around the Web, and thus is as vulnerable as the earthquake information systems that rely on the Xerox PARC Map Server.
In some cases, I’m sure that the specialized servers require a certain operating system or even certain hardware, which makes creating redundant sites more difficult. This is perhaps the case with Yahoo, the popular and well-organized Web subject catalog, since it requires a custom Unix database that isn’t available to the public. In other situations, the question may be a matter of sufficient volunteer labor and an organization willing to host a popular server. Yahoo serves hundreds of thousands (if not millions) files each day to judge from their statistics, and there aren’t many sites that wish to handle that network and hardware load.
But the issue may not always be that simple. For instance, many people have talked about the next wave of Internet service being commercial services that collect and organize resources, charging a tiny fee for each search. That’s the theory behind the commercial InfoSeek; more on them in a bit. The reason I mention commercial searching is that popular, well-organized, well-run sites might conceivably "go commercial." After all, David Filo and Jerry Yang, the guys who maintain Yahoo, might at some point be lured into the high-stakes and stale-lunch world of big business. If they’re even considering such a move in the future, they might not want to let other sites mirror Yahoo so they can retain control.
Commercial sites like InfoSeek and HotWired float in a slightly different boat. They obviously aren’t going to let just anyone mirror their sites, especially InfoSeek, which uses authentication heavily and charges for searches in commercial databases. But at the same time, since the companies running these sites have a vested interest in making sure users aren’t turned away or left bobbing on the waves, there’s less to worry about. If InfoSeek wishes to stay in business, they have to ensure that their customers will be able to get through, perhaps even in the event of a natural disaster. That’s the cost of doing business.
O’Reilly’s Global Network Navigator site, although very much linked to O’Reilly’s books and decidedly commercially oriented, takes a different approach and has signed up 12 mirror sites around the world. Currently, according to John Labovitz, Technical Services Manager at GNN, the mirror sites have all volunteered to carry GNN, mostly to provide content to more local users without requiring users to go out to the Internet. However, John says, "Due to the growth of GNN (not only in popularity, but in content and technology), we’re working on more clearly defining our mirroring requirements, and will probably incorporate more specific terms and conditions than we do now." Even still, it sounds like GNN has little worry in terms of redundancy.
To be honest, we’ve thought about this issue with respect to TidBITS as well, and it’s one of the reasons we’ve agreed to most any nonprofit, non-commercial redistribution that people have proposed (but please ask anyway so we get a sense of where the issues go). By ensuring that TidBITS is mirrored and stored throughout the world, there’s little fear that a catastrophe here could wipe out the archives of TidBITS (and believe me, the early issues make for some pretty humorous reading!). Although both we and Geoff live near Seattle, should a natural disaster destroy our machines or connections (or give us more serious problems to worry about), Mark Anbinder, our indefatigable News Editor, might even be able to take over the task of publishing the issues for a few weeks, as he did when we moved out west and were without a net connection for a while. [Hey, Mark: clear off some disk space! Mt. Rainier is looking more ominous all the time. -Geoff]
Perhaps the thought to consider in the end is that although the Internet breaks up geographic barriers and scatters them to the winds, Internet resources are no less vulnerable than any other physical object. Machines can be stolen (a temporary Web page we saw recently bemoaned the theft of the main server), damaged, or otherwise put out of commission. However, even though specific machines are as insecure as anything else, the Internet itself, through mirroring and similar techniques, can serve to raise the data above the vulnerability of a single location. And of course, the underlying sentiment throughout this article is that perhaps you, whoever you are, can help to protect a unique Internet resource from the vagaries of fate. Consider the possibility.