Thoughtful, detailed coverage of the Mac, iPhone, and iPad, plus the TidBITS Content Network for Apple consultants.

UberVista is Watching You!

I was recently attracted by yet another spider crawling around the Web, called AltaVista. Since a big problem on the Internet is finding what one is looking for, it is always a plus to find a big, fast search engine. AltaVista is such a search engine, another of the many like InfoSeek, Lycos, Excite and all the rest. The difference with AltaVista is its power, and the claim that it covers more than thirty million Web pages. I tried it out.


At the time, I had only been prowling around on the net for about two months. I had my own home page, and some other work present at different sites. I wanted to see how much of a trace I had left, and was curious if there were other people with my name out there. What I found surprised me. My family is small, and my last name, McElhearn, is uncommon. So I started by entering my name into AltaVista's search form, and prepared myself for about 30 seconds of waiting.

First surprise: It did not take 30 seconds, more like 5.

Second surprise: There were 32 occurrences of my name. If you have been leaving tracks on the Web for years, you may think 32 occurrences isn't that many. After I posted a message about this to the Future Culture mailing list, some other people on the list tried it, and came up with numbers far higher than mine: hundreds, even thousands. But, as a net novice, I was surprised by my 32 occurrences.


Out of the 32 occurrences, 30 were about me, and two were about other people with the same last name (maybe cousins?), one of whom won a high school shot put championship. The rest pointed to me, all right, from a letter of mine published in the second issue of Wired, to posts to the Info-Mac digest, to my home page, to my essay for the 24 Hours of Democracy project, and more.

Well, at first I though that was pretty cool. After all, don't our genes want to makes sure we leave our marks on the planet? But the more I think about it, the more I think that a Pandora's box is being opened. The best way to describe AltaVista is by using the words of the company behind it:

"AltaVista is the result of a research project started in the summer of 1995 at Digital's Research Laboratories in Palo Alto, California. By combining a fast Web crawler with scalable indexing software, the team was able to build a large index of the Web in the Fall of 1995.

"After two months of internal testing, we produced an even larger index consisting of the full text of over 16,000,000 pages. We made the site public on the 15th of December 1995. Within three weeks of launch, we were handling over two million HTTP requests per day."

The Web Indexer, the most powerful part of the setup, is an AlphaServer 8400 5/300, with 6 GB of memory, and 10 processors. Digital claims that the server handles most requests in less than a second.

This is only a part of the picture. Another server handles the hits and requests, and a news server maintains a current news spool for the News Indexer, which dynamically updates the database of newsgroup articles. So, AltaVista is trying to be a repository of, more or less, everything that goes through the Web and Usenet, which means a lot of email is there because their robots index archived mailing lists.

The whole thing has awesome power. Given the growth of the Internet and available processing power, AltaVista should be able to keep up with the traffic and provide this service for a long time.

I say, "mind your own business."

I mentioned my experiment to fellow members of a Mac users' group. One of them, a techie with pocket protectors, expressed awe at the power. Another was amazed at the nosiness of the machine, the fact that it didn't respect privacy. What about privacy?

When I subscribe to a mailing list, no one asks if I have given up the rights to use my posts for any reason. Although my words are public (but only in a limited sense; that is, to those who are also subscribed) I might not want them to be at the disposition of any robot around. After all, Digital never asked if they could use my material to show off their computers (because the goal of the operation strikes me as just that: advertising for the powerful computers Digital makes). And what about my rights? Here in France, everyone has a legal right to verify and modify any information concerning them that is kept in any database. I wonder how Digital would react if I asked them to remove some of my posts from its database? Or if I wanted to exercise my right to the copyrights on those words?

Many people contrast electronic information and communication with books, saying that books are permanent, but electronic information is not. I think AltaVista exemplifies just how permanent such information can be. Not only does it float around in the Internet's ether, but it is also indexed in a database where someone can easily fish for it.

The danger of this is obvious. Let us say that I have been posting to the newsgroup, talking about how I like to do it with pumice. In ten years, if my wife wants a divorce, she can hire a bot to snoop around and find that post, along with others, and get child support, keep me away from the kids - the whole nine yards.

Or what about a young hacker, who later grows up and runs for political office? Another party may find it useful to learn he was spouting anarchist ideas in his youth. He will not be able to say he did not inhale.

Many of us have ideas that we later renounce, but when the words are there in black and bits, it is hard to place the necessary distance between the us-then and the us-now.

What to do about it?

It seems difficult to control this kind of snooping. Companies will make money from our words just as they always have. And the search engine is useful to those seeking information. But the danger is real, and it is right around the corner. I am not a Luddite clamoring for a return to the dark ages; I think the Internet will change the way our future happens. But we must be aware of the dangers, and react accordingly.

The first thing is to demand that we be able to strike from the record anything that we no longer want available. We should have the right to filter what is made available in this manner. No one has the right to exploit our words without our permission. While AltaVista is not financially exploiting them, it is using them to advertise, which I see as being much the same thing.

[Over a month ago, I sent email to AltaVista inquiring about their policy for removing people from its database, but I have not received a response. Kirk has also sent a draft of this article to Digital, and he has received no response. AltaVista has an extensive disclaimer which states in part: "In general, Digital believes that persons who make information available on the World Wide Web or in newsgroups do so with the expectation that such information will be publicly and widely available. Digital further believes that its making newsgroup postings and links to publicly accessible Web pages available at this site is legally permissible and consistent with the common, customary expectations of those who make use of the Web and Usenet communications media." -Tonya]

The second thing is to be aware that someone is listening, and that whatever we say publicly on the Net will be stored. Even in private email, encryption is perhaps the only way to keep our communications safe from wandering eyes. Of course, this is not possible in every part of the world. Countries such as Iran and France can put you in jail for using encryption.

Don't forget, the walls have ears.

P.S.: When I wrote this text, in March, I found 32 occurrences of my name. The last time I checked, there were 78.


READERS LIKE YOU! Support TidBITS by becoming a member today!
Check out the perks at <>
Special thanks to Paul Flynne, Michael Whyte, Drew Rodger, and Sam
Bauer for their generous support!