Internet Services

I must tread a fine line when talking about Internet services, because the level of connection (and thus the level of service) varies widely. People who can send Internet email, for instance, may not be able to use Gopher or the World Wide Web. The services I talk about in this chapter (except for FTP and Archie via email) all require a full TCP/IP connection to the Internet.


Before I get into the various TCP/IP-based Internet services, I want to explain URLs, or Uniform Resource Locators. These constitute the most common and efficient method of telling people about resources available via FTP, the World Wide Web, and other Internet services.

NOTE: URL generally stands for Uniform Resource Locator, although some people switch "uniform" for "universal." Despite what I've heard from one source, I have never heard anyone pronounce URL as "earl;" instead, everyone I've talked to, including one person from CERN who helped develop the World Wide Web, spells out the letters.

A URL uniquely specifies the location of something on the Internet, using three main bits of information that you need in order to access any given object. First is the URL scheme, or the type of server making the object available, be it an FTP, Gopher, or World Wide Web server. Second comes the address of the resource. Third and finally, there's the full pathname or identifier for the object.

As a quick example, URLs (at least those for the Web) generally look something like this one (which points at the Microsoft Web server):

NOTE: Uniform Resource Locators have become so popular that the Library of Congress has added a subfield for them when it catalogs electronic resources.

This description is a slight oversimplification, but the point I want to make is that URLs are an attempt to provide a consistent way to reference objects on the Internet. I say "objects" because you can specify URLs not only for files and Web pages, but also for stranger things, such as email addresses, Telnet sessions, and Usenet news postings.

Table 8.1 shows the main URL schemes that you're likely to see.

Table 8.1: Common URL Schemes

     ------      -----------------                    -------------

     ftp         File Transfer Protocol               WSFTP
     gopher      Gopher protocol                      WSGopher
     http        HyperText Transfer Protocol          Netscape
     mailto      Email                                Eudora
     news        Net News Transport Protocol          WinVN
     wais        Wide Area Information Servers        EINet winWAIS

If you see a URL that starts with ftp, you know that the file specified in the rest of the URL is available via FTP, which means that you could use FTP under Unix, FTP via email, or a Windows-based FTP client such as WSFTP to retrieve it. If the URL starts with gopher, use WSGopher or another Gopher client. If it starts with http, use NCSA Mosaic or Netscape or some other Web browser. And, finally, if a URL starts with wais, you can use EINet WAIS or other WAIS client.

NOTE: You can use a Web browser to access most of the URL schemes in table 8.1, although Web browsers are not necessarily ideal for anything but information on the World Wide Web itself. Web browsers work pretty well for accessing files on Gopher servers and via gateways to WAIS databases, but FTP via a Web browser is clumsy (and may fail entirely with certain types of files, such as self-extracting archives).

After the URL scheme comes a colon (:), which delimits the server type from what comes next. If two slashes (//) come next, they denote that a machine name in the format of an IP address will follow, such as with However, if the URL points at an address in some other format, such as an email address like, the slashes aren't appropriate and don't appear. Basically, all this means is that if there are two slashes after the colon, the URL points at a file available via FTP, Gopher, the Web, or perhaps some other protocol.

NOTE: In some rare circumstances, you may need to use a username and password in an FTP URL as well. A URL with a username and password might look like this:

The last part of the URL is the specific information you're looking for, be it an email address or more commonly, the path to the directory of the file you desire. Directory names are separated from the machine name by a slash (/). You may not have to specify the path with some URLs if you're only connecting to the top level of the site.

So, for instance, let's dissect a URL that points at the What's New page on Microsoft's Web server.

First off, the http part tells us that we should use a Web browser to access this URL. Then, is the name of the host machine that's running the Web server. The next part, /pages/misc/whatsnew.htm, is the full path to the file the Web browser shows us, and the path works just like paths do in DOS. /pages is a directory, and /misc is a directory inside /pages, and whatsnew.htm is the actual file inside the /misc directory.

NOTE: If an FTP or Gopher URL ends with a slash, that means it points at a directory and not a file. If it doesn't end with a slash, it may or may not point at a directory. If it's not obvious from the last part of the path, there's no good way of telling until you go there. Since most Web servers enable the creation of some sort of default file to be served in the absence of a specific file in the URL, it's usually less important for Web users to realize whether or not they're specifying a file or a directory.

All of these details aside, how do you use URLs? Your mileage may vary, but I use them in three basic ways. First, if I see them in email or in a Usenet posting, I often copy and paste the host part into WSFTP (if they're FTP URLs) or the whole thing into WebSurfer or Mosaic or Netscape (if any other scheme). That's the easiest way to retrieve a file or connect to a site if you have a WinSock-based Internet connection.

Second, if for some reason I don't want to use Mosaic or Netscape (I prefer WSFTP for FTP, for instance), sometimes I manually dissect the URL, as we did with the What's New page on the Microsoft Web server above, to figure out which program to use and where to go. This method takes more work, but sometimes pays off in the end. You can put a screw in the wall with a hammer, but it's not the best tool for the job.

Third and finally, whenever I want to point people at a specific Internet resource or file available for anonymous FTP, I give them a URL. URLs are unambiguous, and although a bit ugly in running text, easier to use than attempting to spell out what they mean. Consider the example below:

To verbally explain the same information contained in that URL, I would have to say something like: "Using an FTP client program, connect to the anonymous FTP site Change directories into the /pub/tidbits/issues/1995/ directory, and once you're there, retrieve the file TidBITS#261/30-Jan-95.etx." A single URL enables me to avoid such convoluted (and boring) language; and frankly, URLs are in such common use on the Internet, you might as well get used to seeing them right now.

NOTE: Frankly, I'm a little worried that some of the longer URLs in this book may be messed up in production, so if you see a hyphenated URL that doesn't look or work right (the hyphens should only appear between words, never in the middle of a word as you would normally see at the end of a line in running text), assume that the hyphen is an artifact of the production process and don't use it when trying to access that URL.

So, from now on, whenever I mention a file or a Web site, I'll use a URL. If you try to retrieve a file or connect to a Web site and are unsuccessful, chances are either you've typed the URL slightly wrong, or the file or server no longer exists. It's extremely likely that many of the files I give URLs for will have been updated by the time you read this, so the file name at the end of the URL may have changed. So if a URL doesn't work, try removing the file name from the last part of the URL and look in the directory that the original file lived in for the updated file. If all else fails, you can remove everything after the machine name and work your way down to the file you are after.

If, after all this, you'd like to learn more about the technical details behind the URL specifications, check out:

NOTE: I find that URLs don't always work well for files stored on Gopher servers, since Gopher allows spaces and other characters that URLs don't accept. Thus, spaces are encoded in Gopher URLs with %20 to indicate that there's a space there.


Despite the occasionally confusing way people use the term both as a noun and a verb, most people don't have much trouble using FTP. FTP stands for File Transfer Protocol, and not surprisingly, it's only good for transferring files between machines. In the past, you could only use an FTP client to access files stored on FTP servers. Today, however, enough other services such as Gopher and the World Wide Web have implemented the FTP protocols that you can often FTP files no matter what service you happen to be using. Heck, you can even FTP files via email. I'll get to the specifics of the different clients in later chapters; for now, here are a few salient facts to keep in mind regarding FTP.

FTP Manners

The Internet does a wonderful job of hiding geographical boundaries. You may never realize a person with whom you correspond lives on the other side of the globe. When using FTP, however, try to keep the physical location of the remote machine in mind.

First, as with everything on the Internet, someone pays for all this traffic. It's probably not you directly, so try to behave like a good citizen who's being given free access to an amazing resource. Show some consideration by not, for example, using machines in Australia when one in the same area of your country works equally well. Because transoceanic traffic is expensive, many machines mirror others; that is, they make sure to transfer the entire contents of one machine to the other, updating the file collection on a regular, often daily basis.

Here's an example. Because the Windows archive site at is popular and kept up-to-date, other sites carrying Windows software don't want to duplicate the effort. It's much easier to set up a mirror to cica so that machines in Australia and Scandinavia can have exactly the same contents as cica. Mirroring not only saves work, it enables users in those countries to access a cheaper, local site for files. Everyone wins, but only if everyone utilizes local sites whenever possible. You can usually tell where a site is located by looking at the two-letter country domain at the end of the address.

Sometimes, of course, the file you need exists only on a remote site in Finland, for example, so that's where you must go to get it. Another point of etiquette to keep in mind, wherever the file may be, is sensitivity to the time of day at the site from which you retrieve it. Like most things in life other than universities during exams, more people use the Internet during their daytime hours than at night. Thus, it's generally polite to retrieve files during off hours; otherwise, you're preventing people from doing their work. That's not polite, especially if the file you're retrieving is a massive MPEG movie or something equally frivolous.

Notice that I said "their daytime hours." Because the Internet spans the globe, it may be 4:00 A.M. where you are, but it's the middle of the business day somewhere else.

One final piece of FTP etiquette: Don't use someone else's FTP site as a temporary dumping ground for junk that you either can't store on your account or don't want to download directly.

FTP Clients

FTP is inherently simple to use, but there's plenty of room for FTP client software to make your life miserable. The following sections, therefore, describe several benefits and features to look for in an FTP client.


Most of the time, people use an FTP client program to log on to a remote FTP site, find a file or two, download them, and then log off. As such, a disproportionate amount of your time is spent connecting and navigating to the files you want.

A good FTP client enables you to define shortcuts for frequently used FTP sites, along with the userid and password necessary for connecting to them. This benefit is minor but makes a big difference when repeated numerous times. I can't tell you how much I hate typing on a Unix command line when I'm trying to connect to that site with FTP.


Once you're on, the FTP client program should make it very easy to move between directories. Most programs do this by roughly emulating the standard Open/Save dialog box so prevalent in Windows applications. Although the look may change from client to client, the basic operation of using the drop-down listboxes to change drives and directories and then clicking on files to select them is the same. It's helpful when the client program remembers the contents of directories. That way, if you go back to one you've already visited, you don't have to wait for it to refresh the file list.

NOTE: In the first edition of this book, I commented that it would be interesting to see an FTP client that perfectly emulates the Windows File Manager, rather then just the general interface. Since that time, Spry and PC Interface have developed implementations that do just that. One nice feature from the best of these is the ability to drag-and-drop files both to and from the remote system. I like the fact that, for the most part, these FTP programs require no additional learning to use. Just do what you would do normally, and it should work.

A useful variant of shortcuts (also known as bookmarks) to FTP site names is the addition of directory information to the site name. Say, for instance, you want to retrieve Windows files from Not only do you have to enter the host name, userid, and password, but you must also go to the proper directory, which is /pub/tiskwin. A good shortcut can not only get you to the site, but take you to a specific directory as well.

Listing Style

In Unix, you can choose among several different methods of viewing files. Some show you more information, such as file size and creation date, and others show you less, in order to fit more entries on the screen. Although the PC doesn't have the problem of trying to fit multiple columns in a list (only one Windows program uses multiple column lists), not all the FTP clients are good about showing you the entire filename, size, or date. I think this failure is inexcusable, because you need to know how large a file is before you spend an hour retrieving it -- especially if you're connecting at a slow speed. Make sure the program you use provides this information.

Recognizing File Type and Decoding

Much of the time, an FTP client can figure out what sort of file you're retrieving by looking at the extension to the filename. This being the case, the client can make sure it is transferring the file in the proper format. If you're lucky, it even decodes some of the common formats you see on the Internet.

"Wait a minute," you say. "He didn't mention strange file formats before." Sorry about that. I'll get to file formats later on in this chapter, after I've discussed the various ways files might appear on your machine. But first, let's look at how you can retrieve files from FTP sites armed only with an email program.

FTP by Email

One of the problems with FTP is that it requires your attention -- in fact, pretty much your full attention. In addition, you must have a TCP/IP connection to the Internet. If you're connecting via UUCP or some weird BBS, you simply cannot use FTP normally.

There is a solution, although not a terribly good one. You can retrieve files remotely, using only your email program, in two different ways. The most generic way is by using one of the FTPmail or BITFTP servers. The other way is to use a specific mailserver program that only knows how to serve a specific site, sometimes as part of a mailing list manager such as LISTSERV, ListProcessor, or Majordomo. Let's look at the generic way first.

FTPmail and BITFTP

Using FTPmail or BITFTP isn't particularly difficult, but can be extremely frustrating. The problem is twofold. First, the main FTPmail server is seriously overloaded. Because it's a free service that someone at DEC runs in a machine's spare time, FTPmail is a low priority. It can take a week for your file to come back. I've even had requests seemingly disappear into the ether. Second, talking to an FTPmail server is like playing 20 Questions -- unless you know precisely what you're looking for, where it is, and how to enter the commands perfectly, you'll get an error message back. And, if that message comes back a week later, you may not even have the original information with which to correct your mistake.

NOTE: Often when you use email to retrieve files stored on FTP sites, the files are split into chunks. Usually you can control the size of the chunks, but manually joining them in a word processor can be difficult. Some email programs, such as Eudora, as well as various utilities, make joining file chunks easier.

Talking to an FTPmail or BITFTP server feels much like using a standard Unix client to log in to an FTP site, change directories, and finally retrieve the file. The only problem is, you must type in the commands all at once. So, to get a file from the main FTPmail server, you would send email to, put something in the Subject line (it doesn't care what, but dislikes empty Subject lines), and then, in the body of the message, put a set of commands like this:





chdir /pub/tiskwin



So, in English, what you're doing is first getting the help file from FTPmail, then connecting to the anonymous FTP site at, then changing into the /pub/tiskwin/ directory, then retrieving the file called, and finally quitting (see figure 8.1). If you wanted, you could retrieve more files. And, if you included an ls command, FTPmail would return the directory listing to you, enabling you to see what's there before requesting specific files. The binary and uuencode commands make sure the file gets to us in a form that we can read -- see the file formats section for a discussion of file formats.

Figure 8.1: FTPmail sample in Eudora.

Needless to say, there are a number of other commands that FTPmail accepts, and you will probably want to use some of them (see table 8.2).

Table 8.2: Basic FTPmail Commands

-------             --------

ascii               Tells FTPmail that the files you're getting are
                    straight ASCII files, which is true of most INDEX
                    and uuencoded files (see file format section
                    later in this chapter).

binary              Tells FTPmail that the files you're getting are
                    binary files and should be encoded in ASCII
                    before sending. The default format for encoding
                    is btoa, but you can set it to uuencode as well
                    (see next item). You must specify either btoa or
                    uuencode or the request will fail.

btoa                Tells FTPmail to mail binary files in btoa

chdir directory     Changes into the specified directory.

chunksize size      Splits files into chunks defined by the number of
                    bytes in size. The default is 64,000
                    bytes (64K).

compress            Tells FTPmail to compress the files with Unix 
                    Compress before sending (discussed later).

connect Tells FTPmail to connect to the specified host.

dir directory       Returns a long directory listing.

get filename        Gets a file and mails it to you.

help                Sends back the help file.

ls directorY        Returns a short directory listing.

quit                Quits FTPmail and ignores the rest of the mail

reply your-address  Gives FTPmail your address, since it may not be
                    able to determine it from the header.

uuencode            Tells FTPmail to mail binary files in uuencode

I only know of one other FTPmail server. It is in Ireland and uses a somewhat different command set, so I don't recommend using it unless you're in Europe. If you want to find out more about it, send email to and put the single command help in the body of the message.

I know of three BITFTP servers. One is in the U.S., another in Germany, and the third in Poland (see table 8.3). Don't use the ones in Europe unless you too are in Europe -- it's a waste of net bandwidth, and probably won't result in particularly good service anyway.

Table 8.3: BITFTP Servers

     SERVER NAME                        LOCATION
     -----------                        --------          U.S.                   Germany               Poland

NOTE: BITFTP stands for BITNET FTP, or something like that. Machines that are only on BITNET cannot use FTP normally, so some enterprising programmers created the BITFTP program to enable BITNET users to retrieve files stored on FTP sites. I had thought that these servers were restricted to BITNET users, but couldn't find any mention of that restriction in the help file.

Retrieving a file from a BITFTP server works similarly to retrieving a file from FTPmail, but the commands are somewhat different. Here's how you retrieve the same file (along with the help file again) we snagged before. Send email to and put these commands in the body of the letter:



user anonymous

cd /pub/tiskwin



Enough about BITFTP. You can probably figure out the rest on your own, with the aid of the help file. I wouldn't want to spoil all the fun of figuring some of this stuff out for yourself!


More common than FTPmail or BITFTP programs that serve everyone are mailserver programs that provide email access to FTP archives on a specific site. There are many of these mailservers around, although finding them can be a bit difficult, and I cannot tell which FTP sites that might interest you also have mailservers.

Mailing list manager programs such as LISTSERV, ListProcessor, and Majordomo often provide access to files, although these files aren't always available via FTP. Most often the files in question are logs of mailing list discussions, but in a few instances, they're more interesting.

FTP by email is much like playing Pin the Tail on the Donkey with a donkey the size of... Nah, I'll avoid the easy shot at some sleazy politician. Let's talk next about how you find files via FTP. The answer is Archie.


Archie is a good example of what happens when you apply simple technology to a difficult problem in an elegant way. Here is the problem: How do you find any given file on the nets if you don't already know where it's located? After all, in comparison with finding a single file on several million machines, the proverbial haystack looks tiny, and its cousin, the proverbial needle, sticks out like the sore thumb you get when you find it. In a nutshell, Archie uses normal FTP commands to get directory listings of all the files on hundreds of anonymous FTP sites around the world. It then puts these file listings into a database and provides a simple interface for searching it. That's really all there is to Archie. It's amazing that no one thought of it before.

NOTE: Archie was developed in early 1991 by Alan Emtage, Peter Deutsch, and Bill Heelan from the McGill University Computing Center in MontrŽal, Canada. Development now takes place at a company founded by Deutsch and Emtage, Bunyip Information Systems. Although the basic Archie client software is distributed freely, Bunyip sells and supports the Archie server software. If you have questions about Archie, you can write to the Archie Group at

You can access Archie via Telnet, email, Gopher, the World Wide Web, and special Windows client programs. Some Unix machines may also have Unix Archie clients installed. It seems to me there are two basic goals an Archie client should meet. First, it should be easy to search for files, but when you want to define a more complex search, that should be possible as well. Second, since the entire point of finding files is so that you can retrieve them, an Archie client ideally should make it very easy to retrieve anything that it finds. This second feature appears to be less common than you would expect.

NOTE: Archie isn't an acronym for anything, although it took me half an hour searching through files about Archie on the Internet to determine that once and for all.

Accessing Archie via email is extremely easy, although the Archie server offers enough options (I'll let you discover them for yourself) to significantly increase the complexity. For a basic search, merely send email to and put in the body of the message lines like the following:


find nslookup

find ns-lookup

In a short while (or perhaps a long while, depending on the load on the Archie server), the results should come back -- the help file that you asked for and the results of your search for "ns-lookup" and "nslookup." However, if the Archie server you chose is down, or merely being flaky (as is their wont) you might want to try another one. There are plenty. Simply send email to the userid archie at any one of the Archie servers from the list in table 8.4. As usual, it's polite to choose a local server.

Table 8.4: Current Archie Servers

     -----------                  ----------------   --------                  Australia         Austria       Austria          Canada          Finland       Germany               Israel          Italy            Japan                  Korea       Korea          Spain            Sweden           Switzerland        Taiwan       United Kingdom             USA (NE)       USA (NJ)         USA (NJ)            USA (NY)        USA (MD)


Because Telnet is similar to FTP in the sense that you're logging in to a remote machine, the same rules of etiquette apply. As long as you try to avoid bogging down the network when people want to use it for their local work, you shouldn't have to worry about it too much. When you telnet to another machine, you generally telnet into a specific program that provides information you want. The folks making that information available may have specific restrictions on the way you can use their site. Pay attention to these restrictions. The few people who abuse a network service ruin it for everyone else.

What might you want to look for in a Telnet program? That's a good question, I suppose, but not one that I'm all that qualified to answer. For the most part, I avoid Telnet-based command-line interfaces. Thus, in my opinion, you should look for features in a Telnet program that will make it, and any random program that you might happen to run on the remote machine, easier to use.

It's useful to be able to save connection documents that save you the work of logging in to specific machines (but beware of security issues if they also store your password). Also, any sort of macro capability will come in handy for automating repetitive keystrokes. Depending on what you're doing, you may also want some feature for capturing the text that flows by for future reference. And, you should of course be able to copy and paste out of the Telnet program.


IRC, which stands for Internet Relay Chat, is a method of communicating with others on the Internet in real time. It was written by Jarkko Oikarinen of Finland in 1988 and has spread to 20 countries. IRC is perhaps better defined as a multiuser chat system, in which people gather in groups that are called channels, usually devoted to some specific subject. Private conversations are also possible.

NOTE: IRC gained a certain level of fame during the Gulf War, when updates about the fighting flowed into a single channel where a huge number of people had gathered to stay up-to-date on the situation.

I personally have never messed with IRC much, having had some boring experiences with RELAY, a similar service on BITNET, back in college. I'm not all that fond of IRC, in large part because I find the amount of useful information there almost nonexistent, and I'm uninterested in making small talk with people from around the world. Nevertheless, IRC is one of the most popular Internet services. Thousands of people connect to IRC servers throughout any given day. If you're interested in IRC, refer to the section on it back in chapter 5. That should give you a sense of what IRC is like. You can find more information in the IRC tutorials posted for anonymous FTP in:

Client programs for many different platforms exist, including two for Windows called IRCIIWIN and WSIRC. Much as with Telnet, you're looking for features that make the tedious parts of IRC simpler.


MUD, which stands for Multi-User Dungeon or often Multi-User Dimension, may be one of the most dangerously addictive services available on the Internet. The basic idea is somewhat like the text adventures of old, where you type in commands like "Go south," "Get knife," and so on. The difference with MUDs is that they can take place in a wide variety of different realities, basically anything someone could dream up. More importantly, the characters in the MUD are actually other people interacting with you in real time. Finally, after you reach a certain level of proficiency, you are often allowed to modify the environment of the MUD.

The allure of the MUDs should be obvious. Suddenly, you can become your favorite alter ego, describing yourself in any way you want. Your alternate-reality prowess is based on your intellect, and if you rise high enough, you can literally change your world. Particularly for those who may feel powerless or put upon in the real world, the world of the MUD is an attractive escape, despite its text-environment limitations.

After the publication of an article about MUDs, the magazine Wired printed a letter from someone who had watched his brother fail out of an engineering degree and was watching his fiancŽe, a fourth-year astrophysics student, suffer similar academic problems, both due to their addictions to MUDs. But don't take my word for it; read the letter for yourself on Wired's Web server:

NOTE: Unfortunately, due to the way Wired has set up their HotWired server, it now requires that you create a username and password before you can read their stuff. To do that, use your Web browser to go to:

Then click on the Register Now link. You can then fill in the onscreen form to register. Don't worry, it's all free.

I've seen people close to me fall prey to the addictive lure of MUDs. As an experiment in interactive communications and human online interactions, MUDs are extremely interesting, but be aware of the time they can consume from your real life.

I don't want to imply that MUDs are evil. Like almost anything else, they can be abused. But in other situations, they have been used in fascinating ways, such as to create an online classroom for geographically separated students. There's also a very real question of what constitutes addiction and what constitutes real life. I'd say that someone who is failing out of college or failing to perform acceptably at work because of a MUD has a problem, but if that person is replacing several hours per day of television with MUDing, it's a tougher call. Similarly, is playing hours and hours of golf each week any better than stretching your mind in the imaginative world of a MUD? You decide, but remember: there are certain parts of real life that we cannot and should not blow off in favor of a virtual environment.

Although MUDs are currently text-only, rudimentary graphics will almost certainly appear at some point, followed by more realistic graphics, sound, and video, and perhaps some day even links to the virtual reality systems of tomorrow. I don't even want to speculate on what those changes might mean to society, but you may want to think about what might happen, both positive and negative.


Unlike almost every other resource mentioned in this book, the WAIS, or Wide Area Information Servers, project had its conception in big business and was designed for big business. The project started in response to a basic problem. Professionals from all walks of life, and corporate executives in particular, need tremendous amounts of information that are usually stored online in vast databases. However, corporate executives are almost always incredibly busy people without the time, inclination, or skills to learn a complex database query language. Of course, corporate executives are not alone in this situation; many people have the same needs and limitations.

In 1991, four large companies -- Apple Computer, Dow Jones & Co., Thinking Machines Corporation, and KPMG Peat Marwick -- joined together to create a prototype system to address this pressing problem. Apple brought its user interface design expertise, Dow Jones was involved because of its massive databases of information, Thinking Machines provided the programming and expertise in high-end information retrieval engines, and KPMG Peat Marwick provided the information-hungry guinea pigs.

One of the initial concepts was the formation of an organizational memory -- the combined set of memos, reports, guidelines, email, and whatnot -- that make up the textual history of an organization. Because all of these items are primarily text and completely without structure, stuffing them into a standard relational database is like trying to fill a room with balloons. They don't fit well, they're always escaping, and you can never find anything. WAIS was designed to help with this problem.

So far I haven't said anything about how WAIS became one of the Internet's primary sources for free information. With such corporate parentage, it's in some ways surprising that it did. The important thing about the design of WAIS is that it doesn't discriminate. WAIS can incorporate data from many different sources, distribute them over various types of networks, and record whether the data is free or carries a fee. WAIS is also scalable, so that it can accept an increasing number and complexity of information sources. This is an important feature in today's world of exponentially increasing amounts of information. The end result of these design features is that WAIS works perfectly well for serving financial reports to harried executives, but equally well for providing science fiction book reviews to curious undergraduates.

In addition, the WAIS protocol is an Internet standard and is freely available, as are some clients and servers. Anyone can set up his or her own WAIS server for anyone with a WAIS client to access. Eventually, we may see Microsoft, Lotus, and Novell duking it out over who has the best client for accessing WAIS. With the turn the Internet has taken in the past year, however, it's far more likely that we'll see Microsoft, Lotus, and Novell competing with World Wide Web clients.

At the beginning of this section, I mentioned the problem of most people not knowing how to communicate in complex database query languages. WAIS solves that problem by implementing a sophisticated natural language input system, which is a fancy way of saying that you can talk to it in everyday English. If you want to find more information about deforestation in the Amazon rainforest, you simply formulate your query as: "Tell me about deforestation in the Amazon rainforest." Pretty rough, eh? In its current state, WAIS does not actually examine your question for semantic content; that is, it searches based on the useful words it finds in your question (and ignores, for instance, "in" and "the"). However, nothing prevents advances in language processing from augmenting WAIS so that it has a better idea of what you mean.

NOTE: The WAIS folks discourage the use of the term keywords because keywords imply that the databases are indexed, and unless you type in a keyword that matches an index term, you cannot find anything. In fact, keywords and Boolean queries (where you say, for instance, "Find Apple AND Computer") were both methods of getting around the fact that, until recently, we didn't have the computer power to search the full text of the stored documents. Nor did we have the computer power to attempt natural language queries and relevance feedback. Now we do, and it's a good thing.

In any database, you find only the items that match your search. In a very large database, though, you often find far too many items; so many, in fact, that you are equally at a loss as to what might be useful. WAIS attempts to solve this problem with ranking and relevance feedback. Ranking is just what it says. WAIS looks at each item that answers the user's question and ranks it based on the proximity of words and other variables. The better the match, the higher up the document appears in your list of found items. Although by no means perfect, this basic method works well in practice.

Relevance feedback, although a fuzzier concept, also helps you refine a search. If you ask a question and WAIS returns 30 documents that match, you may find one or two that are almost exactly what you're looking for. You can then refine the search by telling WAIS, in effect, that those one or two documents are "relevant" and that it should go look for other documents that are "similar" to the relevant ones. Relevance feedback is basically a computer method of pointing at something and saying, "Get me more like that."

The rise of services such as WAIS and Gopher on the Internet will by no means put librarians out of business. Instead, the opposite is true. Librarians are trained in ways of searching and refining searches. We need their experience, both in making sense of the frantic increase in information resources and in setting up the information services of tomorrow. More than ever, we need to eliminate the stereotype of the little old lady among dusty books and replace it with an image of a person who can help us navigate through data in ways we never could ourselves. There will always be a need for human experts.

When you put all this information together, you end up with a true electronic publishing system. This definition, pulled from a paper written by Brewster Kahle, then of Thinking Machines and now president of WAIS, Inc., is important for Internet users to keep in mind as the future becomes the present: "Electronic publishing is the distribution of textual information over electronic networks." (Kahle later mentions that the WAIS protocol does not prohibit the transmission of audio or video.) I emphasize that definition because I've been fighting to spread it for some years now because of my role with TidBITS.

Electronic publishing has little to do with using computer tools to create paper publications. For those of you who know about Adobe Acrobat, Common Ground from No Hands Software, and Replica from Farallon, those three programs aren't directly related to electronic publishing because they all work on the metaphor of a printed page. With them, you create a page and then print to a file format that other platforms can read (using special readers) but never edit or reuse in any significant way. We're talking about electronic fax machines. We should enjoy greater flexibility with electronic data.

So, how can you use WAIS? I see two basic uses. Most of the queries WAIS gets are probably one-time shots in which the user has a question and wants to see whether WAIS stores any information that can provide the answer. This use has much in common with the way reference librarians work -- someone comes in, asks a question, gets an answer, and leaves.

More interesting for the future of electronic publishing is a second use, that of periodic information requests. As I said earlier in this book, most people read specific sections of the newspaper and, even within those sections, are choosy about what they do and don't read. I, for instance, always read the sports section but am interested only in baseball, basketball, football to a lesser extent, and hockey only if the Pittsburgh Penguins are mentioned. Even within the sports I follow closely, baseball and basketball, I'm more interested in certain teams and players than others.

Rather than skim through the paper each Sunday to see whether anything interesting happened to the teams or players I follow, I can instead ask a question of a WAIS-based newspaper system (which is conceivable right now, using the UPI news feed that ClariNet sells via Usenet). In fact, I might not ask only one question, but gradually come up with a set of questions, some specific, others abstract. Along with "What's happening with Cal Ripken and the Baltimore Orioles?" could be "Tell me about the U.S. economy."

In either case, WAIS would run my requests periodically, every day or two, and indicate which items are new in the list. Ideally, the actual searching would take place at night to minimize the load on the network and to make the search seem faster than the technology permits. Once again, this capability is entirely possible today; all that lacks for common usage is the vast quantities of information necessary to address everyone's varied interests. Although the amount of data available in WAIS is still limited (if you call 500-plus sources limited), serious and important uses are already occurring.

NOTE: A friend at Thinking Machines related a story about a friend who used WAIS to research his son's unusual medical condition and ended up knowing more than the doctor. Sounds like it's time to look for another doctor, but you get the point.

In large part due to its corporate parentage, the WAIS project has been careful to allow for information to be sold and for owners of the information to control who can access the data and when. Although it's not foolproof, the fact that WAIS addresses these issues makes it easier to deal with copyright laws and information theft.

Because of the controls WAIS allows, information providers are likely to start making sources of information more widely available. With the proliferation of these information sources, it will become harder for the user to keep track of what's available. To handle that problem, WAIS incorporates a Directory of Servers, which tracks all the available information servers. Posing a question to the Directory of Servers source (WAIS calls sets of information sources or servers) returns a list of servers that might have information pertaining to your question. You can then easily ask the same question of those servers to reach the actual data.

Most of the data available on WAIS is public and free at the moment, and I don't expect that arrangement to change. I do expect more commercial data to appear in the future, however.

In regard to that issue I want to propose two ideas. First, charges should be very low to allow and encourage access, which means that profit is made on high volume rather than high price. Given the size of the Internet, I think this approach is the way to go, rather than charging exorbitant amounts for a simple search that may not even turn up the answer to your question.

Second, I'd like to see the appearance of more "information handlers," who foot the cost of putting a machine on the Internet and buying WAIS server software and then, for a percentage, allow others to create information sources on their server. WAIS, Inc. already provides this service, but I haven't heard of much competition yet. That service enables a small publisher to make, say, a financial newsletter available to the Internet public for a small fee, but the publisher doesn't have to go to the expense of setting up and maintaining a WAIS server. This arrangement will become more commonplace; the question is when? Of course, as the prices of server machines, server software, and network connections drop, the number of such providers will increase.

WAIS has numerous client interfaces for numerous platforms, but you probably can use either a simple VT100 interface via Telnet or, if you have a WinSock link to the Internet, one of several slick WAIS clients. When evaluating WAIS client programs, keep in mind my comments about the two types of questions and the relevance feedback. A WAIS client should make it easy to ask a quick question without screwing around with a weird interface, and it should also enable you save questions for repeated use (as in the electronic newspaper example). Similarly, with relevance feedback, that act of pointing and saying, "Find me more like this one that I'm pointing at" should be as simple as possible without making you jump through hoops.

Finally, something that none of the WAIS clients I've seen do well is provide a simple method of keeping track of new sources as they appear, not to mention keeping track of which sources have gone away for good.


In direct contrast to WAIS, Gopher originated in academia at the University of Minnesota, where it was intended to help distribute campus information to staff and students. The name is actually a two-way pun (there's probably a word for that) because Gopher was designed to enable you to "go fer" some information. Many people probably picked up on that pun, but the less well-known one is that the University of Minnesota is colloquially known as the home of the Golden Gophers, the school mascot. In addition, one of the Gopher Team said that they have real gophers living outside their office.

NOTE: Calling yourself the Golden Gophers makes more sense than calling yourself the Trojans, not only considering that the Trojans were one of the most well-known groups in history that lost, but also considering that they lost the Trojan War because they fell for a really dumb trick. "Hey, there's a gigantic wooden horse outside, and all the Greeks have left. Let's bring it inside!" Not a formula for long-term survival. Now, if they had formed a task force to study the Trojan Horse and report back to a committee, everyone wouldn't have been massacred. Who says middle management is useless? Anyway, I digress.

The point of Gopher is to make information available over the network, much in the same way that FTP does. In some respects, Gopher and FTP are competing standards for information retrieval, although I'm sure there are more FTP sites than Gopher sites.

NOTE: FTP probably will never go away, because it's such a low-level standard on the Internet. Also, Gopher only works for retrieving data; you cannot use it to send data. Finally, there's no easy way to give Gopher users usernames and passwords so only they can access a Gopher site.

Gopher has several major advantages over FTP. First, it provides a much friendlier interface than the standard command-line FTP client.

Second, Gopher provides access to far more types of information resources than FTP. Gopher provides access to online phone books, online library catalogs, the text of the actual files, databases of information stored in WAIS, various email directories, Usenet news, and Archie.

Third, Gopher pulls all this information together under one interface and makes it all available from a basic menu system.

If you retrieve a file via FTP and the file gives you a reference to another FTP server, you as the user must connect to that site separately to retrieve any more files from there. In contrast, you connect to a single home Gopher server, and from there, wend your way out into the wide world of Gopherspace without ever having to consciously disconnect from one site and connect to another (although that is what happens under the hood). Gopher servers almost always point at each other, so after browsing through one Gopher server in Europe, you may pick a menu item that brings you back to a directory on your home server. Physical location matters little, if at all, in Gopherspace.

Gopher has also become popular because it uses less server resources than standard FTP. When you connect to a Gopher server, the Gopher client software actually connects only long enough to retrieve the menu, and then it disconnects. When you select something from the menu, the client connects again very quickly, so you barely notice that you weren't actually wasting a port connection on the host machine during that time. Administrators like using Gopher for this reason. They don't have to use as much computing power providing files to Internet users.

Several Gopher clients exist for Windows, and you can use a Web browser to access Gopher servers, but you can also access Gopher via Telnet and a VT100 interface. It's nowhere near as nice (it's slower, you can only do one thing at a time, and you cannot view pictures and the like online), but it works if you don't have WinSock-based access to the Internet.


The most important adjunct to Gopher is a service called Veronica, developed by Steve Foster and Fred Barrie at the University of Nevada. Basically, Veronica is to Gopher what Archie is to FTP -- a searching agent; hence, the name.

NOTE: Veronica stands for Very Easy Rodent-Oriented Net-wide Index to Computerized Archives, but apparently the acronym followed the name.

Veronica servers work much like Archie servers. They tunnel through Gopherspace recording the names of available items and adding them to a massive database that is several gigabytes large.

You usually find a Veronica menu within an item called Other Gopher and Information Servers, or occasionally simply World. When you perform a Veronica search, you either look for Gopher directories, which contain files, or you look for everything available via Gopher, which includes the files and things like WAIS sources as well. There are only a few public Veronica servers in the world (between four and six, depending on which machines are up), so you may find that the servers are heavily overloaded at times, at which point they'll tell you that there are too many connections and that you should try again later. Although it's not as polite as I'd like, I find that using the European Veronica servers during their night is the least frustrating (see table 8.5).

Table 8.5: Current Veronica Servers

          SERVER NAME                      LOCATION
          -----------                      --------

          NYSERNet                         U.S.
          University of Texas, Dallas      U.S.
          SCS Nevada                       U.S.
          University of Koeln              Germany
          University of Pisa               Italy
          UNINETT/University of Bergen     Norway

It's definitely worth reading the "Frequently Asked Questions about Veronica" document that lives with the actual Veronica servers. It provides all sorts of useful information about how Veronica works, including the options for limiting your search to only directories or only searchable items. You can use Boolean searches within Veronica, and there are ways of searching for word stems -- that is, the beginning of words. So, if you wanted to learn about yachting, you could search for "yacht*." The possibilities aren't endless, but Veronica is utterly indispensable for navigating Gopherspace and for searching on the Internet in general.


Getting sick of the Archie Comics puns yet? They just keep coming and, like Veronica, I somehow doubt that this acronym came before the name. Jughead stands for Jonzy's Universal Gopher Hierarchy Excavation And Display. Jughead does approximately the same thing as Veronica, but if you've ever done a Veronica search on some generic word, you know that Veronica can provide just a few too many responses (insert sarcasm here). Jughead is generally used to limit the range of a search to a certain machine, and to limit it to directory titles. This makes Jughead much more useful than Veronica if you know where you want to search, or if you're only searching on a Gopher server that runs Jughead.

I don't use Jughead much, because what I like about the massive number of Veronica results is that they often give me a sense of what information might exist on any given topic. I suppose that if I regularly performed fairly specific searches on the same set of Gopher servers, I'd use Jughead more.

NOTE: The best way to find a Jughead server that's generally accessible is to do a Veronica search on "jughead -t7." That returns a list of all searchable Jughead servers, rather than all the documents and directories in Gopherspace that contain the word "jughead."

World Wide Web

The World Wide Web is the most recent and ambitious of the major Internet services. The Web was started at CERN, a high-energy physics research center in Switzerland, as an academic project. It attempts to provide access to the widest range of information by linking not only documents made available via its native HTTP (HyperText Transfer Protocol), but also additional sources of information via FTP, WAIS, and Gopher. Gateways also exist to Oracle databases and to DEC's VMS/Help systems, among many others. The Web tries to suck in all sorts of data from all sorts of sources, avoiding the problems of incompatibility by allowing a smart server and a smart client program to negotiate the format of the data.

NOTE: CERN doesn't stand for anything now, but once was an acronym for a French name.

In theory, this capability to negotiate formats enables the Web to accept any type of data, including multimedia formats, once the proper translation code is added to the servers and the clients. And, when clients don't understand the type of data that's appearing, such as an MPEG movie, for instance, they generally just treat the data as a generic file, and ask another program to handle it after downloading.

The theory behind the Web makes possible many things, such as linking into massive databases without the modification of the format in which they're stored, thereby reducing the amount of redundant or outdated information stored on the nets. It also enables the use of intelligent agents for traversing the Web. But what the Web really does for the Internet is take us one step further toward total ease of use. Let's think about this evolution for a minute.

FTP simply transfers a file from one place to another -- it's essentially the same thing as copying a file from one disk to another on the PC. WAIS took the concept of moving information from one place to another, and made it possible for client and server to agree on exactly what information is transferred. When that information is searched or transferred, you get the full text without having to use additional tools to handle the information. Gopher merged both of those concepts, adding in a simple menu-based interface that greatly eased the task of browsing through information. Gopher also pioneered the concept of a virtual space, if you will, where any menu item on a Gopher server can refer to an actual file anywhere on the Internet. Finally, the World Wide Web subsumes all of the previous services and concepts, so it can copy files from one place to another; it can search through and transfer the text present in those files; and it can present the user with a simple interface for browsing through linked information.

But aside from doing everything that was already possible, the World Wide Web introduced four new concepts. The first one I've mentioned already; it's the capability to accept and distribute data from any source, given an appropriately written Web server.

Second, the Web introduced the concept of rich text and multimedia elements in Internet documents. Gopher and WAIS can display the text in a document, but they can't display it with fonts and styles and sizes and sophisticated formatting. You're limited to straight, boring text (not that it was boring when it first appeared, I assure you). With the Web, you can create HTML (short for HyperText Markup Language) documents that contain special codes that tell a Web browser program to display the text in various different fonts and styles and sizes. Web pages (that's what documents on the Web are generally called) can also contain inline graphics -- that is, graphics that are mixed right in with the text, much as you're used to seeing in books and magazines. And finally, for something you're not used to seeing in books and magazines, a Web page can contain sounds and movies, although sound and movie files are so large that you must follow a link to play each one.

Link? What's a link? Ah, that's the third concept that the Web brought to the Internet. Just as an item in a Gopher menu can point to a file on another Internet machine in a different country, so can Web links. The difference is that any Web page can have a large number of links, all pointing to different files on different machines, and those links can be embedded in the text. For instance, if I were to say in a Web page that I have a really great collection of penguin pictures stored on another Web page (and if you were reading this on the Web and not in a book), you could simply click on underlined words to immediately jump to that link. Hypertext arrives on the Internet.

Hmm, I should probably explain hypertext. A term coined by Ted Nelson years ago, hypertext refers to nonlinear text. Whereas you normally read left to right, top to bottom, and beginning to end, in hypertext you follow links that take you to various different places in the document, or even to other related documents, without having to scan through the entire text. Assume, for instance, that you're reading about wine. There's a link to information on the cork trees that produce the corks for wine bottles, so you take that link, only to see another link to the children's story about Ferdinand the Bull, who liked lying under a cork tree and smelling the flowers. That section is in turn linked to a newspaper article about the running of the bulls in Pamplona, Spain. A hypertext jump from there takes you to a biography of Ernest Hemingway, who was a great fan of bull fighting (and of wine, to bring us full circle). This example is somewhat facetious, but I hope it gives you an idea of the flexibility a hypertext system with sufficient information, such as the World Wide Web, can provide.

Fourth, the final new concept the Web introduced to the Internet is forms. Forms are just what you would think, online forms that you can fill in, but on the Internet forms become tremendously powerful since they make possible all sorts of applications, ranging from surveys to online ordering to reservations to searching agents to who knows what.

For some time, the Web lacked a searching agent such as Archie or Veronica, a major limitation because the Web is so huge. However, a number of searching agents have appeared. You can find a list of the Web searching agents (and a ton of other useful pointers) at:

You can also find a single page with links to most of the well-known searching agents at:

And there's a very nice catalog of resources called Yahoo, running on some machines at Stanford University:

You can access the Web via a terminal and a VT100 interface using Lynx, or even via email (which would be agonizingly slow and ugly), but for proper usage, you must have a special browser.

NOTE: To try the Web via email, send email to with the command www in the body of the message.

When you're evaluating Web browsers, there are a number of features to seek. The most important is one that seems obvious: an easy way to traverse links. Since the entire point of a Web browser is to display fonts and styles in text, a Web browser should give you the ability to change the fonts used to ones on your PC that you find easy to read. HTML documents don't actually include references to Times New Roman and Arial; they encode text in certain styles, much like a word processor or page layout program does. Then, when your Web browser reads the text of a Web page, it decodes the HTML styles and displays them according to the fonts that are available. Sometimes the defaults are ugly, so I recommend playing with them a bit.

Many, if not most, Web pages also contain graphics, which is all fine and nice unless you're the impatient sort who dislikes waiting for the graphics to travel over a slow modem. Web browsers should have an option to turn off autoloading of images. You should also be able to do anything you can do in a normal Windows application, such as copy and paste. You should be able to save a hotlist of Web sites that you'd like to visit again. Finally, you should be able to easily go back to previously visited pages without having to reload them over the Internet.

As I said previously, there are a number of ways to access the Web; but frankly, if you use a PC but don't have access to a WinSock-based connection, you'll miss out on the best parts, even if you can see the textual data in a VT100 browser such as Lynx.

Well, that's enough about all the Internet services. But, before we go on and talk about ways you can get Internet access, I should explain about all the different file formats that you run into on the Internet. They're a source of confusion for many new users.

File Formats

Under Windows, we're all used to the simple concept of double-clicking on a document in the Windows File Manager to open it in the proper application. Windows keeps track of which documents go with which applications by means of file extension associations. Thus, we tend not to think about file formats as much. In DOS, file extensions are limited to 3 characters, but that's usually enough.

When you start exploring on the Internet, you quickly discover that most files also have file extensions. Extensions are extremely useful on the Internet because they identify what sort of file you're looking at. However, because the machines on the Internet are not necessarily PCs, they may have more than three characters for an extension (e.g., .html), or they may have multiple extensions (e.g., .tar.Z).

On the Internet, a limited set of extensions is in common use for files that a PC user may care about. These extensions fall into three basic categories: those used to indicate ASCII encoding, those used to indicate compression formats, and several others used to mark certain types of text and graphics files.

ASCII Encoding

Programs and other binary data files (files with more than just straight text in them) contain binary codes that most email programs don't understand, because email programs are designed to display only text. Binary data files even include data files such as word processor files, which contain formatting information or other nonprinting characters. Most programs enable you to save your files in a variety of formats, including text. If you don't explicitly save a file in some kind of text format, then it's probably a binary data file, although there are exceptions.

NOTE: The main exceptions to this are the Windows Notepad, notepad.exe, and the System Configuration Editor, sysedit.exe. These programs can only save text files.

Computers of different types generally agree on only the first 128 characters in the ASCII character set. (ASCII stands for American Standard Code for Information Interchange.) The important fact to remember is that after those first 128 characters, which include the letters of the alphabet and numbers and punctuation, a Windows accented character may be a Macintosh dingbat.

Still, people want to transfer files via email and other programs that cannot handle all the possible binary codes in a data file or application. Programmers therefore came up with several different ways of representing 8 bits of binary data in 7 bits of straight text. In other words, these conversion programs can take in a binary file such as the Windows Clock (clock.exe), for instance, and convert it into a long string of numbers, letters, and punctuation marks. Another program can take that string of text and turn it back into a functioning copy of the Windows Clock program. I'll leave it to the philosophers to decide whether it is the same program.

Once encoded, that file can travel through almost any email gateway and be displayed in any email program, although it's worthless until you download it to the PC and decode it. The main drawback to this sort of encoding is that you must always decode the file before you can work with it. In addition, because you move from an 8-bit file to a 7-bit file during the encoding process, the encoded file must be larger than the original, sometimes by as much as 35 percent.

Now that you understand why we go through such bother, the Internet uses three main ASCII encoding formats (see table 8.6): uuencode, btoa (read as "b to a"), and BinHex . Although BinHex is mostly a Macintosh standard, there are useful documents occasionally encoded in this way, and we'll talk about how to get them decoded on your PC.

Table 8.6: ASCII Encoding Formats

     ------        ----------               -------------

     uuencode      Most common in Unix      Less common on Mac
     btoa          Most efficient           Least common
     BinHex        Macintosh standard       Least efficient


In the Unix world, uucode (also called uuencode) is the most common format. You can identify a uuencoded file by its .uu or .uue extension. Although not in common usage in the IBM PC world, uucode is seen frequently enough that a number of PC programs have sprung up to encode and decode this format. These include Wincode and uudecode among others. You can find Wincode at the following locations:

You're unlikely to run across uuencoded PC files all that frequently because PC files are generally posted as straight binary files, but binary postings to Usenet news will usually use uucode. If your newsreader doesn't automatically decode uuencoded files, you'll have to do it yourself. By default, most LAN-based email programs that have Internet gateways also encode binary files sent across the Internet in uucode format, since it's the least common denominator between systems. However, most LAN-based email programs also decode uucode files, so you generally can just let the email program do the work for you.

Most uuencoded files start with begin 644, followed by the filename. From that point on, there isn't much that is recognizable: rows upon rows of ASCII gibberish with each line being the same length.

NOTE: Because the number 644 is related to Unix file permissions (don't ask), other numbers are possible in uuencoded files, although I see them less frequently.

All uuencoded files end with a linefeed, a space, the word end, and another linefeed (see figure 8.2).

Figure 8.2: Example of uuencode.


Frankly, I don't know a lot about btoa, which stands for binary to ASCII. This format (see figure 8.3) is supported by a complementary atob converter, which translates ASCII files back into binary. It is the most efficient of the three, so btoa files are slightly smaller than the equivalent uuencode or BinHex file. Despite this seemingly major advantage, btoa doesn't appear nearly as frequently in the Unix world as uuencode, and appears rarely in the Windows world. I don't even know of a Windows, or for that matter, DOS program that can even decode an bota file.

Figure 8.3: Example of btoa.


BinHex is by far the most common format you see in the Macintosh world, which is important for PC users primarily when trying to transfer files back and forth from PCs to Macs. In fact, BinHex is basically used only on Macintosh computers. You can identify most BinHex files by the .hqx extension they carry. I haven't the foggiest idea why it is .hqx instead of .bhx or something slightly more reasonable. Keep in mind that BinHex is another one of these computer words that works as a verb, too, so people say that they binhexed a file before sending it to you.

Every BinHex file starts with the phrase (This file must be converted with BinHex 4.0) even if another program actually did the creating. Then comes a new line with a colon at the start, followed by many lines of ASCII gibberish. Only the last line can be a different length than the others (each line has a hard return after it), and the last character must be a colon as well (see figure 8.4).

Figure 8.4: Example of BinHex.

BinHex suffers from only two real problems, other than that of having a vaguely confusing name. It is perhaps the least efficient of the three encoding formats. Its other real problem is that even though tools exist for debinhexing files on PCs and other platforms, they aren't common. For the most part, use uuencode if you plan to send binary files to a user on another platform.

NOTE: Under Unix, you must use a program called mcvert to debinhex files. If you wish to encode or decode BinHex files on a PC, you can find a PC version of BinHex at:

Compression Formats

Along with the various ASCII-encoded formats, you will frequently see files with a number of file extensions that indicate that the files have been compressed in some way. Almost every file available on the Internet is compressed because disk space is at a premium everywhere.

The folks who run Internet file sites like two things to be true about a compression format. They want it to be as tight as possible, so as to save the most space, and they want to be sure that the files stored in that format will be accessible essentially forever, which requires the format of the compressed files to be made public -- so that in theory any competent programmer can write a program to expand those files should the company go out of business or otherwise disappear.

This second desire has caused some trouble over the years because the compression market is hotly contested, and companies seldom want to put their proprietary compression algorithms (the rules by which a file is compressed) into the public domain, where their competitors can copy them. As it is, the compression format most widely used in the PC world is PKWARE's PKZIP.


The most common compression format that you'll use as a DOS or Windows user is the .zip format, developed and distributed by PKWARE. The two utilities, PKZIP and PKUNZIP, compress and uncompress files, as you'd expect.

Originally intended for squeezing files onto floppies for both archival and distribution, PKZIP has many more options than you'd have use for as an Internet user. For instance, it can verify files on write and set the archive bits for files. Later versions can compress entire directory structures into a single archive.

For the most part, you'll use PKUNZIP for simply uncompressing files that you obtain from the Internet. You can find a self-extracting version of all the PKWARE utilities at:

Self-Extracting Archives

What if you want to send a compressed file or files to a friend who you know has no compression utilities at all? Then you use a self-extracting archive, which is hard to describe further than its name. Compression programs can create self-extracting archives by compressing the file and then attaching a stub, or a small expansion program, to the compressed file. To the user, the self-extracting archive looks like a plain executable with a .exe extension, and if you run it, it then expands the file contained within it. Internet file sites prefer not to have many files, particularly small ones, compressed in self-extracting archives because the stubs are a waste of space for most people on the nets, who already have utilities to expand compressed files.

Unfortunately, since the extension of a self-extracting archive is identical to that of a regular application, it is rather difficult to tell them apart. The most obvious way to find out, of course, is to run it. But then if it is a self-extracting archive, then it will start unpacking and creating multiple files in your current directory. And if you should have existing files, then they'll get mixed in. Worse, if the self-extracting archive contains a directory structure, then it will start creating directories in addition to files (but then again, they're usually easier to delete from the Windows File Manager).

A popular program for creating self-extracting archives is LHA, created by the venerable Haruyasu Yoshizaki, or frequently known as just Yoshi! Yoshi's work can be found in many products meant for commercial distribution. Alas, it is DOS level only, so you'll have to drop to a DOS shell to use it. If you'd like a copy of LHA, you can find it at:

Unix Compression

Unix has a built-in compression program called, uncharacteristically for Unix, compress. Compress creates files with the .Z extension (note the capital Z -- it makes a difference), and although you don't see files with that extension too often in Windows file sites, plenty of them exist on the rest of the net (for Unix users, of course).

As far as I know, compress works only on a single file, but you often want to put more than one file in an archive. In many cases, .Z files are meant for Unix users, so these files wouldn't be relevant to Windows users, but if you're itching to uncompress them, you can find DOS-level utilities suitable for the task. If you find the need, here's where you can find one of the available utilities:

Don't even attempt to grab and uncompress files that have the extension .tar.Z. They represent compressed tape archives, which store Unix directory structures. They will be meaningless to you as a Windows user. And since they are usually huge, don't waste bandwidth by trying to copy them to your machine.

Recently, a new format, called gzip, has started to appear in the Unix world. It's marked by the .z or .gz extension. This is a quite good compression format created for the GNU project, and there are DOS versions of the compression (gzip.exe) and uncompression (gunzip.exe) programs. The file resulting from gzip is substantially smaller that than produced by compress, and gzip is more and more widely used for this reason. It's also popular because the code and algorithm have been made available by the GNU CopyLeft agreement.

NOTE: GNU's CopyLeft is like a copyright, but is really for the purpose of ensuring that the copylefted item remains freely available. The agreement is several pages long, but essentially just ensures that nobody messes with the "free" nature of the item. Usually, the only way to get and read this document is to get the source code for a copylefted program and extract it from there, but a quick Archie search shows a copy of it at:


The most common compression format on the Macintosh is the StuffIt format, popularized by a family of free, shareware, and commercial products from Aladdin Systems. The main reason you would care about a StuffIt file is if a friend who uses the Mac sends you a file, say a cross-platform Word or Excel file, and compresses it. You need a program called UNSTUFF.EXE to expand that file on your PC. You get can it at:

Other File Types

You may want to keep in mind a number of other issues with file types, both with formatting text files for different systems and with graphics files that you find on the Internet.

Text Files

Text files are universally indicated by the .txt extension, and after that, the main thing you have watch for is the end-of-line character.

Unix expects the end-of-line character to be a linefeed (LF). Unless you're using a word processor or editor that can translate this linefeed into the standard end-of-line characters called carriage return and linefeed (CR/LF, also known as a hard return) you're in for a big surprise. The text will look extremely long-winded, because there won't be any carriage returns.

Because the Internet is nondenominational when it comes to computer religion (the Internet on the whole -- almost every individual is rabid about his or her choice of computer platform), most communication programs are good about making sure to put any outgoing text into a format that other platforms can read. Most programs also attempt to read in text and display it correctly no matter what machine formatted it to start with. Unfortunately, as hard as these programs may try, they often fail, so you have to pay attention to what sort of text you send out and retrieve, either via email or FTP.

When you're sending files from Windows, the main thing to remember is to break the lines before 80 characters. "Eighty characters," you say, "how the heck am I supposed to figure out how many characters are on a line without counting them all? After all, Windows editors and word processors have superior proportionally spaced TrueType fonts. Humph!"

Yeah, well, forget about those fonts when you're dealing with the Internet. You can't guarantee that anyone reading what you write even has those fonts, so stick to a monospaced font like Courier New. You can probably find an option in your word processor (Word for Windows has it) that enables you to Save As Text and that inserts returns at the end of each line in the process.

After your lines have hard returns at the end, you usually can send a file properly because most communications programs can handle removing the carriage returns (leaving only the linefeeds for Unix). If you don't add these hard returns and someone tries to read your text file under DOS or Unix, the file may or may not display correctly. There's no telling, depending on that person's individual circumstances, but you usually hear about it if you screw up. Test with something short if you're unsure whether you can send and receive text files properly.

Often, the FTP client or email program automatically replaces linefeeds with the combination of carriage returns and linefeeds on files coming in from the Internet, but if that doesn't happen, you can run a Find and Replace in your word processor to do it manually.

The other reason to view files from the Internet in a monospaced font with lines delimited by hard returns is that people on the Internet can be incredibly creative with ASCII tables and charts. Using only the standard characters you see on the keycaps on your keyboard, these people manage to create some extremely complex tables and graphics. I can't say they are works of art, but I'm always impressed. If you wrap the lines and view in a proportionally spaced font, those ASCII tables and graphics look like textual garbage. It's the price you pay for being clever.

Graphics Files

For a long time graphics files weren't commonly posted on the Internet except for use by users of a specific machine, because PCs were not able to read Mac graphic file formats and vice-versa. Now, however, you can view some common formats on multiple platforms.

First among these formats is GIF, which stands for Graphics Interchange Format. It is a graphic file format originally created by CompuServe. GIF files almost always have the extension .gif and are popular on the Internet because the file format is internally compressed. When you open a GIF file in a program such as WinGIF or WinCIM, the program expands the GIF file before displaying it. Web browsers can display GIF files internally, so the GIF format is frequently used on the World Wide Web.

NOTE: It seems that almost no one can agree on how to pronounce GIF, either with a hard G sound or with a J sound. Take your pick; I won't argue with you either way.

The second type of file format you may see is JPEG, which stands for Joint Photographic Experts Group. JPEG files, which are generally marked by the .jpg extension, use a different form of compression than GIF. JPEG file compression reduces the image size to as much as one-twentieth of the original size, but also reduces the quality slightly. Windows viewers abound. On the Internet, you'll find Lview, WinECJ, and WinJPEG, among versions for virtually every graphics platform.

Although some Web browsers, most notably Netscape, can display JPEG images inline, most cannot, and must download the image from a Web page and use another program to display it for the user.

Sound and Video

With the advent of the World Wide Web, sound and video files have become far more common on the Internet, although they're so large that most people using a modem won't want to spend the hours required to download a short sound or video clip. But, if you have either a fast connection or patience, there are several file formats that you should watch out for.

Sound appears to come primarily in the Ulaw format originated by Sun Microsystems for their Unix workstations. Files in this format have the .au extension. I know little about the Ulaw format except that if you have a properly configured sound card in your Windows machine, you can use one of two free utilities, Wham or Wplany, to play audio clips in the .au format.

There are three video formats that you should know about: MPEG, QuickTime and AVI. MPEG stands for Motion Picture Experts Group. It's actually a compression format, much like JPEG, although one optimized for compressing video rather than still images. MPEG files generally have the extension .mpeg, as you might expect. The only Windows program I know of that can play MPEG files is MPEGplay. You can find MPEGplay on the Internet at:

The QuickTime format is important to know about because as of this writing over half of all multimedia titles that are produced on CD-ROM for the Windows platform are produced on Macintoshes, where QuickTime is the native audio and video format. Apple created QuickTime for Windows to allow cross-platform creation and playback of video material. There aren't many QuickTime movie players under Windows. However, if you manage to purchase a CD-ROM multimedia title, you have a 50 percent chance of getting a QuickTime player. Luckily, QuickTime isn't all that common on the Web, but if you do run into a QuickTime movie, the files available at the URL below, in conjunction with the Windows Media Player, enable you to play QuickTime movies.

AVI stands for Audio Video Interleave and is the native video format for Windows. It is part of the Multimedia PC (MPC) specification. Although slow to catch on, it will probably, if it hasn't already, surpass QuickTime as the dominant Windows video format. The common extension for AVI files is .avi.

Wrapping Up

That should do it for the background material about the various TCP/IP Internet services, such as FTP, Telnet, Gopher, WAIS, the World Wide Web, and a few other minor ones like IRC and MUDs. Feel free to flip back here and browse if you're confused about basic usage or what might be important to look for in a client program. Finally, we looked at the many file formats you may run into and the programs that you must use to decode or display them. Now let's move on and look at the common methods of connecting to the Internet.