URLs ‘R Us
We’ve been using URLs in TidBITS for over a year now, and I don’t think an issue goes by without us pointing at some resource or another with a URL. I wrote a little about URLs back when we first started, but with our readership growing so quickly I think it’s worthwhile to talk about URLs some more. I’ve taken most of this article from information I originally wrote for my Internet Starter Kit for Macintosh, Second Edition.
URL generally stands for Uniform Resource Locator, although some people switch "uniform" for "universal." Despite what I’ve heard from one source, I have never heard anyone pronounce URL as "earl;" instead, everyone I’ve talked to spells out the letters.
URLs constitute the most common and efficient method of telling people where to find objects available via FTP, the World-Wide Web, and other Internet services. I say "objects" because you can specify URLs not only for files and Web pages, but also for stranger things, such as email addresses, Telnet sessions, and Usenet news postings. For example, this issue’s ShrinkWrap article ended with an FTP URL you can use to download ShrinkWrap.
A URL uniquely specifies the location of an object on the Internet, using the three main bits of information that must be used in order to access any given object. First is the type of server making the object available, be it an FTP, Gopher, or World-Wide Web server. Second comes the machine on which the resource lives. Third and finally, there’s the full pathname to the object. This description is a slight oversimplification, but the point I want to make is that URLs are an attempt to provide a consistent way to reference objects on the Internet.
Client/Server — If you see a URL that starts with "ftp" you know the file specified in the rest of the URL is available via an FTP server, which means you could use an FTP client, such as Anarchie or Fetch, to retrieve it. If the URL starts with "gopher", a Gopher client like TurboGopher could access the file on the Gopher server in question. If the URL starts with "http", it’s on a Web server, so you might use a Web browser like MacWeb or Netscape. Other server types used in URLs include "news", "mailto", "telnet", and "wais", although they’re less common than FTP and Web URLs.
You can use a Web browser to access most of the URL types above, although Web browsers are not necessarily ideal for anything but information on the World-Wide Web itself. Web browsers work pretty well for accessing files on Gopher servers, and via gateways to WAIS databases, but FTP via a Web browser is clumsy (and may fail entirely with certain types of files, such as self-extracting archives).
Machine — After the URL type comes a colon (:) and two slashes (//). These characters separate the server type from the second part of common URLs. This second part is the name of the Internet machine that contains the object you’re seeking. In some rare circumstances, you may need to use a username and password in the URL as well. A URL with a username and password might look like this: ftp://username:[email protected]/pub/
Path — The last part of the URL gives the path to the directory of the object you’re looking for, and it may also give the name of a specific file. This is separated from the machine name by a slash (/). When used with WAIS or various other protocols that don’t simply point at files, the path may specify other types of information. You don’t have to specify the path with some URLs, such as FTP or Gopher URLs, if you’re only connecting to the top level of the site.
If an FTP or Gopher URL ends with a slash, that means it points at a directory and not a file. If it doesn’t end with a slash, it may or may not point at a directory. If it’s not obvious from the last part of the path, there’s no good way of telling until you go there. Since most Web servers enable the creation of some sort of default.html or index.html file to be served in the absence of a specific file in the URL, it’s a bit less important for Web users to realize whether or not they’re specifying a file or a directory.
Using URLs — All of these details aside, how do you use URLs? Your mileage may vary, but I use them in three basic ways. First, if I see them in email or in a Usenet posting, I often copy and paste them into Anarchie (if they’re FTP URLs) or Netscape or MacWeb (if they are other types). I do this because copying the URL into the appropriate client is the easiest way to retrieve a file or connect to a site with a MacTCP-based Internet connection. In NewsWatcher 2.0b24 (and InterNews for FTP), you can simplify the process by command-clicking URLs to have them resolved by the appropriate FTP (Anarchie or Fetch), Gopher (TurboGopher 2.0b7), or Web (MacWeb 1.00A3 or Netscape 1.0N) client program. MacWeb 1.00A3 can also use other programs to resolve URLs more appropriately, and finally, the next version of Eudora (perhaps only the commercial version) will sport this feature as well.
Sometimes I manually decode the URL to figure out which program to use and where to go. This method takes more work, but sometimes pays off in the end. You can put a screw in the wall with a hammer, but it’s not the best tool for the job.
Third and finally (and this is where you come in), when I want to point someone at a specific Internet resource or file, I provide a URL. URLs are unambiguous, and although a bit ugly in running text, easier to use than attempting to spell out what they mean.
Consider the example below:
ftp://ftp.tidbits.com/pub/tidbits/issues/1995/ TidBITS#260_23-Jan-95.etx
To verbally explain the information in that URL, I would have to say something like: "Using an FTP client program, connect to the anonymous FTP site <ftp.tidbits.com>. Change directories into the /pub/tidbits/issues/1995/ directory, and once you’re there, retrieve the file TidBITS#260_23-Jan-95.etx."
Note that our long-time naming scheme with TidBITS isn’t all that Web friendly, since the # character has a specific meaning in Web URLs. Stripping off the filename and hitting the directory manually would always work, though, as would simply pasting that URL into Anarchie or Fetch.
The URL enables me to avoid the convoluted (and boring) language above; frankly, URLs are in such common use on the Internet you might as well get used to seeing them now. And for those of you who recommend files to get via FTP or sites to browse with a Web browser, please use URLs since they make life easier for everyone.
If you try to retrieve a file or connect to a Web site and are unsuccessful, chances are either you’ve typed the URL slightly wrong, the server is down temporarily, or the file no longer exists. If an FTP URL doesn’t work, try removing the file name from the last part of the URL and look in the directory that the original file lived in for an updated file.
If, after all this, you’d like to learn more about the technical details behind the URL specifications, check out:
http://info.cern.ch/hypertext/WWW/Addressing/ URL/Overview.html
I find that URLs don’t always work well for files stored on Gopher servers, since Gopher allows spaces and other characters that URLs don’t accept. Thus, spaces are encoded in Gopher URLs with %20 to indicate that there’s a space there. Similarly, WAIS sources usually are easier to refer to by name – using a WAIS client such as MacWAIS makes it easy to use sources without worrying about all the additional information in a URL.