Last week, in the first part of this article in NetBITS-001, I explained how one machine finds another on a local area network (LAN) using Ethernet. But the Internet doesn't run on Ethernet - it can't, in fact - so how do two machines find each other on the Internet?
Matching Names to Faces -- The Internet relies on TCP/IP, a protocol that allows TCP packets to run over Internet Protocol, or IP. TCP is just a method of allowing applications to exchange data with other applications. (There's a streaming protocol called UDP that also runs over the Internet; it's used in cases where it doesn't matter if some data is lost, like sending audio or video.)
Every TCP packet, just like an Ethernet packet, has a "from" and a "to" address in its header. TCP travels over a LAN in encapsulated form. That is, Ethernet packets contain both addresses and other administrative information and your data; the data part of the packet can contain anything - including a different kind of networking protocol packet.
Ethernet delivers the packet to the correct destination, whether that's a computer or a router or a printer. The receiving device peels the Ethernet packet off and uses the material inside. It's like a nut: you crack the shell, and then remove the husk, finally finding the meat deep inside.
The IP part of TCP/IP comes in where packets leave the world of Ethernet and travel from machine to machine over the Internet using IP addresses.
I have to introduce one more protocol, and then it all falls into place. The final bit is ARP: Address Resolution Protocol. ARP is software that creates a list of associations between Ethernet MAC addresses and IP addresses. This list is called a table, but you can visualize it like a spreadsheet with two columns. (You'll recall from Part 1 that a MAC address is the unique identifier for each Ethernet device in the world; each Ethernet card has the number built in at the time of manufacture and the numbers never overlap.)
The ARP table is part of the TCP/IP system, and it's stored separately on each machine that can talk Ethernet and TCP, whether it's a Unix box or a Macintosh. When one machine connects with another it either finds an association between the MAC address and IP address and enters that in its ARP table; or, if the association is already in the table, the connecting machine retrieves and uses it.
Whew! Wasn't that easier than you expected?
Getting from A to B and Back Again -- A router is a special piece of hardware that understands different protocols and converts one type of packet into another. In the simplest case, a company with high-speed Internet access, such as a T1 line (1.544 Mbps), connects its internal Ethernet network to the rest of the Internet using a router.
This archetypal router would have an Ethernet interface that plugs into the Ethernet network; it would also have a serial interface that plugs into a device which, in turn, connects to a higher-level network that eventually leads to the Internet.
When a packet destined for a machine on a company's Ethernet network is sent over the internal LAN, the router ignores it. It possesses a list of internal network addresses; by default, everything else is external. The internal-to-internal packets find their destinations without help from the router.
The device trying to send a packet broadcasts a message saying, "Does anybody on the Ethernet network know where I can find IP address a.b.c.d?" The correct device responds, "That's me!" and the sending machine makes a note in its ARP table for future reference. (The table is flushed occasionally to eliminate stale information.)
If a machine broadcasts a message saying, "I'm looking for a.b.c.d" and the router notices that a.b.c.d isn't an internal network number, it grabs the packet, and passes it up the ladder. The router connects, usually through leased phone company lines, to another router at a network service provider (NSP). The NSP router it connects to is also on an Ethernet or similar high-speed network, usually with a bunch of other routers, all of which transfer packets rapidly back and forth. Some of these devices handle billions of packets an hour.
(This is called store-and-forward communication, by the way, even though it might "store" the packets for only milliseconds. In contrast, within an Ethernet network, packets are received at the speed of electricity on all connected devices. A better example of store-and-forward networks were the old dial-up systems that exchanged Usenet newsgroup posts: they would actually retrieve posts from one site and store them on hard disks where they might sit for hours or days before being transmitted to the next site.)
The NSP's router knows much more than the company's router. It knows where all IP networks in the world live and how to reach them through other routers. I won't go into much detail about that; another set of protocols, known as BGP, allow routers to exchange information about the paths to other networks.
So, the NSP router sends the packet through anywhere from a couple to even a dozen more routers - 30 is the maximum, but 10 to 15 is typical - where each in turn compares the destination to its table to see where to send the packet next.
The packet finally lands in the router that's electrically connected via Ethernet to the machine for which the packet is destined. That router also has an ARP table that matches IP numbers to local Ethernet MAC addresses. It takes the incoming TCP package, wraps it in an Ethernet envelope with the destination machine's MAC address on it, and sends it on to its final destination.
Take a deep breath and read the next sentence: This happens billions of times per minute around the world. That's right, billions. Someday, and not too distantly, it will be trillions.
Real Situations -- Since you now know exactly how data moves from machine to machine over the Internet, how does this apply when you dial in to an Internet service provider (ISP)? The modem-to-modem connection you make using PPP (Point-to-Point Protocol) or SLIP (Serial Line Interface Protocol) enables your machine to communicate with a machine on the other end of the connection. That machine receives your packets just like a router and passes them out onto the Ethernet network to which it's connected. In turn, your packets are passed up the chain and out (and back again) just as though you were on a full-blown network.
What happens if a router receives incorrect information about all the other networks on the Internet? The fact is, this happens occasionally. You might recall on 17-Jul-97 the Internet was sluggish due to routing problems that were described vaguely in the media. What happened relates to the fact that routers share information about networks with each other using the BGP protocol, and routers don't always check the veracity of the information they share. In this case, one small network provider erred in creating its routing tables; these tables were distributed in a cascade, which caused failures in connections, which in turn prevented the routers from getting updates with new, better information. A lot of power cycling and flushing went on before everything settled down. The good news is that router software manufacturers are working to make this harder to muck up, so the devices do more verification and can dump bad information faster and better.
Why doesn't the Internet work more like a phone conversation, rather than via all these little packets flitting around? After all, when you download a file from an FTP site, it looks like information is streaming to you through that connection. The answer is that even though the data transmits in individual chunks, a continuous "channel" (called a socket) opens on both the sending and receiving machine. When you send a request, the server knows where the data is coming from and replies to the sender until it has sent everything requested. It then shuts down the socket, as does the requesting machine. This is one way hackers cause problems. They can send a vast number of requests to open sockets and then fail to acknowledge them - this acknowledgment is part of the normal handshaking that occurs for every socket connection. Earlier operating systems didn't know what to do in these cases, so they would leave more and more sockets open until they crashed, ran out of memory, or stopped responding. Most servers have been patched to prevent this from causing confusion. They just terminate connections after a reasonable period of inactivity.
Why doesn't the Internet use Ethernet, if it's so fast? The main problem with Ethernet is that it's bounded by distance. The farthest points on an Ethernet network must be half the distance it takes to transmit a single packet. This seems odd, until you realize that Ethernet networks are limited by the speed of electricity through the physical wire. The speed of electricity is a fraction of the speed of light: it's constrained by "friction" or resistance in the medium itself. Bits over an Ethernet network can only be sent from one place to another at this speed. Ethernet sends the second bit when the first one is about a foot and a half away.
If the transmitting machine started sending a packet but couldn't "hear" an interfering packet being sent by the most distant machine before it has sent the entire packet, the transmitting machine wouldn't know a collision had taken place. The packet would be lost; the Ethernet devices wouldn't know to retry; and the network wouldn't work.
The Internet uses serial transmission for long distances, where two and only two devices are connected by a single connection. This could be copper wire, fiber optic cable, satellite transmission - whatever. The point is that serial protocols transmit a bit at a time; speed is irrelevant and you don't have to worry about collisions.
Summing Up -- I recommend that you visualize this whole array of transmissions and receptions as a hierarchy. Application software, like an email program, sits on top of the heap in its own little palanquin. It knows how to talk with the next level down - a TCP stack, which is an implementation of the TCP protocol on a particular machine.
The TCP stack in turn knows how to create and decipher packets and deliver them between an interface (such as provided by PPP or Ethernet) and the application software (like an email program) that's talking and responding.
Finally, the devices that make up the public IP network themselves know how to interact and pass data between all of the hundreds of thousands of networks that comprise this thing that we can name with a single word: the Internet.