TidBITS Troubleshooting Primer, Part 1
There’s no point in pretending that problems never happen. Although this may be a typically male viewpoint, life – computer life and life in general – can be seen as nothing but problems ("challenges," "opportunities") and solutions. What has always amazed us is the level to which people without much technical experience assume that they can’t possibly solve computer problems. Although specialized knowledge certainly helps, troubleshooting is a universal skill. If you can figure out why your brakes are squeaking or why the sewing machine is jamming, you can figure out computer-related problems. Despite what many non-computer people think, there’s no real difference.
For those of you who find tracking down and eliminating a problem intimidating, here’s a guide that walks you through how I troubleshoot problems of all types. (This article is adapted from the troubleshooting chapter I wrote for the book my friend Glenn Fleishman and I are co-authoring right now, tentatively titled The Wireless Networking Starter Kit.)
The most important piece of advice I can give up front is: Be methodical. If you start trying solutions without thinking about what caused the problem and what the effect of any given solution may be, you just end up complicating the entire situation. The best way to encourage a methodical approach is to take notes about what you see (especially any error messages), what you do, and the effects of what you do.
Describe the Problem — The first step in troubleshooting is to identify the problem and gather information about it. That sounds simple, and it usually is, because most problems aren’t particularly subtle. Perhaps you can’t send email, or your one wired computer isn’t visible to the computers on your wireless network.
It’s important to determine if the problem is reproducible or intermittent. Although an intermittent problem may be less irksome than a reproducible problem if you can keep working through it, intermittent problems are much harder to track down, because one of the variables involved is related to a time- or state-related fact. Reproducible problems almost beg to be solved, because you can’t keep working until you’ve solved the problem.
Pay attention to any visible indicators that might give more information about the problem. For instance, many devices have status LEDs that indicate whether a device is turned on and if it’s performing some sort of activity. If those LEDs aren’t working the way you expect, record that information.
Break the System Apart — Once you have a firm grasp on the problem, you need to start breaking the system related to the problem into discrete steps or pieces. Then you can start analyzing different parts of the whole. The hard part here is that you may not realize what the different parts of the system are, making it difficult to understand how one could fail. But if you think about what’s involved in using the system, you should be able to determine most of the parts.
For instance, take the example of a wireless network that also has one computer connected via an Ethernet cable. In this sample network, the one wired computer is used as an informal file server. You’re using one of the wireless computers, and you suddenly can’t connect to a shared folder that’s worked fine before. What are the pieces of this system? Let’s determine what must be true for the situation to work properly, after which we can analyze each of the components.
On your computer, you need properly installed file sharing client software.
Your computer must have a working connection to the wireless access point.
The access point must allow you to see a computer connected via wired Ethernet.
The wired Ethernet computer must have a working connection to the access point.
File sharing server software must be running on the wired Ethernet computer.
A folder must explicitly be shared on the wired Ethernet computer.
You could certainly break these pieces into even smaller pieces, but this should be sufficient to get started.
Keep in mind that what I’ve just described is only one working system, which is important, because if there are other working systems – other wireless computers that can see the file server – that can help you zoom in on the problem quickly.
Note all of the pieces of the system briefly in your notebook, and if you’re a picture person, consider drawing yourself a diagram of how it all fits together; this can come in especially handy if you actually need to break the system apart by disconnecting cables or rearranging equipment.
Ask Yourself Questions — Now that you’ve identified all the parts of the system, it’s time to look carefully at each part, making up a possible reason why a failure at that point could be responsible for the whole problem. In our example, let’s take each part and analyze it, asking questions that lead to tests.
File sharing client software is of course necessary, but since you were able to connect previously, it’s a good assumption that it’s installed. Is it turned on? Has anything changed since you last connected successfully that might provide a clue? Have you restarted (it’s always worth trying)? What about other computers, both wired and wireless? Can their file sharing client software see the wired computer?
Is the wireless connection to the access point working from your computer? Is it working for other network-related tasks at the same time you can’t connect to the wired computer? Can other wireless computers connect to the access point?
Is the access point configured correctly so wireless computers can see the wired computer? Since it worked properly before, this likely isn’t the source of the problem. Has anything changed on the access point since you last connected that could be related?
Can the wired computer connect to the access point via its Ethernet cable? (Never underestimate the trouble a broken or flaky cable can cause.)
Is file sharing turned on and configured properly? Has anything changed on that computer that might have resulted in it being turned off or reconfigured? Have you restarted the wired computer recently?
Is the shared folder still shared? Could someone have changed which folders were shared? Has the folder been moved or renamed or otherwise modified in some way that might have changed its state?
I mentioned the difference between reproducible and intermittent problems above; if you have an intermittent problem connecting to the wired Ethernet computer, that generates additional questions.
Does the problem happen at all times of day? Does it happen right after you’ve done something else? Is it related to the presence or absence of any other machines?
Jot these questions down in your notebook, numbering them so you can easily refer back to them when your tests start providing answers.
Answer Questions — Once you have your list of questions, revisit it and think about which test you must perform to come up with an answer to each question. Separate your questions roughly into easy, moderate, and hard categories (you might write an E, M, or H next to each question’s number in the margin).
Also give your intuition a chance to work. If you have a nagging feeling that your spouse might have let your 4-year-old nephew play a game on the wired Ethernet computer, start with that machine. Or, if you just had to reset the access point to factory default settings for another reason, start there.
Wherever you choose to start, begin with tests that eliminate the easiest questions first. For instance, it’s trivial to check if your nephew kicked the Ethernet cable out of the jack; there’s no reason to consider reinstalling the entire operating system on that machine until you’ve exhausted every easier option.
Working methodically is essential at this point, and if you change something in a way that significantly changes the overall system, it’s best (if possible) to put it back so the situation stays the same as when you analyzed the problem. For instance, if you had been thinking about installing a new access point that you’d just bought, don’t do it in the middle of the troubleshooting process or you risk confusing everything.
Make sure to check off each question you answer in your notebook, and note any interesting things that happen when you perform the test. I don’t suggest you do this because you’re going to forget what you’ve done while you’re troubleshooting, but because you may have forgotten by the next time the problem happens. Plus, if you end up wanting to ask someone else for help, you can say authoritatively that you had indeed tried some test with negative results.
In most situations, the solution to your problem will make itself clear during this process of answering questions. Perhaps it’s summer, and the reinstallation of your screen door is blocking the Wi-Fi signal, or perhaps your spouse configured the computer in an unusual way for your nephew’s game. Maybe your access point lost track of the wireless-to-wired Ethernet bridge settings, or maybe your computer or the access point just needed to be restarted.
Get Expert Help — With truly tricky problems, your tests won’t reveal any conclusive answers. Don’t feel too bad, because if you’ve followed the procedure so far carefully, your failing is most likely that you don’t understand all the parts of the system well enough. What to do next? Ask for help, of course, and that’s where I’ll look in the next part of this article.