My apologies to anyone who has run into problems accessing the TidBITS and Take Control Web sites over the last few days. We’ve been experiencing some very weird errors, some of which have caused downtime, and in an attempt to isolate the cause, I’ve had to reboot the server numerous times and work with the extremely clued-in technicians at digital.forest to run hardware tests. Most notably, the machine was busy running memtest for most of Sunday night. The good news is that the RAM showed no problems across five passes (close to 12 hours of testing) so it can be eliminated as a possible cause; the bad news is that we’re still struggling to figure out what might
We’ve tried to minimize the effect of this testing through four tricks:
- Since most TidBITS-related article traffic is actually served by Glenn’s Web server, which identifies itself as tidbits.com, we repointed the DNS settings at easyDNS (which we recommend highly for being easy to use and reliable) for www.tidbits.com to tidbits.com. That way, anyone visiting our home page wouldn’t see anything out of the ordinary. However, that worked for only the www.tidbits.com home page; other pages further down in the hierarchy just returned an error.
- For TidBITS Talk, the Check for Updates links in our ebooks, and other things that use emperor.tidbits.com, we again repointed DNS at Glenn’s machine, and he set it so any emperor.tidbits.com request was served a page explaining the situation. Not ideal, but better than a generic error.
- For Take Control, where we didn’t want to lock out potential ebook customers, we repointed DNS at Glenn’s machine and set it up so any www.takecontrolbooks.com request loaded a custom page that looked much like the normal Take Control home page and made it easy for people to buy our Leopard titles via eSellerate. For other books, I linked to the version of our catalog on the eSellerate site. To judge from the number of orders that came in during the downtime, this approach worked fine.
- Since all of our incoming mail goes through Postini (now owned by Google, though we haven’t seen any changes for better or worse since the acquisition) before arriving at our server, I tweaked Postini’s settings so it sends email to the IP number of our server, rather than the domain name. That way, when I took our server down, Postini just held onto the mail (since it couldn’t contact our server at its IP address) until I brought the server back online. Had I not made that change, Postini would have tried to deliver to Glenn’s machine (when it was responding to emperor.tidbits.com). Since Glenn’s machine doesn’t accept mail for tidbits.com addresses, it would have rejected the messages.
Alas, we’re still unsure as to the cause of the problems, but we have more things to try, all while attempting to minimize downtime. The server has great connectivity at digital.forest, and the technicians are helpful, but it’s still difficult to troubleshoot a remote production server that’s constantly modifying a 2.5 GB database file. I certainly hope we can fix things without anyone noticing, but if you do have trouble connecting, now you know why. For more detail, check my Twitter page, since I tend to post updates about what’s going on even while the machine (and thus my email) is down.