This article originally appeared in TidBITS on 2007-11-05 at 6:28 a.m.
The permanent URL for this article is:
Include images: Off

Explaining Our Recent Server Woes

by Adam C. Engst

My apologies to anyone who has run into problems accessing the TidBITS and Take Control Web sites over the last few days. We've been experiencing some very weird errors, some of which have caused downtime, and in an attempt to isolate the cause, I've had to reboot the server numerous times and work with the extremely clued-in technicians at digital.forest [1] to run hardware tests. Most notably, the machine was busy running memtest [2] for most of Sunday night. The good news is that the RAM showed no problems across five passes (close to 12 hours of testing) so it can be eliminated as a possible cause; the bad news is that we're still struggling to figure out what might be happening.

We've tried to minimize the effect of this testing through four tricks:

Alas, we're still unsure as to the cause of the problems, but we have more things to try, all while attempting to minimize downtime. The server has great connectivity at digital.forest, and the technicians are helpful, but it's still difficult to troubleshoot a remote production server that's constantly modifying a 2.5 GB database file. I certainly hope we can fix things without anyone noticing, but if you do have trouble connecting, now you know why. For more detail, check my Twitter page [4], since I tend to post updates about what's going on even while the machine (and thus my email) is down.