TidBITS experienced the heartbreak of server-crushing attacks several times recently, and after much consternation, Adam Engst and I figured out the cause.
The mystery starts with Google News. Several times recently, TidBITS articles have bubbled up to the top of the Sci/Tech section of Google News. These articles tend to be about an event that occurred a little bit earlier, but include our usual more deliberate analysis and context instead of just the bare facts of the situation. (We’re not the breaking news sorts, nor do you expect that from us.)
Immediately after an article appeared on Google News, our site started to become crippled with page requests. The graphs from our virtual host, Linode, showed massive inbound traffic. At one point, in fact, a support rep suggested it was a distributed denial of service (DDoS) attack, and noted that if it affected other customers, they might have to block our IP address to keep everything else running. But the traffic wasn’t being recorded as page views in Google Analytics, and the server would die again within less than a minute after being restarted, which made no sense.
We were baffled. We thought initially that the load was from people clicking on the Google News headline, since we had once seen intense traffic after becoming the top result on Google’s main search when they had a special snow-falling effect in December 2011. If you searched for “Let It Snow”, our article was the top result in news, and Google put the news results above even Web page results. The “Let It Snow” search resulted in 1,200 simultaneous visitors to our site for a prolonged period, and did knock the server down several times. I tweaked settings to prevent that in the future.
I kept thinking that the problem was due to a server misconfiguration and put non-trivial amounts of time into tweaking settings, but it was all for naught. Each time the problem happened, Adam and I would rack our brains trying to figure out how to keep the server up.
After the fourth or fifth time of being hammered after an article appeared in Google News, I finally discovered a pattern I should have seen earlier. Our access logs were full of requests from many different IP addresses asking for the same page repeatedly within a few seconds. That in itself wasn’t unusual for traffic generated by Google News, but more peculiar was the user-agent identifier — that’s the bit of text a browser sends that tells a server what its maker and version are. We were seeing “Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:220.127.116.11) Gecko/2009042316 Firefox/3.0.10 (.NET CLR 3.5.30729)”.
Mozilla is the identifier for Firefox, and, confusingly, Windows XP’s first 64-bit release identifies itself as Windows NT 5.1. (Windows XP was built on the Windows NT branch of operating systems, a break from the earlier Windows 95, 98, and Me platform.) It’s ridiculous to think that a Mac publication would receive massive amounts of traffic from what was nominally an old version of Firefox running on Windows XP, even though XP remains in wide use.
That was the key to figuring out the problem. Some piece of Web-aware software, likely malicious and certainly poorly programmed, was being triggered on thousands (perhaps tens of thousands) of computers whenever an article link appeared on Google News. We still aren’t sure why, but it definitely wasn’t for humanitarian reasons. In security lingo, computers that have been subverted with malware to do the bidding of distant control systems are called “zombies.”
Once we had identified the rogue user agent, we were able to learn more about the problem, notably from an active and interesting thread at Stack Overflow from other system administrators.
The remediation was simple: we now block any incoming traffic marked with that user agent. The block pattern I used initially was too broad, and blocked some TidBITS readers who either use Windows XP or whose companies use proxy gateways that report that user agent. Of course, serious malware would use random user agent names, and would require more sophisticated pattern recognition and blocking on our part.
Before we figured out that these massive traffic spikes were due to zombies, I also took the opportunity to deploy a caching system in the hopes that eliminating database lookups from our site would solve the problem. It didn’t fix that problem, but it did improve performance overall quite significantly. Now, when you retrieve an article page without being logged in to a TidBITS account, you get a stored snapshot of that page, which is updated every five minutes, or when a comment is posted or the article modified. Logged-in users always receive a fresh page customized by their login. That should let us handle “Let It Snow” cases again with little effort, too.
The Internet is full of strange and perverse beasts, and just when you think you’ve found one’s tail, it turns out to be the mouth, breathing fire at you. We’ve muzzled this particular problem, but will keep alert for future demons.