Around noon Eastern time on 4 October 2021, Facebook, Instagram, and WhatsApp suddenly vanished from the Internet and remained offline for much of the day. While no one outside Facebook yet knows why it happened, Internet infrastructure company Cloudflare has an explanation of what went wrong. It all traces back to an obscure but vital networking protocol called the Border Gateway Protocol (BGP).
BGP enables big networks to tell each other how to route packets. Early on 4 October 2021, BGP messages were sent that withdrew routes to Facebook’s DNS servers, effectively taking them offline. DNS is short for Domain Name System, and it’s the Internet protocol that lets you type
tidbits.com to visit our site rather than having to know and enter our IP address of
18.104.22.168. Without global DNS servers knowing how to match
facebook.com to an appropriate IP address, there was no way to connect to any Facebook server.
Throughout the day, many other intermittent outages were reported, like Apple Pay, Gmail, and Twitter, along with cellular carriers, as you can see in the Downdetector screenshot below. That likely happened because both apps and users kept trying to connect to Facebook, which strained the DNS servers and led to collateral issues all over the Internet. Facebook resolved the routing and lookup problems for its servers after about 5 hours, and Instagram came back an hour or so later.
Cloudflare’s blog post neatly summarizes the lesson from the day:
Today’s events are a gentle reminder that the Internet is a very complex and interdependent system of millions of systems and protocols working together. That trust, standardization, and cooperation between entities are at the center of making it work for almost five billion active users worldwide.