Facebook, Instagram and WhatsApp were all down for nearly six hours on Monday after they were was hit by a major outage.
But what went wrong and could it happen again?
Why did Facebook go down?
Facebook, which also owns Instagram and WhatsApp, has apologised for the disruption, which it blamed on a “faulty configuration change”.
In a lengthy statement it said: “Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centres caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centres communicate, bringing our services to a halt.”
The New York Times reported the issue probably stemmed from a misconfiguration of Facebook’s servers, which did not let users connect to its sites.
The problem was compounded when apps – and users – got error messages and kept trying to reconnect, sparking a “tsunami” of additional traffic, according to experts at Cloudflare.
The outage also left some Facebook staff unable to enter buildings or use internal communications. “Facebook basically locked its keys in its car,” tweeted Jonathan Zittrain, director of Harvard’s Berkman Klein Centre for Internet and Society.
Could it happen again?
In short, yes. This is not the first time Facebook has suffered a major outage. In April 2019 its apps went down for about two hours before they were gradually brought back online, and it was roughly 24 hours before they were fully functional.
Facebook again blamed a “server configuration change”, which means the latest outage appears to be similar.
But while the server issues are the most visible symptom, they are caused by underlying technological issues such as a bug or human error. That means a similar outage could happen again.
What alternatives did people turn to?
Unsurprisingly, the collapse of Facebook, WhatsApp and Instagram sparked a flood of internet traffic to rival social media apps.
Data from Cloudflare shows search queries for Twitter, Signal, Telegram and TikTok all surged as the outage dragged on.
Signal, the privacy-focused private messaging app used by Edward Snowdon, said it had millions of new sign-ups on Monday. Meanwhile Telegram users complained of the app slowing down as people migrated from WhatsApp.
Twitter stayed online, with boss Jack Dorsey poking fun at his rival and endorsing Signal.
Twitter Support tweeted: “Sometimes more people than usual use Twitter. We prepare for these moments, but today things didn’t go exactly as planned. Some of you may have had an issue seeing replies and DMs as a result. This has been fixed. Sorry about that!”
It had earlier joked: “Hello literally everyone.”
Was this the worst outage ever?
Monday’s outage left users unable to access Facebook, WhatsApp or Instagram for almost six hours.
The shutdown was also significant in that it appeared to be a blanket issue, with access blocked for all users.
During an outage in April 2019, Facebook managed to restore partial access for some users within a few hours, but others were left unable to use the apps for a full 24 hours.
Once again, Facebook was forced to tweet updates about the problems.
But its worst outage came in 2008, when a bug knocked the site offline for all users for about 24 hours. However, back then the platform only had about 80m users, while the total is now more than 3bn.
Will there be regulatory implications?
The most immediate impact for Facebook was a financial one, as the outage wiped nearly $50bn (£36bn) off its stock market value.
Shares in the New York-listed company dropped 5pc as the problems persisted, reducing the paper wealth of Mark Zuckerberg, Facebook’s founder and chief executive, by $7bn.
But the technical hiccups could pose a bigger problem for Facebook, drawing attention to its significant market power at a time of heightened regulatory scrutiny.
The simultaneous collapse of three of the world’s most important internet services due to a single server error is likely to raise questions over whether the company has become too big.
Critics may also point out that the problem was compounded by Facebook’s reliance on its own internal systems – a factor that meant its staff were initially unable to resolve the issue.
This could raise questions about whether the company should face regulation over the way its infrastructure is designed and managed.
Adam Leon Smith, of BCS, the Chartered Institute for IT and a software testing expert, said: “The outage is caused by changes made to the Facebook network infrastructure. Many of the recent high-profile outages have been caused by similar network level events.
“It is reported by unidentified Facebook sources on Reddit that the network changes have also prevented engineers from remotely connecting to resolve the issues, delaying resolution.
“Notably, many organisations now define their physical infrastructure as code, but most do not apply the same level of testing rigour when they change that code, as they would when changing their core business logic.”