User Tools

Site Tools


tag:postmortem

TAG: postmortem

2016/09/20 16:12 Amazon * Amazon EC2 and Amazon RDS Service Disruption, April 21 2011 * AWS Service Event in the US-East Region, October 22 2012 * A memory leak bug wa…
2016/09/20 16:17 Company Data Centers * Data corruption leading to major downtime of CSC in February 2016 * ING bank in Romania lost its data center from a loud noise. pos…
2016/09/20 16:13 Desktop Applications * Eve Online accidentially deleted boot.ini, which rendered the Windows machines unbeatable * A post-mortem analysis and description…
2016/09/20 16:15 Embedded Systems * DDoS Light Bulb postmortem
2016/09/20 16:14 Facebook * Outage in September 2010: The propagation of erroneous configurations lead to the system DDoS'ing itself. * Outages in September 2015: “Unexpect…
2016/09/20 16:14 Google * Google App Engine * GMail outage in 2009 * Another GMail issue in 2009 * Outages of several services in January 2014 postmortem
2016/09/20 16:14 Heroku * Widespread application outage in April 2011 * Widespread application outage in June 2012 * Dyno outage in February 2012 postmortem
2016/09/20 16:15 KeenIO Keen IO had a case in 2014 with their distributed messaging system Kafka, where the hypothesized fix for a problem actually made it worse. They call i…
2016/09/20 16:11 Microsoft Azure * Service disruption due to leap day bug * Service disruption due to delayed SSL certificate update postmortem
2016/09/20 16:16 Military The F-35 joint strike fighter is a development program by DoD that faced several issues with software. * IT Hickups article discussing the case. *…
2016/09/20 16:15 NASA In 1999, NASA lost its Mars Climate Orbiter due to a metric misshape in the software. * Mars Climate Orbiter Homepage * CNN summary article * Offic…
2016/09/20 16:15 OpenStack * Load balancer error due to networking latency due to misconfiguration postmortem
2016/09/20 16:15 Wikipedia * Global Outage (cooling failure and DNS) postmortem
2016/09/20 16:11 911 On April 9th, 2014, parts of the american emergency call system (911) went down due to a software bug. * There is a summary article on IEEE spectrum. *…