Imaginary monitors all client websites utilizing a sophisticated system that checks a variety of services to ensure each website is operating and performing well. When these services fall outside of tolerance, on-call technology staff gets paged as does Lisa King, Imaginary's President and Chief Quality Officer.
"It was 9:30pm and I was just about to call it a night when my phone started buzzing," said Lisa. "It was clear from the number and characteristics of the alerts that this was a significant incident."
Following Imaginary's emergency response protocol, Lisa and Joe Jasinski, Imaginary's Technology Manager, connected in a special chat channel to assess the scope. Right away, they saw that the alerts were coming from websites within a specific datacenter at Rackspace. A check of the Rackspace status page confirmed:
"On 26 May 2016, at 21:26 CDT, engineers were alerted to a switching loop occurring in the DFW1 data center. Engineers are engaged and working to resolve the issue. During this time, Customers may be unable to access their Cloud instances hosted within the DFW1 data center."
By 10:45pm, the list of affected client websites was down to one and the client was informed of the outage and given a status. In addition, Imaginary staff reached out directly to Rackspace to share information and expedite corrective action.
By 12:50am, the last site was back up. A quick click through showed that all services were back and functioning properly, which was communicated to the client.
In the morning, the client contact awoke to two notifications - the initial incident and its resolution.
Incidents like this are rare, but they do happen from time-to-time. When they do, it is an important part of Imaginary's service level to be notified immediately, facilitate correction and communicate to clients, no matter what time of day or night.