Disaster Recovery in the Cloud: Are You Prepared?
While the importance of choosing the right disaster recovery solution and cloud provider cannot be understated, having a disaster recovery runbook is equally important (if not more). I have been involved in multiple conversations where the customer’s primary focus was the implementation of the best-suited disaster recovery technology, but conversation regarding DR runbook was either missing completely or lacked key pieces of information. Today, my focus will be to lay out a frame work for what your DR runbook should look like.
“Eighty percent of businesses affected by a major incident either never re-open or close within 18 months.” (Source: Axa Report)
What is a disaster recovery runbook?
A disaster recovery runbook is a working document that outlines a recovery plan with all the necessary information required for execution of this plan. This document is unique to every organization and can include processes, technical details, personnel information, and other key pieces of information that may not be readily available during a disaster situation.
What should I include in this document?
As previously stated, a runbook is unique to every organization depending on the industry and internal processes, but there is standard information that applies to all organizations and should be included in every runbook. Below is a list of the most important information:
- Version control and change history of the document.
- Contacts with titles, phone numbers, email addresses, and job responsibilities.
- Service provider and vendor list with point of contact, phone numbers, and email addresses.
- Access Control List: application/system access and physical access to offices/data centers.
- Updated organization chart.
- Use case scenarios based on DR testing, i.e., what to do in the event of X, and the chain of events that must take place for recovery.
- Alert and custom notifications/emails that need to be sent for a failure or DR event.
- Escalation procedures.
- Technical details and explanation of the disaster recovery solution (network layouts, traffic flows, systems and application inventory, backup configurations, etc.).
- Application-based personnel roles and responsibilities.
- How to revert back and failover/failback procedures.