Most organizations I’ve worked with had a small IT and engineering department team of less than 8 people and were budget-constrained. I will describe a disaster recovery plan based on those criteria and the on-premise information system (physical system, not cloud) and recommend suitable technologies based on the plan.
(1) Risk assessment and analysis
_ Identify potential risks and threats to the information system including natural disasters, cyber-attacks, hardware failures, theft, and fire.
_ Classify these risks based on their likelihood and impact on the operations
(2) Disaster Recovery Committee
_ Ensure that everyone understands their roles in case of disaster
(3) Notification procedures
_ Establish a clear communication to simultaneously (not a call tree) notify all team members in case of a disaster so that everyone is informed promptly.
(4) Recovery procedures
_ Outline step-by-step procedures for recovering each component and analyze prioritized critical component
(5) Reconstitution phase
_ Plan for the restoration of normal operation while/once the risks are eliminated.
(6) Ongoing maintenance and testing
_ Regularly inform the disaster recovery plan
_ Conduct mock drills to identify any weaknesses and make necessary improvements
Given the small size and budget constraints, the following technologies are recommended:
(1) Backup solutions
_ Cost-effective backup solutions like Google Drive, Dropbox, or specialized cloud backup providers can be used. For on-premise solutions, secondary RAID storage or external hard disks can be utilized.
(2) Cloud solutions
As secondary solutions for some services such as server or storage, cloud services (e.g., EC2, S3, Azure Blob) could be used.
(3) Redundant Systems
_ Implement inactive redundant systems which will be activated once the primary systems fail, in fiber optic internet, server, backup, power supply, and so on.
(4) Automation
_ Tools like Acronis, Veeam, or Windows Backup allow to back up at scheduled intervals.
_ Automate to re-route to the cloud server if a physical server fails.
_ Automate to start generators or supply with UPS if power is cut
_ Automate the inbound and outbound internet traffic