Skip to main content

Control table

The table below lists the operational controls that are covered in this guide. The controls are organized into categories and levels of maturity. You can use this table to identify the controls that are most relevant to your organization and to track your progress as you implement them.

CategoryControl namelevel 1level 2level 3level 4level 5
Data Backup and RecoveryEstablish a regular backup schedule for critical data
Data Backup and RecoveryStore backups in multiple locations (offsite and/or cloud-based storage)
Data Backup and RecoveryImplement a versioning system to track and restore previous versions of data
Data Backup and RecoveryEncrypt backups to protect sensitive data
Data Backup and RecoveryTest backup and recovery processes periodically to ensure data integrity
Network redundancy and failoverImplement redundant network connections to prevent single points of failure
Network redundancy and failoverUse load balancers to distribute traffic evenly across resources
Network redundancy and failoverEmploy network failover solutions (e.g., redundant routers, switches)
Network redundancy and failoverMonitor network performance and latency to detect potential issues
Network redundancy and failoverTest network redundancy and failover processes to ensure proper functioning
Infrastructure monitoring and alertingImplement a Monitoring System to Track the Health and Performance of Cloud Infrastructure
Infrastructure monitoring and alertingSet Up Alerts for Critical Events and Performance Thresholds
Infrastructure monitoring and alertingMonitor Resource Usage to Identify Potential Bottlenecks and Capacity Issues
Infrastructure monitoring and alertingEstablish a Centralized Logging System to Collect and Analyze Logs from Various Components
Infrastructure monitoring and alertingRegularly Review Monitoring Data to Identify Trends and Improve Infrastructure Resilience
Incident response planningDevelop a formal incident response plan, including roles and responsibilities
Incident response planningEstablish a communication plan for internal and external stakeholders during incidents
Incident response planningPerform regular incident response drills to test and refine the plan
Incident response planningDocument lessons learned from incidents and update the incident response plan accordingly
Incident response planningProvide training for staff on incident response processes and best practices
Capacity planning and scalingRegularly assess infrastructure capacity and plan for growth
Capacity planning and scalingImplement auto-scaling strategies to handle fluctuating workloads
Capacity planning and scalingUse load testing to identify capacity limits and potential bottlenecks
Capacity planning and scalingMonitor resource usage to anticipate and address potential capacity issues
Capacity planning and scalingReview and update capacity plans based on changing business requirements and growth
Security and access controlsImplement strong authentication and authorization mechanisms
Security and access controlsRegularly review and update user access permissions
Enable encryption for data at rest and in transitApply security patches and updates promptly
Enable encryption for data at rest and in transitConduct regular vulnerability assessments and penetration testing
Application resiliency and fault toleranceDesign applications to be stateless and horizontally scalable
Application resiliency and fault toleranceImplement circuit breakers and retries to handle transient faults
Application resiliency and fault toleranceUse health checks and load balancing to distribute traffic among instances
Application resiliency and fault toleranceIsolate application components to limit the impact of failures
Application resiliency and fault toleranceMonitor application performance and error rates to identify potential issues
Data center and geographic redundancyDeploy infrastructure across multiple data centers or availability zones
Data center and geographic redundancyUse geo-replication to store data redundantly across different regions
Data center and geographic redundancyImplement global load balancing to distribute traffic across data centers
Data center and geographic redundancyTest failover processes between data centers to ensure smooth recovery
Data center and geographic redundancyRegularly review and update data center redundancy strategies based on evolving needs
Regular resilience testing and validationConduct regular disaster recovery and failover tests
Regular resilience testing and validationUse chaos engineering techniques to simulate failures and test system resilience
Regular resilience testing and validationTest backup and recovery processes to validate data integrity
Regular resilience testing and validationPerform load and stress tests to identify capacity limits and potential bottlenecks
Regular resilience testing and validationUse the results of testing to inform updates and improvements to infrastructure resilience
Documentation and Knowledge SharingDocument architecture, processes, and best practices for cloud resilience
Documentation and Knowledge SharingMaintain a centralized knowledge base for easy access to documentation
Documentation and Knowledge SharingRegularly review and update documentation to reflect changes and improvements
Documentation and Knowledge SharingEncourage knowledge sharing and collaboration among team members
Documentation and Knowledge SharingProvide training and resources to help staff stay informed about resilience