Chaos engineering for fault tolerance in cloud services: A resilience perspective

Saravanakumar Baskaran *

Independent Researcher, Seattle, USA.
 
 
Review
World Journal of Advanced Engineering Technology and Sciences, 2021, 02(01), 140–144.
Article DOI: 10.30574/wjaets.2021.2.1.0029
Publication history: 
Received on 06 February 2021; revised on 18 March 2021; accepted on 21 March 2021
 
Abstract: 
As cloud services continue to play an essential role in supporting global digital infrastructure, ensuring system resilience is paramount. Cloud environments are prone to failures due to their complex, distributed nature. To address this, Chaos Engineering has emerged as a proactive method for testing fault tolerance. It introduces deliberate disruptions to cloud systems to assess how well they can withstand and recover from unforeseen issues. By simulating failure in a controlled environment, Chaos Engineering helps organizations build more robust cloud systems that can mitigate risks and maintain service continuity. This paper explores the principles behind Chaos Engineering, its application in cloud services, and how it contributes to enhancing resilience and fault tolerance. Through case studies and examples, we will illustrate how Chaos Engineering enables cloud service providers to anticipate system failures and ensure greater reliability.
 
Keywords: 
Chaos Engineering; Fault Tolerance; Cloud Services; Resilience; Failure Simulation; Distributed Systems; High Availability; Fault Injection; Cloud Computing
 
Full text article in PDF: