International Journal of Computer
Trends and Technology

Research Article | Open Access | Download PDF

Volume 72 | Issue 6 | Year 2024 | Article Id. IJCTT-V72I6P104 | DOI : https://doi.org/10.14445/22312803/IJCTT-V72I6P104

Harnessing Chaos: The Role of Chaos Engineering in Cloud Applications and Impacts on Site Reliability Engineering


Rahul Yadav

Received Revised Accepted Published
19 Apr 2024 23 May 2024 04 Jun 2024 15 Jun 2024

Citation :

Rahul Yadav, "Harnessing Chaos: The Role of Chaos Engineering in Cloud Applications and Impacts on Site Reliability Engineering," International Journal of Computer Trends and Technology (IJCTT), vol. 72, no. 6, pp. 25-30, 2024. Crossref, https://doi.org/10.14445/22312803/ IJCTT-V72I6P104

Abstract

In the ever-evolving landscape of cloud computing, where reliability and resilience are paramount, the concept of chaos might seem counterintuitive. However, within this realm, chaos is not only embraced but actively harnessed as a means of ensuring systems are robust and capable of withstanding unexpected failures. At the heart of this approach lies a powerful methodology known as Chaos Engineering. Chaos Engineering is a disciplined approach to experimenting on distributed systems to build confidence in their resilience. It involves intentionally introducing controlled disruptions or failures into a system to observe how it responds under adverse conditions. This paper investigates and examines how Chaos Engineering techniques might be integrated into cloud-based systems and how this can affect Site Reliability Engineering (SRE) techniques. By simulating real-world failures in a controlled environment, organizations can identify weaknesses, uncover hidden dependencies, and improve the overall reliability of their systems.

Keywords

Azure chaos studio, Cloud technologies, Chaos Engineering, Enterprise architecture, Site reliability engineering.

References

[1] Principles of Chaos Engineering, Principlesofchaos, 2019. [Online]. Available: http://principlesofchaos.org/?lang=ENcontent
[2] Resilience, Cambridge Dictionary, 2020. [Online]. Available: https://dictionary.cambridge.org/dictionary/english/resilience
[3] Russ Miles, Chaos Engineering Observability, O'Reilly Media, 2019.
[Google Scholar] [Publisher Link]
[4] Betsy Beyer et al., Site Reliability Engineering: How Google Runs Production Systems, O'Reilly Media, 2016.
[Google Scholar] [Publisher Link]
[5] Chaos Engineering, AWS Solutions Library. [Online]. Available: https://aws.amazon.com/solutions/resilience/chaos-engineering/
[6] Find and Fix Your Reliability Risks, Gremlin. [Online]. Available: https://www.gremlin.com/
[7] Open Source Chaos Engineering Platform, LitmusChaos. [Online]. Available: https://litmuschaos.io/
[8] Ali Basiri et al., “Chaos Engineering,” IEEE Software, vol. 33, no. 3, pp. 35-41, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Ali Basiri et al., “Automating Chaos Experiments in Production,” 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice, Montreal, QC, Canada, pp. 31-40, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Quickstart: Create and Run a Chaos Experiment by Using Azure Chaos Studio, Microsoft, pp. 1-287, 2023. [Online]. Available: https://learn.microsoft.com/en-us/azure/chaos-studio/chaos-studio-quickstart-azure-portal