Enhancing Microservice Resiliency and Reliability on Kubernetes with Istio: A Site Reliability Engineering Perspective

  IJCTT-book-cover
 
         
 
© 2024 by IJCTT Journal
Volume-72 Issue-11
Year of Publication : 2024
Authors : Mourya Chigurupati, Ashwini Jagtap
DOI :  10.14445/22312803/IJCTT-V72I11P103

How to Cite?

Mourya Chigurupati, Ashwini Jagtap, "Enhancing Microservice Resiliency and Reliability on Kubernetes with Istio: A Site Reliability Engineering Perspective ," International Journal of Computer Trends and Technology, vol. 72, no. 11, pp. 17-22, 2024. Crossref, https://doi.org/10.14445/22312803/IJCTT-V72I11P103

Abstract
The adoption of microservice architectures has increased the complexity of ensuring service resiliency and reliability at scale. Kubernetes has become the platform of choice for hosting microservices, and service meshes like Istio offer a powerful solution for managing inter-service communication. While Istio's traffic management and security features are widely recognized, this paper explores its lesser-known capabilities, such as distributed tracing, fault injection, and circuit breakers, which are critical for Site Reliability Engineering (SRE). These features enable SRE teams to enhance system observability, proactively test service failures, and prevent cascading issues, ultimately improving the reliability and resiliency of microservices in production environments. In particular, Istio’s distributed tracing facilitates precise monitoring of service latencies, while fault injection and circuit breakers provide controlled experimentation to test system limits under stress. Integrating Istio into SRE practices allows for building more robust, fault-tolerant, and resilient Kubernetes-based systems, ensuring improved performance and reduced downtime in dynamic microservice environments.

Keywords
Microservices, Istio, Kubernetes, Service Mesh, Site Reliability Engineering (SRE).

Reference

[1] Xiaojing XIE, and Shyam S. Govardhan, “A Service Mesh-Based Load Balancing and Task Scheduling System for Deep Learning Applications,” 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID), Melbourne, VIC, Australia, pp. 843-849, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Meina Song, Qingyang Liu, E. Haihong, “A Mirco-Service Tracing System Based on Istio and Kubernetes,” 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, pp. 613-616, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Domenico Cotroneo, Luigi De Simone, and Roberto Natella, “ThorFI: A Novel Approach for Network Fault Injection as a Service,” Journal of Network and Computer Applications, vol. 201, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Jacopo Soldani, Marco Marinò, and Antonio Brogi, “Semi-Automated Smell Resolution in Kubernetes-Deployed Microservices,” Proceedings of the 13th International Conference on Cloud Computing and Services Science, Prague, Czech Republic, vol. 1, pp. 34-45, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Lars Larsson et al., “Impact of ETCD Deployment on Kubernetes, Istio, and Application Performance,” Software: Practice and Experience, vol. 50, no. 10, pp. 1986-2007, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Rafi Abbel Mohammad, and Achmad Imam Kistijantoro, “Development of Performance Regression Analysis Tool using Distributed Tracing on Microservice-Based Applications,” 2022 9th International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA), Tokoname, Japan, pp. 1-6, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Christina Eder, Stefan Winzinger, and Robin Lichtenthäler, “A Comparison of Distributed Tracing Tools in Serverless Applications,” 2023 IEEE International Conference on Service-Oriented System Engineering (SOSE), Athens, Greece, pp. 98-105, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[8] José Flora et al., “A Study on the Aging and Fault Tolerance of Microservices in Kubernetes,” IEEE Access, vol. 10, pp. 132786-132799, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Lalita J. Jagadeesan, and Veena B. Mendiratta, “When Failure is (Not) an Option: Reliability Models for Microservices Architectures,” 2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), Coimbra, Portugal, pp. 19-24, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Mohammad Reza Saleh Sedghpour, Cristian Klein, and J. Tordsson, “Service Mesh Circuit Breaker: From Panic Button to Performance Management Tool,” Proceedings of the 1st Workshop on High Availability and Observability of Cloud Systems, New York, NY, USA, pp. 4-10, 2021.
[CrossRef] [Google Scholar] [Publisher Link]