Overcoming Challenges in Deploying Large Language Models for Generative AI Use Cases: The Role of Containers and Orchestration

  IJCTT-book-cover
 
         
 
© 2024 by IJCTT Journal
Volume-72 Issue-2
Year of Publication : 2024
Authors : Sriramaraju Sagi
DOI :  10.14445/22312803/IJCTT-V72I2P114

How to Cite?

Sriramaraju Sagi, "Overcoming Challenges in Deploying Large Language Models for Generative AI Use Cases: The Role of Containers and Orchestration," International Journal of Computer Trends and Technology, vol. 72, no. 2, pp. 75-81, 2024. Crossref, https://doi.org/10.14445/22312803/IJCTT-V72I2P114

Abstract
This research delves into using Language Models (LLMs) in converged infrastructure, specifically focusing on container technologies like Kubernetes and OpenShift for orchestration purposes. The passage discusses the challenges involved in implementing LLMs, including scalability, performance issues and security considerations. It suggests that containers can effectively address these challenges. Additionally, it explores the benefits of using containers to deploy LLMs, such as scalability, optimized resource utilization, enhanced flexibility, increased portability, and strengthened security measures. Furthermore, it examines how Suse Rancher plays a role in managing applications that are containerized to ensure both security and scalability. The validation and analysis section provides an assessment of a study that utilizes an infrastructure platform called FlexPod to evaluate LLM models across container orchestration platforms, demonstrating the practicality and advantages of integrating FlexPod Datacenter.

Keywords
Large Language Models (LLM), Containerization, Scalability, Datacenter, Kubernetes.

Reference

[1] FlexPod Datacenter with SUSE Rancher for AI Workloads Design Guide, NetApp, Cisco, 2023. [Online]. Available: https://www.cisco.com/c/en/us/td/docs/unified_computing/ucs/UCS_CVDs/flexpod_suse_rancher_design.html
[2] Diaz Jorge-Martinez et al., “Artificial Intelligence-based Kubernetes Container for Scheduling Nodes of Energy Composition,” International Journal of System Assurance Engineering and Management, pp. 1-9, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Laszlo Toka et al., “Adaptive AI-based Auto-Scaling for Kubernetes,” 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, Melbourne, VIC, Australia, pp. 599-608, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Brandon Thurgood, and Ruth G. Lennon, “Cloud Computing With Kubernetes Cluster Elastic Scaling,” Proceedings of the 3rd International Conference on Future Networks and Distributed Systems, Paris France, pp. 1-7, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Nhat-Minh Dang-Quang, and Myungsik Yoo, “Deep Learning-Based Autoscaling Using Bidirectional Long Short-Term Memory for Kubernetes,” Applied Sciences, vol. 11, no. 9, pp. 1-25, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[6] Chaoyu Wu, E Haihong, and Meina Song, “An Automatic Artificial Intelligence Training Platform Based on Kubernetes,” Proceedings of the 2020 2nd International Conference on Big Data Engineering and Technology, Singapore China, pp. 58-62, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Chun-Hsiang Lee et al., “Multi-Tenant Machine Learning Platform Based on Kubernetes,” Proceedings of the 2020 6th International Conference on Computing and Artificial Intelligence, pp. 5-12, 2020.
[CrossRef] [Google Scholar] [Publisher Link]