Talk
Intermediate
First Talk

Infrastructure Insights for GenAI: Balancing Scalability and Cost

Rejected

Session Description

As the adoption of Generative AI increases, open-source solutions such as Llama 3, StableLM, and Falcon provide flexibility and mitigate privacy concerns. However, deploying these models at scale poses significant challenges, including optimizing infrastructure costs, managing GPU resources, and ensuring scalability. This session covers key strategies for building and scaling infrastructure tailored to Open Source GenAI systems. We will explore the role of Kubernetes in managing and scaling LLM workloads, GPU optimization techniques like MPS, time-slicing, and DRA, as well as real-world examples of deployment from OpenAI and xAI. Real-world examples of deployment on the session will also outline critical aspects of observability, AI governance, and security to ensure robust, cost-effective deployments. Attendees will leave knowing concrete steps that will ensure scaled and resilient workloads when targeting GenAI workloads all while maximising open source tool and infrastructure efficiency.

Key Takeaways

None

References

Session Categories

FOSS

Reviews

0 %
Approvability
0
Approvals
1
Rejections
0
Not Sure
Buzxwords
Reviewer #1
Rejected