AI/ML

Autoscaling Generative AI Workloads

at KCD Praha 24

Short lightning talk about KEDA being used as autoscaler for AI/ML workload. Stable diffusion model was used as an example that generates images based on the text input. Demo application was scaling the worker pods based on the length of message queue. I also briefly talks about pitfalls of GPU intensive workloads on K8s.

( recording )

KEDA AI/ML KCD kubernetes 2024