When young engineers raised on the raw power and complexity of Kubernetes, Argo Workflows, and MLflow step into the world of managed ML platforms like Vertex AI and SageMaker, the experience is as enlightening as it is humbling. In this talk, we will walk you through our experiential journey as we explored these managed platforms after months of working solely with OSS-based, cloud-native MLOps stacks.
We share how these platforms surprised us, helped us, and—at times—confused us. With a strong grounding in the nuts and bolts of infrastructure, orchestration, and monitoring, we came into Vertex AI with technical curiosity, not just to use it, but to understand what lies beneath its glossy surface. The result? A deeply reflective comparison of control versus convenience, abstraction versus visibility, and learning versus leveraging.
This is not a talk that evangelizes managed services. Rather, it seeks to demystify them by contrasting them with their open-source counterparts. While there is no way to directly see what lies beneath the hood of Vertex AI, we drew on our prior experience with Kubernetes and OSS MLOps tools to infer what these managed abstractions might be doing behind the scenes. For every ease-of-use feature, we critically examined the potential trade-offs in control, flexibility, and observability that come with them.
Example scenarios:
How do you trace the underlying K8s orchestration in Vertex AI Pipelines, compared to directly managing Argo Workflows?
What happens when implicit resource management and autoscaling in Vertex AI abstracts away the explicit YAML and Terraform configs you’re used to?
How do you compare MLflow’s experiment tracking to built-in managed experiment tracking?
When debugging pipeline failures, what observability features are lost or gained?
Even though we were already automating and managing our OSS stack for each stage of the ML lifecycle independently, we needed a way to build confidence that these managed platforms didn’t just simplify, but also preserved the control and visibility we valued. These scenarios are critical to explore!
In this session, we will share our reflections and findings from using both OSS tools and managed platforms. We will highlight how these experiences helped us better understand the trade-offs and design decisions behind these platforms.
We will also discuss practical aspects like:
Navigating between UI and CLI/API in managed platforms
How built-in observability maps (or doesn’t) to Prometheus-style monitoring
The balance between rapid prototyping and debugging transparency
Encouraging engineers to use managed tools not as black boxes, but as opportunities to learn and critically assess their underlying infrastructure
The following features make this talk unique:
The ability to do:
Comparative deep dives into how managed platforms like Vertex AI and SageMaker abstract open-source stacks
Trace what happens “under the hood” in these platforms and how that compares to directly using Kubernetes, Argo Workflows, MLflow, and Prometheus
Discuss trade-offs openly: control vs convenience, speed vs transparency
Highlight how our foundational OSS experience enhanced our ability to critically assess managed platforms
Inspire early engineers to not skip the OSS journey and to appreciate the design of these managed tools with a critical and curious eye
Understand how Vertex AI and SageMaker simplify the ML lifecycle and where they hide the complexity
Identify key abstractions in managed ML platforms and how they map to the underlying Kubernetes, container, and monitoring infrastructure
Gain insights into trade-offs: rapid deployment vs deep control, ease-of-use vs observability
Learn why foundational OSS experience is still crucial, even in a managed world
Be inspired to approach managed platforms as opportunities for curiosity, reflection, and deeper technical understanding