Platform Engineering for AI Applications

Course & Training

Platform engineering for AI applications: Deployment, operation, and optimization of large language models and AI workloads with inference engines, LLM gateways, monitoring, and MLOps.

This hands-on course provides platform engineers, DevOps engineers, and SREs with the necessary knowledge and skills to professionally operate AI applications and large language models. Participants learn how to build LLM infrastructure, configure inference engines, optimize performance, and operate AI workloads in production. The course covers the full spectrum from self-hosted model inference through optimized inference engines (vLLM, TensorRT-LLM) to LLM gateways, monitoring, and fine-tuning. Participants will work with real AI infrastructure projects including model deployment, GPU resource management, batching strategies, caching optimization, and observability. The course will address inference server configuration, performance tuning, cost optimization, monitoring setup, and MLOps practices.

In-House Course:

We are happy to conduct tailored courses for your team - on-site, remotely or in our course rooms.

Request In-House Course

Public Course Schedule:

➡️ No suitable date? Sign up for the waiting list.

Jonas Felix

Marc Merzinger

Jonas Felix

Content:

The course will consist of the following topics and may be extended or adapted based on the audience.
The examples in the course will focus on widely used AI/ML frameworks and platform engineering practices.
For in-house courses there is a selection of technologies and deployment scenarios which can be chosen to better fit the audience.

– Basics and Theory of AI and Large Language Models:
... - History of Artificial Intelligence and Neural Networks
... - LLM architectures (Transformer, Attention Mechanism)
... - Model types (Instruct, Chat, Code, Embedding)
... - Wording: Tokens, Quantization, Weights, Layers, Context Window
... - Open Source vs. Open Weight vs. Commercial Models
... - Multimodality and other Neural Networks
– AI/ML Frameworks and Standards:
... - PyTorch and TensorFlow basics
... - ONNX and model interchange formats
... - Hugging Face Transformers and Model Hub
... - Standards: MCP (Model Context Protocol), OpenAI API compatibility
– Self-Hosted Model Inference and Deployment:
... - Ollama for quick model deployment
... - On-premises model weight storage
... - Model download and management
... - First inference tests and prompting
– Inference Engines and Performance Optimization:
... - Inference engines: vLLM, TensorRT-LLM, Triton Inference Server
... - Batching strategies (Static vs. Dynamic Batching)
... - Caching strategies (KV-Cache, Prompt Caching)
... - Parallelization (Tensor Parallelism, Pipeline Parallelism)
... - Quantization (INT8, INT4, GPTQ, AWQ, GGUF)
... - Performance metrics (Latency, Throughput, TTFT, TPOT)
– GPU Infrastructure and Resource Management:
... - GPU selection and sizing for LLM workloads
... - CUDA, ROCm and GPU driver management
... - GPU sharing and multi-tenancy
... - Kubernetes GPU scheduling and resource quotas
... - Cost optimization for GPU workloads
– LLM Gateways and API Management:
... - LiteLLM: Installation, configuration and backends
... - Model load balancing and fallback strategies
... - Rate limiting and cost tracking
... - API key management and authentication
... - Multi-model routing and A/B testing
– Guardrails and Content Filtering:
... - Concepts and necessity of guardrails
... - Content filter implementation
... - Input/output validation
... - Safety and compliance
– Container Orchestration for AI Workloads:
... - Docker images for LLM deployment
... - Kubernetes deployments for inference servers
... - Helm charts for AI applications
... - Autoscaling strategies for AI workloads
... - Service mesh for model serving
– Model Serving Patterns and Architectures:
... - REST API and gRPC for model serving
... - Synchronous vs. asynchronous inference
... - Streaming responses and server-sent events
... - Model versioning and canary deployments
... - Blue-green deployments for models
– Monitoring and Observability for AI Workloads:
... - Metrics: Inter-token latency, throughput, token/s
... - Prometheus integration for LLM metrics
... - Grafana dashboards for AI monitoring
... - OpenTelemetry for distributed tracing
... - Log aggregation for inference logs
... - Cost tracking and resource utilization
– Retrieval Augmented Generation (RAG) Infrastructure:
... - Vector databases (Pinecone, Weaviate, Qdrant, Milvus)
... - Embedding models and deployment
... - Chunking strategies and indexing
... - Hybrid search and reranking
... - RAG pipeline orchestration
– Fine-Tuning and Model Adaptation:
... - Fine-tuning concepts and use cases
... - LoRA (Low-Rank Adaptation) and QLoRA
... - Full fine-tuning vs. parameter-efficient fine-tuning
... - Training infrastructure and GPU requirements
... - Model evaluation and validation
– MLOps for LLMs:
... - Model registry and versioning (MLflow, Weights & Biases)
... - Experiment tracking and reproducibility
... - Model lifecycle management
... - CI/CD for model deployment
... - Automated model testing and validation
– Tool-Calling and Function-Calling:
... - Tool-calling concepts and implementation
... - Function-calling APIs
... - Integration of external tools and services
... - Agentic workflows (distinction from agentic coding)
– Security and Compliance for AI Workloads:
... - Model security and supply chain
... - Data privacy and GDPR compliance
... - Secret management for API keys
... - Network security for model serving
... - Audit logging and compliance reporting
– Cloud Providers and Managed Services:
... - AWS SageMaker, Bedrock and EC2 for LLMs
... - Azure OpenAI Service and Azure ML
... - Google Cloud Vertex AI and GKE
... - Hybrid and multi-cloud strategies
– Best Practices and Future Trends:
... - Emerging technologies in AI infrastructure
... - Cost optimization and sustainability
... - Performance tuning and benchmarking
... - Team organization for AI platform teams

The course combines theoretical foundations with intensive practical exercises. Participants work with real AI infrastructure projects and learn how to professionally operate LLM workloads.

Disclaimer: The actual course content may vary from the above, depending on the trainer, implementation, duration and constellation of participants.

Whether we call it training, course, workshop or seminar, we want to pick up participants at their point and equip them with the necessary practical knowledge so that they can apply the technology directly after the training and deepen it independently.

Goal:

Upon completing this course, you will be able to professionally operate and optimize AI applications and large language models. You will master techniques for model deployment, inference optimization, GPU resource management, and performance tuning. You'll gain practical experience working with inference engines (vLLM, Ollama), LLM gateways (LiteLLM), monitoring systems, and MLOps practices. Additionally, you'll understand how to scale AI infrastructure, optimize costs, and operate production-ready AI workloads. These skills will enable you as a platform engineer to take responsibility for and optimize your organization's entire AI infrastructure.

Duration:

2 Days (Is individually adapted for in-house courses.)

Form:

The course combines theoretical concepts with intensive practical exercises on real AI infrastructure projects. Participants work with various inference engines, LLM gateways, and monitoring tools to develop a comprehensive understanding of operating AI workloads. The trainer guides the process with expert knowledge and individual support to foster optimal configuration and efficient operation of LLM infrastructure.

Target Audience:

The training is aimed at platform engineers, DevOps engineers, site reliability engineers, and system engineers who want to operate AI applications and large language models in production. The course is particularly suitable for professionals responsible for the infrastructure, deployment, and operation of AI workloads who want to deepen their knowledge in LLM operations, inference optimization, and MLOps.

Requirements:

This course requires solid knowledge in platform engineering, DevOps, or system engineering. Participants should have experience with Linux systems, container technologies (Docker), Kubernetes basics, and infrastructure as code. Familiarity with Python basics, REST APIs, monitoring tools (Prometheus/Grafana), and cloud platforms is beneficial. Basic understanding of machine learning concepts is helpful but not mandatory.

Preparation:

Before the course, each participant receives a detailed questionnaire to assess their experience level and specific interests. We provide an advanced lab environment with GPU access, pre-installed inference engines and model weights, as well as an installation guide to prepare development environments. During the course necessary cloud resources and API access will be provided for practical exercises. After the course participants will continue to have access to the Letsboot Labmachine environment for learning related AI platform engineering.

Request In-House Course:

In-House Kurs Anfragen

Waitinglist for public course:

Sign up for the waiting list for more public course dates. Once we have enough people on the waiting list, we will determine a date that suits everyone as much as possible and schedule a new session. If you want to participate directly with two colleagues, we can even plan a public course specifically for you.

Waiting List Request

Platform engineering for AI applications represents a new discipline in IT infrastructure. With the rapid rise of large language models and generative AI, new challenges emerge for platform engineers: GPU resource management, inference optimization, cost management, and scaling of AI workloads. This course provides the necessary knowledge to professionally operate AI infrastructure, from self-hosted model inference through optimized inference engines to production-ready deployments with monitoring, observability, and MLOps practices. Platform engineers learn how to bridge the gap between AI/ML teams and production infrastructure and operate AI applications reliably, performantly, and cost-effectively.