Talk to an Instructor:
Jonas Felix
A two-day intensive course focusing on monitoring and observability of applications with Prometheus and visualizing metrics with Grafana. Participants will learn the installation, configuration, and effective use of Prometheus for monitoring applications and creating meaningful dashboards with Grafana in a Kubernetes environment — to ensure stability and performance. We cover **service discovery, recording rules, PromQL, alerting (Alertmanager & Grafana Alerting), SLO/SLA tracking, histograms/exemplars**, plus **HA & long-term storage** (Thanos/Cortex/Mimir) and **security/costs/retention**.
We are happy to conduct tailored courses for your team - on-site, remotely or in our course rooms.
Prometheus is a powerful open-source monitoring and alerting system purpose-built for modern, distributed, and containerized applications. In this course, we show how to use Prometheus with Grafana effectively to monitor application health, identify performance issues, and ensure software quality.
**Course topics (hands-on focus):**
- **Intro & Architecture**
- Data model (labels/series), pull model, TSDB
- Components: Prometheus, exporters, Alertmanager, Pushgateway (when to use)
- Deployment options: standalone, Prometheus Operator, kube-prometheus-stack (Helm)
- **Install & Configure**
- Prometheus on Kubernetes (Helm/Operator) and Docker/Compose
- **Service discovery** (Kubernetes, EC2, Consul) and **relabeling** patterns
- Scrape config, jobs/targets, multi-cluster/namespace layouts
- **Instrumentation & Exporters**
- App instrumentation best practices (counter/gauge/histogram/summary)
- **Histograms & exemplars** for latency and trace correlation
- Key exporters: **node_exporter**, **kube-state-metrics**, **cAdvisor**, **blackbox_exporter**, DB exporters
- OpenTelemetry bridge (OTel Collector → Prometheus)
- **PromQL & Recording Rules**
- Query basics, label matching, joins
- **rate/irate**, histogram quantiles, Apdex/latency buckets
- **Recording rules** & groups for performance and reuse
- **SLO/SLA** metrics: error budget, availability & latency
- **Grafana Dashboards**
- Data source config, time ranges, transformations
- Dashboard design, panels, variables, library panels
- **Exemplars** in Grafana, drill-downs, annotations
- **Best practices** for SRE, infra & app monitoring
- **Alerting**
- Prometheus alert rules, templating, severity design
- **Alertmanager**: routing, inhibition, silence, receivers
- **Grafana Alerting**: when to use; harmonizing with Alertmanager
- **Runbooks** & annotations: from alert to action
- **Operations, Scale & Reliability**
- Retention & TSDB tuning, WAL/compaction, capacity
- **High availability**: sharding/HA pairs, **Thanos/Cortex/Mimir** for LTS & global query
- Federation vs. remote write/read, multi-cluster strategies
- Self-monitoring; watchdog alerts
- **Security & Compliance**
- TLS, authN/Z (reverse proxy, OAuth proxy), network scoping
- Multi-tenancy (Mimir/Cortex), tenant isolation via labels/namespaces
- PII/Compliance: what not to put into metrics
- **Cost Control & Cardinality**
- Detecting label explosion, cardinality checks
- Metric hygiene: naming, labeling, intervals, downsampling/recording
- Storage cost vs. resolution vs. retention: guardrails
- **Troubleshooting & Patterns**
- Debugging slow queries, PromQL optimization
- Exporter/target issues, scrape errors, stale series
- Incident dashboards (Golden Signals, RED/USE)
**Hands-on labs (Beispiele):**
- Lab 1: Helm deploy (kube-prometheus-stack), access & security
- Lab 2: Service discovery & relabeling — scrape only what matters
- Lab 3: PromQL drills (rates, histograms, joins, quantiles)
- Lab 4: Recording rules for SLOs + SLI dashboards in Grafana
- Lab 5: Alerting setup (rules + Alertmanager routing), runbook linking
- Lab 6: Blackbox checks (HTTP/TCP/ICMP) + incident dashboard
- Lab 7: Retention/cardinality tuning, self-monitoring & watchdog
- Lab 8: Thanos for long-term storage & HA querying
Scenarios and hands-on labs are based on Kubernetes and containerized applications.
Disclaimer: The actual course content may vary from the above, depending on the trainer, implementation, duration and constellation of participants.
Whether we call it training, course, workshop or seminar, we want to pick up participants at their point and equip them with the necessary practical knowledge so that they can apply the technology directly after the training and deepen it independently.
After the course, participants can use Prometheus and Grafana as monitoring and alerting systems in their projects: configure **service discovery & relabeling**, write **PromQL** confidently, define **recording rules & SLOs**, operate **alerts with Alertmanager/Grafana**, and plan **operations/scale** (HA, retention, cost, cardinality) with confidence.
The course combines short input sessions, guided **live demos**, and practical **hands-on labs** in a Kubernetes cluster (Helm/Operator). We emphasize **realistic scenarios**, clear patterns, and directly applicable best practices.
Software developers, DevOps/Platform engineers, SREs, and system administrators who want to monitor apps and infrastructure efficiently, establish **SLOs**, speed up **incident response**, and operate **Kubernetes**-based monitoring stacks professionally.
Basic Linux/CLI skills, foundational knowledge of containers/Kubernetes and web applications. Helpful: some exposure to metrics/logs and YAML/Helm.
Each participant receives a questionnaire and an installation guide. We provide a lab environment (Kubernetes cluster, **kube-prometheus-stack**, sample services). Optional: bring your own cloud access. Prerequisites are verified in advance.
Thank you for your request, we will get back to you as soon as possible.
Unexpected error - please contact us by E-Mail or Phone.
Sign up for the waiting list for more public course dates. Once we have enough people on the waiting list, we will determine a date that suits everyone as much as possible and schedule a new session. If you want to participate directly with two colleagues, we can even plan a public course specifically for you.
Thank you for your request, we will get back to you as soon as possible.
Unexpected error - please contact us by E-Mail or Phone.
Prometheus started at SoundCloud in 2012 and joined the CNCF in 2016 as the second project after Kubernetes. In combination with Grafana, it has become the de facto standard for metrics-based monitoring and SRE-led observability. The ecosystem (Operator, Thanos/Mimir/Cortex, OpenTelemetry integration) keeps evolving.
Talk to an Instructor:
Jonas Felix
Training-Centers:
Basel:
- Aeschenplatz 6, 4052 Basel
Zurich:
- HWZ, Lagerstrasse 5, 8004 Zürich
Company address:
felixideas GmbH
Baslerstrasse 5a
4102 Binningen