RES507 - 90 Operating Kubernetes in Production

Operating Kubernetes in Production

From “it runs” to “it survives”

Today

10–15 min recap
Finish Lab 85
Clarify grading expectations
Introduce rollout & controlled change
Think like production engineers

Quick Recap: Controllers

Key idea:

Kubernetes continuously reconciles desired state with actual state.

Interactive Question

If I delete a Pod manually:

What happens?
Who recreates it?
Why?

Readiness vs Liveness

Readiness:

Should this Pod receive traffic?

Liveness:

Should this container be restarted?

Production impact:

One controls traffic
One controls restarts

Layered Architecture View

When debugging, always ask:

Which layer am I in?

Failure in Production

In real systems:

Pods crash
Images fail to pull
Nodes run out of memory
Deployments misconfigure secrets

Good engineers expect failure. They design for it.

Controlled Change: Rolling Updates

When you change an image:

1
kubectl set image deployment/quote-app app=quote-app:v2

Kubernetes does not destroy everything.

It rolls forward gradually.

Observe a Rollout

1
kubectl rollout status deployment quote-app
2
kubectl rollout history deployment quote-app

You can:

Monitor progress
Inspect revision history

Rollback

If something breaks:

1
kubectl rollout undo deployment quote-app

Production mindset:

Change must be reversible.

Think Like an Engineer

Ask yourself:

What is my blast radius?
What layer isolates failure?
What is my rollback plan?
What happens if the node dies?
Where is persistence guaranteed?

This is what differentiates operators from YAML writers.

Lab 85 Reminder

Make sure your repository contains:

architecture-notes.md
Embedded architecture diagram
Secret-based configuration
Resource limits + probes
Controlled failure analysis

If it is not in GitHub, it cannot be graded.

Finish required Lab 85 tasks
Perform one controlled failure
Be ready to explain your architecture
Then we move into final production challenge next session