Home
RES507 Course Hero Image

Operating Kubernetes in Production

From “it runs” to “it survives”

Today

  • 10–15 min recap
  • Finish Lab 85
  • Clarify grading expectations
  • Introduce rollout & controlled change
  • Think like production engineers

Quick Recap: Controllers

Deployment

ReplicaSet

Pod

Key idea:

Kubernetes continuously reconciles desired state with actual state.

Interactive Question

If I delete a Pod manually:

  • What happens?
  • Who recreates it?
  • Why?

Readiness vs Liveness

Readiness:

  • Should this Pod receive traffic?

Liveness:

  • Should this container be restarted?

Production impact:

  • One controls traffic
  • One controls restarts

Layered Architecture View

User

Service

Pod

Container

Node OS

VM

Hypervisor

Hardware

When debugging, always ask:

Which layer am I in?

Failure in Production

In real systems:

  • Pods crash
  • Images fail to pull
  • Nodes run out of memory
  • Deployments misconfigure secrets

Good engineers expect failure. They design for it.

Controlled Change: Rolling Updates

When you change an image:

Terminal window
kubectl set image deployment/quote-app app=quote-app:v2

Kubernetes does not destroy everything.

It rolls forward gradually.

Observe a Rollout

Terminal window
kubectl rollout status deployment quote-app
kubectl rollout history deployment quote-app

You can:

  • Monitor progress
  • Inspect revision history

Rollback

If something breaks:

Terminal window
kubectl rollout undo deployment quote-app

Production mindset:

Change must be reversible.

Think Like an Engineer

Ask yourself:

  • What is my blast radius?
  • What layer isolates failure?
  • What is my rollback plan?
  • What happens if the node dies?
  • Where is persistence guaranteed?

This is what differentiates operators from YAML writers.

Lab 85 Reminder

Make sure your repository contains:

  • architecture-notes.md
  • Embedded architecture diagram
  • Secret-based configuration
  • Resource limits + probes
  • Controlled failure analysis

If it is not in GitHub, it cannot be graded.

Next

  • Finish required Lab 85 tasks
  • Perform one controlled failure
  • Be ready to explain your architecture
  • Then we move into final production challenge next session