85 Architecture, Virtualization, and Production Design

Design and analyze a production-ready architecture using containers, Kubernetes, and virtualization concepts.

Introduction

In this lab, you will step back from implementation details and think like a systems designer.

You will:

  • Analyze application architecture
  • Compare containers and virtual machines
  • Design a production-ready deployment
  • Introduce scaling, resilience, and failure thinking
  • Connect Kubernetes to real-world infrastructure

This lab is intentionally deeper and more open-ended than previous ones.

You are expected to reason, discuss, and justify your decisions.


Submission checklist

🚨 Read this section before continuing.
This clarifies how Lab 85 will be graded.

Your repository must contain:

  • A file named architecture-notes.md
  • Written answers to all reasoning questions
  • A production architecture design section
  • An architecture diagram (PNG, Draw.io export, or image file)
  • The diagram embedded in or clearly linked from architecture-notes.md
  • Updated Kubernetes manifests with:
    • Resource requests and limits
    • Readiness and liveness probes
    • Secret-based configuration (no plain-text credentials)
  • Evidence of at least one controlled failure and analysis

Everything must be committed and pushed to GitHub.

If it is not in your repository, it cannot be graded.


Review your current deployment

Before designing something larger, verify your current Kubernetes setup.

Run:

Terminal window
kubectl get pods
kubectl get services
kubectl get deployments

Confirm:

  • Your quote application is running
  • The database is reachable
  • You can port-forward successfully

If something is broken, fix it now.

You will build on this foundation.


Map the current architecture

Create a simple architecture diagram on paper or digitally.

Your diagram must include:

  • User / browser
  • Kubernetes cluster
  • Deployment
  • Pod
  • Service
  • PostgreSQL database

Answer in writing:

  • Where does isolation happen?
  • What restarts automatically?
  • What does Kubernetes not manage?

Be prepared to explain your diagram.


Compare containers and virtual machines

Create a comparison table with at least five differences.

Topics to consider:

  • Kernel sharing
  • Startup time
  • Resource overhead
  • Security isolation
  • Operational complexity

Then answer:

  • When would you prefer a VM over a container?
  • When would you combine both?

Write your answers in a file named:

Terminal window
architecture-notes.md

Commit and push your reasoning.


Introduce horizontal scaling

Scale your deployment.

Terminal window
kubectl scale deployment quote-app --replicas=3

Verify:

Terminal window
kubectl get pods

You should see multiple replicas.

Now test behavior:

  • Port-forward the service
  • Refresh the page multiple times

Observe whether responses appear consistent.

Answer:

  • What changes when you scale?
  • What does not change?

Simulate failure

Delete one running pod.

Terminal window
kubectl delete pod <pod-name>

Immediately observe:

Terminal window
kubectl get pods

Answer:

  • Who recreated the pod?
  • Why?
  • What would happen if the node itself failed?

Write your answers in architecture-notes.md.


Introduce resource limits

Edit your deployment and add resource constraints.

Add under the container spec:

resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "250m"
memory: "256Mi"

Apply the updated manifest.

Observe:

Terminal window
kubectl describe pod <pod-name>

Answer:

  • What are requests vs limits?
  • Why are they important in multi-tenant systems?

Add readiness and liveness probes

Enhance your deployment with health checks.

Example:

livenessProbe:
httpGet:
path: /
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /
port: 3000
initialDelaySeconds: 5
periodSeconds: 5

Apply and observe behavior.

Break your app temporarily to see how probes react.

Answer:

  • What is the difference between readiness and liveness?
  • Why does this matter in production?

Connect Kubernetes to virtualization

Discuss and write answers:

  • What runs underneath your k3s cluster?
  • Is Kubernetes replacing virtualization?
  • In a cloud provider, what actually hosts your nodes?

Explain how this stack might look in:

  • A cloud data center
  • An embedded automotive system
  • A financial institution

Add this section to architecture-notes.md.


Design a production architecture

Create a new section in architecture-notes.md.

Design a production-ready version of this system.

Include:

  • Multiple nodes
  • Database persistence
  • Backup strategy
  • Monitoring
  • Logging
  • CI/CD pipeline integration

Answer clearly:

  • What would run in Kubernetes?
  • What would run in VMs?
  • What would run outside the cluster?

This is a graded reasoning exercise.


Required break and analysis

Introduce a controlled failure in your deployment.

Examples:

  • Set an invalid image name
  • Remove environment variables
  • Misconfigure resource limits

Apply and observe the failure.

Capture:

Terminal window
kubectl describe pod <pod-name>
kubectl get events

Then fix the issue.

You must show at least one failed state in the Kubernetes events.


Required extension: secret-based configuration

This extension is required for everyone.

Refactor your database credentials so they are not stored in plain text inside your deployment manifest.

Create a Secret

Create a secret for your database credentials:

Terminal window
kubectl create secret generic quote-db-secret \
--from-literal=POSTGRES_USER=quote \
--from-literal=POSTGRES_PASSWORD=quote

Reference the Secret in your deployment

Modify your Deployment manifest so environment variables are loaded from the Secret instead of hard-coded values.

Example structure:

env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: quote-db-secret
key: POSTGRES_USER
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: quote-db-secret
key: POSTGRES_PASSWORD

Apply your changes and verify the application still works.

Answer in architecture-notes.md:

  • Why is this better than plain-text configuration?
  • Is a Secret encrypted by default? Where?

Commit and push your changes.


🆕 New for Session 3: Controlled rollouts and safe rollback

This section is new. It is designed to keep strong students busy and to prepare everyone for the final session.

What you will practice:

  • Make a controlled change
  • Observe the rollout
  • Intentionally break it once
  • Roll back safely
  • Document what you learned

Perform a controlled rollout

Pick one small, safe change. For example:

  • Change the page title in the template
  • Add a new route that returns a short message
  • Add a tiny log line

Commit the change locally first.

Build and publish a versioned image

You need two image versions to demonstrate a rollout.

If you already have a workflow from the previous course, you can reuse it.

Otherwise, keep it simple and build locally.

  • Build a versioned image tag:
Terminal window
docker build -t quote-app:v1 -f docker/Dockerfile .
  • Make your code change, then build a second version:
Terminal window
docker build -t quote-app:v2 -f docker/Dockerfile .

If you are using k3s with containerd, you may need to import the image.

Example (local, no registry):

Terminal window
docker save quote-app:v2 | sudo k3s ctr images import -

Update the Deployment to the new image

Update the Deployment image to quote-app:v2.

If you used a registry image name in your manifests, keep using that same pattern.

Apply your manifest update, then observe the rollout.

Observe rollout progress

Run:

Terminal window
kubectl rollout status deployment quote-app
kubectl rollout history deployment quote-app
kubectl get pods -o wide

Answer in architecture-notes.md:

  • What changed in the cluster during the rollout?
  • What stayed the same?
  • How did Kubernetes decide when to create and delete Pods?

Require one broken rollout

Introduce a controlled failure so the rollout does not complete.

Examples:

  • Set the image to a tag that does not exist
  • Set the container port to the wrong value
  • Break your readiness probe path

Apply the change and confirm you can see evidence of failure:

Terminal window
kubectl get pods
kubectl describe pod <pod-name>
kubectl get events

In architecture-notes.md, write:

  • What failed first?
  • Which signal showed you the failure fastest?
  • What would you check next if this happened in production?

Roll back safely

Undo the broken rollout:

Terminal window
kubectl rollout undo deployment quote-app

Verify:

  • The application becomes healthy again
  • The Deployment stabilizes
  • Pods return to a ready state

Add a short note in architecture-notes.md:

  • What did rollback change?
  • What did rollback not change?

Required extension choice

Choose ONE of these extensions and implement it.

Option A: Explicit rollout strategy

Update your Deployment to explicitly define a rolling update strategy:

strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0

Explain in architecture-notes.md:

  • What does maxSurge do?
  • What does maxUnavailable do?
  • Why might you choose 0 for maxUnavailable?

Option B: Rollout notes and evidence

Add a short “Rollout Evidence” section to architecture-notes.md that includes:

  • The commands you ran (rollout status, history, undo)
  • One or two short excerpts of output (keep it short)
  • Your interpretation of what happened

Option C: Canary-style rollout (manual)

Simulate a “canary” rollout manually:

  • Temporarily run 1 replica of v2 and 2 replicas of v1
  • Compare behavior
  • Then complete the rollout

Document the approach and limitations in architecture-notes.md.

Optional stretch: real end-to-end checks

In real projects, teams often add end-to-end checks (for example with Playwright) to validate an application through the browser.

You do not need to implement this here, but write 3–5 bullets in architecture-notes.md explaining:

  • What an end-to-end test would validate for this app
  • Where it should run (locally, CI, or both)
  • What the biggest cost or risk would be

Optional extension lanes

The following lanes are optional unless specified by your instructor.

Choose one or more depending on your pace.

Multi-node simulation

If your environment supports it:

  • Add an additional worker node
  • Observe pod scheduling behavior
  • Run:
Terminal window
kubectl get nodes
kubectl get pods -o wide

Document:

  • How pods are distributed
  • How Kubernetes chooses nodes
  • What happens when a node becomes unavailable
Database hardening

Improve the security posture of your database deployment:

  • Ensure credentials come from a Kubernetes Secret
  • Avoid plain-text passwords in manifests
  • Verify environment variables are injected correctly

Explain:

  • Why Secrets are preferable to ConfigMaps for credentials
  • Where Secret data is stored in the cluster
Autoscaling exploration

Research Horizontal Pod Autoscaler (HPA).

Document:

  • What metrics are required
  • Why metrics-server is needed
  • How autoscaling differs from manual scaling
  • What would be required in a real production cluster

Implementation is optional unless you complete all required tasks early.

Architecture critique

Write a short critique of your own design.

Include:

  • Single points of failure
  • Operational risks
  • Security concerns
  • Observability gaps

Be precise and realistic.

🐉 Beast Mode: production realism challenge

Attempt this only after completing all required work.

Choose ONE of the following:

  • Add a NetworkPolicy that restricts which pods can talk to the database.
  • Split application and database into separate namespaces and update access rules.
  • Add a basic Ingress resource and explain how it connects to a real load balancer.
  • Simulate a rolling update and observe rollout history.

For your chosen challenge:

  • Implement it as far as your environment allows.
  • Document what worked and what did not.
  • Explain what would change in a real multi-node production cluster.

Add a dedicated section in architecture-notes.md titled:

Beast Mode Challenge

Wrap-up reflection

Before leaving, ensure:

  • Your manifests are committed
  • architecture-notes.md is pushed
  • You can explain your design verbally

This session is about thinking like an engineer responsible for production systems.

Speed is not the goal.

Clarity and reasoning are.