Kubernetes Mastery

Name: @tank/kubernetes-mastery
Author: Elad Ben Haim

Core Philosophy

Declarative over imperative -- Define desired state in YAML manifests. Let controllers reconcile actual state. Never rely on kubectl run in production; commit manifests to Git.
Least privilege by default -- Every workload gets its own ServiceAccount with minimal RBAC. Run as non-root. Drop all capabilities. Apply Pod Security Standards at namespace level.
Resource-aware scheduling -- Always set CPU/memory requests (scheduler guarantee) and memory limits (OOM protection). Omit CPU limits for latency-sensitive workloads to avoid throttling.
Probes are self-healing -- Configure startup probes for slow-init apps, readiness probes to gate traffic, and liveness probes to restart deadlocked processes. Aggressive liveness probes cause restart storms.
GitOps is the deployment model -- ArgoCD or Flux syncs cluster state from Git. Manual kubectl apply is for emergencies only. Every change is auditable and reversible.

Quick-Start: Common Problems

"My Pod is stuck in CrashLoopBackOff"

Check exit code: kubectl describe pod <name> -- look at Last State and Exit Code
Read logs: kubectl logs <name> --previous (shows last crashed container)
Exit code 137 = OOM killed -- increase memory limit
Exit code 1 = application error -- fix the app
Liveness probe failing? Check if probe path/port is correct and timeout is sufficient -> See references/observability-and-debugging.md

"Which Service type should I use?"

Scenario	Service Type
Pod-to-pod within cluster	ClusterIP (default)
External access via cloud LB	LoadBalancer
External access without cloud LB	NodePort + Ingress
Headless (direct pod DNS)	ClusterIP with `clusterIP: None`
External database/API	ExternalName or Endpoints
-> See `references/networking-and-services.md`

"Helm or Kustomize?"

Signal	Use
Packaging for distribution (charts)	Helm
Environment-specific overlays (dev/staging/prod)	Kustomize
Need templating with conditionals/loops	Helm
Prefer pure YAML, no templating language	Kustomize
Both -- Helm for third-party, Kustomize for in-house	Common hybrid
-> See `references/helm-and-kustomize.md`

"How do I set up autoscaling?"

Set resource requests on all containers (HPA needs metrics to compare against)
Deploy Metrics Server (kubectl apply -f metrics-server.yaml)
Create HPA: kubectl autoscale deployment <name> --min=2 --max=10 --cpu-percent=70
For custom metrics (queue depth, RPS): use Prometheus Adapter + HPA v2
Add Cluster Autoscaler for node-level scaling -> See references/autoscaling-and-resources.md

"My Deployment rollout is stuck"

Check status: kubectl rollout status deployment/<name>
Check events: kubectl describe deployment/<name> -- look for FailedCreate
Insufficient resources? Scale down or add nodes
Image pull error? Verify image name, tag, and imagePullSecrets
Rollback: kubectl rollout undo deployment/<name> -> See references/gitops-and-deployment.md

Decision Trees

Workload Controller Selection

Workload Type	Controller
Stateless web app, API	Deployment
Database, distributed store	StatefulSet
Per-node agent (logging, monitoring)	DaemonSet
One-off batch processing	Job
Scheduled batch processing	CronJob

Security Hardening Priority

Priority	Action
1 (Day 1)	Dedicated ServiceAccounts, no default SA
2 (Day 1)	Pod Security Standards: `warn` then `enforce` `restricted`
3 (Week 1)	Default-deny NetworkPolicies per namespace
4 (Week 1)	RBAC audit -- remove wildcards and ClusterRoleBindings
5 (Ongoing)	Secrets in external store (Vault, ESO), not plain manifests

Storage Selection

Need	Solution
Shared config files	ConfigMap (mounted as volume)
Credentials, API keys	Secret (+ External Secrets Operator)
Database storage	PVC with StorageClass (retain policy)
Shared filesystem (multi-pod)	ReadWriteMany PVC (NFS, EFS, CephFS)
Ephemeral scratch space	emptyDir

Reference Index

File	Contents
`references/workloads-and-controllers.md`	Pods, Deployments, StatefulSets, DaemonSets, Jobs, CronJobs, ReplicaSets, init containers, sidecar pattern, pod lifecycle
`references/networking-and-services.md`	Service types (ClusterIP/NodePort/LoadBalancer/ExternalName), Ingress controllers, DNS, service discovery, service mesh overview
`references/helm-and-kustomize.md`	Helm chart anatomy, values/templates, chart repositories, hooks, Kustomize bases/overlays, patches, strategic merge, Helm vs Kustomize selection
`references/security-and-rbac.md`	RBAC (Roles/ClusterRoles/Bindings), ServiceAccounts, Pod Security Standards/Admission, NetworkPolicies, SecurityContext, OPA/Gatekeeper
`references/storage-and-configuration.md`	PersistentVolumes, PersistentVolumeClaims, StorageClasses, volume types, ConfigMaps, Secrets, External Secrets Operator, projected volumes
`references/autoscaling-and-resources.md`	Resource requests/limits, QoS classes, LimitRanges, ResourceQuotas, HPA (v1/v2), VPA, Cluster Autoscaler, Karpenter, right-sizing
`references/observability-and-debugging.md`	kubectl debug/logs/exec/describe, events, Prometheus, Grafana, log aggregation, troubleshooting CrashLoopBackOff/ImagePull/Pending/OOM
`references/gitops-and-deployment.md`	ArgoCD, Flux, rolling updates, blue/green, canary (Argo Rollouts/Flagger), PodDisruptionBudgets, rollback, progressive delivery

@tank/kubernetes-mastery

Description

Triggered by

Kubernetes Mastery

Core Philosophy

Quick-Start: Common Problems

"My Pod is stuck in CrashLoopBackOff"

"Which Service type should I use?"

"Helm or Kustomize?"

"How do I set up autoscaling?"

"My Deployment rollout is stuck"

Decision Trees

Workload Controller Selection

Security Hardening Priority

Storage Selection

Reference Index

@tank/kubernetes-mastery

Description

Triggered by

Kubernetes Mastery

Core Philosophy

Quick-Start: Common Problems

"My Pod is stuck in CrashLoopBackOff"

"Which Service type should I use?"

"Helm or Kustomize?"

"How do I set up autoscaling?"

"My Deployment rollout is stuck"

Decision Trees

Workload Controller Selection

Security Hardening Priority

Storage Selection

Reference Index

Command Palette