Kubernetes Security in Production

Pod Security Standards: The Replacement for PSP

Pod Security Policies were removed in Kubernetes 1.25. The replacement, Pod Security Standards, defines three levels — privileged, baseline, restricted — and is enforced via labels on namespaces:

apiVersion: v1
kind: Namespace
metadata:
  name: prod
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: latest

Default every workload namespace to restricted. Make exceptions explicit, time-bound, and tracked.

RBAC: The Mistake Pattern

The most common K8s RBAC mistake is granting cluster-admin to a service account because "the deployment kept failing." The fix path:

Start at view at namespace scope. Add specific verbs as the workload needs them.
Avoid * in resources, verbs, and apiGroups.
Audit roles regularly — RBAC creep is real. Tools like krane and rbac-lookup help.
Never grant the ability to create or modify RoleBindings / ClusterRoleBindings outside admin contexts — that's a privilege-escalation pivot.

Network Policies: Stop the Default Open

By default, every pod can reach every other pod. That's a flat L7 network — fine in dev, terrible in prod. The pattern that holds up:

Default deny at the namespace level — both ingress and egress.
Allowlist the specific traffic your workload needs.
CNI choice matters — Calico, Cilium, and (recently) the default kubeadm setups support NetworkPolicies. AWS VPC CNI requires explicit enablement.
Cilium adds L7 policy (HTTP method, path, gRPC service) — useful for east-west microservice traffic.

Admission Controllers: Where Real Prevention Happens

Kyverno and OPA Gatekeeper are the two go-to admission policy engines. The high-value policies:

Block images from untrusted registries.
Require image digests (not :latest tags).
Require signed images (Cosign / Sigstore verification).
Block privileged containers, host network, host PID/IPC.
Require resource limits (prevents noisy-neighbor and resource-exhaustion attacks).
Validate image vulnerability scan results before allowing deployment.

Kyverno is usually faster to onboard (YAML-only, no Rego). Gatekeeper is more flexible for complex policies. Pick one, not both.

Runtime Detection: Beyond Static Posture

Static posture (image scans, admission policies) covers what's possible. Runtime detection covers what actually happens:

Falco — eBPF-based, syscall-level rules. Mature, free, well-supported.
Tetragon (Cilium) — eBPF + L7 awareness. Strong for service-mesh contexts.
Tracee — eBPF, more research-oriented.
Commercial CWPP — Wiz, Sysdig, Aqua, Prisma — better integrated dashboards, similar underlying tech.

"Static analysis tells you what could happen. Runtime tells you what just did. You need both — and the runtime piece is where most teams under-invest."