Kubernetes TLS before cert-manager: what still breaks

cert-manager solved certificate issuance for many clusters. It did not solve inventory, cross-cluster deployment, or the gap between a ready Certificate resource and a pod that serves the new chain. Teams that install cert-manager and close the ticket still page on expiry — because the operational model stopped at the CRD.

Ingress secrets are not the whole story

Most guides end when tls.crt lands in the Ingress secret. Sidecar meshes, internal gRPC services, webhook servers, and admission controllers use separate secrets in separate namespaces. cert-manager issues per Certificate object; nobody reconciles whether every consumer mounted the update. After renewal, run an openssl s_client check from inside the cluster network — not just against the public load balancer.

Webhook chicken and egg

Validating webhooks that terminate TLS with a cert-manager-managed certificate create a bootstrap problem. If the webhook cert expires and the API server cannot reach the webhook, you cannot apply new Certificate resources to fix it. Keep a break-glass manual secret rotation procedure and a staging cluster where webhook TLS is tested before production promotion.

Multi-cluster sprawl

Platform teams running ten clusters often have ten independent cert-manager instances, ten DNS solver configs, and ten slightly different ClusterIssuer definitions. Drift is inevitable. Centralize policy: allowed issuers, challenge methods, and minimum key sizes. Discovery tooling should list every TLS secret across namespaces and flag duplicates for the same SAN — you will find three valid certs for api.example.com and no owner for two of them.

Estimate blast radius with the certificate sprawl estimator before you standardize on a single issuance path.

Renewal ≠ reload

Some ingress controllers hot-reload; others need a rolling restart. cert-manager updates the secret; your deployment may not watch secret changes. Add checksum annotations to pod templates or use operators that react to secret rotation events. Document per-workload behavior in the runbook — "cert-manager renewed it" is not an acceptable postmortem root cause.

When to keep cert-manager vs external automation

cert-manager excels at in-cluster issuance tied to Ingress and Gateway API. External automation wins when you deploy the same cert to Key Vault, App Gateway, and RDS from one inventory. Hybrid is common: cert-manager for pod-facing certs, centralized ACME for edge and PaaS targets. The failure mode is assuming one tool covers both without an integration contract.

Audit your edge TLS posture with the TLS cipher audit checklist after every ingress controller upgrade — cipher defaults change silently across chart versions.