By Yair Knijn · October 7, 2025

Your microservices do mTLS. Nobody planned how to rotate the internal CA.

A platform director greenlights internal mTLS because security asked for it and the service mesh made it a checkbox. Istio or Linkerd mints a root, every sidecar gets a leaf, traffic is encrypted and authenticated by the next sprint. The wrong assumption: the part the mesh automated is not the part that will hurt you. Leaf certs rotate themselves on a tight cycle and you forget them. The root that signs the chain does not rotate itself, and it carries an expiry date you picked once, distractedly, in a Helm value.

That date is a flag day you scheduled yourself. When it lands, every workload that trusts the old root rejects every peer at once, across the whole estate, with no failing leaf to warn you.

An internal CA expiring is not the same animal as a leaf

A leaf expiring is one service failing renewal, and your monitoring catches it. A root or issuing CA expiring invalidates the entire chain at once. A common mesh default hands the root a decade of life and the intermediate something shorter, and both leave your mental model the moment mtls: STRICT goes green. The standing guidance is to rotate the intermediate while it still has between a half and a third of its lifetime left. Almost nobody does, because nothing is on fire and the trigger is buried in a chart you forgot.

Trust bundles everywhere: the simultaneous-update problem

Minting a new CA is the easy part. The hard part: the new root must reach every trust store before the first workload presents a cert signed by it, while the old root stays trusted until the last workload stops presenting one. Those two requirements pull in opposite directions, which is why a naive swap is an outage. In Kubernetes the workable answer is a dual-root overlap. cert-manager's trust-manager defines a bundle once and syncs it to every namespace, and during rotation you point it at a source holding both roots. You move leaves onto the new issuer at your own pace, then drop the old root only when nothing still depends on it.

Get the order wrong and you have a partition: part of the fleet trusts new, part trusts old, and they refuse each other. The overlap window is the step nobody can skip.

cert-manager, SPIFFE, and short-lived workload identities

This is the strongest argument for not hand-rolling internal PKI. SPIFFE and SPIRE give each workload an identity baked into an SVID, and the SPIRE agent renews each SVID well before it expires, so leaf rotation stops being something you watch. Pair that with cert-manager issuing the signing material and trust-manager distributing the bundle, and the daily churn handles itself. What none of those tools decides is the cadence of the root above them, the overlap window, and who owns the runbook. That part stays human, and it tends to be missing.

Rotating an internal CA without a flag-day outage

A rotation that does not page anyone has a fixed shape:

Generate the new root or intermediate before the old one drops below a third of its life.
Push a bundle holding both roots to every workload, and verify it landed before issuing from the new CA.
Cut new leaf issuance to the new issuer, and let short-lived SVIDs roll the fleet on their own.
Confirm zero workloads still present chains under the old root.
Remove the old root from the bundle last, only after that confirmation.

The failure mode is always sequencing, never cryptography. At any moment you need to know which roots are live, what each one signs, and when it expires, across every Environment in the estate. That inventory is what Automate Certificates keeps for you, internal roots and intermediates included, with expiry alerting that fires on the CA above your services, not only the leaves below. Map your chains before the date you forgot arrives: see how it works.