The Automate Certificates blog.
Renewal runbooks, ACME challenge tradeoffs, and incident lessons for teams managing TLS at scale.
- Let's Encrypt 90-day renewal: the runbook that survives on-call ACME v2 cron windows, staging vs production, and deployment hooks — what to document before your first midnight renewal page.
- Kubernetes TLS before cert-manager: what still breaks Ingress secrets, webhook failures, and the gap between issued certs and pods that actually terminate TLS.
- Azure Key Vault vs ACME automation: when each wins Managed certificates, import workflows, and the renewal path that auditors ask about on DORA reviews.
- Certificate expiry incidents: the cost nobody budgets Downtime minutes, war-room hours, and the compliance findings that follow a single missed not_after.
- DNS-01 vs HTTP-01: picking the ACME challenge that scales Wildcard coverage, internal services, and the propagation buffer your runbook probably skips.
- Code signing certificate lifecycle beyond the build pipeline HSM storage, timestamp servers, and revocation — the controls that outlive your CI job definition.
- The post-quantum migration starts with an inventory you were supposed to already have NIST has finalized post-quantum standards and regulators are signaling crypto-agility expectations, but you can't migrate algorithms you can't enumerate. The trap is treating PQC as a future research project while lacking the certificate inventory the migration depends on.
- The edge devices have certificates that can't be renewed remotely. The clock is still ticking. IoT, OT, and embedded fleets often ship with long-lived certs and no automated renewal path, then the 47-day world and shrinking lifetimes collide with devices you can't easily reach. The trap is provisioning fleets with no renewal mechanism.
- You acquired a company and inherited its certificates. Nobody knows where they all are. Post-acquisition, the parent inherits an unknown certificate estate with its own CAs, expiry dates, and forgotten hosts. The trap is integrating systems without ever discovering and consolidating the acquired company's PKI.
- The day you need to re-issue everything is the day you hit Let's Encrypt's rate limits A CA distrust, key compromise, or migration can force re-issuing the whole estate at once, exactly when ACME rate limits, CAA records, and validation throughput become the bottleneck. The trap is assuming you can mass-reissue on demand.
- The monitoring that watches your certs expired its own cert. Now you're blind. Certificate monitoring, alerting webhooks, and the load balancers terminating TLS all run on certificates too. The trap is a circular dependency where the thing that would have warned you is the thing that's down.
- PCI DSS 4.0 wants a cryptographic inventory. Your cardholder estate doesn't have one. PCI DSS 4.0 requirements 4.2.1 and 12.3.3 expect an inventory of trusted keys and certificates and documented cryptographic cipher suites and protocols. The trap is passing prior PCI cycles without ever building that inventory.
- The certificate sat in Key Vault and quietly expired. Nobody had wired the alert. Putting certificates in Azure Key Vault feels like safe storage, but Key Vault stores and serves; it does not guarantee renewal or alerting unless you configure it. The trap is mistaking secure storage for lifecycle management.
- Your microservices do mTLS. Nobody planned how to rotate the internal CA. Service-to-service mTLS with an internal CA is great, until the internal root or issuing CA approaches expiry and every workload needs new trust bundles simultaneously. The trap is building internal PKI with no rotation story.
- DNS-01 was your scalable choice. Then your DNS provider had a bad day during a renewal. DNS-01 challenges depend on programmatic access to your DNS provider's API at renewal time. The trap is not planning for the renewal that lands during a DNS API outage, rate-limit, or propagation delay, and stalls with no fallback.
- The engineer who left owned the ACME account. Renewals stopped three months later. Certificate automation often hangs off one person's API token, ACME account key, or DNS credentials. The trap is offboarding that engineer without ever discovering renewals were silently tied to their access until the first cert expires.
- Your ISO 27001 cert says you manage cryptography. The auditor wants the evidence. ISO 27001:2022 Annex A controls 8.24 (cryptography) and 5.9 (inventory of assets) expect a documented, evidenced cryptographic key and certificate lifecycle. The trap is a written policy with no operational data behind it.
- Why the deal stalled on a vendor security questionnaire about certificate lifecycle Enterprise buyers and their auditors now ask how you manage certificate inventory, renewal, and key storage as part of due diligence. The trap is a sales-blocking 'we do it manually' answer that fails the buyer's third-party risk review.
- One wildcard cert, forty hosts, one private key. The director who thought that was simpler. Wildcard certificates feel like an operational shortcut, but a single shared private key spread across dozens of hosts turns one compromise into an estate-wide revocation and re-key event. The trap is optimizing for fewer certs instead of smaller blast radius.
- The leaf cert was fine. The intermediate expired and took the site down anyway. Monitoring not_after on the leaf certificate misses the case where an intermediate or cross-signed CA certificate in the chain expires first. The trap is alerting on the wrong cert and getting blindsided by a chain you didn't track.
- Your code-signing key is from 2014 and has never been rotated. So was NVIDIA's. When LAPSUS$ breached NVIDIA, stolen code-signing certificates, one dating to 2014, were used to sign malware that Windows still trusted. The trap is treating code-signing keys as set-and-forget because rotating them is painful.
- The reliability manager who thought a cert outage was a 30-minute fix Industry data puts the average expired-certificate outage at roughly 5 hours end-to-end and millions of dollars, with most orgs hitting several per year. The trap is budgeting cert incidents as trivial blips instead of multi-hour, cross-team firefights.
- A browser distrusts your CA and gives you 90 days. Now what's your replacement plan? Chrome, Apple, and Mozilla distrusted Entrust TLS certificates in late 2024, forcing a mass migration to a new CA. The trap is single-CA dependence with no tested path to swap issuers across the whole estate on short notice.
- The shadow certificate no platform team knows about, until it shows up in a CT log search Every publicly trusted TLS certificate is logged to Certificate Transparency within seconds of issuance. The trap is that an attacker, an auditor, or a phishing victim can enumerate your estate from crt.sh before your own inventory ever sees those hosts.
- Your team renews certs by hand four times a year. At 47 days that's a wall. CA/Browser Forum Ballot SC-081 drops maximum TLS lifetimes to 200 days in March 2026, 100 in 2027, and 47 in 2029. The trap is treating it as a 2029 problem when the renewal-frequency math breaks your manual process years earlier.
- DORA says you must keep a certificate register. Most CISOs find out theirs is empty. DORA's RTS on ICT risk management, Article 7, requires financial entities to maintain an up-to-date register of all certificates and certificate-storing devices for critical functions. The trap is assuming your CMDB or your CA portal already counts as that register.