Operationally, the issue is rooted in simple monitoring and accurate inventory. ...

renewiltord · 2025-12-28T04:41:18 1766896878

Can do with any weighted LB, right? E.g. route53 or Cloudflare LB. But even manually you just need k IPs (perhaps even 2) and have host k1 and host k2 report different (overlappingly valid) certs. Then (1/k) users will see bad cert. your usual will be near zero failures but canary will have 100% failures.

I’ve always used the calendar event before expiry and then manual renew option but I wonder why I didn’t do this. It’s trivial to roll out. With Route53 just make one canary LB and balance 1% traffic to it. Can be entirely automated.

firesteelrain · 2025-12-28T12:39:19 1766925559

That would work. In my case, which I am living right now, I am dealing with multiple environments where we didn’t set up the environment and we get burned by an expiring cert here and there leading to an outage. Users have zero appetite for any outage whatsoever and our inventory is bad.