Happened on the first day of my first on-call rotation - a cert for one of the key services expired. Autorenew failed, because one of the subdomains on the cert no longer resolved.
The main lesson we took from this was: you absolutely need monitoring for cert expiration, with alert when (valid_to - now) becomes less than typical refresh window.
It's easy to forget this, especially when it's not strictly part of your app, but essential nonetheless.
The main lesson we took from this was: you absolutely need monitoring for cert expiration, with alert when (valid_to - now) becomes less than typical refresh window.
It's easy to forget this, especially when it's not strictly part of your app, but essential nonetheless.