This afternoon some reminders were delayed by up to 3 hours. This service interruption was caused by a failure in one of our vendor systems during a routine deployment. To prevent this from happening again, we are changing our deployment scripts, reducing our dependencies during deployment and improving our monitoring systems. Only reminder delivery was impacted. Scheduling or accessing reminders remained functional.
Like many modern software systems, FollowUpThen makes use of a package manager for updating and managing various libraries upon which we depend. (You can think of this like the Google or Apple App Stores that lets you download and track versions of your apps).
Also like many modern software systems, FollowUpThen runs on Amazon Web Services, a “cloud computing environment”. This comes with many benefits, one of which is automatic scaling. If we experience heavy load, Amazon Web Services automatically adds servers as needed.
When a server is automatically added it runs through various steps to install and configure FollowUpThen. One of these steps is to run the package manager.
We follow best practices for “locking in” the versions of our packages to maintain consistency. However, today we encountered something unexpected: The package manager we use makes use of an automatic security checker which, today, had an SSL error. This created an unexpected failure during the automatic deployment process, a failure that repeated itself even when the system tried to automatically recover by rolling back to the latest working version. The result: Our send-reminder sub-system was offline.
As soon as we discovered the problem and restored service all due reminders were processed and sent.
Thankfully, any reminders that were scheduled during the downtime were successfully processed. Inbound reminder scheduling was not affected. Only outbound reminders.
The immediate bug with this particular vendor has been fixed. But moving forward we are changing our deployment procedures to reduce our dependencies on any external system. We also have updated our alerting system to notify us more quickly (and urgently) in the event of a system failure like this.
We know you rely on FollowUpThen and sincerely appreciate the trust you place in our system. If you have any questions about the outage, our preventative changes, or your account please send me an email at via email@example.com.
Co-Founder | FollowUpThen