Conjured Subscriptions Downtime
Incident Report for Conjured
Postmortem

Yesterday, we experienced downtime relating to our Conjured Referrals app having to do with 500 errors displaying in the admin area of that app, as well as intermittent downtime of the customer-facing part of the app. This error did not originally affect Conjured Subscriptions.

The 500 errors being displayed within Conjured Referrals stemmed from our Redis database, which is hosted with DigitalOcean, needing to be upgraded. Once we upgraded that component, the 500 errors within Conjured Referrals went away and the app resumed normal function.

We only upgraded the Redis databases relating to Conjured Referrals, since Conjured Subscriptions was not showing any errors, however due to a one-line edit in the Conjured Referrals config file that allowed that app to connect to the new Redis database, Conjured Subscriptions began returning blank pages within the admin dashboard and displaying "There was an error in the third-party app" within the customer-facing pages.

There were two reasons (one procedural, and one programatic) that this bug wasn't caught at the time (thank you to one of our clients for reporting it to us as soon as they found out!). The procedural reason was that because Conjured Subscriptions wasn't affected by the original bug, it was never tested after the fix for Conjured Referrals was made. On the programming side, we have extensive error logging and alerts set up to notify us of errors within the apps, however this error originated before any code could be run, making it so no errors were logged.

In order to fix these gaps in the future, we're implementing a two-pronged approach. In the short term, as part of our development process, we will test all apps after every live code push, regardless of whether they were the intended target of a fix or feature. This will catch major outages such as this one going forward. In addition, we will be working to build a "dead man's switch" style alert - if the app doesn't ping out to our monitoring system periodically, an alert will be sent to our development team to check on whether the app is still up. This alert should catch outages related to server issues, regardless of whether we have pushed code recently.

As always, please don't hesitate to reach out to us (support@conjured.co) if you have any questions, concerns, or find any other issues that need addressing.

Posted Jul 06, 2020 - 09:02 MDT

Resolved
This incident has been resolved. Please read the incident's postmortem for more details.
Posted Jul 06, 2020 - 09:01 MDT
Monitoring
We've issued a fix and are monitoring the situation. We'll be issuing a postmortem on the issue shortly.
Posted Jul 06, 2020 - 08:19 MDT
Identified
We have identified the issue as relating to our Redis database, which is hosted by DigitalOcean and provides session and caching, being down. We are actively working to fix this and should have a solution shortly.
Posted Jul 06, 2020 - 07:55 MDT
Investigating
We're currently investigating reports of downtime in the Conjured Subscriptions app.
Posted Jul 06, 2020 - 07:49 MDT
This incident affected: Conjured Subscriptions - Admin Dashboard and Conjured Subscriptions - Customer-Facing.