Core Platform Outage

Incident Report for Roadie

Postmortem

Core Platform Outage Summary – December 16th, 2025

On Sept 16th 2025 at approximately 2:00 pm EST, our platform experienced a core platform outage affecting the Roadie Platform. While we were observing limited critical deployments only during the peak period, there was a need to make a configuration change that was unrelated to production traffic and was intended to improve the system resiliency and further support the scaling of our hybrid cloud architecture. The change itself was not directly related to production request handling, but it impacted critical ingress and platform components in an unforeseen way, resulting in a complete service disruption.

Our engineers immediately began investigating, and by 2:39 pm EST, they identified the cause of the issue and applied a fix, reverting the earlier configuration change. By 3:11 pm EST the fix was fully applied and we began to see the Platform function normally again. By 3:48 pm EST, all key metrics had returned to normal and we declared the outage resolved.

To reduce the likelihood of similar outages in the future, we have implemented the following measures:

  • Enhanced monitoring for ingress and other critical platform components to improve early detection of anomalous behavior.
  • Strengthened configuration safeguards, including additional validation and checks to prevent high-risk or incompatible configuration changes from being applied.
  • Improved change review practices for infrastructure and resiliency-related updates to better identify potential system-wide impacts before deployment.

We sincerely apologize for any disruption this may have caused. If you have any questions, please feel free to reach out to us at techops@roadie.com

Posted Dec 18, 2025 - 15:42 EST

Resolved

This incident has been resolved.
Posted Dec 16, 2025 - 15:48 EST

Update

We are continuing to monitor for any further issues.
Posted Dec 16, 2025 - 15:14 EST

Monitoring

A fix has been implemented and our engineers are continuing to monitor the situation.
Posted Dec 16, 2025 - 15:11 EST

Identified

Our engineers are still applying the fix.
Posted Dec 16, 2025 - 15:02 EST

Update

Our engineers have identified the cause of the outage and are applying a fix now.
Posted Dec 16, 2025 - 14:39 EST

Investigating

We are currently experience a core platform outage. Our engineers are currently investigating.
Posted Dec 16, 2025 - 14:34 EST
This incident affected: Core Platform.