ManageBac experienced a severe performance degradation between 05:40am UTC on January 9th and 11:17am UTC on January 10th. During this period, ManageBac responded to requests much slower than normal and a percentage of requests would intermittently time-out and fail. As soon as we detected the issue, our operations and product development teams began investigating the cause. We identified the issue was being caused by abnormally high network request times between ManageBac and third party web services that ManageBac depends on. We followed up with our data-centre provider and escalated the issue to their priority support. We also implemented workarounds where possible to alleviate some of the pressure on the system with the goal of keeping critical customer processes running.
After an extended analysis, the root problem was determined to be upstream of our data-centre. Specifically, the fault laid with a network provider called Level3 between our data-centre in Montreal and Amazon Web Services. Our data-centre provider eventually isolated their network from Level3 and re-routed our connections via alternative network providers. We verified that full functionality was immediately restored at 11:17am UTC on January 10.
Why did this occur?
At this point in time, Level 3’s network problems are continuing and our data-centre provider has not received a root cause explanation. Our data-centre assures us that our network will remain isolated from Level3 using alternative network providers until such time as the fault has been reliably and demonstrably rectified.
What are we doing to ensure it doesn’t happen again?
We have set up additional monitoring tools to specifically watch for and alert us of any similar network problems which could adversely affect ManageBac performance.
In addition, we are implementing a second, completely isolated, redundant data-centre with a different data-centre provider. This will give us many more options in the event of a future network outage and will allow us to recover from similar events in a significantly faster time frame.
We thank everyone for their patience and apologise for any inconvenience this incident may have caused. Providing our schools with reliable access to their ManageBac accounts is a top priority so downtime and network performance issues are not taken lightly. For more information on our commitment to users, you can review our service level agreement here.
If you have further questions about this incident, please feel free to reach out to our support team at firstname.lastname@example.org.