PassFort apologises for the disruption to service yesterday evening.
Between 18:20 and 21:40, applications that were created stayed in the Automating state for much longer than usual, preventing them from being Approved. Notifications (such as when applications require manual approval) were delayed in delivery.
All applications were processed normally by 02:05.
At 18:20, PassFort's processing queues began to increase in size. This was due to an unexpected 10x load on our servers caused by a customer’s batch operation. The issue was exacerbated when many of the items in the queue errored due to a data provider issue and, as a result, the items were retried multiple times. The final result was an overall >100x increase in load.
PassFort automatically scaled but, due to the magnitude of the increase, hit its automated scaling limits. Applications were still seeing significant delays, so at 21:40 engineers intervened to manually scale the application further. New applications were prioritised in the queue to allow normal processing while engineers resolved the backlog.
Between 21:40 and 02:05, PassFort continued to adjust scaling and monitored the queues in order to ensure that the backlog was resolved as quickly as possible.
During the issue period, the API error rate was 10%, meaning the majority of requests continued to operate normally.
PassFort will be taking a number of actions to ensure such an incident cannot occur again.