Delay in processing risk
Incident Report for Passfort
Postmortem

Summary

All times are in UTC.

At about 10:35 on 2021-09-14 customers started reporting increased delays in application processing.

An investigation was started immediately. At 11:30 we received an alert from our monitoring system informing us of delays in risk processing.

PassFort engineers determined the cause of the issue was an unexpectedly large number of risk jobs, coupled with an unoptimised query, that had led to a rising backlog of jobs in our Risk service.

At about 13:00 a fix was deployed to speed up risk processing. Risk throughput was greatly increased after this point and the backlog was cleared within an hour.

PassFort engineers performed a sanity check on other potentially affected services and noticed an increased backlog in our Search service, caused by a similar issue. This meant new applications could not be found via the portal. At about 16:15 a fix was deployed, and by 16:30 the backlog had been cleared.

Analysis

The rising backlog of risk jobs was due to an unusually large number of applications being created on our system on that day.

Most of our services were able to scale without any issues. The Risk and Search services were using unoptimised queries when accessing the database, which slowed down job processing.

Actions

  1. We have updated our services to ensure frequent database queries use appropriate indexes
  2. We are updating our dashboards to have better visibility on increasing job backlogs across our services
Posted Oct 01, 2021 - 15:01 BST

Resolved
The backlog has processed and this incident has been resolved.
Posted Sep 14, 2021 - 15:09 BST
Identified
The issue has been identified and a fix is in place.
Applications will continue to be delayed while the backlog is processed, we expect this to take 25 minutes.
Posted Sep 14, 2021 - 14:26 BST
Investigating
We have identified delays in risk being processed, which we are actively investigating.

Applications will show as “Automating” and will complete once we resolve the issue.

We will post an update in 30 minutes.
Posted Sep 14, 2021 - 14:11 BST
This incident affected: PassFort Environments (🇪🇺 EU - identity.passfort.com).