At around 16:30 an issue was escalated to the engineering team due to profiles stuck automating in a live institution. During the initial investigation it had also been reported that similar issues had been seen in a Demo account which indicated bridge checks were starting and not completing.
The investigation revealed the cause of this incident was a migration script being run against a customer account which was generating profiles, alongside a customer also running an unannounced script to generate profiles.
These scripts all running concurrently placed an unexpectedly high load on the bridge service - approximately 2-3x what it could handle at any time with it’s normal resource allocation.
We have added a task to implement monitoring for this specific issue in the future and will be able to handle the issue proactively. We will also attempt to scale up the number of bridge instances when we know scripts are planned to run and which may exceed our capacity.