Starting at approximately 12:26 pm EDT on May 28, 2020, some Channeltivity customers started experiencing intermittent availability issues due to high-CPU usage within the application service. By 12:55 pm EDT, most customers were being affected with long page load times and timeouts. After diagnosis, the application service was restarted and customers instances were running normally again by 1:36 pm EDT.
The high CPU issue repeated itself from 3:02 pm to 3:39 pm EDT, again causing application availability issues.
The engineering team has now implemented proactive monitoring, mitigation and diagnosis rules to prevent this issue from impacting application availability while we collect additional data to resolve the root cause.
UPDATE June 18, 2020: We've been monitoring CPU usage within the application service and have determined the mitigation efforts have been successful.