At 1:51 pm EDT today, Channeltivity's Azure storage containers in the US North Central data center started experiencing timeouts when attempting to create secure access tokens. This is preventing Channeltivity images and files from loading for affected customers. Library search is affected as well. We are working with Microsoft to investigate the issue and bring it to quick resolution. We will post further updates below.
Update 2:26 pm EDT: Microsoft is reporting a service issue affecting storage account in the US North Central region.
Update 2:36 pm EDT: Microsoft posted an Incident advisory:
Starting at 17:52 UTC on 31 Jul 2018 a subset of customers using Storage in North Central US who may experience difficulties connecting to resources hosted in this region. Engineers have determined that this is caused by an underlying Storage incident which is currently under investigation. Other services that leverage Storage in this region may also be experiencing impact related to this and impacted services will be listed on the Azure Status Health Dashboard. Engineers are aware of this issue and are actively investigating. The next update will be provided in 60 minutes, or as events warrant.
Update 2:39 pm EDT: We're seeing restored access to Azure storage containers. We will monitor the situation until Microsoft provides the all-clear.
Update 2:58 pm EDT: Update from Microsoft:
Engineers are seeing signs of recovery and validating potential mitigation steps. The next update will be provided in 60 minutes, or as events warrant.
Update 3:33 pm EDT: Update from Microsoft:
Engineers are investigating a power event as a potential root cause and are seeing signs of recovery.
Update August 1, 10:21 am EDT: Update from Microsoft:
Summary of Impact: Between 17:52 UTC and 18:40 UTC on 31 Jul 2018 a subset of customers using Storage in North Central US may have experienced difficulties connecting to resources hosted in this region. Other services that leverage Storage in this region may also have been experiencing impact related to this.
Preliminary root cause:A power event resulted in a restart of a number of storage nodes in one of the availability zones in North Central region.
Mitigation: Majority of storage resources self-healed. Engineers manually recovered the remaining unhealthy storage nodes. Additional manual mitigating actions were performed by storage engineers.
Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences.
Update August 7, 9:58 am EDT: Update from Microsoft:
ROOT CAUSE AND MITIGATION: During a planned power maintenance activity, a breaker tripped transferring the IT load to its second source. A subset of the devices saw a second breaker trip resulting in the restart of a subset of Storage nodes in one of the availability zones in North Central region. Maintenance was completed with no further issues or impact to services. A majority of Storage resources self-healed. Engineers manually recovered the remaining unhealthy Storage nodes. Additional manual mitigating actions were performed by Storage engineers to fully mitigate the incident. NEXT STEPS: We sincerely apologize for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to): 1. Engineers have sent failed components to the OEM manufacturer for further testing and analysis.
Zach Smith
At 1:51 pm EDT today, Channeltivity's Azure storage containers in the US North Central data center started experiencing timeouts when attempting to create secure access tokens. This is preventing Channeltivity images and files from loading for affected customers. Library search is affected as well. We are working with Microsoft to investigate the issue and bring it to quick resolution. We will post further updates below.
Update 2:26 pm EDT: Microsoft is reporting a service issue affecting storage account in the US North Central region.
Update 2:36 pm EDT: Microsoft posted an Incident advisory:
Starting at 17:52 UTC on 31 Jul 2018 a subset of customers using Storage in North Central US who may experience difficulties connecting to resources hosted in this region. Engineers have determined that this is caused by an underlying Storage incident which is currently under investigation. Other services that leverage Storage in this region may also be experiencing impact related to this and impacted services will be listed on the Azure Status Health Dashboard. Engineers are aware of this issue and are actively investigating. The next update will be provided in 60 minutes, or as events warrant.
Update 2:39 pm EDT: We're seeing restored access to Azure storage containers. We will monitor the situation until Microsoft provides the all-clear.
Update 2:58 pm EDT: Update from Microsoft:
Engineers are seeing signs of recovery and validating potential mitigation steps. The next update will be provided in 60 minutes, or as events warrant.
Update 3:33 pm EDT: Update from Microsoft:
Engineers are investigating a power event as a potential root cause and are seeing signs of recovery.
Update August 1, 10:21 am EDT: Update from Microsoft:
Summary of Impact: Between 17:52 UTC and 18:40 UTC on 31 Jul 2018 a subset of customers using Storage in North Central US may have experienced difficulties connecting to resources hosted in this region. Other services that leverage Storage in this region may also have been experiencing impact related to this.
Preliminary root cause: A power event resulted in a restart of a number of storage nodes in one of the availability zones in North Central region.
Mitigation: Majority of storage resources self-healed. Engineers manually recovered the remaining unhealthy storage nodes. Additional manual mitigating actions were performed by storage engineers.
Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences.
Update August 7, 9:58 am EDT: Update from Microsoft:
ROOT CAUSE AND MITIGATION: During a planned power maintenance activity, a breaker tripped transferring the IT load to its second source. A subset of the devices saw a second breaker trip resulting in the restart of a subset of Storage nodes in one of the availability zones in North Central region. Maintenance was completed with no further issues or impact to services. A majority of Storage resources self-healed. Engineers manually recovered the remaining unhealthy Storage nodes. Additional manual mitigating actions were performed by Storage engineers to fully mitigate the incident.
NEXT STEPS: We sincerely apologize for the impact to affected customers. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future. In this case, this includes (but is not limited to):
1. Engineers have sent failed components to the OEM manufacturer for further testing and analysis.