Microsoft apologizes “deeply” for Azure worldwide, team unavailability

Microsoft apologized on Tuesday for a worldwide outage that affected Azure cloud services, including Microsoft Teams, Office 365 and Dynamics 365.

“We understand how incredibly impactful and unacceptable this is and we deeply apologize,” said Microsoft in a post-incident review report on the outage, which was the result of “authentication errors” in several Microsoft cloud services. “We are continually taking steps to improve the Microsoft Azure platform and our processes to help ensure that such incidents do not occur in the future.”

Microsoft referred in the report to changes made after a September 28, 2020 outage, which impacted Microsoft 365 users for five hours.

“In the September incident, we indicated our plans to apply additional protections to the backend SDP (session description protocol) system of the Azure AD (Active Directory) service to avoid the class of problems identified here.”

Microsoft said that the first phase of the SDP changes has been completed and the second phase is in a “carefully planned deployment” that will end in the middle of the year.

“The initial analysis indicates that, once fully implemented, it will avoid the type of disruption that happened today, as well as the related incident in September 2020,” said Microsoft. “In the meantime, additional safeguards have been added to our key removal process, which will remain until the second phase of the SDP deployment is complete.”

Microsoft said Tuesday morning that “most services” affected by the worldwide unavailability of Azure and Teams were online again, except for Intune and Microsoft Managed Desktop.

The last update on the outage came in a Tweet at 6:34 am from the Microsoft 365 status account.

Microsoft’s apology came after a global outage on Monday that affected the Teams collaboration app, as well as “several” other services from Azure, Office 365 and Dynamics 365.

The problems – released by Microsoft on Twitter from 3:40 pm ET on Monday – could be affecting any “worldwide” user, the company said at the time.

Even with the outage, some industry executives are asking MSPs to move customers to the cloud more quickly after the attack on the Exchange Server on March 2 by hackers sponsored by the Chinese state.

This attack affected only local versions of Exchange Server and not Exchange Online or the cloud-based Office 365 email service. Approximately 30,000 organizations in the United States and 60,000 organizations worldwide have had emails stolen as a result of the breach, since they were still running local versions of Exchange.

Last week, Microsoft alerted customers to DearCry Ransomware breaches as a result of the attack on the local Exchange server. On March 12, he warned that “human-operated ransomware attacks are using Microsoft Exchange vulnerabilities to exploit customers”.

Emmet Tydings, president of AB&T Telecom of Columbia, Md., Which provides internet voice and data and failover stability for MSPs, said it is critical that partners move customers to the cloud to avoid serious security issues like the ones that came with the Chinese attack on local Exchange servers.

“MSPs need to move their customers to the cloud more quickly and also need to stabilize their communications infrastructure with diversity in their circuits and failover,” said Tydings. “Microsoft emphasized that they are better able to provide security in the cloud than with on-premises Exchange.”

Tydings said that partners need to provide robust Internet connectivity with SD-WAN and wireless failover with operator plans through a SIM module and a backup cable for a primary fiber line.

In the event of an outage such as Microsoft Teams, MSPs should resort to an alternative communication infrastructure, such as Zoom or Cisco Webex, he said.

With the global pandemic leading to more distributed workforces, the local Exchange no longer makes sense for customers, according to Tydings.

“The MSPs we work with have been heroes in converting their customers from on-premises to the cloud since the pandemic,” he said.

The rapid migration to the cloud has led companies to invest in making software products faster, but they are not investing in making cloud services more resilient, said Ofer Smadari, co-founder and CEO of StackPulse of Portland, Oregon, whose platform reliability helps teams detect, respond and correct incidents with code-based automation.

“We see the results in the headlines every week, it seems, as the major brands have disruptions on the site,” said Smadari. “Most companies are still using traditional IT tools, such as billing systems, service management tools, or communications applications to share information and collaborate to restore service. Companies need to move from an IT management mindset to an engineering mindset, where they build resilience in their applications and business operations to adopt a more risk-aware approach. Only then can they quickly recover from interruptions and deliver on their promise to customers. “

.Source