Skip main navigation

Case Study: Microsoft Developer Division Moves to DevOps

.
18.3
Becoming good at DevOps meant taking the practises that we’d honed for being a good box software provider and changing them to this new world of continuous delivery, and continuous feedback, and continuous innovation that DevOps requires. How disruptive was it? Well, we went from a world where the notion was you, at the end of every sprint, had a, quote, “potentially” shippable increment to one where you deploy live at the end of every sprint. We went from a world where the idea of testing was to minimise the mean time between failure to one where the idea is to minimise the mean time to remediate.
64.9
In other words, if there’s a live side incident, you want it fixed immediately, and then you want it fixed to root cause. So you’re always getting better, and you’re always automating whatever you can automate. So an example of that would be, say, alerting. We’ve recently introduced automatic alerting, which is a 40x improvement over the way we used to do escalations. The classic escalation model is that you have a notion of tier 1 support where you have these people who are spending time looking at lots of alerts that come in.
103.9
Well, we now have an auto-dialer that looks at the alerts that come in and automatically routes them to the appropriate dev for the right section of Visual Studio online and says, it looks like there’s a live side problem with your code or your portion of the system based on the code and configuration, and you need to hop on it. And you need to be on it within five minutes if it’s working hours or 15 minutes if it’s not in working hours. And that’s done automatically.
141.8
And that works automatically because we’ve been able to eliminate all of the false noise that normally happens in alerting systems because we have been able to learn from watching the noise that comes in the systems and get rid of the noise by building a health model and taking this directly to root cause. That’s an example of a practise. And you don’t need such high-reliability hardware because you have software that can handle failure and understands that hardware does fail and that people do make mistakes. And you want to understand things like what to do if a disc drive fails or if a data centre fails. You want to be able to handle an earthquake, right? Earthquakes happen. Tsunamis happen.
195.3
And in fact, we can do data centre to data centre failover. So these are things that software now knows how to deal with.

In the previous step, you discovered how you can be a change agent and advocate for implementing DevOps in your organisation. In this step, we will look at and learn from the Microsoft Developer Division’s DevOps implementation.

Over seven years, the Microsoft Developer Division (DevDiv) embraced Agile practices. The division achieved a 15-times reduction in technical debt through solid engineering practices, drawn heavily from XP. They trained everyone on Scrum, multidisciplinary teams, and product ownership across the division. They significantly focused on the flow of value to customers. By the time they shipped Visual Studio 2010, the product line achieved a level of customer recognition that was unparalleled.

After they shipped Microsoft Visual Studio 2010, the team knew that they needed to begin converting Team Foundation Server into a software as a service (SaaS) offering. The SaaS version, now called Visual Studio Online (VSO), would be hosted on Microsoft Azure, and to succeed with that they needed to adopt DevOps practices.

That meant that the division needed to expand their practices from Agile to DevOps. A tacit assumption of Agile was that the Product Owner was omniscient and could groom the backlog correctly. In contrast, when you run a high-reliability service, you can observe how customers are actually using its capabilities in near real-time. You can release frequently, experiment with improvements, measure, and ask customers how they perceive the changes. The data that you collect becomes the basis for the next set of improvements you do.

In this way, a DevOps product backlog is really a set of hypotheses that become experiments in the running software and allow a cycle of continuous feedback. DevOps grew from Agile based on four trends:

A larger loop demonstrating the DevOps lifecycle, with Development having its own internal sub-loop, and production on the other side of the loop with its own internal loop. An arrow labelled with collaboration is placed between these two sub-loops to connect them. In order, the outer loop has text explaining different advantages. Pointing at the Development loop is an arrow with text reading, “the agile methodologies are accelerating the construction process”. Next in the loop are some servers and cogs, with explanatory text stating, “an automated release pipeline is needed to deliver at the pace of development with full traceability”. Moving on to the Production sub-loop, the text states that “Availability and performance issues are hard to troubleshoot in this fast-changing world with distributed applications”. Continuing on with the greater loop, the last text states that “Usage should determine the next set of priorities and learn”; this final statement feeds to an image labelled ‘Backlog’ before the loop returns to the Development section

Unlike many ‘born–in–the–cloud’ companies, Microsoft did not start with a SaaS offering. Most of the customers were using the on-premises version of their software (Team Foundation Server). When VSO was started, the division determined that they would maintain a single code base for both the SaaS and ‘box’ versions of the product, developing cloud-first.

When an engineer pushes code, it triggers a continuous integration pipeline. At the end of every three-week sprint, they release to the cloud, and after four to five sprints, they release a quarterly update for the on-premises product, as illustrated below:

In the upper arm are cycles labelled, consecutively, “Update 1”, “Update 2”, and “Update n”. In the lower arm, which is labeled “VSO Service Updates”, there are five loops which overlap and cover the same amount of time as the three updates mentioned earlier. In between the two arms is a loop labelled “vNext”.

To learn more about specific aspects of this journey, from source control to live-site culture, and from branching to open source, to alignment, see Our Journey to Cloud Cadence: Lessons Learned at Microsoft Developer Division, a free e-book by Sam Guckenheimer.

In our next step, we will complete our first CloudSwyft Hands-On Learning Lab for Week 1.

Join the discussion

What most appeals to you about the idea of implementing DevOps? Do you have any real-world challenges that it could solve or improve?
Use the discussion section below and let us know your thoughts. Try to respond to at least one other post and once you’re happy with your contribution, click the Mark as complete button to check the step off, then you can move to the next step.
This article is from the free online

Microsoft Future Ready: Fundamentals of DevOps and Azure Pipeline

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education