Quality through autonomous teams

On 18-04-2018
Category: Blog, Development

In the heart of the Agile method are autonomous teams that take responsibility and make their own decisions. Transitioning to self-organizing teams may seem like a risk, a move intuitively perilous to quality. But Rini van Solingen explains how bol.com and ANWB used autonomous teams to greatly improve the quality of their systems.

Quality without control?

The core of the agile method is to deliver results in rapid sprints, and always deliver the most valuable results first. Autonomous teams are key to the method’s success: teams that have free rein to translate ideas into working results. The term ‘autonomy’, however, has different connotations for different people. It can call to mind images of bumper cars, with gridlock and accidents. Or of a flock of starlings: thousands of birds twisting through the sky in a complex choreography. Both images are correct.

So, it’s only natural that a transition to agile is accompanied by trepidation about the level of quality that the autonomous teams will deliver. If teams are in charge of their own work, won’t they be tempted to cut corners? What are the implications for standards, certifications and regulators? Could the transition even jeopardize the company’s reputation?

These concerns are inherent to letting go, especially letting go of control. And what will happen to quality as a consequence?

Incorrect

This negative image of autonomy is incorrect, but only when the teams are genuinely responsible. Don’t let a team build something and hand its work off to another team for testing, and yet another for support. Everything must be done by the same team. If you built it, you’re testing and supporting it. Every measure that enhances a team’s autonomy and promotes its insight into the results of the work delivered, will have a direct impact on the level of quality (see ‘Seven steps’).

But if all you’re doing is delegating the small decisions, you’re not establishing true autonomy. The biggest step is to make teams so robust that they can do their work independent of other teams. The more interconnected and interdependent teams are, the lower their actual autonomy. In large organizations, with dozens or even hundreds of teams, this can present the greatest challenge to implementing agile ways of working.

Resilience

A key success factor in decreasing dependencies is to increase the resilience of teams and applications. Fully autonomous teams will assume that other teams occasionally fail to deliver – and they will be ready for that. Resilient teams plan for contingencies, and do not allow external factors to jeopardize their results. If the iDeal payment system is down, bol.com will ship your order with an invoice. If the bol.com search feature is down, you can use the menu to navigate to the desired items. And there is no reason to blame other teams for their setbacks if they no longer impede you. As one of the software engineers put it: “The art of forgiving, because the show must go on!”

Empirically, autonomous agile teams experience a marked increase in the quality of their work (see ‘bol.com and ANWB Roadside Assistance in numbers’). Real autonomy, in terms of the team and its environment, has a way of accomplishing that. If you ask people to ‘eat their own dog food’, they’ll make caviar canapés.

Seven steps to improve quality

  1. Remove external dependencies

In the process of delivering solutions, teams sometimes have to work together with external parties. But whenever external input is on the critical path, it can delay your agile teams. Remove these dependencies, for example by using products as-is, or by running software on-premise.

  1. Collaborate closely with business teams

The effort you put into the quality of your systems & interfaces, directly benefits your customers. A close and permanent level of cooperation between IT teams and business teams increases the sense of ownership for all parties. Which, in turn, elevates the quality.

  1. Offer autonomous services

Present the architecture in terms of services, and let teams use each other’s components freely. Teams are responsible for designing their own interfaces. Attractive interfaces will inspire confidence in each other’s components.

  1. Make your own services resilient

A key step towards achieving autonomy is for teams to plan for their own services to continue working, even when other teams’ services fail (resilience). This approach has an enormous positive impact on the quality and availability of the system as a whole.

  1. Release and take ownership yourself

Allocate all required competencies in one self-contained agile team and make the team responsible for all process steps: from idea to support. This includes every team releasing its products to the production environment. This automatically instills a team’s desire to monitor its own components.

  1. Fail fast but safe

Offering an infrastructure in which teams can deploy to pilot users, or perform automatic rollbacks, makes it easier for teams to deliver high quality work. Failing quickly in a safe environment can alert the team to potential large-scale quality issues.

  1. Automate all quality checks

Automated test tools, for functional and non-functional tests, are tremendous time savers. Making these tools available will motivate teams to use them regularly to check the quality of their work.

Bol.com and ANWB roadside assistance in numbers

Advanced autonomy at ANWB and bol.com has resulted in measurable quality improvements:

  • Reduction of serious incidents

Bol.com differentiates between critical and emergency incidents. Critical incidents are addressed as soon as they are reported, but only worked on during office hours. In case of emergencies, the team stays until the problem is solved. The number of emergency incidents has declined proportionally in five years’ time to 50 percent of the baseline level, to just one per week on average. The number of critical incidents has declined proportionally by 75 percent to about one per day. Per agile team, this works out to an average of just 1 emergency and 5 critical incidents per year.

In 2013, ANWB switched to autonomous agile teams for its roadside emergency assistance (call center and respondents). The ANWB roadside assistance staff receive over a million calls per year. Since introducing autonomy, the number of serious incidents has gradually decreased from 32 (Priority 1) in 2013 to 15 in 2017. So far in 2018, there have been no serious incidents.

  • Reduced repair time

Incidents are also resolved quicker when teams are autonomous. Bol.com has seen a 70% reduction in repair time for critical incidents between 2014 and 2017: from 42 hours to just 13 on average. Two times out of three, the incident is resolved before the next morning.

At ANWB, the average repair time for incidents is under 90 minutes, and all incidents are resolved within one working day.

  • Independent incident resolution

When teams resolve their own incidents, this creates clear lines of responsibility and greater transparency when issues arise. At ANWB and bol.com, this has entirely obviated the need for incident managers. An incident is handled by whatever team delivered the product.

  • Autonomous releases and direct feedback

At ANWB, agile teams release their own code to the production environment. They have an agreement with the call center that, to avoid unnecessary risks, no code will be deployed during rush hour traffic. ANWB currently has one to four releases per day that pertain to their roadside assistance service.

The agile teams at bol.com are entirely autonomous: they can choose whatever time is convenient for them. Sometimes they deploy during office hours, and sometimes early in the morning, just in case. Bol.com releases roughly 900 times per month. That’s an average of 40 releases per day. And they do so with great confidence in their ability to correctly assess impacts and risks. On Black Friday, the busiest day of the 2017, they did over 50 releases.

  • Advanced automation of quality controls

The ANWB call center runs an automated test set that includes over 7,000 unit tests, 200 functional regression tests and 1 large performance & load test that consists of 2,000 specific scenarios. These tests can be run by every developer, and they are run at least once per day and prior to every release.

At bol.com, every team has its own test environment to conduct automated functional and non-functional tests. They currently have over 12,000 automated tests that they run twice per day. Almost 300 of these are performance & load tests.

Text by Rini van Solingen, in close cooperation with Sjoerd van den Berg, Peter Brouwers, Nick Tinnemeier and Frederieke Ubels at bol.com, and with Sjoerd Hemminga and Gijs Scheepers at ANWB.