High-performing Engineering Teams, and the Holy Grail

holy grail pfffft
dev productivity
cncf-landscape

Jeremy Meiss

Director, DevRel &

CircleCI

So back to the tech industry…

Back to the Tech Industry
You Seek the Holy Grail

Forrester 2021 Total Economic Impact Study

Using best-in-class CI/CD platform can provide:
  • $7.8 million saved from shorter software development cycles.
  • $4.3 million recuperated in lost developer productivity.
  • 50% decrease in annual infrastructure spend.
  • $1.7 million estimated value of improved code quality.
dude this is a Wendy’s
state-of-reports
one size doesn’t fit all
Image: Consumer Choice Center

CI/CD Benchmarks for high-performing teams

Duration

Mean time to resolve

Success rate

Throughput

head banging good time

Duration

the foundation of software engineering velocity, measures the average time in minutes required to move a unit of work through your pipeline

efficency
rejoice rejoice
pipeline optimization

Duration Benchmark

<=10 minute builds

"a good rule of thumb is to keep your builds to no more than ten minutes. Many developers who use CI follow the practice of not moving on to the next task until their most recent check-in integrates successfully. Therefore, builds taking longer than ten minutes can interrupt their flow."

Paul M. Duvall (2007). Continuous Integration: Improving Software Quality and Reducing Risk

Duration: What the data shows

duration
Benchmark: 5-10mins

Improving test coverage

  • Add unit, integration, UI, end-to-end testing across all app layers
  • Add code coverage into pipelines to identify inadequate testing
  • Include static and dynamic security scans to catch vulnerabilities
  • Incorporate TDD practices by writing tests during design phase

Optimizing your pipelines

  • Use test splitting & parallelism for simultaneous multiple tests
  • Cache dependencies & data to avoid rebuilding unchanged code
  • Use Docker images custom made for CI environments
  • Choose the right machine size for your needs
dancing efficiency

Mean time to Recovery

the average time required to go from a failed build signal to a successful pipeline run

pipeline time to recovery

"A key part of doing a continuous build is that if the mainline build fails, it needs to be fixed right away. The whole point of working with CI is that you're always developing on a known stable base."

Martin Fowler (2006). “Continuous Integration.” Web blog post. MartinFowler.com

MTTR Benchmark

<=60min MTTR on default branches

MTTR: What the data shows

mttr-data
Benchmark: 60 mins

Treat your default branch as the lifeblood of your project

long live CI/CD

Getting to faster recovery times

  • Treat default branch as the lifeblood of your project
  • Set up instant alerts for failed builds (Slack, Pagerduty, etc.)
  • Write clear, informative error messages for your tests
  • SSH into the failed build machine to debug remote test env

Success rate

number of passing runs divided by the total number of runs over a period of time

Failed signals are not all bad

failed signals are ok

Success rate benchmark

90%+ success rate on default branches

Success rate: What the data shows

success-rate-data
Benchmark: 90%+ on default

Throughput

average number of workflow runs that an organization completes on a given project per day

high velocity
only a model

Throughput benchmark

It depends.

Throughput: What the data shows

throughput-data
Benchmark: at the speed of your business
Throughput is the most dependent on the other metrics
trailing indicators

High-performing teams in 2023

high-performing teams

The impact of Platform teams

Platform Teams, DevOps, and YOU

No, DevOps is not dead

devops is not dead
cncf landscape

The Rise of Platform Teams

platform teams
platform

Platform Perspective: Duration

  • Identify and eliminate impediments to developer velocity
  • Set guardrails and enforce quality standards across projects
  • Standardize test suites & CI configs (shareable configs / policies)
  • Welcome failed pipelines, i.e. fast failure
  • Actively monitor, streamline, & parallelize pipelines across the org

Platform Perspective: MTTR

  • Ephasise value of deploy-ready, default branches
  • Set up effective monitoring & alerting systems, track recovery time
  • Limit frequency & severity of broken builds w/ role-based policies
  • Config- and Infrastructure-as-Code tools limit misconfig potential
  • Actively monitor, streamline, & parallelize pipelines across the org

Platform Perspective: Success Rate

  • With low success rates, look at MTTR & shorten recovery time first
  • Set baseline success rate, aim for continuous improvement, look for flaky tests or test coverage gaps
  • Be mindful of patterns & influence of external factors, i.e. decline on Fridays, holidays, etc.

Platform Perspective: Throughput

  • Map goals to reality of internal & external business situations, i.e. customer expectations, competitive landscape, codebase complexity, etc.
  • Capture a baseline, monitor for deviations
  • Alleviate as much developer cognitive load from day-to-day work

2023 State of Software Delivery Report

2023 state of software delivery report
go.jmeiss.me/SoSDR2023

Thank You.

timeline.jerdog.me
@IAmJerdog
@jerdog
/in/jeremymeiss
@jerdog@hachyderm.io