CI/CD pipelines that worked fine at 20 engineers start to break down at 200. The failure modes of CI at scale are consistent across organisations.
Build time as a first-class metric
A CI pipeline that takes 45 minutes provides weak feedback loops. Developers stop waiting for CI before merging and start merging and hoping. The feedback loop erosion happens gradually: CI grows from 10 minutes to 15 to 20 and nobody treats it as a problem because each addition was incremental. The organisational treatment: make build time a reported metric, set a target (10 minutes or less for the majority of PRs), and treat regressions as incidents.
Test parallelisation and sharding
The fastest path to reducing CI time is parallelisation. Running 200 unit tests sequentially in a single container takes 10 minutes; running the same tests in 10 parallel containers takes 1 minute plus parallelisation overhead. Most CI platforms (GitHub Actions, Azure DevOps, CircleCI) support matrix builds for parallel test sharding. The investment in parallelisation returns immediately in developer cycle time.
Dependency caching
Docker layer caching, npm/pip/NuGet package caching, and compilation caching (ccache, sccache, BuildKit) can reduce CI times by 50-70% for incremental runs. The discipline required: structure Dockerfiles to separate dependency installation (changes infrequently, should be cached) from application code (changes every commit, cache miss is expected). Cache invalidation from unnecessarily broad COPY statements is a common CI performance anti-pattern.
Flaky test management
Flaky tests (tests that intermittently fail without code changes) are a systemic CI reliability problem at scale. At 100 tests with 1% individual flakiness, approximately one test will fail in every CI run on average. The flaky test backlog grows faster than it is resolved if there is no explicit ownership. Systematic approaches: detect flakiness by running tests multiple times on the same commit, tag flaky tests as known-flaky and quarantine them, track flakiness rates by test file and assign ownership.