Redundancy

Add independent alternatives so one failure doesn’t stop the outcome; test failover so the backup isn’t imaginary.

Author

Reliability engineering & safety science (von Neumann, Shannon; modern SRE and business continuity practice)



Redundancy means providing more than one way to achieve a required function. In series systems the weakest link fails the whole; redundancy converts the path to parallel, so other components, suppliers or people can take over. It’s different from buffers (time/stock) and best when backups are independent and regularly exercised so they’ll work under stress.

How it works


Patterns

  • Active–active (parallel) – multiple units serve at once; one can disappear with no outage.
  • Active–passive (standby) – secondary takes over on failure; classify as hot/warm/cold by readiness.
  • 2N/N+1/quorum – full duplication (2N), one extra unit (N+1), or majority voting (quorum, RAID, consensus).

Independence & diversity – spread across vendors/regions/power/failure modes; add design diversity to avoid common-mode failure.

Reliability math (intuition) – series reliability multiplies (one failure kills); parallel succeeds if any path works.

Graceful degradation – non-essential features shed load to keep the core available.

People & process – cross-training, runbooks, and documentation raise the bus factor.

Data & backups – separate copies, media and locations; verify with restore tests.

Use-cases


SRE/IT – multi-AZ/region, load balancers, database replicas, circuit breakers.

Supply chaindual-source critical inputs; safety stock at bottlenecks.

Operations – spare capacity, alternate routes, manual fallbacks.

Finance – liquidity buffers, diversified facilities, ring-fenced risk.

Org design – deputy roles, rota coverage, shared ownership of key knowledge.

Pitfalls & Cautions


Common-mode failure – “redundant” paths sharing a region, provider, library or process.

Bit-rot – cold backups decay; no one practices the switchover.

Split-brain & inconsistency – unsynchronised replicas; design clear leadership/quorum rules.

Complexity tax – more parts mean more failure modes; keep designs simple and observable.

False comfort – redundancy without detection, automation, or runbooks.

Security surface – extra endpoints and creds expand attack surface; pair with controls.

Related Mental Models

Click below to learn other mental models

  • Eisenhower Matrix

    Eisenhower Matrix

    Prioritise by importance, not urgency: Do, Schedule, Delegate, or Eliminate.

  • Hanlon’s Razor

    Hanlon’s Razor

    Don’t attribute to malice what can be explained by error, ignorance or misaligned incentives.

  • Agency (high / Low)

    Agency (high / Low)

    A practical lens for how people approach problems: low-agency waits for circumstances; high-agency creates options and moves first.

Preparing reader…