Redundancy

Add independent alternatives so one failure doesn’t stop the outcome; test failover so the backup isn’t imaginary.

Author

Reliability engineering & safety science (von Neumann, Shannon; modern SRE and business continuity practice)



Redundancy means providing more than one way to achieve a required function. In series systems the weakest link fails the whole; redundancy converts the path to parallel, so other components, suppliers or people can take over. It’s different from buffers (time/stock) and best when backups are independent and regularly exercised so they’ll work under stress.

How it works


Patterns

  • Active–active (parallel) – multiple units serve at once; one can disappear with no outage.
  • Active–passive (standby) – secondary takes over on failure; classify as hot/warm/cold by readiness.
  • 2N/N+1/quorum – full duplication (2N), one extra unit (N+1), or majority voting (quorum, RAID, consensus).

Independence & diversity – spread across vendors/regions/power/failure modes; add design diversity to avoid common-mode failure.

Reliability math (intuition) – series reliability multiplies (one failure kills); parallel succeeds if any path works.

Graceful degradation – non-essential features shed load to keep the core available.

People & process – cross-training, runbooks, and documentation raise the bus factor.

Data & backups – separate copies, media and locations; verify with restore tests.

Use-cases


SRE/IT – multi-AZ/region, load balancers, database replicas, circuit breakers.

Supply chaindual-source critical inputs; safety stock at bottlenecks.

Operations – spare capacity, alternate routes, manual fallbacks.

Finance – liquidity buffers, diversified facilities, ring-fenced risk.

Org design – deputy roles, rota coverage, shared ownership of key knowledge.

Pitfalls & Cautions


Common-mode failure – “redundant” paths sharing a region, provider, library or process.

Bit-rot – cold backups decay; no one practices the switchover.

Split-brain & inconsistency – unsynchronised replicas; design clear leadership/quorum rules.

Complexity tax – more parts mean more failure modes; keep designs simple and observable.

False comfort – redundancy without detection, automation, or runbooks.

Security surface – extra endpoints and creds expand attack surface; pair with controls.

Recent Mental Models

Click below to learn other mental models

  • The Idea Maze

    The Idea Maze

    Before building, map the space: the key forks, dead ends and dependencies—so you can choose a promising path and run smarter tests.

  • Thucydides Trap

    Thucydides Trap

    When a rising power threatens to displace a ruling power, fear and miscalculation can tip competition into conflict unless incentives and guardrails are redesigned.

  • Zero to One

    Zero to One

    Aim for vertical progress—create something truly new (0 → 1), not just more of the same (1 → n). Win by building a monopoly on a focused niche and compounding from there.