Skip to content

Distributed Systems

The hardest module to teach because the failure modes are unintuitive. Students raised on single-machine programming have an implicit assumption that the network is reliable. Disabusing them of this is half the battle.

The fallacies — still relevant

Peter Deutsch's eight fallacies of distributed computing (1994) remain the best framing I have found. I make students recite them.

  1. The network is reliable
  2. Latency is zero
  3. Bandwidth is infinite
  4. The network is secure
  5. Topology does not change
  6. There is one administrator
  7. Transport cost is zero
  8. The network is homogeneous

Every distributed system bug is a violation of at least one.

Consensus revisited

Paxos is the right answer to a question students do not yet have. Raft is the algorithm they will read first, will implement if asked, and will think they understand. They do not.

This year I am trying a new order: - Two-phase commit (intuitive, broken under coordinator failure — motivates the rest) - View-stamped replication (Liskov, often overlooked — Raft borrows heavily) - Raft (with explicit comparison to VSR) - Paxos (last, framed as the more general primitive)

Whether this works pedagogically I will know in February.

Reading

Vogels — Eventually Consistent. Still the best non-technical framing. Brewer — Towards Robust Distributed Systems. The CAP keynote. Kleppmann — Designing Data-Intensive Applications. The best modern textbook on the practical side.