Distributed Systems¶
The hardest module to teach because the failure modes are unintuitive. Students raised on single-machine programming have an implicit assumption that the network is reliable. Disabusing them of this is half the battle.
The fallacies — still relevant¶
Peter Deutsch's eight fallacies of distributed computing (1994) remain the best framing I have found. I make students recite them.
- The network is reliable
- Latency is zero
- Bandwidth is infinite
- The network is secure
- Topology does not change
- There is one administrator
- Transport cost is zero
- The network is homogeneous
Every distributed system bug is a violation of at least one.
Consensus revisited¶
Paxos is the right answer to a question students do not yet have. Raft is the algorithm they will read first, will implement if asked, and will think they understand. They do not.
This year I am trying a new order: - Two-phase commit (intuitive, broken under coordinator failure — motivates the rest) - View-stamped replication (Liskov, often overlooked — Raft borrows heavily) - Raft (with explicit comparison to VSR) - Paxos (last, framed as the more general primitive)
Whether this works pedagogically I will know in February.
Reading¶
Vogels — Eventually Consistent. Still the best non-technical framing. Brewer — Towards Robust Distributed Systems. The CAP keynote. Kleppmann — Designing Data-Intensive Applications. The best modern textbook on the practical side.