Skip to content

Data Structures

Module sits between Algorithms (theoretical) and Programming Project (applied). The challenge is making the asymptotic analysis feel grounded — students will accept that a hash table is O(1) on average but cannot tell you when it stops being true.

Hash tables — the part nobody teaches well

Coalesced hashing, Robin Hood hashing, cuckoo hashing — none of these are in the standard syllabus and probably should not be at undergraduate level. But the open addressing vs separate chaining tradeoff should be more than two slides. Specifically:

  • Cache behaviour: open addressing with linear probing wins under modern memory hierarchy. Chaining is what every textbook draws but separate allocations kill performance.
  • Load factor matters more than the hash function for collision rate in practice.
  • Rehashing costs amortise but the worst-case insert is O(n) — relevant for real-time systems.

Trees that matter

Red-black: in every standard library, none of my students can implement one. That is fine, they should not need to. B-trees: motivated by the storage hierarchy, leads naturally into the databases module. Tries: criminally underused in industry given how good they are for autocomplete and IP routing.

Skip lists are elegant but I have stopped teaching them — the probabilistic analysis loses people.

Coursework idea (saved for next year)

Implement an LSM tree from scratch. Constrained problem, real-world relevance, exposes them to amortised analysis, and the result is interesting enough to be a CV item.