5: Hash Tables
Hash tables implement dictionaries with average insert, lookup, and delete. The tradeoff: no ordering, and worst-case is .
Core Idea
Convert keys to array indices via a hash function:
index = hash(key) mod capacityCollisions occur when two keys hash to the same index.
Data Structure
DictNode:
- key: String
- val: Object
- next: Optional[DictNode] # for chaining
Dict:
- table: List[Optional[DictNode]]
- size: Integer
- capacity: Integer
- hash_func: Callable[[String], Integer]Collision Resolution: Chaining
Each bucket is a linked list of all elements that hash to that index.
insert(key, val):
index = hash_func(key) mod capacity
# Search bucket for existing key
node = table[index]
while node is not None:
if node.key == key:
node.val = val
return node
node = node.next
# Prepend new node
new_node = DictNode(key, val)
new_node.next = table[index]
table[index] = new_node
size += 1
get(key):
index = hash_func(key) mod capacity
node = table[index]
while node is not None:
if node.key == key:
return node
node = node.next
return NoneLoad Factor
- Higher → more collisions → slower
- Resize (double capacity, rehash all) when
Resizing is but amortized per insertion.
Runtime
| Operation | Average | Worst |
|---|---|---|
| insert | ||
| get | ||
| delete |
Why average O(1): With good hashing, elements distribute uniformly. Expected bucket length is . Bounded means .
Why worst O(n): All keys could hash to one bucket.
Universal Hashing
A universal hash family guarantees: for any two distinct keys, when hash function is chosen randomly.
Provides expected regardless of input distribution.
Hash Table vs BST
| Hash Table | Balanced BST | |
|---|---|---|
| Insert/Lookup/Delete | avg | |
| Ordered iteration | ||
| Range queries | ||
| Worst case |
Use hash tables when you only need key-based operations and don’t need ordering.