5: Hash Tables

Hash tables implement dictionaries with O(1)O(1) average insert, lookup, and delete. The tradeoff: no ordering, and worst-case is O(n)O(n).


Core Idea

Convert keys to array indices via a hash function:

index = hash(key) mod capacity

Collisions occur when two keys hash to the same index.


Data Structure

DictNode:
  - key: String
  - val: Object
  - next: Optional[DictNode]  # for chaining

Dict:
  - table: List[Optional[DictNode]]
  - size: Integer
  - capacity: Integer
  - hash_func: Callable[[String], Integer]

Collision Resolution: Chaining

Each bucket is a linked list of all elements that hash to that index.

insert(key, val):
    index = hash_func(key) mod capacity
    # Search bucket for existing key
    node = table[index]
    while node is not None:
        if node.key == key:
            node.val = val
            return node
        node = node.next
    # Prepend new node
    new_node = DictNode(key, val)
    new_node.next = table[index]
    table[index] = new_node
    size += 1

get(key):
    index = hash_func(key) mod capacity
    node = table[index]
    while node is not None:
        if node.key == key:
            return node
        node = node.next
    return None

Load Factor

α=nm=elementscapacity\alpha = \frac{n}{m} = \frac{\text{elements}}{\text{capacity}}

  • Higher α\alpha → more collisions → slower
  • Resize (double capacity, rehash all) when α>0.75\alpha > 0.75

Resizing is O(n)O(n) but amortized O(1)O(1) per insertion.


Runtime

OperationAverageWorst
insertO(1)O(1)O(n)O(n)
getO(1)O(1)O(n)O(n)
deleteO(1)O(1)O(n)O(n)

Why average O(1): With good hashing, elements distribute uniformly. Expected bucket length is α\alpha. Bounded α\alpha means O(1)O(1).

Why worst O(n): All keys could hash to one bucket.


Universal Hashing

A universal hash family guarantees: for any two distinct keys, P(collision)1/mP(\text{collision}) \leq 1/m when hash function is chosen randomly.

ha,b(k)=((ak+b)modp)modmh_{a,b}(k) = ((ak + b) \mod p) \mod m

Provides expected O(1)O(1) regardless of input distribution.


Hash Table vs BST

Hash TableBalanced BST
Insert/Lookup/DeleteO(1)O(1) avgO(logn)O(\log n)
Ordered iterationO(nlogn)O(n \log n)O(n)O(n)
Range queriesO(n)O(n)O(logn+k)O(\log n + k)
Worst caseO(n)O(n)O(logn)O(\log n)

Use hash tables when you only need key-based operations and don’t need ordering.