1 Introduction


  • Problem: binary relationship from inputs to outputs

  • Algorithm: procedure mapping each input to a single output

    • An algorithm solves a problem if it returns a correct output for each and every problem input
  • Correctness:

    • For small inputs: can use case analysis
    • For arbitrarily large inputs: algorithm either is recursive or loop in some way. Use induction.
  • Efficiency: how fast does an algorithm produce a correct output?

    • Count the number of fixed time operations algorithm takes to return
    • Asymptotic Notation: ignore constant factors and low order terms
    input constant logarithmic linear log-linear quadratic polynomial exponential
    \(n\) \(Θ(1)\) \(Θ(log n)\) \(Θ(n)\) \(Θ(n log n)\) \(Θ(n^2)\) \(Θ(n^c )\) \(2^Θ(n^c)\)
    \(1000\) \(1\) \(≈ 10\) \(1000\) \(≈ 10,000\) \(1,000,000\) \(1000^c\) \(2^1000 ≈ 10^301\)
    Time \(1 ns\) \(10 ns\) \(1 µs\) \(10 µs\) \(1 ms\) \(10^{(3c-9)} s\) \(10^281 millenia\)
  • Model of Computation: what operations on the machine can be performed in \(O(1)\) time.

    • Machine word: block of w bits (w is word size of a w-bit Word-RAM)
    • Memory: Addressable sequence of machine words
    • Processor supports many constant time operations on a \(O(1)\) number of words (integers):
      • integer arithmetic: (+, -, *, //, %)
      • logical operators: (&&, ||, !, ==, <, >, <=, =>)
      • bitwise arithmetic: (&, |, «, », …)
      • Given word a, can read word at address a, write word to address a
  • Data Structure: a way to store non-constant data, that supports a set of operations
    • A collection of operations is called an interface
    • Example:
      • Sequence: Extrinsic order to items (first, last, nth)
      • Set: Intrinsic order to items (queries based on item keys)
    • Data structures may implement the same interface with different performance
  • Example: Static Array - fixed width slots, fixed length, static sequence interface
    • StaticArray(n): allocate static array of size n initialized to 0 in \(Θ(n)\) time
    • StaticArray.get_at(i): return word stored at array index i in \(Θ(1)\) time
    • StaticArray.set_at(i, x): write word x to array index i in \(Θ(1)\) time

More on Asymptotic Notation

  • \(O\) Notation:
    • Non-negative function \(g(n)\) is in \(O(f(n))\) if and only if there exists a positive real number \(c\) and positive integer \(n_0\) such that \(g(n) ≤ c · f(n)\) for all \(n ≥ n_0\).
  • \(Ω\) Notation:
    • Non-negative function \(g(n)\) is in \(Ω(f(n))\) if and only if there exists a positive real number c and positive integer \(n_0\) such that \(c · f(n) ≤ g(n)\) for all \(n ≥ n_0\).
  • \(Θ Notation\):
    • Non-negative \(g(n)\) is in \(Θ(f(n))\) if and only if \(g(n) ∈ O(f(n))∩Ω(f(n))\)

2 Data Structures


Data Structure Interfaces

  • A data structure is a way to store data, with algorithms that support operations on the data
  • Collection of supported operations is called an interface (also API or ADT)
  • Interface is a specification: what operations are supported (the problem!)
  • Data structure is a representation: how operations are supported (the solution!)

Sequence Interface (L02, L07)

  • Maintain a sequence of items (order is extrinsic)
  • Ex: (\(x_0\), \(x_1\), \(x_2\), . . . , \(x_{n-1}\)) (zero indexing)
  • use n to denote the number of items stored in the data structure
  • Supports sequence operations:
Type Interface Specification
Container build(X) given an iterable X, build sequence from items in X
  len() return the number of stored items
Static iter_seq() return the stored items one-by-one in sequence order
  get_at(i) return the \(i^{th}\) item
  set_at(i, x) replace the \(i^{th}\) item with x
Dynamic insert_at(i, x) add \(x\) as the \(i^{th}\) item
  delete_at(i, x) remove and return the \($i^{th}\) item
  insert_fist(x) add \(x\) as the first item
  delete_first(x) remove and return the first item
  insert_last(x) add \(x\) as the last item
  delete_last(x) remove and return the last item
  • Special case interfaces:
    • stack: insert_last(x) and delete_last()
    • queue: insert_last(x) and delete_first()

Set Interface (L03-L08)

  • Sequence is about extrinsic order, set is about intrinsic order
  • Maintain a set of items having unique keys (e.g., item x has key x.key)
  • Set or multi-set? We restrict to unique keys for now.
  • Often we let key of an item be the item itself, but may want to store more info than just key
  • Supports set operations:

    Type Interface Specification
    Container build(X) given an iterable X, build sequence from items in X
      len() return the number of stored items
    Static find(k) return the stored item with key k
    Dynamic insert(x) add x to set (replace item with key x.key if one already exist)
      delete(x) remove and return the stored item with key k
    Order iter_ord() return the stored items one-by-one in key order
      find_min() return the stored item with smallest key
      find_max() return the stored item with largest key
      find_next(k) return the stored item with smallest key larger than k
      find_prev(k) return the stored item with largest key smaller than k
  • Special case interfaces:
  • dictionary: set without the Order operations

Array Sequence

  • Array is great for static operations! get at(i) and set at(i, x) in Θ(1) time!
  • But not so great at dynamic operations…
  • For consistency, we maintain the invariant that array is full.
  • Then inserting and removing items requires:
    • reallocating the array
    • shifting all items after the modified item
Sequence Data Structure API Type     Worst Case \(O(\cdot)\)  
Array Container Static Dynamic    
API build(x) get_at(i)
insert_at(i, x)
Array \(n\) \(1\) \(n\) \(n\) \(n\)

Linked List Sequence

  • Pointer data structure (this is not related to a Python “list”)
  • Each item stored in a node which contains a pointer to the next node in sequence
  • Each node has two fields: node.item and node.next
  • Can manipulate nodes simply by relinking pointers!
  • Maintain pointers to the first node in sequence (called the head)
  • Can now insert and delete from the front in \(Θ(1)\) time! Yay!
  • (Inserting/deleting efficiently from back is also possible; you will do this in PS1)
  • But now get_at(i) and set_at(i, x) each take \(O(n)\) time… :(
  • Can we get the best of both worlds? Yes! (Kind of…)
Sequence Data Structure API Type     Worst Case \(O(\cdot)\)  
Array Container Static Dynamic    
API build(x) get_at(i)
insert_at(i, x)
Linked List \(n\) \(n\) \(1\) \(n\) # 1 if we keep track of tail \(n\)

Dynamic Array Sequence

  • Make an array efficient for last dynamic operations
  • Python “list” is a dynamic array
  • Idea! Allocate extra space so reallocation does not occur with every dynamic operation
  • Fill ratio: \(0 ≤ r ≤ 1\) the ratio of items to space
  • Whenever array is full ($r = 1$), allocate \(Θ(n)\) extra space at end to fill ratio \(r_i\) (e.g., 1/2)
  • Will have to insert \(Θ(n)\) items before the next reallocation
  • A single operation can take \(Θ(n)\) time for reallocation
  • However, any sequence of \(Θ(n)\) operations takes \(Θ(n)\) time
  • So each operation takes \(Θ(1)\) time “on average”
Sequence Data Structure API Type     Worst Case \(O(\cdot)\)  
Array Container Static Dynamic    
API build(x) get_at(i)
insert_at(i, x)
Dynamic Array \(n\) \(1\) \(n\) \(1_{(a)}\) \(n\)

Amortized Analysis

  • Data structure analysis technique to distribute cost over many operations
  • Operation has amortized cost \(T(n)\) if k operations cost at most \(≤ kT(n)\)
  • “\(T(n)\) amortized” roughly means \(T(n)\) “on average” over many operations
  • Inserting into a dynamic array takes \(Θ(1)\) amortized time

Dynamic Array Deletion

  • Delete from back? \(Θ(1)\) time without effort, yay!
  • However, can be very wasteful in space. Want size of data structure to stay \(Θ(n)\)
  • Attempt: if very empty, resize to r = 1. Alternating insertion and deletion could be bad…
  • Idea! When \(r < r_d\), resize array to ratio \(r_i\) where \(r_d < r_i\) (e.g., \(r_d = 1/4, r_i = 1/2\))
  • Then \(Θ(n)\) cheap operations must be made before next expensive resize
  • Can limit extra space usage to \((1 + ε)n\) for any \(ε > 0\) (set \(r_d = \frac{1}{1+\epsilon}, r_i = \frac{r_d + 1}{2}\))
  • Dynamic arrays only support dynamic last operations in \(Θ(1)\) time
  • Python List append and pop are amortized \(O(1)\) time, other operations can be \(O(n)\)!
  • (Inserting/deleting efficiently from front is also possible; you will do this in PS1)
Sequence Data Structure API Type     Worst Case \(O(\cdot)\)  
Array Container Static Dynamic    
API build(x) get_at(i)
insert_at(i, x)
Static Array \(n\) \(1\) \(n\) \(n\) \(n\)
Linked List \(n\) \(n\) \(1\) \(n\) # 1 if we keep track of tail \(n\)
Dynamic Array \(n\) \(1\) \(n\) \(1_{(a)}\) \(n\)

3 Sorting


Set Interface

  • Storing items in an array in arbitrary order can implement a (not so efficient) set
  • Stored items sorted increasing by key allows:
    • faster find min/max (at first and last index of array)
    • faster finds via binary search: \(O(log n)\)
Set Data Structure API Type     Worst Case \(O(\cdot)\)  
Set Container Static Dynamic    
API build(x) find(k) insert(k)
Array \(n\) \(n\) \(n\) \(n\) \(n\)
Sorted Array \(nlogn\) \(logn\) \(n\) \(1\) \(logn\)
  • But how to construct a sorted array efficiently?


  • Given a sorted array, we can leverage binary search to make an efficient set data structure.
  • Input: (static) array A of n numbers
  • Output: (static) array B which is a sorted permutation of A
    • Permutation: array with same elements in a different order
    • Sorted: B[i - 1]B[i] for all \(i ∈ {1, . . . , n}\)
  • Example: \([8, 2, 4, 9, 3] → [2, 3, 4, 8, 9]\)
  • A sort is destructive if it overwrites \(A\) (instead of making a new array \(B\) that is a sorted version of \(A\))
  • A sort is in place if it uses \(O(1)\) extra space (implies destructive: in place ⊆ destructive)

Permutation Sort

  • There are \(n!\) permutations of A, at least one of which is sorted. (Due to duplications)
  • For each permutation, check whether sorted in \(Θ(n)\)
  • Example: \([2, 3, 1] → {[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]}\)
def permutation_sort(A):
    """Sort A"""
	for B in permutations(A): # O(n!)
		if is_sorted(B): # O(n)
	return B
  • permutation sort analysis:
    • Correct by case analysis: try all possibilities (Brute Force)
    • Running time: \(Ω(n! · n)\) which is exponential :(

Solving Recurrences

  • Substitution: Guess a solution, replace with representative function, recurrence holds true
  • Recurrence Tree: Draw a tree representing the recursive calls and sum computation at nodes
  • Master Theorem: A formula to solve many recurrences (R03)

Selection Sort

  • Find a largest number in prefix A[:i + 1] and swap it to A[i]
  • Recursively sort prefix A[:i]
  • Example: \([8, 2, 4, 9, 3], [8, 2, 4, 3, 9], [3, 2, 4, 8, 9], [3, 2, 4, 8, 9], [2, 3, 4, 8, 9]\)
def selection_sort(A, i=None):
    """Sort A[:i+1]"""
    if i is None: i = len(A) - 1
    if i > 0:
        j = prefix_max(A, i)
        A[i], A[j] = A[j], A[i]
        selection_sort(A, i - 1)
def prefix_max(A, i):
    """Return index of maximum in A[:i+1]"""
    if i > 0:
        j = prefix_max(A, i - 1)
        if A[i] < A[j]:
            return j
	return i
  • prefix_max analysis:
    • Base case: for i = 0, array has one element, so index of max is \(i\)
    • Induction: assume correct for $i$, maximum is either the maximum of A[:i] or A[i], returns correct index in either case.
    • \[S(1) = Θ(1); S(n) = S(n - 1) + Θ(1)\]
      • Substitution: \(S(n) = Θ(n)\), \(cn = Θ(1) + c(n - 1) =⇒ 1 = Θ(1)\)
      • Recurrence tree: chain of \(n\) nodes with \(Θ(1)\) work per node, \(\Sigma_{i=0}^{n-1}1=Θ(n)\)

Insertion Sort

  • Recursively sort prefix A[:i]
  • Sort prefix A[:i + 1] assuming that prefix A[:i] is sorted by repeated swaps
  • Example: \([8, 2, 4, 9, 3], [2, 8, 4, 9, 3], [2, 4, 8, 9, 3], [2, 4, 8, 9, 3], [2, 3, 4, 8, 9]\)
def insertion_sort(A, i=None):
    """Sort A[:i+1]"""
    if i is None: i = len(A) - 1
    if i > 0:
        insertion_sort(A, i-1)
        insert_last(A, i)
def insert_last(A, i):
    """Sort A[:i+1] assuming sorted A[:i]"""
    if i > 0 and A[i] < A[i-1]:
    	A[i], A[i-1] = A[i-1], A[i]
        insert_last(A, i-1)
  • insert_last analysis:
    • Base case: for \(i = 0\), array has one element so is sorted
    • Induction: assume correct for \(i\), if \(A[i] >= A[i - 1]\), array is sorted; otherwise, swapping last two elements allows us to sort A[:i] by induction. □
    • \[S(1) = Θ(1); S(n) = S(n - 1) + Θ(1) =⇒ S(n) = Θ(n)\]
  • insertion_sort analysis:
    • Base case: for \(i = 0\), array has one element so is sorted
    • Induction: assume correct for $i$, algorithm sorts A[:i] by induction, and then insert last correctly sorts the rest as proved above. □
    • \[T(1) = Θ(1); T(n) = T(n - 1) + Θ(n) =⇒ T(n) = Θ(n^2)\]

Merge Sort

  • Recursively sort first half and second half (may assume power of two)
  • Merge sorted halves into one sorted list (two finger algorithm)
  • Example: \([7, 1, 5, 6, 2, 4, 9, 3], [1, 7, 5, 6, 2, 4, 3, 9], [1, 5, 6, 7, 2, 3, 4, 9], [1, 2, 3, 4, 5, 6, 7, 9]\)
def merge_sort(A, lo=0, hi=None):
    """Sort A[lo:hi]"""
    if hi is None: hi = len(A)
    if hi - lo > 1:
        mid = (lo + hi + 1) // 2
        merge_sort(A, lo, mid)
        merge_sort(A, mid, hi)
        left, right = A[lo:mid], A[mid:hi]
        merge(left, right, A, len(left), len(right), lo, hi)
def merge(left, right, A, i, j, lo, hi):
    """Merge sorted left[:i] anr right[:j] into A[lo:hi]"""
    if lo < hi:
        if (j <= 0) or (i > 0 and left[i-1] > right[j-1]):
            A[hi-1] = left[i-1]
            i -= 1
            A[hi-1] = right[j-1]
            j -= 1
        merge(left, right, A, i, j, lo, hi -1)
  • merge analysis:
    • Base case: for \(n = 0\), arrays are empty, so vacuously correct
    • Induction: assume correct for \(n\), item in A[r] must be a largest number from remaining prefixes of left and right, and since they are sorted, taking largest of last items suffices; remainder is merged by induction. □
    • \[S(0) = Θ(1); S(n) = S(n - 1) + Θ(1) =⇒ S(n) = Θ(n)\]
  • merge_sort analysis:
    • Base case: for \(n = 1\), array has one element so is sorted
    • Induction: assume correct for \(k < n\), algorithm sorts smaller halves by induction, and then merge merges into a sorted array as proved above. □
    • \[T(1) = Θ(1); T(n) = 2T(n/2) + Θ(n)\]
      • Substitution: Guess \(T(n) = Θ(n log n)\) \(cn log n = Θ(n) + 2c(n/2)log(n/2) =⇒ cn log(2) = Θ(n)\)
      • Recurrence Tree: complete binary tree with depth \(log2 n\) and \(n\) leaves, level \(i\) has \(2^i\) nodes with \(O(n/2^i )\) work each, total: \(\Sigma_{i=0}^{log_2^n}(2i )(n/2^i ) = \Sigma_{i=0}^{log_2^n}n = Θ(n log n)\)

Master Theorem

  • The Master Theorem provides a way to solve recurrence relations in which recursive calls decrease problem size by a constant factor.
  • Given a recurrence relation of the form \(T(n) = aT(n/b)+ f(n)\) and \(T(1) = Θ(1)\), with branching factor \(a ≥ 1\), problem size reduction factor \(b > 1\), and asymptotically non-negative function \(f(n)\), the Master Theorem gives the solution to the recurrence by comparing \(f(n)\) to \(a^{log_b^{n}}=n^{log_b^a}\) , the number of leaves at the bottom of the recursion tree.
  • When \(f(n)\) grows asymptotically faster than \(n\) , the work done at each level decreases geometrically so the work at the root dominates;
  • alternatively, when \(f(n)\) grows slower, the work done at each level increases geometrically and the work at the leaves dominates.
  • When their growth rates are comparable, the work is evenly spread over the tree’s \(O(log n)\) levels.


case solution conditions
1 \(T(n) = \Theta(n^{log_b^a})\) \(f(n) = \Theta(n^{log_b^{a-\epsilon}})\) for some constant \(ε > 0\)
2 \(T(n) = \Theta(n^{log_b^a}log^{k+1}n)\) \(T(n) = \Theta(n^{log_b^a}log^{k}n)\) for some constant \(k ≥ 0\)
3 \(T(n) = \Theta(f(n))\) \(f(n) = \Theta(n^{log_b^{a+\epsilon}})\) for some constant \(ε > 0\)
and \(af(n/b) < cf(n)\) for some constant \(0 < c < 1\)
  • The Master Theorem takes on a simpler form when f(n) is a polynomial, such that the recurrence has the from \(T(n) = aT(n/b) + Θ(n^c )\) for some constant \(c ≥ 0\).
case solution conditions intuition
1 \(T(n) = \Theta(n^{log_b^a})\) \(c < log_b^a\) Work done at leaves dominates
2 \(T(n) = \Theta(n^{c}log^{n})\) \(c=log_b^a\) Work balanced across the tree
3 \(T(n) = \Theta(n^c)\) \(c>log_b^a\) Work done at root dominates
  • This special case is straight-forward to prove by substitution (this can be done in recitation).
  • To apply the Master Theorem (or this simpler special case), you should state which case applies, and show that your recurrence relation satisfies all conditions required by the relevant case.
  • There are even stronger (more general) formulas to solve recurrences, but we will not use them in this class.

4 Hashing


Comparison Model

  • In this model, assume algorithm can only differentiate items via comparisons
  • Comparable items: black boxes only supporting comparisons between pairs
  • Comparisons are \(<, ≤, >, ≥, =, \neq\), outputs are binary: True or False
  • Goal: Store a set of n comparable items, support find(k) operation
  • Running time is lower bounded by # comparisons performed, so count comparisons!

Decision Tree

  • Any algorithm can be viewed as a decision tree of operations performed
  • An internal node represents a binary comparison, branching either True or False
  • For a comparison algorithm, the decision tree is binary (draw example)
  • A leaf represents algorithm termination, resulting in an algorithm output
  • A root-to-leaf path represents an execution of the algorithm on some input
  • Need at least one leaf for each algorithm output, so search requires \(≥ n + 1\) leaves

Comparison Search Lower Bound

  • What is worst-case running time of a comparison search algorithm?
  • running time \(≥\) # comparisons \(≥\) max length of any root-to-leaf path \(≥\) height of tree
  • What is minimum height of any binary tree on \(≥ n\) nodes?
  • Minimum height when binary tree is complete (all rows full except last)
  • \(Height ≥ \lceil lg(n + 1) \rceil - 1 = Ω(log n)\), so running time of any comparison sort is \(Ω(log n)\)S
  • Sorted arrays achieve this bound! Yay!
  • More generally, height of tree with \(Θ(n)\) leaves and max branching factor \(b\) is \(Ω(logb n)\)
  • To get faster, need an operation that allows super-constant \(ω(1)\) branching factor. How??

Direct Access Array

  • Exploit Word-RAM \(O(1)\) time random access indexing! Linear branching factor!
  • Idea! Give item unique integer key k in \(\{0, . . . , u - 1\}\), store item in an array at index \(k\)
  • Associate a meaning with each index of array.
  • If keys fit in a machine word, i.e. \(u ≤ 2^w\), worst-case \(O(1)\) find/dynamic operations! Yay!
  • 6.006: assume input numbers/strings fit in a word, unless length explicitly parameterized
  • Anything in computer memory is a binary integer, or use (static) 64-bit address in memory
  • But space \(O(u)\), so really bad if \(n \ll u\) … :(
  • Example: if keys are ten-letter names, for one bit per name, requires \(26^{10} ≈ 17.6\) TB space
  • How can we use less space?


  • Idea! If \(n \ll u\), map keys to a smaller range \(m = Θ(n)\) and use smaller direct access array
  • Hash function: \(h(k) : \{0, . . . , u - 1\} → \{0, . . . , m - 1\}\) (also hash map)
  • Direct access array called hash table, \(h(k)\) called the hash of key k
  • If \(m \ll u\), no hash function is injective by pigeonhole principle
  • Always exists keys \(a, b\) such that \(h(a) = h(b)\) \(→\) Collision! :(
  • Can’t store both items at same index, so where to store? Either:
    • store somewhere else in the array (open addressing)
      • complicated analysis, but common and practical
    • store in another data structure supporting dynamic set interface (chaining)


  • Idea! Store collisions in another data structure (a chain)
  • If keys roughly evenly distributed over indices, chain size is \(n/m = n/Ω(n) = O(1)!\)
  • If chain has \(O(1)\) size, all operations take \(O(1)\) time! Yay!
  • If not, many items may map to same location, e.g. \(h(k) = constant\), chain size is \(Θ(n)\) :(
  • Need good hash function! So what’s a good hash function?

Hash Functions

Division (bad): \(h(k) = k \ \mod \ m\)

  • Heuristic, good when keys are uniformly distributed!
  • $m$ should avoid symmetries of the stored keys
  • Large primes far from powers of 2 and 10 can be reasonable
  • Python uses a version of this with some additional mixing
  • If \(u \ll n\), every hash function will have some input set that will a create \(O(n)\) size chain
  • Idea! Don’t use a fixed hash function! Choose one randomly (but carefully)!

Universal (good, theoretically): \(h_{ab}{(k)}=(((ak+b) \mod \ p) \mod \ m)\)

  • Hash Family $$\mathcal{H}(p,m) = {h_{ab} a,b \in {0,…,p-1} and a \neq 0 }$$
  • Parameterized by a fixed prime \(p > u\), with \(a\) and \(b\) chosen from range \(\{0,...,p-1\}\)
  • \(\mathcal{H}\) is a Universal family: \(\underset{h\in\mathcal{H}}{Pr}\{h(k_i) = h(k_j)\} \leq 1/m \qquad \forall k_i \neq k_j \in \{0,...,u-1\}\)
  • Why is universality useful? Implies short chain lengths! (in expectation)
  • \(X_{ij}\) indicator random variable over \(h \in \mathcal{H}: \ X_{ij}=1 \ if \ h(k_i) = h(k_j), X_{ij} = 0 \ otherwise\)
  • Size of chain at index \(h(k_i)\) is random variable \(X_i = \Sigma_{j}{X_{ij}}\)
  • Expected size of chain at index \(h(k_i)\):
\[\begin{align}\underset{h\in\mathcal{H}}{\mathbb{E}}{X_i} = \underset{h\in\mathcal{H}}{\mathbb{E}} \{\Sigma_{j}{X_{ij}}\} = \Sigma_{j}\underset{h\in\mathcal{H}}{\mathbb{E}}{X_{ij}} & = 1 + \Sigma_{j\neq i}\underset{h\in\mathcal{H}}{\mathbb{E}}{X_{ij}}\\ &= 1 + \Sigma_{j\neq i}(1)\underset{h\in\mathcal{H}}{Pr}\{h(k_i) = h(k_j)\} + (0)\underset{h\in\mathcal{H}}{Pr}\{h(k_i) \neq h(k_j)\}\\ &\leq 1 + \Sigma_{j\neq i} 1/m = 1 + (n-1) /m\end{align}\]
  • Since \(m = Ω(n)\), load factor \(α = n/m = O(1)\), so \(O(1)\) in expectation!


  • If \(n/m\) far from 1, rebuild with new randomly chosen hash function for new size m
  • Same analysis as dynamic arrays, cost can be amortized over many dynamic operations
  • So a hash table can implement dynamic set operations in expected amortized O(1) time! :)
Data Structure API Type     Worst Case \(O(\cdot)\)  
Set Container Static Dynamic    
API build(x) find(k) insert(k)
Array \(n\) \(n\) \(n\) \(n\) \(n\)
Sorted Array \(nlogn\) \(logn\) \(n\) \(1\) \(logn\)
Direct Access Array \(u\) \(1\) \(1\) \(u\) \(u\)
Hash Table \(n_{(e)}\) \(1_{e}\) \(1_{(a)(e)}\) \(n\) \(n\)

5 Linear Sorting


Comparison Sort Lower Bound

  • Comparison model implies that algorithm decision tree is binary (constant branching factor)
  • Requires # leaves L ≥ # possible outputs
  • Tree height lower bounded by $Ω(log L)$, so worst-case running time is $Ω(log L)$
  • To sort array of n elements, # outputs is n! permutations
  • Thus height lower bounded by $log(n!) ≥ log((n/2)^{n/2}) = Ω(n log n) $
  • So merge sort is optimal in comparison model
  • Can we exploit a direct access array to sort faster?

Direct Access Array Sort

  • Example: $[5, 2, 7, 0, 4]$
  • Suppose all keys are unique non-negative integers in range ${0, . . . , u - 1}$, so $n ≤ u$
  • Insert each item into a direct access array with size \(u\) in $Θ(n)$
  • Return items in order they appear in direct access array in $Θ(u)$
  • Running time is $Θ(u)$, which is $Θ(n)$ if $u = Θ(n)$. Yay!
def direct_access_sort(A):
    """Sort A assuming items have distinct non-negative keys."""
    u = 1 + max([x.key for x in A])
    D = [None] * u
    for x in A:
        D[x.key] = x
    i = 0
    for key in range(u):
        if D[key] is not None:
            A[i] = D[key]
            i += 1
  • What if keys are in larger range, like $u = Ω(n^2) < n^2$?
  • Idea! Represent each key $k$ by tuple $(a, b)$ where $k = an + b$ and $0 ≤ b < n$
  • Specifically $a = \lfloor k/n\rfloor < n$ and $b = (k \mod n)$ (just a 2-digit base-n number!)
  • This is a built-in Python operation $(a, b) = divmod(k, n)$
  • Example: $[17, 3, 24, 22, 12] ⇒ [(3,2), (0,3), (4,4), (4,2), (2,2)] ⇒ [32, 03, 44, 42, 22]_{(n=5)}$
  • How can we sort tuples?

Tuple Sort

  • Item keys are tuples of equal length, i.e. item $x.key = (x.k_1, x.k_2, x.k_3, . . .)$.
  • Want to sort on all entries lexicographically, so first key $k_1$ is most significant
  • How to sort? Idea! Use other auxiliary sorting algorithms to separately sort each key
  • (Like sorting rows in a spreadsheet by multiple columns)
  • What order to sort them in? Least significant to most significant!
  • Exercise: $[32, 03, 44, 42, 22] =⇒ [42, 22, 32, 03, 44] =⇒ [03, 22, 32, 42, 44]_{(n=5)}$
  • Idea! Use tuple sort with auxiliary direct access array sort to sort tuples (a, b).
  • Problem! Many integers could have the same a or b value, even if input keys distinct
  • Need sort allowing repeated keys which preserves input order
  • Want sort to be stable: repeated keys appear in output in same order as input
  • Direct access array sort cannot even sort arrays having repeated keys!
  • Can we modify direct access array sort to admit multiple keys in a way that is stable?

Counting Sort

  • Instead of storing a single item at each array index, store a chain, just like hashing!
  • For stability, chain data structure should remember the order in which items were added
  • Use a sequence data structure which maintains insertion order
  • To insert item x, insert_last to end of the chain at index $x.key$
  • Then to sort, read through all chains in sequence order, returning items one by one
def counting_sort(A):
    """Sort A assuming items have non-negative keys."""
    u = 1 + max([x.key for x in A])
    D = [[] for i in range(u)]
    for x in A:
    i = 0
    for chain in D:
        for x in chain:
            A[i] = x
            i += 1

Radix Sort

  • Idea! If $u < n^2$, use tuple sort with auxiliary counting sort to sort tuples (a, b)
  • Sort least significant key b, then most significant key a
  • Stability ensures previous sorts stay sorted
  • Running time for this algorithm is $O(2n) = O(n)$. Yay!
  • If every $key < n^c$ for some positive $c = logn(u)$, every key has at most $c$ digits base \(n\)
  • A c-digit number can be written as a c-element tuple in $O(c)$ time
  • We sort each of the c base-n digits in $O(n)$ time
  • So tuple sort with auxiliary counting sort runs in $O(cn)$ time in total
  • If c is constant, so each key is $≤ n^c$ , this sort is linear $O(n)$!
def radix_sort(A):
    """Sort A assuming items have non-negative keys"""
    n = len(A)
    u = 1 + max([x.key for x in A])
    c = 1 + (u.bit_length() // n.bit_length())
    class Obj: pass
    D = [Obj() for a in A]
    for i in range(n):
        D[i].digits = []
        D[i].item = A[i]
        high = A[i].key
        for j in range(c):
            high, low = divmod(high, n)
	for i in range(c):
        for j in range(n):
            D[j].key = D[j].digits[i]
    for i in range(n);
    	A[i] = D[i].item
Algorithm Time $O(\cdot)$ In-place? Stable? Comments
Insertion Sort $n^2$ Y Y $O(nk)$ for k-proximate
Selection Sort $n^2$ Y N $O(n)$ swaps
Merge Sort \(nlogn\) N Y stable, optimal comparison
Counting Sort $n + u$ N Y $O(n)$ when $u = O(n)$
Radix Sort $n + nlog_n{u}$ N Y $O(n)$ when $u = O(n)$

6 Binary Trees, Part 1


Sequence Data Structure API Type     Worst Case $O(\cdot)$  
Array Container Static Dynamic    
API build(x) get_at(i)
insert_at(i, x)
Static Array \(n\) \(1\) \(n\) \(n\) \(n\)
Linked List \(n\) \(n\) \(1\) \(n\) # 1 if we keep track of tail \(n\)
Dynamic Array \(n\) \(1\) \(n\) $1_{(a)}$ \(n\)
Goal \(n\) \(logn\) \(logn\) \(logn\) \(logn\)
Set Data Structure API Type     Worst Case $O(\cdot)$  
Set Container Static Dynamic    
API build(x) find(k) insert(k)
Array \(n\) \(n\) \(n\) \(n\) \(n\)
Sorted Array \(nlogn\) \(logn\) \(n\) \(1\) \(logn\)
Goal \(nlogn\) \(logn\) \(logn\) \(logn\) \(logn\)

How? Binary Trees!

  • Pointer-based data structures (like Linked List) can achieve worst-case performance
  • Binary tree is pointer-based data structure with three pointers per node
  • Node representation: node.{item, parent, left, right}
  • Example:

example tree

class TreeNode:
    def __init__(self, x):
        self.item = x
        self.left = None
        self.right = None
        self.parent = None


  • The root of a tree has no parent (Ex: <A>)
  • leaf of a tree has no children (Ex: <C>,<E>, and <F>)
  • Define depth(<X>) of node <X> in a tree rooted at <A> to be length of path from <A> to <X>
  • Define height(<X>) of node <X> to be max depth of any node in the subtree rooted at <X>
  • Idea: Design operations to run in $O(h)$ time for root height $h$, and maintain $h = O(log n)$
  • A binary tree has an inherent order: its traversal order (In-order traversal)
    • every node in node <X>’s left subtree is before <X>
    • every node in node <X>’s right subtree is after <X>
def subtree_iter(A):
    if A.left: yield from A.left.subtree_iter()
    yield A
    if A.right: yield from A.right.subtree_iter()
  • List nodes in traversal order via a recursive algorithm starting at root:
    • Recursively list left subtree, list self, then recursively list right subtree
    • Runs in $O(n)$ time, since $O(1)$ work is done to list each node
    • Example: Traversal order is (<F>,<D>,<B>,<E>,<A>,<C>)
  • Right now, traversal order has no meaning relative to the stored items
  • Later, assign semantic meaning to traversal order to implement Sequence/Set interfaces

Tree Navigation

  • Find first node in the traversal order of node <X>’s subtree (last is symmetric)
    • Otherwise, <X> is the first node, so return it
    • Running time is $O(h)$ where h is the height of the tree
    • Example: first node in <A>’s subtree is <F>
def subtree_first(A):
    if A.left: return A.left.subtree_first()
    return A

def subtree_last(A):
    if A.right: return A.right.subtree_last()
    return A
  • Find successor of node <X> in the traversal order (predecessor is symmetric)
    • If <X> has right child, return first of right subtree
    • Otherwise, return lowest ancestor of <X> for which <X> is in its left subtree
    • Running time is $O(h)$ where $h$ is the height of the tree
    • Example: Successor of: <B> is <E>, <E> is <A>, and <C> is None.
def successor(A):
    if A.right: return A.right.subtree_first()
    while A.parent and (A is A.parent.right):
        A = A.parent
	return A.parent

def predecessor(A):
    if A.left: return A.left.subtree_last()
    while A.parent and (A is A.parent.left):
        A = A.parent
    return A.parent

Dynamic Operations

  • Change the tree by a single item (only add or remove leaves):
    • add a node after another in the traversal order (before is symmetric)
    • remove an item from the tree
  • Insert node <Y> after <X> in the traversal order
    • If <X> has no right child, make <Y> the right child of <X>
    • Otherwise, make <Y> the left child of <X>’s successor (which cannot have a left child)
    • Running time is $O(h)$ where $h$ is the height of the tree


def subtree_insert_before(A, B):
    if A.left:
        A = A.left.subtree_last()
        A.right, B.parent = B, A
        A.left, B.parent = B, A
def subtree_insert_after(A, B):
    if A.right:
        A = A.right.subtree_first()
        A.left, B.parent = B, A
        A.right, B.parent = B, A
  • Delete the item in node <X> from <X>’s subtree
    • If <X> is a leaf, detach from parent and return
    • Otherwise, <X> has a child
      • If <X> has a left child, swap items with the predecessor of <X> and recurse
      • Otherwise <X> has a right child, swap items with the successor of <X> and recurse
    • Running time is $O(h)$ where $h$ is the height of the tree


def subtree_delete(A):
    if A.left or A.right:   # A is a leaf node
        if A.left: B = A.predecessor()
        else:      B = A.successor()
        A.item, B.item = B.item, A.item
    if A.parent:            # A is not a leaf node
        if A.parent.left is A: A.parent.left = None
        else:                  A.parent.right = None
	return A

Application: Set

  • Idea! Set Binary Tree (a.k.a Binary Search Tree / BST)
  • Traversal order(In-order) is sorted order increasing by key
    • Equivalent to BST Property: for every node, every key in left subtree $\leq$ node’s key $\leq$ every key in right subtree
  • Then can find the node with key $k$ in node <X>’s subtree in $O(h)$ time like binary search:
    • If $k$ is smaller than the key at <X>, recurse in left subtree (or return None)
    • If $k$ is larger than the key at <X>, recurse in right subtree (or return None)
    • Otherwise, return the item stored at <X>
  • Other Set operations follow a similar pattern
class BSTNode(TreeNode):
    def subtree_find(A, k):
        if k == A.item.key: return A
        if k < A.item.key and A.left: return A.left.subtree_find(k)
        if k > A.item.key and A.right: return A.right.subtree_find(k)
    def subtree_find_next(A, k):
        if A.item.key <= k:
            if A.right: return A.right.subtree_find_next(k)
            else:       return None
        if A.item.key > k:
            if A.left:
				B = A.left.subtree_find_next(k)
                if B: return B
        return A
    def subtree_find_prev(A, k):
        if A.item.key >= k:
            if A.left: return A.left.subtree_find_prev(k)
            else:      return None
        if A.item.key < k:
            if A.right:
                B = A.right.subtree_find_prev(k)
                if B: return B
		return A

    def subtree_insert(A, B):
        if B.item.key < A.item.key:
            if A.left: A.left.subtree_insert(B)
			else:      A.subtree_insert_before(B)
		elif B.item.key > A.item.key:
            if A.right: A.right.subtree_insert(B)
            else:       A.subtree_insert_after(B)
		else: A.item = B.item

class BinaryTree:
    def __init__(self, node_type=BinaryNode):
        self.root = None
        self.size = 0
        self.node_type = node_type
    def __len__(self): return self.size
    def __iter__(self): 
        if self.root:
            for item in self.root.subtree_iter():
                yield node.item
class BinaryTreeSet(BinaryTree):
    def __init__(self): 
    def iter_order(self): yield from self
    def build(self, X):
        for x in X: self.insert(x)
	def find_min(self):
        if self.root: return self.root.subtree_first().item
    def find_max(self):
        if self.root: return self.root_subtree_last().item
    def find(self, k):
        if self.root:
            node = self.root.subtree_find(k)
            if node: return node.item
	def find_next(self, k):
        if self.root:
            node = self.root.subtree_find_next(k)
            if node: return node.item
	def find_prev(self, k):
        if self.root:
            node = self.root.subtree_find_prev(k)
            if node: return node.item
    def insert(self, x):
        new_node = self.node_type(x)
        if self.root:
            if new_node.parent is None: return False
            self.root = new_node
		self.size += 1
	def delete(self, k):
        assert self.root
        node = self.root.subtree_find(k)
        assert node
        ext = node.subtree_delete()
        if ext.parent is None: self.root = None
		self.size -= 1
		return ext.item

Application: Sequence

  • Idea! Sequence Binary Tree: Traversal order is sequence order
  • How do we find $i^th$ node in traversal order of a subtree? Call this operation subtree_at(i)
  • Could just iterate through entire traversal order, but that’s bad, $O(n)$
  • However, if we could compute a subtree’s size in $O(1)$, then can solve in $O(h)$ time
    • How? Check the size $n_L$ of the left subtree and compare to $i$
    • If $i < n_L$, recurse on the left subtree
    • If $i > n_L$, recurse on the right subtree with $i’ = i - n_L - 1$
    • Otherwise, $i = n_L$, and you’ve reached the desired node!
  • Maintain the size of each node’s subtree at the node via augmentation
    • Add node.size field to each node
    • When adding new leaf, add $+1$ to a.size for all ancestors a in $O(h)$ time
    • When deleting a leaf, add $-1$ to a.size for all ancestors a in $O(h)$ time
  • Sequence operations follow directly from a fast subtree_at(i) operation
  • Naively, build(X) takes $O(nh)$ time, but can be done in $O(n)$ time; see recitation

7 Binary Tree II: AVL


Height Balance

  • How to maintain height $h=O(logn)$ where \(n\) is number of nodes in tree?
  • A binary tree that maintains $O(logn)$ height under dynamic operations is called balanced
    • There are many balancing schemes (Red-Black Trees, Splay Trees, 2-3 Trees, …)
    • First proposed balancing scheme was the AVL Tree(Adelson-Velsky and Landis, 1962)


  • Need to reduce height of tree without changing its traversal order, so that we represent the same sequence of items.

  • How to change the structure of a tree, while preserving traversal order? Rotations!

    tree rotations

  • A rotation relinks $O(1)$ pointers to modify tree structure and maintains traversal order

def subtree_rotate_right(D):
    assert D.left
    B, E = D.left, D.right
    A, C = B.left, B.right
    # make sure new B has the right connection to D's parent
    D, B = B, D
    D.item, B.item = D.item, B.item
    B.left, B.right = A, D
    D.left, D.right = C, E
    if A: A.parent = B
    if E: E.parent = D
def subtree_rotate_left(B):
    assert B.right
    A, D = B.left, B.right
    C, E = D.left, D.right
    B, D = D, B
    B.item, D.item = D.item, B.item
    D.left, D.right = B, E
    B.left, B.right = A, C
    if A: A.parent = B
    if E: E.parent = D

Rotations Suffice

  • Claim: $O(n)$ rotations can transform a binary tree to any other with same traversal order
  • Proof: Repeatedly perform last possible right rotation in traversal order; resulting tree is a canonical chain. Each rotation increases depth of the last node by 1. Depth of last node in final chain is $n - 1$, so at most $n-1$ rotations are performed. Reverse canonical rotations to reach target tree. Q.E.D
  • Can maintain height-balance by using $O(n)$ rotations to fully balance the tree, but slow :(
  • We will keep the tree balanced in $O(logn)$ time per operation!

AVL Trees: Height Balance

  • AVL trees maintain height-balance (also called the AVL property)
    • A node is height-balanced if heights of its left and right subtree differ by at most 1
    • Let skew of a node be the height of its right subtree minus that of its left subtree
    • Then a node is height-balanced if its skew is $-1,0$ or \(1\)
  • Claim: A binary tree with height-balanced nodes has height $h=O(logn)$ (i.e., $n=2^{\Omega(h)}$)

  • Proof: Suffices to show fewest nodes $F(h)$ in any height $h$ tree is $F(h)=2^{\Omega(h)}$ \(F(0) = 1, F(1)=2, F(h)=1+F(h-1)+F(h-2)\geq 2F(h-2))\implies F(h)\geq 2^{h/2}\qquad \square\)

  • Suppose adding or removing leaf from a height-balanced tree results in imbalance

    • Only subtree of the leaf’s ancestors have changed in height or skew
    • Heights changed by only $\pm 1$, so skews still have magnitude $\leq 2$
    • Idea: Fix height-balance of ancestors starting from leaf up to the root
    • Repeatedly rebalanced lowest ancestor that is not height-balanced, wlog assume skew 2
  • Local Rebalance: Given binary tree node <B>:
    • whose skew 2 and
    • every other node in <B>’s subtree is height-balanced
    • then <B>’s subtree can be made height-balanced via one or two rotations
    • (after which <B?’s height is the same or one less than before)
  • Proof:
    • Since skew of <B> is 2, <B>?’s right child exists
    • Case 1: skew of <F> is 0 or Case 2: skew of <F> is 1
      • Perform a left rotation on <B>


Computing Height

  • How to tell whether node is height-balanced? Compute heights of subtrees!
  • How to compute the height of node <X> ? Naive algorithm:
    • Recursively compute height of the left and right subtrees of <X>
    • Add \(1\) to the max of the two heights
    • Runs in $Ω(n)$ time, since we recurse on every node :(
  • Idea: Augment each node with the height of its subtree! (Save for later!)
  • Height of <X> can be computed in $O(1)$ time from the heights of its children:
    • Look up the stored heights of left and right subtrees in $O(1)$ time
    • Add \(1\) to the max of the two heights
  • During dynamic operations, we must maintain our augmentation as the tree changes shape
  • Recompute subtree augmentations at every node whose subtree changes:
    • Update relinked nodes in a rotation operation in $O(1)$ time (ancestors don’t change)
    • Update all ancestors of an inserted or deleted node in $O(h)$ time by walking up the tree

Steps to Augment a Binary Tree

  • In general, to augment a binary tree with a subtree property P, you must:
    • State the subtree property P(<X>) you want to store at each node <X>
    • Show how to compute P(<X>) from the augmentations of <X>’s children in $O(1)$ time
    • Then stored property P(<X>) can be maintained without changing dynamic operation costs

Application: Sequence

  • For sequence binary tree, we needed to know subtree sizes
  • For just inserting/deleting a leaf, this was easy, but now need to handle rotations
  • Subtree size is a subtree property, so can maintain via augmentation
    • Can compute size from sizes of children by summing them and adding 1


  • Set AVL trees achieve $O(log n)$ time for all set operations
  • except $O(n log n)$ time for build and $O(n)$ time for iter
  • Sequence AVL trees achieve $O(log n)$ time for all sequence operations
  • except $O(n)$ time for build and iter

Application: Sorting

  • Any Set data structure defines a sorting algorithm: build (or repeatedly insert) then iter
  • For example, Direct Access Array Sort from Lecture 5
  • AVL Sort is a new $O(n log n)$ time sorting algorithm

8 Binary Heaps


Priority Queue Interface

  • Keep track of many items, quickly access/remove the most important

    • Example: router with limited bandwidth, must prioritize certain kinds of messages
    • Example: process scheduling in operating system kernels
    • Example: discrete-event simulation (when is next occurring event?)
    • Example: graph algorithms (later in the course)
  • Order items by key = priority so Set interface (not Sequence interface)

  • Optimized for a particular subset of Set operations:

    Operation Specification
    build(X) build priority queue from iterable X
    insert(x) add item x to data structure
    delete_max() remove and return stored item with largest key
    find_max() return stored item with largest key
  • (Usually optimized for max or min, not both)
  • Focus on insert and delete_max operations: build can repeatedly insert; find_max() can insert(delete_min())
class PriorityQueue:
    def __init__(self):
        self.A = []
    def insert(self, x):
    def delete_max(self):
        assert len(self.A) > 0
        return self.A.pop()   # not correct by it self.
    def sort(PQ, A):
        pq = PQ()
        for x in A: pq.insert(x)
        out = [pq.delete_max() for _ in A]
        return reversed(out) 

Priority Queue Sort

  • Any priority queue data structure translates into a sorting algorithm:
    • build(A), e.g., insert items one by one in input order
    • Repeatedly delete_min() (or delete_max()) to determine (reverse) sorted order
  • All the hard work happens inside the data structure
  • Running time is $T_{build} + n · T_{delete_max} ≤ n · T_{insert} + n · T_{delete_max}$
  • Many sorting algorithms we’ve seen can be viewed as priority queue sort:
Priority Queue Data Structure   Operations $O(\cdot)$   Priority Queue Sort   Algorithm
  build(A) insert(x) delete_max() Time In-place?  
Dynamic Array \(n\) $1_{(a)}$ \(n\) $n^2$ Y Selection Sort
Sorted Dynamic Array \(nlogn\) \(n\) $1_{(a)}$ $n^2$ Y Insertion Sort
Set AVL Tree \(nlogn\) \(logn\) \(logn\) \(nlogn\) N AVL Sort
Goal \(n\) $logn_{(a)}$ $logn_{(a)}$ \(nlogn\) Y Heap Sort

Priority Queue: Set AVL Tree

  • Set AVL trees support insert(x), find_min(), find_max(), delete_min(), and delete_max() in $O(log n)$ time per operation
  • So priority queue sort runs in $O(n log n)$ time
    • This is (essentially) AVL sort from Lecture 7
  • Can speed up find_min() and find_max() to $O(1)$ time via subtree augmentation
  • But this data structure is complicated and resulting sort is not in-place
  • Is there a simpler data structure for just priority queue, and in-place $O(n lg n)$ sort? YES, binary heap and heap sort
  • Essentially implement a Set data structure on top of a Sequence data structure (array), using what we learned about binary trees

Priority Queue: Array

  • Store elements in an unordered dynamic array
  • insert(x): append x to end in amortized $O(1)$ time
  • delete_max(): find max in $O(n)$, swap max to the end an d remove
  • insert is quick, but delete_max is slow
  • Priority queue sort is selection sort! (plus some copying)
class PQArray(PriorityQueue):
    def delete_max(self):  # O(n)
        n, A, m = len(self.A), self.A, 0
        for i in range(1, n):
            m = i if A[m].key < A[i].key else m
        A[m], A[n] = A[n], A[m]
        return super().delete_max()  # pop from end

We use *args to allow insert to take one argument (as makes sense now) or zero arguments; we will need the latter functionality when making the priority queues in-place.

Priority Queue: Sorted Array

  • Store elements in a sorted dynamic array
  • insert(x): append x to end, swap down to sorted position in $O(n)$ time
  • ` delete_max()`: delete from end in $O(1)$ amortized
  • delete_max is quick, but insert is slow
  • Priority queue sort is insertion sort! (plus some copying)
  • Can we find a compromise between these two array priority queue extremes?
class PQSortedArray(PriorityQueue):
    def insert(self, x=None):
        if x is not None: super().insert(x)
        i, A = len(self.A) - 1, self.A
        while 0 < i and A[i+1].key < A[i].key:
            A[i+1], A[i] = A[i], A[i+1]
           	i -= 1

Array as a Complete Binary Tree

  • Idea: interpret an array as a complete binary tree, with maximum $2^i$ nodes at depth $i$ except at the largest depth, where all nodes are left-aligned
  • Equivalently, complete tree is filled densely in reading order: root to leaves, left to right
  • Perspective: bijection between arrays and complete binary trees
  • Height of complete tree perspective of array of \(n\) item is $\lceil log n \rceil$, so balanced binary tree



Implicit Complete Tree

  • Complete binary tree structure can be implicit instead of storing pointers
  • Root is at index 0
  • Compute neighbors by index arithmetic:
\[\begin{align} left(i) &=2i+1 \\ right(i) &=2i+2 \\ parent(i) &= \lfloor \frac{i-1}{2} \rfloor \end{align}\]

Binary Heaps

  • Idea: keep larger elements higher in tree, but only locally
  • Max-Heap Property at node $i: Q[i] ≥ Q[j] for j ∈ {left(i),right(i)}$
  • Max-heap is an array satisfying max-heap property at all nodes
  • Claim: In a max-heap, every node i satisfies $Q[i] ≥ Q[j]$ for all nodes $j$ in subtree(i)
  • Proof:
    • Induction on $d = depth(j) - depth(i)$
    • Base case: $d = 0$ implies $i = j$ implies $Q[i] ≥ Q[j]$ (in fact, equal)
    • $depth(parent(j)) - depth(i) = d - 1 < d$, so $Q[i] ≥ Q[parent(j)]$ by induction
    • $Q[parent(j)] ≥ Q[j]$ by Max-Heap Property at parent(j)
  • In particular, max item is at root of max-heap
def parent(i):
    p = (i - 1) // 2
    return p if 0 < i else i

def left(i, n):
    l = 2 * i + 1
    return l if l < n else i

def right(i, n):
    r = 2 * i + 2
    return r if r < n else i

Heap Insert

  • Append new item $x$ to end of array in $O(1)$ amortized, making it next leaf $i$ in reading order
  • max_heapify_up(i): swap with parent until Max-Heap Property
    • Check whether $Q[parent(i)] \geq Q[i]$ (part of Max-Heap Property at parent(i))
    • If not, swap items $Q[i]$ and $Q[parent(i)]$, and recursively max_heapify_up(parent(i))
  • Correctness:
    • Max-Heap Property guarantees all nodes $\geq$ descendants, except $Q[i]$ might be > some of its ancestors (unless $i$ is the root, so we’re done)
    • If swap is necessary, same guarantee is true with $Q[parent(i)]$ instead of $Q[i]$
  • Running time: height of tree, so $\Theta(\log n)$

Heap Delete Max

  • Can only easily remote last element from dynamic array, but max key is in root of tree
  • So swap item at root node $i = 0$ with last item at node $n-1$ in heap array
  • max_heapify_down(i): swap root with larger child until Max-Heap Property
    • Check whether $Q[i] \geq Q[j] \ for \ j \in {left(i), right(i)}$ (Max-Heap Property at i)
    • If not, swap $Q[i]$ with $Q[j]$ for child $j \ \in {left(i), right(i)}$ with maximum key, and recursively max_heapify_down(j)
  • Correctness:
    • Max-Heap Property guarantees all nodes $\geq$ descendants, except $Q[i]$ might be < some descendants (unless $i$ is a leaf, so we’re done)
    • If swap is necessary, same guarantee is true with $Q[j]$ instead of $Q[i]$
  • Running time: height of tree, so $\Theta(\log n)$
class PQHeap(PriorityQueue):
    def insert(self, x=None):
        if x: super().insert(x)
		n, A = self.n, self.A
        max_heapify_up(A, n, n-1)
    def delete_max(self):
        n, A = self.n, self.A
        A[0], A[n] = A[n], A[0]
        max_heapify_down(A, n, 0)
        return super().delete_max()
def max_heapify_up(A, n, c):
    p = parent(c)
    if A[p].key < A[c].key:
        A[c], A[p] = A[p], A[c]
        max_heapify_up(A, n, p)
def max_heapify_down(A, n, p):
    l, r = left(p, n), right(p, n)
    c = l if A[r].key < A[l].key else r
    if A[p].key < A[c].key:
        A[c], A[p] = A[p], A[c]
        max_heapify_down(A, n, c)    

Heap Sort

  • Plugging max-heap into priority queue sort gives us a new sorting algorithm
  • Running time is $O(n log n)$ because each insert and delete_max takes $O(log n)$
  • But often include two improvements to this sorting algorithm:

In-place Priority Queue Sort

  • Max-heap $Q$ is a prefix of a larger array $A$, remember how many items $ Q $ belong to heap
  • $ Q $ is initially zero, eventually $ A $ (after inserts), then zero again (after deletes)
  • insert() absorbs next item in array at index $ Q $ into heap
  • delete_max() moves max item to end, then abandons it by decrementing $ Q $
  • In-place priority queue sort with Array is exactly Selection Sort
  • In-place priority queue sort with Sorted Array is exactly Insertion Sort
  • In-place priority queue sort with binary Max Heap is Heap Sort
class PriorityQueue:
    def __init__(self, A):
        self.n, self.A = 0, A
    def insert(self):
        assert self.n < len(self.A)
        self.n += 1
    def delete_max(self):
        assert self.n >= 1
        self.n -= 1
    def sort(Queue, A):
        pq = Queue(A)
        for i in range(len(A)): pq.insert()
        for i in range(len(A)): pq.delete_max()
        return pq.A

Linear Build Heap

  • Inserting \(n\) items into heap call max_heapify_up(i) for $i$ from $0$ to $n-1$ (root down): \(worst-case \ swaps \approx \Sigma_{i=0}^{n-1}depth(i)=\Sigma_{i=0}^{n-1}logi=log(n!)\geq(n/2)log(n/2)=\Omega(nlogn)\)

  • Idea! Treat full array as a complete binary tree from start, then max_heapify_down(i) for i from $n-1$to $0$ (leaves up): \(worst-case \ swaps \approx \Sigma_{i=0}^{n-1}height(i)=\Sigma_{i=0}^{n-1}(logn -logi)=log(\frac{n^n}{n!})=\Theta(log(\frac{n^n}{\sqrt{n}(n/e)^n}))=O(n)\)

  • So can build heap in $O(n)$ time

  • (Doesn’t speed up $O(nlogn)$ performance of heap sort)

def build_max_heap(A):
    n = len(A)
    for i in range(n // 2, -1, -1):
        max_heapify_down(A, n, i)

Sequence AVL Tree Priority Queue

  • Where else have we seen linear build time for an otherwise logarithmic data structure? Sequence AVL Tree!
  • Store items of priority queue in Sequence AVL Tree in arbitrary order (insertion order)
  • Maintain max (and/or min) augmentation:
    • node.max = pointer to node in subtree of node with maximum key
      • This is a subtree property, so constant factor overhead to maintain
  • find_min() and find _max() in $O(1)$ time
  • delete_min() and delete_max() in $O(log n)$ time
  • build(A) in $O(n)$ time
  • Same bounds as binary heaps (and more)

Set vs. Multiset

  • While our Set interface assumes no duplicate keys, we can use these Sets to implement Multisets that allow items with duplicate keys:
    • Each item in the Set is a Sequence (e.g., linked list) storing the Multiset items with the same key, which is the key of the Sequence
  • In fact, without this reduction, binary heaps and AVL trees work directly for duplicate-key items (where e.g. delete_max deletes some item of maximum key), taking care to use ≤ constraints (instead of < in Set AVL Trees)

9 Breadth-First Search


Graph Applications

  • Why? Graphs are everywhere!
  • any network system has direct connection to graphs
  • e.g., road networks, computer networks, social networks
  • the state space of any discrete system can be represented by a transition graph
  • e.g., puzzle & games like Chess, Tetris, Rubik’s cube
  • e.g., application workflows, specifications

Graph Definitions


  • Graph $G = (V, E)$ is a set of vertices \(V\) and a set of pairs of vertices $E \subseteq V \times X$
  • Directed edges are ordered pairs, e.g., $(u, v)$ for $u, v \in V$
  • Undirected edges are unordered pairs, ${u, v}$ for $u, v \in V$ i.e., $(u, v)$ and $(v, u)$
  • In this class, we assume all graphs are simple:
    • edges are distinct, e.g., $(u, v)$ only occurs once in E (though $(v, u)$ may appear), and
    • edges are pairs of distinct vertices, e.g., $u \neq v$ for all $(u, v) ∈ E$
    • Simple implies $\vert E \vert = O(\vert V \vert ^2)$, since $\vert E \vert ≤ \big(\frac{\vert V \vert }{2}\big)$for undirected, $≤ 2 \big(\frac{\vert V \vert }{2}\big)$ for directed

Neighbor Sets/Adjacencies

  • The outgoing neighbor set of $u ∈ V$ is $Adj^+(u) = {v ∈ V (u, v) ∈ E} $
  • The incoming neighbor set of $u ∈ V$ is $Adj^-(u) = {v ∈ V (v, u) ∈ E}$
  • The out-degree of a vertex $u ∈ V$ is $deg^+(u) = Adj^+(u) $
  • The in-degree of a vertex $u ∈ V$ is $deg^-(u) = Adj^-(u) $
  • For undirected graphs, $Adj^-(u) = Adj^+(u)$ and $deg^-(u) = deg^+(u)$
  • Dropping superscript defaults to outgoing, i.e., $Adj(u) = Adj^+(u)$ and $deg(u) = deg^+(u)$

Graph Representations

  • To store a graph $G = (V, E)$, we need to store the outgoing edges $Adj(u)$ for all $u ∈ V$
  • First, need a Set data structure $Adj$ to map \(u\) to $Adj(u)$
  • Then for each \(u\), need to store $Adj(u)$ in another data structure called an adjacency list
  • Common to use direct access array or hash table for $Adj$, since want lookup fast by vertex
  • Common to use array or linked list for each $Adj(u)$ since usually only iteration is needed
  • For the common representations, $Adj$ has size $Θ(\vert V \vert )$, while each $Adj(u)$ has size $Θ(deg(u))$
  • Since $\Sigma_{u∈V} deg(u) ≤ 2\vert E \vert $ by handshaking lemma, graph storable in $Θ( V + \vert E \vert ) $ space
  • Thus, for algorithms on graphs, linear time will mean $Θ( V + \vert E \vert )$ (linear in size of graph)


  • A path is a sequence of vertices $p = (v_1, v_2, . . . , v_k)$ where $(v_i, v_{i+1}) ∈ E$ for all $1 ≤ i < k$.
  • A path is simple if it does not repeat vertices
  • The length $\mathcal{l}(p)$ of a path $p$ is the number of edges in the path
  • The distance $δ(u, v)$ from $u ∈ V$ to $v ∈ V$ is the minimum length of any path from \(u\) to \(v\)
    • i.e., the length of a shortest path from \(u\) to \(v\)
    • (by convention, $δ(u, v) = ∞$ if \(u\) is not connected to \(v\))

Graph Path Problems

  • There are many problems you might want to solve concerning paths in a graph:
  • SINGLE_PAIR_REACHABILITY(G, s, t): is there a path in $G$ from $s ∈ V$ to $t ∈ V$ ?
  • SINGLE_PAIR_SHORTEST_PATH(G, s, t): return distance $δ(s,t)$, and a shortest path in $G = (V, E)$ from $s ∈ V$ to $t ∈ V$
  • SINGLE_SOURCE_SHORTEST_PATHS(G, s): return $δ(s, v)$ for all $v ∈ V$ , and a shortest-path tree containing a shortest path from $s$ to every $v ∈ V$ (defined below)
  • Each problem above is at least as hard as every problem above it (i.e., you can use a black-box that solves a lower problem to solve any higher problem)
  • We won’t show algorithms to solve all of these problems
  • Instead, show one algorithm that solves the hardest in $O( V + \vert E \vert )$ time!

Shortest Paths Tree

  • How to return a shortest path from source vertex $s$ for every vertex in graph?
  • Many paths could have length $Ω(\vert V \vert )$, so returning every path could require $Ω(\vert V \vert ^2) $ time
  • Instead, for all $v ∈ V$ , store its parent $P(v)$: second to last vertex on a shortest path from $s$
  • Let $P(s)$ be null (no second to last vertex on shortest path from $s$ to $s$)
  • Set of parents comprise a $shortest paths tree$ with $O(|V |)$ size! (i.e., reversed shortest paths back to $s$ from every vertex reachable from $s$)

Breadth-First Search (BFS)

  • How to compute $δ(s, v)$ and $P(v)$ for all $v ∈ V$ ?
  • Store $δ(s, v)$ and $P(v)$ in Set data structures mapping vertices \(v\) to distance and parent
  • (If no path from $s$ to \(v\), do not store \(v\) in $P$ and set $δ(s, v)$ to $∞$)
  • Idea! Explore graph nodes in increasing order of distance
  • Goal: Compute level sets $L_i = {v v ∈ V and d(s, v) = i}$ (i.e., all vertices at distance $i$)
  • Claim: Every vertex $v ∈ L_i$ must be adjacent to a vertex $u ∈ L_{i-1}$ (i.e., $v ∈ Adj(u)$)
  • Claim: No vertex that is in $L_j$ for some $j < i$, appears in $L_i$
  • Invariant: $δ(s, v)$ and $P(v)$ have been computed correctly for all \(v\) in any $L_j$ for $j < i$
  • Base case $(i = 1): L_0 = {s}, δ(s, s) = 0, P(s) = None$
  • Inductive Step: To compute $L_i$:
    • for every vertex \(u\) in $L_{i-1}$:
      • for every vertex $v ∈ Adj(u)$ that does not appear in any $L_j$ for $j < i$:
        • add \(v\) to $L_i$, set $δ(s, v) = i$, and set $P(v) = u$
  • Repeatedly compute $L_i$ from $L_j$ for $j < i$ for increasing $i$ until $L_i$ is the empty set
  • Set $δ(s, v) = ∞$ for any $v ∈ V$ for which $δ(s, v)$ was not set
  • Breadth-first search correctly computes all $δ(s, v)$ and $P(v)$ by induction
  • Running time analysis:
    • Store each $L_i$ in data structure with $Θ( L_i )$ time iteration and $O(1)$ time insertion (i.e., in a dynamic array or linked list)
    • Checking for a vertex \(v\) in any $L_j$ for $j < i$ can be done by checking for \(v\) in $P$
    • Maintain $δ$ and $P$ in Set data structures supporting dictionary ops in $O(1)$ time (i.e., direct access array or hash table)
    • Algorithm adds each vertex \(u\) to $≤ 1$ level and spends $O(1)$ time for each $v ∈ Adj(u)$
    • Work upper bounded by $O(1) × \Sigma_{u\in V} deg(u) = O(\vert E \vert )$ by handshake lemma
    • Spend $Θ( V )$ at end to assign $δ(s, v)$ for vertices $v ∈ V$ not reachable from $s$So
    • breadth-first search runs in linear time! $O( V + \vert E \vert ) $
def bfs(adj, s):
    parent = [None for v in adj]
    parent[s] = s
    levels = [[s]]
    while 0 < len(levels[-1]):
        level = []
        for u in levels[-1]:
            for v in adj[u]:
                if parent[v] is None:
                    parent[v] = u
    return parents

def unweighted_shortest_path(adj, s, t):
    parents = bfs(adj, s)
    if parent[t] is None: return None
    i = t
    path = [t]
    while i != s:
        i = parent[i]
    return reversed(path)

10 Depth-First Search


Depth-First Search (DFS)

  • Searches a graph from a vertex $s$, similar to BFS
  • Solves Single Source Reachability, not Single Source Shortest Paths. Useful for solving other problems (later)!
  • Return (not necessarily shortest) parent tree of parent pointers back to $s$.
  • Idea! Visit outgoing adjacencies recursively, but never revisit a vertex
  • i.e., follow any path until you get stuck, backtrack until finding an unexplored path to explore
  • $P(s) = None$, then run $visit(s)$, where
  • visit(u)
    • for every $v \in Adj(u)$ that does not appear in $P$:
      • set $P(v)= u$ and recursively call visit(v)
    • (DFS finishes visiting vertex \(u\), for use later!)
def dfs(adj, s, parent=None, order=None):
    if parent is None:
        parent = [None for v in adj]
        parent[s] = s
        order = []
    for v in adj[s]:
        if parent[v] is None:
            parent[v] = s
            dfs(adj, v, parent, order)
    return parent, order


  • Claim: DFS visits \(v\) and correctly sets $P(v)$ for every vertex \(v\) reachable from $s$
  • Proof: induct on $k$, for claim on only vertices within distance $k$ from $s$
    • Base case $(k = 0): P(s)$ is set correctly for $s$ and $s$ is visited
    • Inductive step: Consider vertex \(v\) with $δ(s, v) = k’ + 1$
    • Consider vertex \(u\), the second to last vertex on some shortest path from $s$ to \(v\)
    • By induction, since $δ(s, u) = k’$ , DFS visits \(u\) and sets $P(u)$ correctly
    • While visiting \(u\), DFS considers $v ∈ Adj(u)$
    • Either \(v\) is in $P$, so has already been visited, or \(v\) will be visited while visiting \(u\)
    • In either case, \(v\) will be visited by DFS and will be added correctly to $P$ $\square$

Running Time

  • Algorithm visits each vertex \(u\) at most once and spends $O(1)$ time for each $v ∈ Adj(u)$
  • Work upper bounded by $O(1) × \Sigma_{u \in V}deg(u) = O(\vert E \vert )$
  • Unlike BFS, not returning a distance for each vertex, so DFS runs in $O(\vert E \vert )$ time

Full-BFS and Full-DFS

  • Suppose want to explore entire graph, not just vertices reachable from one vertex
  • Idea! Repeat a graph search algorithm $A$ on any unvisited vertex
  • Repeat the following until all vertices have been visited:
    • Choose an arbitrary unvisited vertex $s$, use $A$ to explore all vertices reachable from $s$
  • We call this algorithm Full-A, specifically Full-BFS or Full-DFS if A is BFS or DFS
  • Visits every vertex once, so both Full-BFS and Full-DFS run in $O(\vert V \vert + \vert E \vert )$ time
def full_dfs(adj):
    parent = [None for v in adj]
    order = []
    for v in range(len(adj)):
        if parent[v] is None:
            parent[v] = v
            dfs(adj, v, parent, order)
    return parent, order

DFS Edge Classification

  • Consider a graph edge from vertex \(u\) to \(v\), we call the edge a tree edge if the edge is part of the DFS tree (i.e. $parent[v] = u$)
  • Otherwise, the edge from \(u\) to \(v\) is not a tree edge, and is either:
    • a back edge - \(u\) is a descendant of \(v\)
    • a forward edge - \(v\) is a descendant of \(u\)
    • a cross edge - neither are descendants of each other

Graph Connectivity

  • An undirected graph is connected if there is a path connecting every pair of vertices
  • In a directed graph, vertex \(u\) may be reachable from \(v\), but $v$ may not be reachable from \(u\)
  • Connectivity is more complicated for directed graphs (we won’t discuss in this class)
  • Connectivity(G): is undirected graph G connected?
  • Connected_Components(G): given undirected graph $G = (V, E)$, return partition of $V$ into subsets $V_i ⊆ V$ (connected components) where each $V_i$ is connected in $G$ and there are no edges between vertices from different connected components
  • Consider a graph algorithm $A$ that solves Single Source Reachability
  • Claim: $A$ can be used to solve Connected Components
  • Proof: Run Full-$A$. For each run of $A$, put visited vertices in a connected component $\square$

Topological Sort

  • A Directed Acyclic Graph (DAG) is a directed graph that contains no directed cycle
  • A Topological Order of a graph $G = (V, E)$ is an ordering $f$ on the vertices such that: every $edge (u, v) ∈ E$ satisfies $f(u) < f(v)$
  • Exercise: Prove that a directed graph admits a topological ordering if and only if it is a DAG
  • How to find a topological order?
  • A Finishing Order is the order in which a Full-DFS finishes visiting each vertex in G
  • Claim: If $G = (V, E)$ is a DAG, the reverse of a finishing order is a topological order
  • Proof: Need to prove, for every $edge (u, v) ∈ E$ that \(u\) is ordered before $v$, i.e., the visit to $v$ finishes before visiting \(u\). Two cases:
    • If \(u\) visited before $v$:
      • Before visit to \(u\) finishes, will visit $v$ (via $(u, v)$ or otherwise)
      • Thus the visit to $v$ finishes before visiting \(u\)
    • If $v$ visited before \(u\):
      • \(u\) can’t be reached from $v$ since graph is acyclic
      • Thus the visit to $v$ finishes before visiting \(u\)

Cycle Detection

  • Full-DFS will find a topological order if a graph $G = (V, E)$ is acyclic
  • If reverse finishing order for Full-DFS is not a topological order, then $G$ must contain a cycle
  • Check if $G$ is acyclic: for each edge $(u, v)$, check if $v$ is before \(u\) in reverse finishing order
  • Can be done in $O(\vert E \vert )$ time via a hash table or direct access array
  • To return such a cycle, maintain the set of ancestors along the path back to $s$ in Full-DFS
  • Claim: If $G$ contains a cycle, Full-DFS will traverse an edge from $v$ to an ancestor of $v$
  • Proof: Consider a cycle $(v_0, v_1, . . . , v_k, v_0)$ in $G$
    • Without loss of generality, let $v_0$ be the first vertex visited by Full-DFS on the cycle
    • For each $v_i$, before visit to $v_i$ finishes, will visit $v_{i+1}$ and finish
    • Will consider edge $(v_i, v_{i+1})$, and if $v_{i+1}$ has not been visited, it will be visited now
    • Thus, before visit to $v_0$ finishes, will visit $v_k$ (for the first time, by $v_0$ assumption)
    • So, before visit to $v_k$ finishes, will consider $(v_k, v_0)$, where $v_0$ is an ancestor of $v_k$ $\square$

11 Weighted Shortest Paths


Weighted Graphs

  • A weighted graph is a graph $G = (V, E)$ together with a weight function $w: E \rightarrow Z$

  • i.e., assign each edge $e = (u, v) \in E$ an integer weight: $w(e) = w(u, v)$

  • Many applications for edge weights in a graph:

    • distances in road network
    • latency in network connections
    • strength of a relationship in a social network
  • Two common ways to represent weights computationlly:

    • Inside graph representation: store edge weight with each vertex in adjacency lists
    • Store separate Set data structure mapping each edge to its weight
  • We assume a representation that allows querying the weight of an edge in $O(1)$ time

  • Examples

    weighted graph

Weighted Paths

  • The weight $w(\pi)$ of a path $\pi$ in a weighted graph is the sum of weights of edges in the path
  • The (weighted) shortest path from $s \in V$ to $t \in V$ is path of minimum weight from $s$ to $t$
  • $\delta(s, t) = inf{w(\pi) \ path \ \pi \ from \ s \ to \ t}$ is the shortest-path weight from $s$ to $t$
  • (Often use “distance” for shortest -path weight in weighted graphs, not number of edges)
  • As with unweighted graphs:
    • $\delta(s,t) = \inf$ if no path from $s$ to $t$
    • Subpaths of shortest paths are shortest paths (or else could splice in a shortest path)
  • Why infimum not minimum? Possible that no finite-length minimum-weight path exists
  • When? Can occur if there is a negative-weight cycle in the graph, Ex: $(b,f,g,c,b)$ in $G1$

  • A negative-weight cycle is a path $\pi$ starting and ending at same vertex $w(\pi) < 0$
  • $\delta (s, t) = - \infty$ if there is a path from $s$ to $t$ through a vertex on a negative-weight cycle
  • If this occurs, don’t want a shortest path, but may want the negative-weight cycle

Weighted Shortest Paths Algorithms

  • Already know one algorithm: Breadth-First Search! Runs in $O(\vert V \vert + \vert E \vert )$ time when, e.g.:
    • graph has positive weights, and all weights are the same
    • graph has positive weights, and sum of all weights at most $O(\vert V \vert + \vert E \vert )$
  • For general weighted graphs, we don’t know how to solve SSSP in $O(\vert V \vert + \vert E \vert )$ time
  • But if your graph is a Directed Acyclic Graph you can!
Restrictions   SSSP Algorithm    
Graph Weights Name Running Time $O(\cdot)$  
General Unweighted BFS $\vert V \vert + \vert E \vert $  
DAG Any DAG Relaxation $\vert V \vert + \vert E \vert $  
General Any Bellman-Ford $\vert V \vert \cdot\vert E \vert $  
General Non-negative Dijkstra $\vert V \vert log\vert V \vert + \vert E \vert $  

Shortest-Paths Tree

  • For BFS, we kept track of parent pointers during search. Alternatively, compute them after!
  • If know $\delta(s, v)$ for all vertices $v\in V$, can construct shortest-path tree in $O(\vert V \vert + \vert E \vert )$ time
  • For weighted shortest paths from $s$, only need parent pointers for vertices $v$ with finite $\delta(s, v)$
  • Initialize empty $P$ and set $P(s) = None$
  • For each vertex $u\in V$ where $\delta(s, u)$ is finite:
    • For each outgoing neighbor $v \in Adj^+(u)$:
      • If $P(v)$ not assigned and $\delta(s, v) = \delta(s, u) + w(u, v)$
        • There exits a shortest path through edge $(u, v)$, so set $P(v) = u$
  • Parent pointers may traverse cycles of zero weight. Mark each vertex in such a cycle.
  • For each unmarked vertex $u\in V$ (including vertices later marked):
    • For each $v\in Adj^+(u)$ where $v$ is marked and $\delta(s, v) = \delta(s, u) + w(u, v)$
      • Unmark vertices in cycle containing $v$ by traversing parent pointers from $v$
      • Set $P(v) = u$, breaking the cycle


  • A relaxation algorithm searches for a solution to an optimization problem by starting with a solution that is not optimal, then iteratively improves the solution until it becomes an optimal solution to the original problem.
def try_to_relax(adj, w, d, parent, u, v):
    if d[v] > d[u] + w(u, v):
        d[v] = d[u] + w(u, v)
        parent[v] = u
def general_relax(adj, w, s):
    d = [float('inf') for _ in adj]
    parent = [None for _ in adj]
    d[s], parent[s] = 0, s
    while some_edge_relaxable(adj, w, d):
        (u, v) = get_relaxable_edge(adj, w, d)
        try_to_relax(adj, w, d, parent, u, v)
    return d, parent

DAG Relaxation

  • Idea! Maintain a distance estimate $d(s, v)$ (initially $\infty$ ) for each vertex $v\in V$, that always upper bounds true distance $\delta(s, v)$, then gradually lowers until $d(s, v) = \delta(s, v)$
  • When do we lower? When an edge violates the triangle inequality!
  • Triangle Inequality: the shortest-path weight from \(u\) to $v$ cannot be greater than the shortest path from \(u\) to $v$ through another vertex $x$, i.e., $\delta(u, v) \neq \delta(u, x) + \delta(x, v)$ for all $u, v, x \in V$
  • If $d(s, v) > d(s, u) + w(u, v)$ for some edge $u, v$, then triangle inequality is violated :(
  • Fix by lowering $d(s, v)$ to $d(s, u) + w(u, v)$, i.e., relax $(u, v)$ to satisfy violated constraint
  • Claim: Relaxation is safe: maintains that each $d(s, v)$ is weight of a path to $v$ (or $\infty$) $\forall v \in V$
  • Proof: Assume $d(s, v’)$ is weight of a path (or $\infty$) for $\forall v’ \in V$. Relaxing some edge $(u, v)$ sets $d(s, v)$ to $d(s, u) + w(u, v)$, which is the weight of a path from $s$ to $v$ through \(u\) $\qquad \square$
  • Set $d(s, v) = \infty$ for all $v \in V$, then set $d(s, s) = 0$
  • Process each vertex \(u\) in a topological sort order of G:
    • For each outgoing neighbor $v \in Adj^+(u)$:
      • If $d(s, v) > d(s, u) + w(u, v)$
        • relax edge $(u, v)$, i.e., set $d(s, v) = d(s, u) + w(u, v)$
def DAGRelaxation(adj, w, s):
    _, order = dfs(adj, s)
    d = [float('inf') for _ in adj]
    parent = [None for _ in adj]
    d[s], parent[s] = 0, s
    for u in order:
        for v in adj[u]:
            try_to_relax(adj, w, d, parent, u, v)
	return d, parent


  • Claim: At end of DAG Relaxation: $d(s, v) = \delta(s, v)$ for all $v\in V$
  • Proof: Induct on $k$: $d(s, v) = \delta(s, v)$ for all $v$ in first $k$ vertices in topological order
    • Base case: Vertex $s$ and every vertex before $s$ in topological order satisfies claim at start
    • Inductive Step: Assume claim holds for first $k’$ vertices, let $v$ be the $(k’+1)^{th}$
    • Consider a shortest path from $s$ to $v$, and let \(u\) be the vertex preceding $v$ on path
    • \(u\) occurs before $v$ in topological order, so $d(s, u) = \delta(s, u)$ by induction
    • When processing $u, d(s, v)$ is set to be no larger than $\delta(s, u) + w(u, v) = \delta(s, v)$
    • But $d(s, v) \geq \delta(s, v)$ since relaxation is safe, so $d(s, v) = \delta(s, v)$
  • Alternatively:
    • For any vertex $v$, DAG relaxation sets $d(s, v) = min { d(s,u) + w(u, v) u \in dj^-(v) }$
    • Shortest path to $v$ must pass through some incoming neighbor \(u\) of $v$
    • So if $d(s, u) = \delta(s, u)$ for all $u \in Adj^-(v)$ by induction, then $d(s, v) = \delta(s, v)$

Running Time

  • Initialization takes $O(\vert V \vert )$ time, and Topological Sort takes $O(\vert V \vert + \vert E \vert )$ time
  • Additional work upper bounded by $O(1) \times \Sigma_{u\in V}deg^+(u) = O(\vert E \vert )$
  • Total running time is linear, $O(\vert V \vert + \vert E \vert )$

12 Bellman-Ford


Simple Shortest Paths

  • If graph contains cycles and negative weights, might contain negative-weight cycles :(
  • If graph does not contain negative-weight cycles, shortest paths are simple!
  • Claim 1: If $\delta(s, v)$ is finite, there exists a shortest path to $v$ that is simple
  • Proof: By contradiction:
    • Suppose no simple shortest path: let $\pi$ be a shortest path with fewest vertices
    • $\pi$ not simple, so exists cycle $C$ in $\pi$; $C$ has non-negative weight (or else $\delta(s,v) = -\infty$)
    • Removing $C$ form $\pi$ forms path $\pi’$ with fewest vertices and weight $w(\pi’) \leq w(\pi)$
  • Since simple paths cannot repeat vertices, finite shortest paths contain at most $\vert V \vert - 1$ edges

Negative Cycle Witness

  • k-Edge Distance $\delta_k(s, v)$: the minimum weight of any path from $s$ to $v$ using $\leq k$ edges
  • Idea! Compute $\delta_{\vert V \vert -1}(s, v)$ and $\delta_{\vert V \vert }(s, v)$ for all $v \in V$
    • If $\delta(s, v)\neq - \infty, \delta(s, v) = \delta_{\vert V \vert -1}(s,v)$, since a shortest path is simple (or nonexistent)
    • If $\delta_{\vert V \vert }(s,v) < \delta_{\vert V \vert - 1}(s, v)$
      • there exists a shorter non-simple path to $v$, so $\delta_{\vert V \vert }(s, v) = -\infty$
      • call $v$ a (negative cycle) witness
    • However, there may be vertices with $-\infty$ shortest-path weight that are not witness
  • Claim 2: if $\delta(s,v) = - \infty$, then $v$ is reachable from a witness
  • Proof: Suffices to prove: every negative-weight cycle reachable from s contains a witness
    • Consider a negative-weight cycle $C$ reachable from $s$
    • For $v ∈ C$, let $v’ ∈ C$ denote $v$’s predecessor in $C$, where $\Sigma_{v∈C} w(v’ , v) < 0$
    • Then $\delta_{\vert V \vert }(s, v) \leq \delta_{\vert V \vert - 1}(s, v’) + w(v’, v)$ (RHS weight of some path on $\leq \vert V \vert $ vertices)
    • so $\Sigma_{v\in C}\delta_{\vert V \vert }(s, v) \leq \Sigma_{v\in C}\delta_{\vert V \vert -1}(s, v’) + \Sigma_{v\in C}w(v’, v) < \Sigma_{v\in C}\delta_{\vert V \vert - 1}(s, v’)$
    • If $C$ contains no witness, $\delta_{\vert V \vert }(s, v) \geq \delta_{\vert V \vert - 1}(s, v)$ for all $v \in C$, a contradiction $\qquad \square$


  • Idea! Use graph duplication: make multiple copies (or levels) of the graph
  • $\vert V \vert + 1$ levels: vertex $v_k$ in level $k$ represents reaching vertex $v$ from $s$ using $\leq k$ edges
  • If edges only increase in level, resulting graph is a DAG!
  • Construct new DAG $G’ = (V’, E’)$ from $G = (V, E)$
    • $G’$ has $\vert V \vert (\vert V \vert + 1)$ vertices $v_k$ for all $v\in V$ and $k \in {0, …, \vert V \vert }$
    • $G’$ has $\vert V \vert (\vert V \vert + \vert E \vert )$ edges:
      • \(\vert V \vert\) edges $(v_{k-1}, v_k)$ for $k \in {1, …, \vert V \vert }$ of weight zero for each $v \in V$
      • \(\vert V \vert\) edges $(u_{k-1}, v_k)$ for $k \in {1, …, \vert V \vert }$ of weight $w(u, v)$ for each $(u, v) \in E$
  • Run DAG Relaxation on $G’$ from $s_0$ to compute $\delta(s_0, v_k)$ for all $v_k \in V’$
  • For each vertex: set $d(s, v) = \delta(s_0, v_{ v-1 })$
  • For each witness $u \in V$ where $\delta (s_0, u_{\vert V \vert }) < \delta(s_0, u_{\vert V \vert - 1})$
    • For each vertex $v$ reachable from \(u\) in $G$:
      • set $d(s, v) = -\infty$


INF = float('inf')

def bellman_ford(adj, w, s):
    # initialization
	d = [INF for _ in adj]     
    parent = [None for _ in adj]
    d[s], parent[s] = 0, s
    V = len(adj)
    # construct shortest paths in rounds
    for k in range(V-1):
        for u in range(V):
            for v in adj[u]:
                try_to_relax(adj, w, d, parent, u, v)
    # check for negative weight cycles accessible from s
    for u in range(V):
        for v in adj[u]:
            if d[v] > d[u] + w(u, v):
            	raise Exception("found a negative weight in cycle!")
	return d, parent


Running Time

  • $G’$ has size $O( V ( V + \vert E \vert ))$ and can be constructed in as much time
  • Running DAG Relaxation on $G’$takes linear time in the size of $G’$
  • Does $O(1)$ work for each vertex reachable from a witness
  • Finding reachability of a witness takes $O(\vert E \vert )$ time, with at most $O(\vert V \vert )$ witnesses: $O(\vert V \vert \vert E \vert )$
  • (Alternatively, connect super node $x$ to witnesses via 0-weight edges, linear search from $x$)
  • Pruning $G$ at start to only subgraph reachable from $s$ yields $O(\vert V \vert \vert E \vert )$ time algorithm

13 Dijkstra’s Algorithm


Non-negative Edge Weights

  • Idea! Generalize BFS approach to weighted graphs:
    • Grow a sphere centered at source s
    • Repeatedly explore closer vertices before further ones
    • But how to explore closer vertices if you don’t know distances beforehand? :(
  • Observation 1: If weights non-negative, monotonic distance increasing along shortest paths
    • i.e., if vertex \(u\) appears on a shortest path from $s$ to $v$, then $\delta(s, u) \leq \delta(s, v)$
    • Let $V_x \subset V$ be the subset of vertices reachable within distance $\leq x$ from $s$
    • If $v \in V_x$ then any shortest path from $s$ to $v$ only contains vertices from $V_x$
    • Perhaps grow $V_x$ one vertex at a time! (but growing for every $x$ is slow if weights large)
  • Observation 2: Can solve SSSP fast if given order of vertices in increasing distance from $s$
    • Remove edges that go against this order (since cannot participate in shortest paths)
    • May still have cycles if zero-weight edges: repeatedly collapse into single vertices
    • Compute $\delta(s, v)$ for each $v\in V$ using DAG relaxation in $O(\vert V \vert + \vert E \vert )$ time

Dijkstra’s Algorithm

  • Idea! Relax edges from each vertex in increasing order of distance from source $s$

  • Idea! Efficiently find next vertex in the order using a data structure

  • Changeable Priority Queue $Q$ on items with keys and unique IDs, supporting operations:

    Operation Specification
    Q.build(X) initialize $Q$ with items in iterator X
    Q.delete_min() remove an item with minimum key
    Q.decrease_key(id, k) find stored item with ID id and change key to k
  • Implement by cross-linking a Priority Queue $Q’$ and a Dictionary $D$ mapping IDs into $Q’$

  • Assume vertex IDs are integers from $0$ to $\vert V \vert -1$ so can use a direct access array for D

  • For brevity, say item x is the tuple $(x.id, x.key)$

  • Set $d(s, v) = \infty$ for all $v \in V$, then set $d(s,s) = 0$

  • Build changeable priority queue $Q$ with an item $(v, d(s, v))$ for each vertex $v \in V$

    • For vertex $v$ in outgoing adjacencies $Adj^+(u)$:
      • If $d(s, v)>d(s,u) + w(u, v)$:
        • Relax edge $(u, v)$, i.e., set $d(s, v) = d(s, u) + w(u, v)$
        • Decrease the key of $v$ in $Q$ to new estimate $d(s, v)$


def dijkstra(adj, w, s):
    d = [INF for _ in adj]
    parent = [None for _ in adj]
    d[s], parent[s] = 0, s
    Q = PriorityQueue()    
    V = len(adj)
    for v in range(V):
        Q.insert(v, d[v])   # label and key
    for _ in range(V):
        u = Q.extract_min() # get label for item with min key
       	for v in adj[u]:
            try_to_relax(adj, w, d, parent, u, v)
            Q.decrease_key(v, d[v]) # alter key for given label
	return d, parent

class PriorityQueue:
    def __init__(self):
        self.A = {}
    def insert(self, label, key):
        self.A[label] = key
    def extract_min(self):
        min_label = None
        for label in self.A:
            if (min_label is None) or (self.A[label] < self.A[min_label]):
                min_label = label
        del self.A[min_label]
        return min_label
    def decrease_key(self, label, key):
        if (label in self.A) and (key < self.A[label]):
            self.A[label] = key
class Item:
    def __init__(self, label, key):
        self.label, self.key = label, key
    def __lt__(self, other):
        return self.key < other.key
class PriorityQueue:
    def __init__(self):
        self.A = []
        self.label_2_idx = dict()
    def insert(self, label, key):
        item = Item(label, key)
        idx= len(self.A) - 1
        self.label_2_idx[label] = idx
        heapq._siftdown(self.A, 0, idx)
    def extract_min(self):
        label = self.A[0].label
        self.A[0], self.A[-1] = self.A[-1], self.A[0]
        self.label_2_idx[self.A[0].label] = 0
        if self.A: heapq._siftup(self.A, 0)
        return label
    def decrease_key(self, label, key):
        if label in self.label_2_idx:
            idx = self.label_2_idx[label]
            if self.A[idx].key < key:
                self.A[idx].key = key
                heapq._siftdown(self.A, 0, idx)


  • Claim: At end of Dijkstra’s algorithm $d(s, v) = \delta(s, v)$ for all $v \in V$

  • Proof:

    • If relaxation sets $d(s, v)$ to $\delta(s, v)$, then $d(s, v) = \delta(s, v)$ at the end of the algorithm

      • Relaxation can only decrease estimates $d(s, v)$
      • Relaxation is safe, i.e., maintains that each $d(s, v)$ is weight of a path to $v$ (or $\infty$)
    • Suffices to show $d(s, v) = \delta(s, v)$ when vertex $v$ is removed from $Q$

      • Proof by induction on first $k$ vertices removed from $Q$

      • Base Case ($k$ = 1): $s$ is first vertex removed from $Q$, and $d(s, s) = 0 = δ(s, s)$

      • Inductive Step: Assume true for $k < k’$ , consider $k’$ th vertex $v_0$ removed from $Q$

      • Consider some shortest path $π$ from $s$ to $v’$, with $w(π) = δ(s, v’ )$

      • Let $(x, y)$ be the first edge in $π$ where $y$ is not among first $k’ - 1$ (perhaps $y = v’$ )

      • When $x$ was removed from $Q$, $d(s, x) = δ(s, x)$ by induction, so: \(\begin{align} d(s, y) &\leq \delta(s, x) + w(x, y) \\ &=\delta(s, y) \\ &\leq \delta(s, v') \\ &\leq d(s, v') \\ &\leq d(s, y) \\ \end{align}\)

      • So $d(s, v’) = \delta(s, v’)$ as desired

Running Time

  • Count operations on changeable priority queue Q, assuming it contains n items:

    Operation Time Occurrences in Dijkstra
    Q.build(X) $B_n$ 1
    Q.delete_min() $M_n$ |V|
    Q.decrease_key(id, k) $D_n$ |E|
  • Total running time is $O(B_{\vert V \vert } + \vert V \vert \cdot M_{\vert V \vert } + \vert E \vert \cdot D_{\vert V \vert })$

  • Assume pruned graph to search only vertices reachable from the source, so $ V = O(\vert E \vert )$


15 Recursive Algorithms


Design your own recursive algorith

  • Constant-sized program to solve arbitrary input
  • Need looping or recursion, analyze by induction
  • Recursive function call: vertex in a graph, directed edge from $A\rightarrow B$ if $B$ calls $A$
  • Dependency graph of recursive calls must be acyclic (if can terminate)
  • Classify based on shape of graph
Class Graph
Brute Force Star
Decrease & Conquer Chain
Divide & Conquer Tree
Dynamic Programming DAG
Greedy / Incremental Subgraph
  • Hard part is thinking inductively to construct recurrence on subproblems
  • How to solve a problem recursively (SRT BOT)
    • Subproblem definition
    • Relate subproblem solutions recursively
    • Topological order on subproblems ($\Rightarrow$ subproblem DAG)
    • Base cases of relation
    • Original problem solution via subproblems(s)
    • Time analysis

Merge Sort in SRT BOT Framework

  • Merge sorting an array $A$ of \(n\) elements can be expressed in SRT BOT as follows:
  • Subproblems: $S(i,j)=$ sorted array on elements of $A[i:j]$ for $i \leq i \leq j \leq n$
  • Relation: $S(i,j) = merge(S(i, m), S(m, j))$ where $m=\lfloor (i+j) / 2 \rfloor$
  • Topological order: Increasing $j - i$
  • Base cases: $S(i, i + 1) = [A[i]]$
  • Original: $S(0, n)$
  • Time: $T(n) = 2T(n/2) + O(n) = O(nlogn)$

Fibonacci Numbers

  • Compute the \(n\)th Fibonacci number $F_n$
  • Subproblems: $F(i)=$ the $i$th Fibonacci number $F_i$ for $i \in {0,1,…, n}$
  • Relation: $F(i) = F(i-1)+F(i-2)$ (definition of Fibonacci numbers)
  • Topological order: Increasing $i$
  • Base cases: $F(0) = 0, F(1) = 1$
  • Original problem: $F(n)$
def fib(n):
    if n < 2: return n
    return fib(n-1) + fib(n-2)
  • Divide and conquer implies a tree of recursive calls
  • Time: $T(n) = T(n-1)+T(n-2) + O(1) > 2T(n-2), T(n) = \Omega(2 ^{n/2})$ exponential… :(
  • Subproblem $F(k)$ computed more than once! ($F(n-k)$ times)
  • Can we avoid this waste?

Re-using Subproblem Solutions

  • Either:
    • Top down: record subproblem solutions in a memo and re-use
    • Bottom up: solve subproblems in topological sort order (usually via loops)
  • For Fibonacci, $n + 1$ subproblems (vertices) and $<2n$ dependencies (edges)
  • Time to compute is then $O(n)$ additions
def fib(n):
    memo = dict()
    def F(i):
        if i < 2: return i
       	if i not in memo:
            memo[i] = F(i-1) + F(i-2)
        return memo[i]
   return F(n)

def fib(n):
    F = dict()
    F[0], F[1] = 0, 1
    for i in range(2, n+1):
        F[i] = F[i-1] + F[i-2]
    return F[n]
  • A subtlety is that Fibonacci numbers grow to $\Theta(n)$ bits long, potentially $\gg$ word size $w$
  • Each addition costs $O(\lceil n / w \rceil)$ time
  • So total cost is $O(n\lceil n / w \rceil) = O(n + n^2/w)$ time

Dynamic Programming

  • Weird name coined by Richard Bellman
    • Wanted government funding, needed cool name to disguise doing mathematics!
    • Updating (dynamic) a plan or schedule (program)
  • Existence of recursive solution implies decomposable subproblems
  • Recursive algorithm implies a graph of computation
  • Dynamic programming if subproblem dependencies overlap (DAG, in-degree > 1)
  • “Recurse but re-use” (Top down: record and lookup subproblem solutions)
  • “Careful brute force” (Bottom up: do each subproblem in order)
  • Often useful for counting/optimization problems: almost trivially correct recurrences

How to Solve a Problem Recursively (SRT BOT)

  • Subproblem definition subproblem $x\in X$
    • Describe the meaning of a subproblem in words, in terms of parameters
    • Often subsets of input: prefixes, suffixes, contiguous substrings of a sequence
    • Often record partial state: add subproblems by incrementing some auxiliary variable
  • Relate subproblem solutions recursively $x(i) = f(x(j),…)$ for one or more $j < i$
  • Topological order: to argue relation is acyclic and subproblems form a DAG
  • Base cases
    • State solutions for all (reachable) independent subproblems where relation breaks down
  • Original problem
    • Show how to compute solution to original problem from solutions to subproblem(s)
    • Possibly use parent pointers to recover actual solution, not just objective function
  • Time analysis
    • $\Sigma_{x\in X}work(x), $ or if $work(x)=O(W)$ for all $x\in X$, then $ X \cdot O(W)$
    • $work(x)$ measures non-recursive work in relation; treat recursions as taking $O(1)$ time

DAG Shortest Paths

  • DAG SSSP problem: given a DAG G and vertex $s$, compute $\delta(s, v)$ for all $v\in V$
  • Subproblems: $\delta(s, v)$ for all $v\in V$
  • Relation: $\delta(s, v)=min{\delta(s, u) + w(u, v) \vert u \in Adj^-(v)} \cup {\infty}$
  • Topological order: Topological order of G
  • Base case: $\delta(s, s) = 0$
  • Original: All subproblem
  • Time: $\Sigma_{v\in V}O(1+\vert A dj^-(v)\vert ) = O(\vert V \vert +\vert E \vert )$
  • DAG Relaxation computes the same min values as this dynamic program, just
    • step-by-step (if new value < min, update min via edge relaxation), and
    • from the perspective of \(u\) and $Adj^+(u)$ instead of $v$ and $Adj^-(v)$

How to Relate Subproblem Solutions

  • The general approach we’re following to define a relation on subproblem solutions:
    • Identify a question about a subproblem solution that, if you knew the answer to, would reduce to “smaller” subproblem(s)
    • Then locally brute-force the question by trying all possible answers, and taking the best
    • Alternatively, we can think of correctly guessing the answer to the question, and directly recursing; but then we actually check all possible guesses, and return the “best”
  • The key for efficiency is for the question to have a small (polynomial) number of possible answers, so brute forcing is not too expensive
  • Often (but not always) the non-recursive work to compute the relation is equal to the number of answers we’re trying

16 Dynamic Programming Subproblems


Longest Common Subsequence (LCS)

  • Given two strings $A$ and $B$, find a longest (not necessarily contiguous) subsequence of $A$ that is also a subsequence of $B$.
  • Example: $A=hieroglyphology \ B=michaelangelo$
  • Solution: $hello$ or $heglo$ or $iello$ or $ieglo$, all length 5
  • Maximization problem on length of subsequence
  1. Subproblems:
    • $x(i, j)=$ length of the longest common subsequence of suffixes $A[i:]$ and $B[j:]$
    • For $0\leq i \leq \vert A \vert $ and $0 \leq j \leq \vert B \vert $
  2. Relate:
    • Either first characters match or they don’t
    • If first characters match, some longest common subsequence will use them
    • (if no LCS uses first matched pair, using it will only improve solution)
    • (if an LCS uses first in $A[i]$ but not first in $B[j]$, matching $B[j]$ is also optimal)
    • If they do not match, they cannot both be in a longest common subsequence
    • Guess whether $A[i]$ or $B[j]$ is not in LCS
    • $x(i, j) = \begin{cases} x(i + 1, j+1) + 1 \qquad if A[i] = B[j]\max{x(i+1, j), x(i,j+1)} \qquad otherwise \end{cases}$
  3. Topological order:
    • Subproblem $x(i, j)$ depend only on strictly larger $i$ or $j$ or both
    • Simplest order to state: Decreasing $i + j$
    • Nice order for bottom-up code: Decreasing $i$, then decreasing $j$
  4. Base
    • $x(i, \vert B \vert ) = x(\vert A \vert , j)=0$ (one string is empty)
  5. Original problem
    • Length of longest common subsequence of $A$ and $B$ is $x(0,0)$
    • Store parent pointers to reconstruct subsequencce
    • If the parent pointer increases both indices, add that character to LCS
  6. Time:
    • # subproblems: $(\vert A \vert + 1)\cdot(\vert B \vert + 1)$
    • work per subproblem: $O(1)$
    • $O(\vert A \vert \ cdot \vert B \vert ) $ running time
def lcs(A, B):
    a, b = len(A), len(B)
    x = [[0] * (b + 1) for _ in range(a + 1)]
    for i in reversed(range(a)):
        for j in reversed(range(b)):
            if A[i] == B[j]:
                x[i][j] = x[i + 1][j + 1] + 1
                x[i][j] = max(x[i + 1][j], x[i][j + 1])
	return x[0][0]

def lcs(A, B):
    a, b = len(A), len(B)
    x = [[0] * (b + 1) for _ in range(a + 1)]
    for i in range(1, a + 1):
        for j in range(1, b + 1):
            if A[i] == B[j]:
                x[i][j] = x[i - 1][j - 1] + 1
                x[i][j] = max(x[i - 1][j], x[i][j - 1])
	return x[0][0]

Longest Increasing Subsequence (LIS)

  • Given a string \(A\), find a longest (not necessarily contiguous) subsequence of \(A\) that strictly increases (lexicographically).
  • Example: \(A= carbohydrate\)
  • Solution: \(abort\) of length 5
  • Maximization problem on length of subsequence
  • Attempted solution:
    • Natural subproblems are prefixes or suffixes of \(A\), say suffix \(A[i :]\)
    • Natural question about LIS of \(A[i :]\): is \(A[i]\) in the LIS? (2 possible answers)
    • But then how do we recurse on \(A[i + 1 :]\) and guarantee increasing subsequence?
    • Fix: add constraint to subproblems to give enough structure to achieve increasing property
  1. Subproblems
    • \(x(i)\) = length of longest increasing subsequence of suffix \(A[i:]\) that includes \(A[i]\)
    • For \(0\leq i \leq \vert A \vert\)
  2. Relate
    • We’re told that \(A[i]\) is in LIS (first element)
    • Next question: what is the second element of LIS?
      • Could be any \(A[j]\) where \(j > i\) and \(A[j] > A[i]\) (so increasing)
      • Or \(A[i]\) might be the last element of LIS
    • \[x(i) = max\{1 + x(j) \vert i < j < \vert A \vert , A[j] > A[i]\} \cup \{1\}\]
  3. Topological order:
    • Decreasing \(i\)
  4. Base
    • No base case necessary, because we consider the possibility that \(A[i]\) is last
  5. Original problem
    • What is the first element of LIS? Guess!
    • Length of LIS of \(A\) is \(max\{x(i) \vert 0\leq i < \vert A \vert \}\)
    • Store parent pointers to reconstruct subsequence
  6. Time
    • # subproblems: \(\vert A \vert\)
    • work per subproblem \(O( \vert A \vert )\)
    • \(O( \vert A \vert ^2)\) running time
    • speed up to \(O( \vert A \vert log \vert A \vert )\) by doing only \(O(log \vert A \vert )\) work per subproblem, via AVL tree augmentation
def lis(A):
    a = len(A)
    x = [1] * a
    for i in reversed(range(a)):
        for j in range(i, a):
            if A[j] > A[i]:
                x[i] = max(x[i], 1 + x[j])
	return max(x)

Alternating Coin Game

  • Given sequence of n coins of value \(v_0, v_1,...,v_{n1}\)
  • Two players (“me” and “you”) take turns
  • In a turn, take first or last coin among remaining coins
  • My goal is to maximize total value of my taken coins, where I go first
  • First solution exploits that this is a zero-sum game: I take all coins you don’t
  1. Subproblems
    • Choose subproblems that correspond to the state of the game
    • For every contiguous subsequence of coins from \(i\) to \(j\), \(0\leq i \leq j < n\)
    • \(x(i, j)=\) maximum total value I can take starting from coins of values \(v_i, ..., v_j\)
  2. Relate
    • I must choose either coin \(i\) or coin \(j\) (Guess!)
    • Then it’s your turn, so you’ll get values \(x(i+1, j)\) or \(x(i, j-1)\) respectively
    • To figure out how much value I get, subtract this from total coin values
    • \(x(i,j) = max\{v_i + \Sigma_{k=i+1}^{j} v_k \qquad x(i + 1, j), v_j + \Sigma_{k=i}^{j}v_k \qquad x(i, j-1)\}\) ???
  3. Topological order
    • Increasing \(j - i\)
  4. Base
    • \[x(i,i) = v_i\]
  5. Original problem
    • \[x(0, n- 1)\]
    • store parent pointers to reconstruct strategy
  6. Time
    • # subproblems: \(\Theta(n^2)\)
    • work per subproblem: \(\Theta(n)\) to compute sums
    • \(\Theta(n^3)\) running time
    • Speed up to \(\Theta(n^2)\) time by pre-computing all sums \(\Sigma_{k=i}^{j}v_k\) in \(\Theta(n^2)\) time via dynamic programming
  • Second solution uses subproblem expansion: add subproblems for when you move next
  1. Subproblems
    • Choose subproblems that correspond to the full state of the game
    • Contiguous subsequence of coins from \(i\) to \(j\), and which player \(p\) goes next
    • \(x(i, j, p) =\) maximum total value I can take when player \(p \in \{me, you\}\) starts from coins of values \(v_i,...,v_j\)
  2. Relate
    • Player \(p\) must choose either coin \(i\) or coin \(j\) (Guess!)
    • If \(p =\) me, then I get the value; otherwise, I get nothing
    • Then it’s the other player’s turn
    • \[x(i, j, me) = max\{v_i + x(i + 1, j, you), v_j + x(i, j - 1, you)\}\]
    • \[x(i, j, you) = min\{x(i + 1, j, me), x(i, j-1, me)\}\]
  3. Topological order
    • Increasing \(j - i\)
  4. Base
    • \[x(i, i, me) = v_i\]
    • \[x(i,i,you)=0\]
  5. Original problem
    • \[x(0, n-1, me)\]
    • Store parent pointers to reconstruct strategy
  6. Time
    • # subproblems: \(\Theta(n^2)\)
    • work per subproblem: \(\Theta(1)\)
    • \(\Theta(n^2)\) running time

Yet another alternative solution.

def coin_game(coins):
    n = len(coins)
    dp = [[0] * n for _ in range(n)]    
    for i in reversed(range(n)):
        for j in range(i, n):
            if i == j: 
                d[i][j] = coins[i]
                dp[i][j] = max(coins[i] - dp[i+1][j], coins[j] - dp[i][j-1])
	return dp[0][n-1] >= 0

def coin_game(coins):
    n = len(coins)
    dp = [[0] * n for _ in range(n)]
    parents = dict()
    for i in reversed(range(n)):
        for j in range(i, n):
            if i == j: 
                d[i][j] = coins[i]
                parents[(l, r)] = ((l, r), coins[l])
                a = coins[i] - dp[i+1][j]
                b = coins[j] - dp[i][j-1]
                if a > b:
                    dp[i][j] = a
                    parents[(l, r)] = ((l+1, r), nums[l])
                    dp[i][j] = b
                    parents[(l, r)] = ((l, r-1), nums[r])
	if dp[0][n-1] >= 0
		state = (0, n-1)
        turn = 1
        while parents[state][0] != state:
            print(f"player {turn % 2} took {parents[state][1]}")
            state = parents[state][0]
            turn += 1
	return dp[0][n-1] >= 0

Subproblem Constraints and Expansion

  • We’ve now seen two examples of constraining or expanding subproblems
  • If you find yourself lacking information to check the desired conditions of the problem, or lack the natural subproblem to recurse on, try subproblem constraint/expansion!
  • More subproblems and constraints give the relation more to work with, so can make DP more feasible
  • Usually a trade-off between number of subproblems and branching/complexity of relation

17 Dynamic Programming III


Single-Source Shortest Paths Revisited

  1. Subproblems
    • Expand subproblems to add information to make acyclic!
    • \(\delta_k(s,v)=\) weight of shortest path from \(s\) to \(v\) using at most \(k\) edges
    • For \(v\in V\) and \(0 \leq k \leq \vert V \vert\)
  2. Relate:
    • Guess last edge \((u, v)\) on shortest path from \(s\) to \(v\)
    • \[\delta_k(s, v) = min\{\delta_{k-1}(s, u) + w(u, v) \vert (u, v) \in E\}\ \cup\ \{\delta_{k-1}(s, v)\}\]
  3. Topological order:
    • Increasing k: subproblems depend on subproblems only with strictly smaller \(k\).
  4. base
    • \(\delta_{0}(s, s)=0\) and \(\delta_0(s, v) = \infty\) for \(v \neq s\) (no edges)
  5. Original problem
    • If has finte shortest path, then \(\delta(s, v) = \delta_{\vert V \vert- 1}(s, v)\)
    • Otherwise some \(\delta_{\vert V \vert }(s, v) < \delta_{\vert V \vert - 1}(s, v)\), so path contains a negative-weight cycle
    • Can keep track of parent pointers to subproblem that minimized recurrence
  6. Time
    • # subproblems: \(\vert V \vert \times (\vert V \vert +1)\)
    • Work for subproblem \(\delta_k(s,v): O(deg_{in}(v))\) \(\Sigma_{k=0}^{\vert V \vert } \Sigma_{v\in V}O(deg_{in}(v)) = \Sigma_{k=0}^{\vert V \vert } O(\vert E \vert ) =O(\vert V \vert \ cdot\vert E \vert )\)
  • This is just Bellman-Ford! (computed in a slightly different order)

All-Pairs Shortest Paths: Floyd-Warshall

  • Could define subproblem \(\delta_k(u, v)=\) minimum weight of path from \(u\) to \(v\) using at most \(k\) edges, as in Bellman-Ford
  • Resulting running time is \(\vert V \vert\) times Bellman-Ford, i.e., \(O(\vert V \vert ^2 \cdot \vert E \vert ) = O(\vert V \vert ^4)\)
  • Know a better algorithm from L14: Johnson achieves \(O(\vert V \vert ^2log \vert V+\vert V \vert \cdot\vert E \vert )=O(\vert V \vert ^3)\)
  • Can achieve \(\Theta(\vert V \vert ^3)\) running time (matching Johnson for dense graphs) with a simple dynamic program, called Floyd-Warshall.
  • Number vertices so that \(V=\{1,...,\vert V \vert \}\)
  1. Subproblems:
    • \(d(u, v, k)=\) minimum weight of a path from \(u\) to \(v\) that only uses vertices from \(\{1,2,...,k\} \cup \{u, v\}\)
    • For \(u, v \in V\) and \(1\leq k \leq \vert V \vert\)
  2. Relate
    • \[x(u, v, k)= min\{x(u, k, k-1) + x(k,v,k-1), x(u, v, k-1)\}\]
    • Only constant branching! No longer guessing previous vertex/edge
  3. Topological order
    • Increasing \(k\): relation depends only on smaller \(k\)
  4. Base
    • \[x(u, u, 0) = 0\]
    • \(x(u, v, 0) = w(u, v)\) if \((u, v)\in E\)
    • \(x(u,v,0)=\infty\) if none of the above
  5. Original problem
    • \(x(u, v, \vert V \vert )\) for all \(u, v \in V\)
  6. Time
    • \(O(\vert V \vert ^3)\) subproblems
    • Each \(O(1)\) work
    • \(O(\vert V \vert ^3)\) in total
    • Constant number of dependencies per subproblem brings the factor of \(O(\vert E \vert)\) in the running time down to \(O(\vert V \vert)\)