# 1 Introduction

## Notes

• Problem: binary relationship from inputs to outputs

• Algorithm: procedure mapping each input to a single output

• An algorithm solves a problem if it returns a correct output for each and every problem input
• Correctness:

• For small inputs: can use case analysis
• For arbitrarily large inputs: algorithm either is recursive or loop in some way. Use induction.
• Efficiency: how fast does an algorithm produce a correct output?

• Count the number of fixed time operations algorithm takes to return
• Asymptotic Notation: ignore constant factors and low order terms
input constant logarithmic linear log-linear quadratic polynomial exponential
$$n$$ $$Θ(1)$$ $$Θ(log n)$$ $$Θ(n)$$ $$Θ(n log n)$$ $$Θ(n^2)$$ $$Θ(n^c )$$ $$2^Θ(n^c)$$
$$1000$$ $$1$$ $$≈ 10$$ $$1000$$ $$≈ 10,000$$ $$1,000,000$$ $$1000^c$$ $$2^1000 ≈ 10^301$$
Time $$1 ns$$ $$10 ns$$ $$1 µs$$ $$10 µs$$ $$1 ms$$ $$10^{(3c-9)} s$$ $$10^281 millenia$$
• Model of Computation: what operations on the machine can be performed in $$O(1)$$ time.

• Machine word: block of w bits (w is word size of a w-bit Word-RAM)
• Memory: Addressable sequence of machine words
• Processor supports many constant time operations on a $$O(1)$$ number of words (integers):
• integer arithmetic: (+, -, *, //, %)
• logical operators: (&&, ||, !, ==, <, >, <=, =>)
• bitwise arithmetic: (&, |, «, », …)
• Data Structure: a way to store non-constant data, that supports a set of operations
• A collection of operations is called an interface
• Example:
• Sequence: Extrinsic order to items (first, last, nth)
• Set: Intrinsic order to items (queries based on item keys)
• Data structures may implement the same interface with different performance
• Example: Static Array - fixed width slots, fixed length, static sequence interface
• StaticArray(n): allocate static array of size n initialized to 0 in $$Θ(n)$$ time
• StaticArray.get_at(i): return word stored at array index i in $$Θ(1)$$ time
• StaticArray.set_at(i, x): write word x to array index i in $$Θ(1)$$ time

### More on Asymptotic Notation

• $$O$$ Notation:
• Non-negative function $$g(n)$$ is in $$O(f(n))$$ if and only if there exists a positive real number $$c$$ and positive integer $$n_0$$ such that $$g(n) ≤ c · f(n)$$ for all $$n ≥ n_0$$.
• $$Ω$$ Notation:
• Non-negative function $$g(n)$$ is in $$Ω(f(n))$$ if and only if there exists a positive real number c and positive integer $$n_0$$ such that $$c · f(n) ≤ g(n)$$ for all $$n ≥ n_0$$.
• $$Θ Notation$$:
• Non-negative $$g(n)$$ is in $$Θ(f(n))$$ if and only if $$g(n) ∈ O(f(n))∩Ω(f(n))$$

# 2 Data Structures

## Notes

### Data Structure Interfaces

• A data structure is a way to store data, with algorithms that support operations on the data
• Collection of supported operations is called an interface (also API or ADT)
• Interface is a specification: what operations are supported (the problem!)
• Data structure is a representation: how operations are supported (the solution!)

### Sequence Interface (L02, L07)

• Maintain a sequence of items (order is extrinsic)
• Ex: ($$x_0$$, $$x_1$$, $$x_2$$, . . . , $$x_{n-1}$$) (zero indexing)
• use n to denote the number of items stored in the data structure
• Supports sequence operations:
Type Interface Specification
Container build(X) given an iterable X, build sequence from items in X
len() return the number of stored items
Static iter_seq() return the stored items one-by-one in sequence order
get_at(i) return the $$i^{th}$$ item
set_at(i, x) replace the $$i^{th}$$ item with x
Dynamic insert_at(i, x) add $$x$$ as the $$i^{th}$$ item
delete_at(i, x) remove and return the $$i^{th}$$ item
insert_fist(x) add $$x$$ as the first item
delete_first(x) remove and return the first item
insert_last(x) add $$x$$ as the last item
delete_last(x) remove and return the last item
• Special case interfaces:
• stack: insert_last(x) and delete_last()
• queue: insert_last(x) and delete_first()

### Set Interface (L03-L08)

• Maintain a set of items having unique keys (e.g., item x has key x.key)
• Set or multi-set? We restrict to unique keys for now.
• Often we let key of an item be the item itself, but may want to store more info than just key
• Supports set operations:

Type Interface Specification
Container build(X) given an iterable X, build sequence from items in X
len() return the number of stored items
Static find(k) return the stored item with key k
Dynamic insert(x) add x to set (replace item with key x.key if one already exist)
delete(x) remove and return the stored item with key k
Order iter_ord() return the stored items one-by-one in key order
find_min() return the stored item with smallest key
find_max() return the stored item with largest key
find_next(k) return the stored item with smallest key larger than k
find_prev(k) return the stored item with largest key smaller than k
• Special case interfaces:
• dictionary: set without the Order operations

### Array Sequence

• Array is great for static operations! get at(i) and set at(i, x) in Θ(1) time!
• But not so great at dynamic operations…
• For consistency, we maintain the invariant that array is full.
• Then inserting and removing items requires:
• reallocating the array
• shifting all items after the modified item
Sequence Data Structure API Type     Worst Case $$O(\cdot)$$
Array Container Static Dynamic
API build(x) get_at(i)
set_at(i)
insert_first(x)
delete_first()
insert_last(x)
delete_last()
insert_at(i, x)
delete_at(i)
Array $$n$$ $$1$$ $$n$$ $$n$$ $$n$$

• Pointer data structure (this is not related to a Python “list”)
• Each item stored in a node which contains a pointer to the next node in sequence
• Each node has two fields: node.item and node.next
• Can manipulate nodes simply by relinking pointers!
• Maintain pointers to the first node in sequence (called the head)
• Can now insert and delete from the front in $$Θ(1)$$ time! Yay!
• (Inserting/deleting efficiently from back is also possible; you will do this in PS1)
• But now get_at(i) and set_at(i, x) each take $$O(n)$$ time… :(
Sequence Data Structure API Type     Worst Case $$O(\cdot)$$
Array Container Static Dynamic
API build(x) get_at(i)
set_at(i)
insert_first(x)
delete_first()
insert_last(x)
delete_last()
insert_at(i, x)
delete_at(i)
Linked List $$n$$ $$n$$ $$1$$ $$n$$ # 1 if we keep track of tail $$n$$

### Dynamic Array Sequence

• Make an array efficient for last dynamic operations
• Python “list” is a dynamic array
• Idea! Allocate extra space so reallocation does not occur with every dynamic operation
• Fill ratio: $$0 ≤ r ≤ 1$$ the ratio of items to space
• Whenever array is full ($r = 1$), allocate $$Θ(n)$$ extra space at end to fill ratio $$r_i$$ (e.g., 1/2)
• Will have to insert $$Θ(n)$$ items before the next reallocation
• A single operation can take $$Θ(n)$$ time for reallocation
• However, any sequence of $$Θ(n)$$ operations takes $$Θ(n)$$ time
• So each operation takes $$Θ(1)$$ time “on average”
Sequence Data Structure API Type     Worst Case $$O(\cdot)$$
Array Container Static Dynamic
API build(x) get_at(i)
set_at(i)
insert_first(x)
delete_first()
insert_last(x)
delete_last()
insert_at(i, x)
delete_at(i)
Dynamic Array $$n$$ $$1$$ $$n$$ $$1_{(a)}$$ $$n$$

### Amortized Analysis

• Data structure analysis technique to distribute cost over many operations
• Operation has amortized cost $$T(n)$$ if k operations cost at most $$≤ kT(n)$$
• “$$T(n)$$ amortized” roughly means $$T(n)$$ “on average” over many operations
• Inserting into a dynamic array takes $$Θ(1)$$ amortized time

### Dynamic Array Deletion

• Delete from back? $$Θ(1)$$ time without effort, yay!
• However, can be very wasteful in space. Want size of data structure to stay $$Θ(n)$$
• Attempt: if very empty, resize to r = 1. Alternating insertion and deletion could be bad…
• Idea! When $$r < r_d$$, resize array to ratio $$r_i$$ where $$r_d < r_i$$ (e.g., $$r_d = 1/4, r_i = 1/2$$)
• Then $$Θ(n)$$ cheap operations must be made before next expensive resize
• Can limit extra space usage to $$(1 + ε)n$$ for any $$ε > 0$$ (set $$r_d = \frac{1}{1+\epsilon}, r_i = \frac{r_d + 1}{2}$$)
• Dynamic arrays only support dynamic last operations in $$Θ(1)$$ time
• Python List append and pop are amortized $$O(1)$$ time, other operations can be $$O(n)$$!
• (Inserting/deleting efficiently from front is also possible; you will do this in PS1)
Sequence Data Structure API Type     Worst Case $$O(\cdot)$$
Array Container Static Dynamic
API build(x) get_at(i)
set_at(i)
insert_first(x)
delete_first()
insert_last(x)
delete_last()
insert_at(i, x)
delete_at(i)
Static Array $$n$$ $$1$$ $$n$$ $$n$$ $$n$$
Linked List $$n$$ $$n$$ $$1$$ $$n$$ # 1 if we keep track of tail $$n$$
Dynamic Array $$n$$ $$1$$ $$n$$ $$1_{(a)}$$ $$n$$

# 3 Sorting

## Notes

### Set Interface

• Storing items in an array in arbitrary order can implement a (not so efficient) set
• Stored items sorted increasing by key allows:
• faster find min/max (at first and last index of array)
• faster finds via binary search: $$O(log n)$$
Set Data Structure API Type     Worst Case $$O(\cdot)$$
Set Container Static Dynamic
API build(x) find(k) insert(k)
delete(k)
find_min()
find_max()
find_prev(k)
find_next(k)
Array $$n$$ $$n$$ $$n$$ $$n$$ $$n$$
Sorted Array $$nlogn$$ $$logn$$ $$n$$ $$1$$ $$logn$$
• But how to construct a sorted array efficiently?

### Sorting

• Given a sorted array, we can leverage binary search to make an efficient set data structure.
• Input: (static) array A of n numbers
• Output: (static) array B which is a sorted permutation of A
• Permutation: array with same elements in a different order
• Sorted: B[i - 1]B[i] for all $$i ∈ {1, . . . , n}$$
• Example: $$[8, 2, 4, 9, 3] → [2, 3, 4, 8, 9]$$
• A sort is destructive if it overwrites $$A$$ (instead of making a new array $$B$$ that is a sorted version of $$A$$)
• A sort is in place if it uses $$O(1)$$ extra space (implies destructive: in place ⊆ destructive)

#### Permutation Sort

• There are $$n!$$ permutations of A, at least one of which is sorted. (Due to duplications)
• For each permutation, check whether sorted in $$Θ(n)$$
• Example: $$[2, 3, 1] → {[1, 2, 3], [1, 3, 2], [2, 1, 3], [2, 3, 1], [3, 1, 2], [3, 2, 1]}$$
def permutation_sort(A):
"""Sort A"""
for B in permutations(A): # O(n!)
if is_sorted(B): # O(n)
return B
• permutation sort analysis:
• Correct by case analysis: try all possibilities (Brute Force)
• Running time: $$Ω(n! · n)$$ which is exponential :(

### Solving Recurrences

• Substitution: Guess a solution, replace with representative function, recurrence holds true
• Recurrence Tree: Draw a tree representing the recursive calls and sum computation at nodes
• Master Theorem: A formula to solve many recurrences (R03)

### Selection Sort

• Find a largest number in prefix A[:i + 1] and swap it to A[i]
• Recursively sort prefix A[:i]
• Example: $$[8, 2, 4, 9, 3], [8, 2, 4, 3, 9], [3, 2, 4, 8, 9], [3, 2, 4, 8, 9], [2, 3, 4, 8, 9]$$
def selection_sort(A, i=None):
"""Sort A[:i+1]"""
if i is None: i = len(A) - 1
if i > 0:
j = prefix_max(A, i)
A[i], A[j] = A[j], A[i]
selection_sort(A, i - 1)

def prefix_max(A, i):
"""Return index of maximum in A[:i+1]"""
if i > 0:
j = prefix_max(A, i - 1)
if A[i] < A[j]:
return j
return i
• prefix_max analysis:
• Base case: for i = 0, array has one element, so index of max is $$i$$
• Induction: assume correct for $i$, maximum is either the maximum of A[:i] or A[i], returns correct index in either case.
• $S(1) = Θ(1); S(n) = S(n - 1) + Θ(1)$
• Substitution: $$S(n) = Θ(n)$$, $$cn = Θ(1) + c(n - 1) =⇒ 1 = Θ(1)$$
• Recurrence tree: chain of $$n$$ nodes with $$Θ(1)$$ work per node, $$\Sigma_{i=0}^{n-1}1=Θ(n)$$

### Insertion Sort

• Recursively sort prefix A[:i]
• Sort prefix A[:i + 1] assuming that prefix A[:i] is sorted by repeated swaps
• Example: $$[8, 2, 4, 9, 3], [2, 8, 4, 9, 3], [2, 4, 8, 9, 3], [2, 4, 8, 9, 3], [2, 3, 4, 8, 9]$$
def insertion_sort(A, i=None):
"""Sort A[:i+1]"""
if i is None: i = len(A) - 1
if i > 0:
insertion_sort(A, i-1)
insert_last(A, i)

def insert_last(A, i):
"""Sort A[:i+1] assuming sorted A[:i]"""
if i > 0 and A[i] < A[i-1]:
A[i], A[i-1] = A[i-1], A[i]
insert_last(A, i-1)
• insert_last analysis:
• Base case: for $$i = 0$$, array has one element so is sorted
• Induction: assume correct for $$i$$, if $$A[i] >= A[i - 1]$$, array is sorted; otherwise, swapping last two elements allows us to sort A[:i] by induction. □
• $S(1) = Θ(1); S(n) = S(n - 1) + Θ(1) =⇒ S(n) = Θ(n)$
• insertion_sort analysis:
• Base case: for $$i = 0$$, array has one element so is sorted
• Induction: assume correct for $i$, algorithm sorts A[:i] by induction, and then insert last correctly sorts the rest as proved above. □
• $T(1) = Θ(1); T(n) = T(n - 1) + Θ(n) =⇒ T(n) = Θ(n^2)$

### Merge Sort

• Recursively sort first half and second half (may assume power of two)
• Merge sorted halves into one sorted list (two finger algorithm)
• Example: $$[7, 1, 5, 6, 2, 4, 9, 3], [1, 7, 5, 6, 2, 4, 3, 9], [1, 5, 6, 7, 2, 3, 4, 9], [1, 2, 3, 4, 5, 6, 7, 9]$$
def merge_sort(A, lo=0, hi=None):
"""Sort A[lo:hi]"""
if hi is None: hi = len(A)
if hi - lo > 1:
mid = (lo + hi + 1) // 2
merge_sort(A, lo, mid)
merge_sort(A, mid, hi)
left, right = A[lo:mid], A[mid:hi]
merge(left, right, A, len(left), len(right), lo, hi)

def merge(left, right, A, i, j, lo, hi):
"""Merge sorted left[:i] anr right[:j] into A[lo:hi]"""
if lo < hi:
if (j <= 0) or (i > 0 and left[i-1] > right[j-1]):
A[hi-1] = left[i-1]
i -= 1
else:
A[hi-1] = right[j-1]
j -= 1
merge(left, right, A, i, j, lo, hi -1)
• merge analysis:
• Base case: for $$n = 0$$, arrays are empty, so vacuously correct
• Induction: assume correct for $$n$$, item in A[r] must be a largest number from remaining prefixes of left and right, and since they are sorted, taking largest of last items suffices; remainder is merged by induction. □
• $S(0) = Θ(1); S(n) = S(n - 1) + Θ(1) =⇒ S(n) = Θ(n)$
• merge_sort analysis:
• Base case: for $$n = 1$$, array has one element so is sorted
• Induction: assume correct for $$k < n$$, algorithm sorts smaller halves by induction, and then merge merges into a sorted array as proved above. □
• $T(1) = Θ(1); T(n) = 2T(n/2) + Θ(n)$
• Substitution: Guess $$T(n) = Θ(n log n)$$ $$cn log n = Θ(n) + 2c(n/2)log(n/2) =⇒ cn log(2) = Θ(n)$$
• Recurrence Tree: complete binary tree with depth $$log2 n$$ and $$n$$ leaves, level $$i$$ has $$2^i$$ nodes with $$O(n/2^i )$$ work each, total: $$\Sigma_{i=0}^{log_2^n}(2i )(n/2^i ) = \Sigma_{i=0}^{log_2^n}n = Θ(n log n)$$

### Master Theorem

• The Master Theorem provides a way to solve recurrence relations in which recursive calls decrease problem size by a constant factor.
• Given a recurrence relation of the form $$T(n) = aT(n/b)+ f(n)$$ and $$T(1) = Θ(1)$$, with branching factor $$a ≥ 1$$, problem size reduction factor $$b > 1$$, and asymptotically non-negative function $$f(n)$$, the Master Theorem gives the solution to the recurrence by comparing $$f(n)$$ to $$a^{log_b^{n}}=n^{log_b^a}$$ , the number of leaves at the bottom of the recursion tree.
• When $$f(n)$$ grows asymptotically faster than $$n$$ , the work done at each level decreases geometrically so the work at the root dominates;
• alternatively, when $$f(n)$$ grows slower, the work done at each level increases geometrically and the work at the leaves dominates.
• When their growth rates are comparable, the work is evenly spread over the tree’s $$O(log n)$$ levels.

case solution conditions
1 $$T(n) = \Theta(n^{log_b^a})$$ $$f(n) = \Theta(n^{log_b^{a-\epsilon}})$$ for some constant $$ε > 0$$
2 $$T(n) = \Theta(n^{log_b^a}log^{k+1}n)$$ $$T(n) = \Theta(n^{log_b^a}log^{k}n)$$ for some constant $$k ≥ 0$$
3 $$T(n) = \Theta(f(n))$$ $$f(n) = \Theta(n^{log_b^{a+\epsilon}})$$ for some constant $$ε > 0$$
and $$af(n/b) < cf(n)$$ for some constant $$0 < c < 1$$
• The Master Theorem takes on a simpler form when f(n) is a polynomial, such that the recurrence has the from $$T(n) = aT(n/b) + Θ(n^c )$$ for some constant $$c ≥ 0$$.
case solution conditions intuition
1 $$T(n) = \Theta(n^{log_b^a})$$ $$c < log_b^a$$ Work done at leaves dominates
2 $$T(n) = \Theta(n^{c}log^{n})$$ $$c=log_b^a$$ Work balanced across the tree
3 $$T(n) = \Theta(n^c)$$ $$c>log_b^a$$ Work done at root dominates
• This special case is straight-forward to prove by substitution (this can be done in recitation).
• To apply the Master Theorem (or this simpler special case), you should state which case applies, and show that your recurrence relation satisfies all conditions required by the relevant case.
• There are even stronger (more general) formulas to solve recurrences, but we will not use them in this class.

# 4 Hashing

## Notes

### Comparison Model

• In this model, assume algorithm can only differentiate items via comparisons
• Comparable items: black boxes only supporting comparisons between pairs
• Comparisons are $$<, ≤, >, ≥, =, \neq$$, outputs are binary: True or False
• Goal: Store a set of n comparable items, support find(k) operation
• Running time is lower bounded by # comparisons performed, so count comparisons!

### Decision Tree

• Any algorithm can be viewed as a decision tree of operations performed
• An internal node represents a binary comparison, branching either True or False
• For a comparison algorithm, the decision tree is binary (draw example)
• A leaf represents algorithm termination, resulting in an algorithm output
• A root-to-leaf path represents an execution of the algorithm on some input
• Need at least one leaf for each algorithm output, so search requires $$≥ n + 1$$ leaves

### Comparison Search Lower Bound

• What is worst-case running time of a comparison search algorithm?
• running time $$≥$$ # comparisons $$≥$$ max length of any root-to-leaf path $$≥$$ height of tree
• What is minimum height of any binary tree on $$≥ n$$ nodes?
• Minimum height when binary tree is complete (all rows full except last)
• $$Height ≥ \lceil lg(n + 1) \rceil - 1 = Ω(log n)$$, so running time of any comparison sort is $$Ω(log n)$$S
• Sorted arrays achieve this bound! Yay!
• More generally, height of tree with $$Θ(n)$$ leaves and max branching factor $$b$$ is $$Ω(logb n)$$
• To get faster, need an operation that allows super-constant $$ω(1)$$ branching factor. How??

### Direct Access Array

• Exploit Word-RAM $$O(1)$$ time random access indexing! Linear branching factor!
• Idea! Give item unique integer key k in $$\{0, . . . , u - 1\}$$, store item in an array at index $$k$$
• Associate a meaning with each index of array.
• If keys fit in a machine word, i.e. $$u ≤ 2^w$$, worst-case $$O(1)$$ find/dynamic operations! Yay!
• 6.006: assume input numbers/strings fit in a word, unless length explicitly parameterized
• Anything in computer memory is a binary integer, or use (static) 64-bit address in memory
• But space $$O(u)$$, so really bad if $$n \ll u$$ … :(
• Example: if keys are ten-letter names, for one bit per name, requires $$26^{10} ≈ 17.6$$ TB space
• How can we use less space?

### Hashing

• Idea! If $$n \ll u$$, map keys to a smaller range $$m = Θ(n)$$ and use smaller direct access array
• Hash function: $$h(k) : \{0, . . . , u - 1\} → \{0, . . . , m - 1\}$$ (also hash map)
• Direct access array called hash table, $$h(k)$$ called the hash of key k
• If $$m \ll u$$, no hash function is injective by pigeonhole principle
• Always exists keys $$a, b$$ such that $$h(a) = h(b)$$ $$→$$ Collision! :(
• Can’t store both items at same index, so where to store? Either:
• store somewhere else in the array (open addressing)
• complicated analysis, but common and practical
• store in another data structure supporting dynamic set interface (chaining)

### Chaining

• Idea! Store collisions in another data structure (a chain)
• If keys roughly evenly distributed over indices, chain size is $$n/m = n/Ω(n) = O(1)!$$
• If chain has $$O(1)$$ size, all operations take $$O(1)$$ time! Yay!
• If not, many items may map to same location, e.g. $$h(k) = constant$$, chain size is $$Θ(n)$$ :(
• Need good hash function! So what’s a good hash function?

### Hash Functions

#### Division (bad): $$h(k) = k \ \mod \ m$$

• Heuristic, good when keys are uniformly distributed!
• $m$ should avoid symmetries of the stored keys
• Large primes far from powers of 2 and 10 can be reasonable
• Python uses a version of this with some additional mixing
• If $$u \ll n$$, every hash function will have some input set that will a create $$O(n)$$ size chain
• Idea! Don’t use a fixed hash function! Choose one randomly (but carefully)!

#### Universal (good, theoretically): $$h_{ab}{(k)}=(((ak+b) \mod \ p) \mod \ m)$$

•  Hash Family $$\mathcal{H}(p,m) = {h_{ab} a,b \in {0,…,p-1} and a \neq 0 }$$
• Parameterized by a fixed prime $$p > u$$, with $$a$$ and $$b$$ chosen from range $$\{0,...,p-1\}$$
• $$\mathcal{H}$$ is a Universal family: $$\underset{h\in\mathcal{H}}{Pr}\{h(k_i) = h(k_j)\} \leq 1/m \qquad \forall k_i \neq k_j \in \{0,...,u-1\}$$
• Why is universality useful? Implies short chain lengths! (in expectation)
• $$X_{ij}$$ indicator random variable over $$h \in \mathcal{H}: \ X_{ij}=1 \ if \ h(k_i) = h(k_j), X_{ij} = 0 \ otherwise$$
• Size of chain at index $$h(k_i)$$ is random variable $$X_i = \Sigma_{j}{X_{ij}}$$
• Expected size of chain at index $$h(k_i)$$:
\begin{align}\underset{h\in\mathcal{H}}{\mathbb{E}}{X_i} = \underset{h\in\mathcal{H}}{\mathbb{E}} \{\Sigma_{j}{X_{ij}}\} = \Sigma_{j}\underset{h\in\mathcal{H}}{\mathbb{E}}{X_{ij}} & = 1 + \Sigma_{j\neq i}\underset{h\in\mathcal{H}}{\mathbb{E}}{X_{ij}}\\ &= 1 + \Sigma_{j\neq i}(1)\underset{h\in\mathcal{H}}{Pr}\{h(k_i) = h(k_j)\} + (0)\underset{h\in\mathcal{H}}{Pr}\{h(k_i) \neq h(k_j)\}\\ &\leq 1 + \Sigma_{j\neq i} 1/m = 1 + (n-1) /m\end{align}
• Since $$m = Ω(n)$$, load factor $$α = n/m = O(1)$$, so $$O(1)$$ in expectation!

#### Dynamic

• If $$n/m$$ far from 1, rebuild with new randomly chosen hash function for new size m
• Same analysis as dynamic arrays, cost can be amortized over many dynamic operations
• So a hash table can implement dynamic set operations in expected amortized O(1) time! :)
Data Structure API Type     Worst Case $$O(\cdot)$$
Set Container Static Dynamic
API build(x) find(k) insert(k)
delete(k)
find_min()
find_max()
find_prev(k)
find_next(k)
Array $$n$$ $$n$$ $$n$$ $$n$$ $$n$$
Sorted Array $$nlogn$$ $$logn$$ $$n$$ $$1$$ $$logn$$
Direct Access Array $$u$$ $$1$$ $$1$$ $$u$$ $$u$$
Hash Table $$n_{(e)}$$ $$1_{e}$$ $$1_{(a)(e)}$$ $$n$$ $$n$$

# 5 Linear Sorting

## Notes

### Comparison Sort Lower Bound

• Comparison model implies that algorithm decision tree is binary (constant branching factor)
• Requires # leaves L ≥ # possible outputs
• Tree height lower bounded by $Ω(log L)$, so worst-case running time is $Ω(log L)$
• To sort array of n elements, # outputs is n! permutations
• Thus height lower bounded by $log(n!) ≥ log((n/2)^{n/2}) = Ω(n log n)$
• So merge sort is optimal in comparison model
• Can we exploit a direct access array to sort faster?

### Direct Access Array Sort

• Example: $[5, 2, 7, 0, 4]$
• Suppose all keys are unique non-negative integers in range ${0, . . . , u - 1}$, so $n ≤ u$
• Insert each item into a direct access array with size $$u$$ in $Θ(n)$
• Return items in order they appear in direct access array in $Θ(u)$
• Running time is $Θ(u)$, which is $Θ(n)$ if $u = Θ(n)$. Yay!
def direct_access_sort(A):
"""Sort A assuming items have distinct non-negative keys."""
u = 1 + max([x.key for x in A])
D = [None] * u
for x in A:
D[x.key] = x
i = 0
for key in range(u):
if D[key] is not None:
A[i] = D[key]
i += 1
• What if keys are in larger range, like $u = Ω(n^2) < n^2$?
• Idea! Represent each key $k$ by tuple $(a, b)$ where $k = an + b$ and $0 ≤ b < n$
• Specifically $a = \lfloor k/n\rfloor < n$ and $b = (k \mod n)$ (just a 2-digit base-n number!)
• This is a built-in Python operation $(a, b) = divmod(k, n)$
• Example: $[17, 3, 24, 22, 12] ⇒ [(3,2), (0,3), (4,4), (4,2), (2,2)] ⇒ [32, 03, 44, 42, 22]_{(n=5)}$
• How can we sort tuples?

### Tuple Sort

• Item keys are tuples of equal length, i.e. item $x.key = (x.k_1, x.k_2, x.k_3, . . .)$.
• Want to sort on all entries lexicographically, so first key $k_1$ is most significant
• How to sort? Idea! Use other auxiliary sorting algorithms to separately sort each key
• (Like sorting rows in a spreadsheet by multiple columns)
• What order to sort them in? Least significant to most significant!
• Exercise: $[32, 03, 44, 42, 22] =⇒ [42, 22, 32, 03, 44] =⇒ [03, 22, 32, 42, 44]_{(n=5)}$
• Idea! Use tuple sort with auxiliary direct access array sort to sort tuples (a, b).
• Problem! Many integers could have the same a or b value, even if input keys distinct
• Need sort allowing repeated keys which preserves input order
• Want sort to be stable: repeated keys appear in output in same order as input
• Direct access array sort cannot even sort arrays having repeated keys!
• Can we modify direct access array sort to admit multiple keys in a way that is stable?

### Counting Sort

• Instead of storing a single item at each array index, store a chain, just like hashing!
• For stability, chain data structure should remember the order in which items were added
• Use a sequence data structure which maintains insertion order
• To insert item x, insert_last to end of the chain at index $x.key$
• Then to sort, read through all chains in sequence order, returning items one by one
def counting_sort(A):
"""Sort A assuming items have non-negative keys."""
u = 1 + max([x.key for x in A])
D = [[] for i in range(u)]
for x in A:
D[x.key].append(x)
i = 0
for chain in D:
for x in chain:
A[i] = x
i += 1

• Idea! If $u < n^2$, use tuple sort with auxiliary counting sort to sort tuples (a, b)
• Sort least significant key b, then most significant key a
• Stability ensures previous sorts stay sorted
• Running time for this algorithm is $O(2n) = O(n)$. Yay!
• If every $key < n^c$ for some positive $c = logn(u)$, every key has at most $c$ digits base $$n$$
• A c-digit number can be written as a c-element tuple in $O(c)$ time
• We sort each of the c base-n digits in $O(n)$ time
• So tuple sort with auxiliary counting sort runs in $O(cn)$ time in total
• If c is constant, so each key is $≤ n^c$ , this sort is linear $O(n)$!
"""Sort A assuming items have non-negative keys"""
n = len(A)
u = 1 + max([x.key for x in A])
c = 1 + (u.bit_length() // n.bit_length())

class Obj: pass

D = [Obj() for a in A]
for i in range(n):
D[i].digits = []
D[i].item = A[i]
high = A[i].key
for j in range(c):
high, low = divmod(high, n)
D[i].digits.append(low)
for i in range(c):
for j in range(n):
D[j].key = D[j].digits[i]
counting_sort(D)
for i in range(n);
A[i] = D[i].item
Algorithm Time $O(\cdot)$ In-place? Stable? Comments
Insertion Sort $n^2$ Y Y $O(nk)$ for k-proximate
Selection Sort $n^2$ Y N $O(n)$ swaps
Merge Sort $$nlogn$$ N Y stable, optimal comparison
Counting Sort $n + u$ N Y $O(n)$ when $u = O(n)$
Radix Sort $n + nlog_n{u}$ N Y $O(n)$ when $u = O(n)$

# 6 Binary Trees, Part 1

## Notes

Sequence Data Structure API Type     Worst Case $O(\cdot)$
Array Container Static Dynamic
API build(x) get_at(i)
set_at(i)
insert_first(x)
delete_first()
insert_last(x)
delete_last()
insert_at(i, x)
delete_at(i)
Static Array $$n$$ $$1$$ $$n$$ $$n$$ $$n$$
Linked List $$n$$ $$n$$ $$1$$ $$n$$ # 1 if we keep track of tail $$n$$
Dynamic Array $$n$$ $$1$$ $$n$$ $1_{(a)}$ $$n$$
Goal $$n$$ $$logn$$ $$logn$$ $$logn$$ $$logn$$
Set Data Structure API Type     Worst Case $O(\cdot)$
Set Container Static Dynamic
API build(x) find(k) insert(k)
delete(k)
find_min()
find_max()
find_prev(k)
find_next(k)
Array $$n$$ $$n$$ $$n$$ $$n$$ $$n$$
Sorted Array $$nlogn$$ $$logn$$ $$n$$ $$1$$ $$logn$$
Goal $$nlogn$$ $$logn$$ $$logn$$ $$logn$$ $$logn$$

### How? Binary Trees!

• Pointer-based data structures (like Linked List) can achieve worst-case performance
• Binary tree is pointer-based data structure with three pointers per node
• Node representation: node.{item, parent, left, right}
• Example:

class TreeNode:
def __init__(self, x):
self.item = x
self.left = None
self.right = None
self.parent = None

### Terminology

• The root of a tree has no parent (Ex: <A>)
• leaf of a tree has no children (Ex: <C>,<E>, and <F>)
• Define depth(<X>) of node <X> in a tree rooted at <A> to be length of path from <A> to <X>
• Define height(<X>) of node <X> to be max depth of any node in the subtree rooted at <X>
• Idea: Design operations to run in $O(h)$ time for root height $h$, and maintain $h = O(log n)$
• A binary tree has an inherent order: its traversal order (In-order traversal)
• every node in node <X>’s left subtree is before <X>
• every node in node <X>’s right subtree is after <X>
def subtree_iter(A):
if A.left: yield from A.left.subtree_iter()
yield A
if A.right: yield from A.right.subtree_iter()
• List nodes in traversal order via a recursive algorithm starting at root:
• Recursively list left subtree, list self, then recursively list right subtree
• Runs in $O(n)$ time, since $O(1)$ work is done to list each node
• Example: Traversal order is (<F>,<D>,<B>,<E>,<A>,<C>)
• Right now, traversal order has no meaning relative to the stored items
• Later, assign semantic meaning to traversal order to implement Sequence/Set interfaces

• Find first node in the traversal order of node <X>’s subtree (last is symmetric)
• Otherwise, <X> is the first node, so return it
• Running time is $O(h)$ where h is the height of the tree
• Example: first node in <A>’s subtree is <F>
def subtree_first(A):
if A.left: return A.left.subtree_first()
return A

def subtree_last(A):
if A.right: return A.right.subtree_last()
return A
• Find successor of node <X> in the traversal order (predecessor is symmetric)
• If <X> has right child, return first of right subtree
• Otherwise, return lowest ancestor of <X> for which <X> is in its left subtree
• Running time is $O(h)$ where $h$ is the height of the tree
• Example: Successor of: <B> is <E>, <E> is <A>, and <C> is None.
def successor(A):
if A.right: return A.right.subtree_first()
while A.parent and (A is A.parent.right):
A = A.parent
return A.parent

def predecessor(A):
if A.left: return A.left.subtree_last()
while A.parent and (A is A.parent.left):
A = A.parent
return A.parent

### Dynamic Operations

• Change the tree by a single item (only add or remove leaves):
• add a node after another in the traversal order (before is symmetric)
• remove an item from the tree
• Insert node <Y> after <X> in the traversal order
• If <X> has no right child, make <Y> the right child of <X>
• Otherwise, make <Y> the left child of <X>’s successor (which cannot have a left child)
• Running time is $O(h)$ where $h$ is the height of the tree

def subtree_insert_before(A, B):
if A.left:
A = A.left.subtree_last()
A.right, B.parent = B, A
else:
A.left, B.parent = B, A

def subtree_insert_after(A, B):
if A.right:
A = A.right.subtree_first()
A.left, B.parent = B, A
else:
A.right, B.parent = B, A
• Delete the item in node <X> from <X>’s subtree
• If <X> is a leaf, detach from parent and return
• Otherwise, <X> has a child
• If <X> has a left child, swap items with the predecessor of <X> and recurse
• Otherwise <X> has a right child, swap items with the successor of <X> and recurse
• Running time is $O(h)$ where $h$ is the height of the tree

def subtree_delete(A):
if A.left or A.right:   # A is a leaf node
if A.left: B = A.predecessor()
else:      B = A.successor()
A.item, B.item = B.item, A.item
if A.parent:            # A is not a leaf node
if A.parent.left is A: A.parent.left = None
else:                  A.parent.right = None
return A

### Application: Set

• Idea! Set Binary Tree (a.k.a Binary Search Tree / BST)
• Traversal order(In-order) is sorted order increasing by key
• Equivalent to BST Property: for every node, every key in left subtree $\leq$ node’s key $\leq$ every key in right subtree
• Then can find the node with key $k$ in node <X>’s subtree in $O(h)$ time like binary search:
• If $k$ is smaller than the key at <X>, recurse in left subtree (or return None)
• If $k$ is larger than the key at <X>, recurse in right subtree (or return None)
• Otherwise, return the item stored at <X>
• Other Set operations follow a similar pattern
class BSTNode(TreeNode):
def subtree_find(A, k):
if k == A.item.key: return A
if k < A.item.key and A.left: return A.left.subtree_find(k)
if k > A.item.key and A.right: return A.right.subtree_find(k)

def subtree_find_next(A, k):
if A.item.key <= k:
if A.right: return A.right.subtree_find_next(k)
else:       return None
if A.item.key > k:
if A.left:
B = A.left.subtree_find_next(k)
if B: return B
return A

def subtree_find_prev(A, k):
if A.item.key >= k:
if A.left: return A.left.subtree_find_prev(k)
else:      return None

if A.item.key < k:
if A.right:
B = A.right.subtree_find_prev(k)
if B: return B
return A

def subtree_insert(A, B):
if B.item.key < A.item.key:
if A.left: A.left.subtree_insert(B)
else:      A.subtree_insert_before(B)
elif B.item.key > A.item.key:
if A.right: A.right.subtree_insert(B)
else:       A.subtree_insert_after(B)
else: A.item = B.item

class BinaryTree:
def __init__(self, node_type=BinaryNode):
self.root = None
self.size = 0
self.node_type = node_type

def __len__(self): return self.size
def __iter__(self):
if self.root:
for item in self.root.subtree_iter():
yield node.item

class BinaryTreeSet(BinaryTree):
def __init__(self):
super().__init__(node_type=BSTNode)

def iter_order(self): yield from self

def build(self, X):
for x in X: self.insert(x)

def find_min(self):
if self.root: return self.root.subtree_first().item

def find_max(self):
if self.root: return self.root_subtree_last().item

def find(self, k):
if self.root:
node = self.root.subtree_find(k)
if node: return node.item

def find_next(self, k):
if self.root:
node = self.root.subtree_find_next(k)
if node: return node.item

def find_prev(self, k):
if self.root:
node = self.root.subtree_find_prev(k)
if node: return node.item

def insert(self, x):
new_node = self.node_type(x)
if self.root:
self.root.subtree_insert(new_node)
if new_node.parent is None: return False
else:
self.root = new_node
self.size += 1

def delete(self, k):
assert self.root
node = self.root.subtree_find(k)
assert node
ext = node.subtree_delete()
if ext.parent is None: self.root = None
self.size -= 1
return ext.item

### Application: Sequence

• Idea! Sequence Binary Tree: Traversal order is sequence order
• How do we find $i^th$ node in traversal order of a subtree? Call this operation subtree_at(i)
• Could just iterate through entire traversal order, but that’s bad, $O(n)$
• However, if we could compute a subtree’s size in $O(1)$, then can solve in $O(h)$ time
• How? Check the size $n_L$ of the left subtree and compare to $i$
• If $i < n_L$, recurse on the left subtree
• If $i > n_L$, recurse on the right subtree with $i’ = i - n_L - 1$
• Otherwise, $i = n_L$, and you’ve reached the desired node!
• Maintain the size of each node’s subtree at the node via augmentation
• Add node.size field to each node
• When adding new leaf, add $+1$ to a.size for all ancestors a in $O(h)$ time
• When deleting a leaf, add $-1$ to a.size for all ancestors a in $O(h)$ time
• Sequence operations follow directly from a fast subtree_at(i) operation
• Naively, build(X) takes $O(nh)$ time, but can be done in $O(n)$ time; see recitation

# 7 Binary Tree II: AVL

## Notes

### Height Balance

• How to maintain height $h=O(logn)$ where $$n$$ is number of nodes in tree?
• A binary tree that maintains $O(logn)$ height under dynamic operations is called balanced
• There are many balancing schemes (Red-Black Trees, Splay Trees, 2-3 Trees, …)
• First proposed balancing scheme was the AVL Tree(Adelson-Velsky and Landis, 1962)

### Rotations

• Need to reduce height of tree without changing its traversal order, so that we represent the same sequence of items.

• How to change the structure of a tree, while preserving traversal order? Rotations!

• A rotation relinks $O(1)$ pointers to modify tree structure and maintains traversal order

def subtree_rotate_right(D):
assert D.left
B, E = D.left, D.right
A, C = B.left, B.right

# make sure new B has the right connection to D's parent
D, B = B, D
D.item, B.item = D.item, B.item

B.left, B.right = A, D
D.left, D.right = C, E

if A: A.parent = B
if E: E.parent = D

def subtree_rotate_left(B):
assert B.right
A, D = B.left, B.right
C, E = D.left, D.right

B, D = D, B
B.item, D.item = D.item, B.item

D.left, D.right = B, E
B.left, B.right = A, C

if A: A.parent = B
if E: E.parent = D

### Rotations Suffice

• Claim: $O(n)$ rotations can transform a binary tree to any other with same traversal order
• Proof: Repeatedly perform last possible right rotation in traversal order; resulting tree is a canonical chain. Each rotation increases depth of the last node by 1. Depth of last node in final chain is $n - 1$, so at most $n-1$ rotations are performed. Reverse canonical rotations to reach target tree. Q.E.D
• Can maintain height-balance by using $O(n)$ rotations to fully balance the tree, but slow :(
• We will keep the tree balanced in $O(logn)$ time per operation!

### AVL Trees: Height Balance

• AVL trees maintain height-balance (also called the AVL property)
• A node is height-balanced if heights of its left and right subtree differ by at most 1
• Let skew of a node be the height of its right subtree minus that of its left subtree
• Then a node is height-balanced if its skew is $-1,0$ or $$1$$
• Claim: A binary tree with height-balanced nodes has height $h=O(logn)$ (i.e., $n=2^{\Omega(h)}$)

• Proof: Suffices to show fewest nodes $F(h)$ in any height $h$ tree is $F(h)=2^{\Omega(h)}$ $$F(0) = 1, F(1)=2, F(h)=1+F(h-1)+F(h-2)\geq 2F(h-2))\implies F(h)\geq 2^{h/2}\qquad \square$$

• Suppose adding or removing leaf from a height-balanced tree results in imbalance

• Only subtree of the leaf’s ancestors have changed in height or skew
• Heights changed by only $\pm 1$, so skews still have magnitude $\leq 2$
• Idea: Fix height-balance of ancestors starting from leaf up to the root
• Repeatedly rebalanced lowest ancestor that is not height-balanced, wlog assume skew 2
• Local Rebalance: Given binary tree node <B>:
• whose skew 2 and
• every other node in <B>’s subtree is height-balanced
• then <B>’s subtree can be made height-balanced via one or two rotations
• (after which <B?’s height is the same or one less than before)
• Proof:
• Since skew of <B> is 2, <B>?’s right child exists
• Case 1: skew of <F> is 0 or Case 2: skew of <F> is 1
• Perform a left rotation on <B>

TBC

### Computing Height

• How to tell whether node is height-balanced? Compute heights of subtrees!
• How to compute the height of node <X> ? Naive algorithm:
• Recursively compute height of the left and right subtrees of <X>
• Add $$1$$ to the max of the two heights
• Runs in $Ω(n)$ time, since we recurse on every node :(
• Idea: Augment each node with the height of its subtree! (Save for later!)
• Height of <X> can be computed in $O(1)$ time from the heights of its children:
• Look up the stored heights of left and right subtrees in $O(1)$ time
• Add $$1$$ to the max of the two heights
• During dynamic operations, we must maintain our augmentation as the tree changes shape
• Recompute subtree augmentations at every node whose subtree changes:
• Update relinked nodes in a rotation operation in $O(1)$ time (ancestors don’t change)
• Update all ancestors of an inserted or deleted node in $O(h)$ time by walking up the tree

### Steps to Augment a Binary Tree

• In general, to augment a binary tree with a subtree property P, you must:
• State the subtree property P(<X>) you want to store at each node <X>
• Show how to compute P(<X>) from the augmentations of <X>’s children in $O(1)$ time
• Then stored property P(<X>) can be maintained without changing dynamic operation costs

### Application: Sequence

• For sequence binary tree, we needed to know subtree sizes
• For just inserting/deleting a leaf, this was easy, but now need to handle rotations
• Subtree size is a subtree property, so can maintain via augmentation
• Can compute size from sizes of children by summing them and adding 1

### Conclusion

• Set AVL trees achieve $O(log n)$ time for all set operations
• except $O(n log n)$ time for build and $O(n)$ time for iter
• Sequence AVL trees achieve $O(log n)$ time for all sequence operations
• except $O(n)$ time for build and iter

### Application: Sorting

• Any Set data structure defines a sorting algorithm: build (or repeatedly insert) then iter
• For example, Direct Access Array Sort from Lecture 5
• AVL Sort is a new $O(n log n)$ time sorting algorithm

# 8 Binary Heaps

## Notes

### Priority Queue Interface

• Keep track of many items, quickly access/remove the most important

• Example: router with limited bandwidth, must prioritize certain kinds of messages
• Example: process scheduling in operating system kernels
• Example: discrete-event simulation (when is next occurring event?)
• Example: graph algorithms (later in the course)
• Order items by key = priority so Set interface (not Sequence interface)

• Optimized for a particular subset of Set operations:

Operation Specification
build(X) build priority queue from iterable X
insert(x) add item x to data structure
delete_max() remove and return stored item with largest key
find_max() return stored item with largest key
• (Usually optimized for max or min, not both)
• Focus on insert and delete_max operations: build can repeatedly insert; find_max() can insert(delete_min())
class PriorityQueue:
def __init__(self):
self.A = []

def insert(self, x):
self.A.append(x)

def delete_max(self):
assert len(self.A) > 0
return self.A.pop()   # not correct by it self.

@classmethod
def sort(PQ, A):
pq = PQ()
for x in A: pq.insert(x)
out = [pq.delete_max() for _ in A]
return reversed(out)

### Priority Queue Sort

• Any priority queue data structure translates into a sorting algorithm:
• build(A), e.g., insert items one by one in input order
• Repeatedly delete_min() (or delete_max()) to determine (reverse) sorted order
• All the hard work happens inside the data structure
• Running time is $T_{build} + n · T_{delete_max} ≤ n · T_{insert} + n · T_{delete_max}$
• Many sorting algorithms we’ve seen can be viewed as priority queue sort:
Priority Queue Data Structure   Operations $O(\cdot)$   Priority Queue Sort   Algorithm
build(A) insert(x) delete_max() Time In-place?
Dynamic Array $$n$$ $1_{(a)}$ $$n$$ $n^2$ Y Selection Sort
Sorted Dynamic Array $$nlogn$$ $$n$$ $1_{(a)}$ $n^2$ Y Insertion Sort
Set AVL Tree $$nlogn$$ $$logn$$ $$logn$$ $$nlogn$$ N AVL Sort
Goal $$n$$ $logn_{(a)}$ $logn_{(a)}$ $$nlogn$$ Y Heap Sort

### Priority Queue: Set AVL Tree

• Set AVL trees support insert(x), find_min(), find_max(), delete_min(), and delete_max() in $O(log n)$ time per operation
• So priority queue sort runs in $O(n log n)$ time
• This is (essentially) AVL sort from Lecture 7
• Can speed up find_min() and find_max() to $O(1)$ time via subtree augmentation
• But this data structure is complicated and resulting sort is not in-place
• Is there a simpler data structure for just priority queue, and in-place $O(n lg n)$ sort? YES, binary heap and heap sort
• Essentially implement a Set data structure on top of a Sequence data structure (array), using what we learned about binary trees

### Priority Queue: Array

• Store elements in an unordered dynamic array
• insert(x): append x to end in amortized $O(1)$ time
• delete_max(): find max in $O(n)$, swap max to the end an d remove
• insert is quick, but delete_max is slow
• Priority queue sort is selection sort! (plus some copying)
class PQArray(PriorityQueue):
def delete_max(self):  # O(n)
n, A, m = len(self.A), self.A, 0
for i in range(1, n):
m = i if A[m].key < A[i].key else m
A[m], A[n] = A[n], A[m]
return super().delete_max()  # pop from end

We use *args to allow insert to take one argument (as makes sense now) or zero arguments; we will need the latter functionality when making the priority queues in-place.

### Priority Queue: Sorted Array

• Store elements in a sorted dynamic array
• insert(x): append x to end, swap down to sorted position in $O(n)$ time
•  delete_max(): delete from end in $O(1)$ amortized
• delete_max is quick, but insert is slow
• Priority queue sort is insertion sort! (plus some copying)
• Can we find a compromise between these two array priority queue extremes?
class PQSortedArray(PriorityQueue):
def insert(self, x=None):
if x is not None: super().insert(x)
i, A = len(self.A) - 1, self.A
while 0 < i and A[i+1].key < A[i].key:
A[i+1], A[i] = A[i], A[i+1]
i -= 1

### Array as a Complete Binary Tree

• Idea: interpret an array as a complete binary tree, with maximum $2^i$ nodes at depth $i$ except at the largest depth, where all nodes are left-aligned
• Equivalently, complete tree is filled densely in reading order: root to leaves, left to right
• Perspective: bijection between arrays and complete binary trees
• Height of complete tree perspective of array of $$n$$ item is $\lceil log n \rceil$, so balanced binary tree

### Implicit Complete Tree

• Complete binary tree structure can be implicit instead of storing pointers
• Root is at index 0
• Compute neighbors by index arithmetic:
\begin{align} left(i) &=2i+1 \\ right(i) &=2i+2 \\ parent(i) &= \lfloor \frac{i-1}{2} \rfloor \end{align}

### Binary Heaps

• Idea: keep larger elements higher in tree, but only locally
• Max-Heap Property at node $i: Q[i] ≥ Q[j] for j ∈ {left(i),right(i)}$
• Max-heap is an array satisfying max-heap property at all nodes
• Claim: In a max-heap, every node i satisfies $Q[i] ≥ Q[j]$ for all nodes $j$ in subtree(i)
• Proof:
• Induction on $d = depth(j) - depth(i)$
• Base case: $d = 0$ implies $i = j$ implies $Q[i] ≥ Q[j]$ (in fact, equal)
• $depth(parent(j)) - depth(i) = d - 1 < d$, so $Q[i] ≥ Q[parent(j)]$ by induction
• $Q[parent(j)] ≥ Q[j]$ by Max-Heap Property at parent(j)
• In particular, max item is at root of max-heap
def parent(i):
p = (i - 1) // 2
return p if 0 < i else i

def left(i, n):
l = 2 * i + 1
return l if l < n else i

def right(i, n):
r = 2 * i + 2
return r if r < n else i

### Heap Insert

• Append new item $x$ to end of array in $O(1)$ amortized, making it next leaf $i$ in reading order
• max_heapify_up(i): swap with parent until Max-Heap Property
• Check whether $Q[parent(i)] \geq Q[i]$ (part of Max-Heap Property at parent(i))
• If not, swap items $Q[i]$ and $Q[parent(i)]$, and recursively max_heapify_up(parent(i))
• Correctness:
• Max-Heap Property guarantees all nodes $\geq$ descendants, except $Q[i]$ might be > some of its ancestors (unless $i$ is the root, so we’re done)
• If swap is necessary, same guarantee is true with $Q[parent(i)]$ instead of $Q[i]$
• Running time: height of tree, so $\Theta(\log n)$

### Heap Delete Max

• Can only easily remote last element from dynamic array, but max key is in root of tree
• So swap item at root node $i = 0$ with last item at node $n-1$ in heap array
• max_heapify_down(i): swap root with larger child until Max-Heap Property
• Check whether $Q[i] \geq Q[j] \ for \ j \in {left(i), right(i)}$ (Max-Heap Property at i)
• If not, swap $Q[i]$ with $Q[j]$ for child $j \ \in {left(i), right(i)}$ with maximum key, and recursively max_heapify_down(j)
• Correctness:
• Max-Heap Property guarantees all nodes $\geq$ descendants, except $Q[i]$ might be < some descendants (unless $i$ is a leaf, so we’re done)
• If swap is necessary, same guarantee is true with $Q[j]$ instead of $Q[i]$
• Running time: height of tree, so $\Theta(\log n)$
class PQHeap(PriorityQueue):
def insert(self, x=None):
if x: super().insert(x)
n, A = self.n, self.A
max_heapify_up(A, n, n-1)

def delete_max(self):
n, A = self.n, self.A
A[0], A[n] = A[n], A[0]
max_heapify_down(A, n, 0)
return super().delete_max()

def max_heapify_up(A, n, c):
p = parent(c)
if A[p].key < A[c].key:
A[c], A[p] = A[p], A[c]
max_heapify_up(A, n, p)

def max_heapify_down(A, n, p):
l, r = left(p, n), right(p, n)
c = l if A[r].key < A[l].key else r
if A[p].key < A[c].key:
A[c], A[p] = A[p], A[c]
max_heapify_down(A, n, c)

### Heap Sort

• Plugging max-heap into priority queue sort gives us a new sorting algorithm
• Running time is $O(n log n)$ because each insert and delete_max takes $O(log n)$
• But often include two improvements to this sorting algorithm:

### In-place Priority Queue Sort

•  Max-heap $Q$ is a prefix of a larger array $A$, remember how many items $Q$ belong to heap
•  $Q$ is initially zero, eventually $A$ (after inserts), then zero again (after deletes)
•  insert() absorbs next item in array at index $Q$ into heap
•  delete_max() moves max item to end, then abandons it by decrementing $Q$
• In-place priority queue sort with Array is exactly Selection Sort
• In-place priority queue sort with Sorted Array is exactly Insertion Sort
• In-place priority queue sort with binary Max Heap is Heap Sort
class PriorityQueue:
def __init__(self, A):
self.n, self.A = 0, A

def insert(self):
assert self.n < len(self.A)
self.n += 1

def delete_max(self):
assert self.n >= 1
self.n -= 1

@classmethod
def sort(Queue, A):
pq = Queue(A)
for i in range(len(A)): pq.insert()
for i in range(len(A)): pq.delete_max()
return pq.A

### Linear Build Heap

• Inserting $$n$$ items into heap call max_heapify_up(i) for $i$ from $0$ to $n-1$ (root down): $$worst-case \ swaps \approx \Sigma_{i=0}^{n-1}depth(i)=\Sigma_{i=0}^{n-1}logi=log(n!)\geq(n/2)log(n/2)=\Omega(nlogn)$$

• Idea! Treat full array as a complete binary tree from start, then max_heapify_down(i) for i from $n-1$to $0$ (leaves up): $$worst-case \ swaps \approx \Sigma_{i=0}^{n-1}height(i)=\Sigma_{i=0}^{n-1}(logn -logi)=log(\frac{n^n}{n!})=\Theta(log(\frac{n^n}{\sqrt{n}(n/e)^n}))=O(n)$$

• So can build heap in $O(n)$ time

• (Doesn’t speed up $O(nlogn)$ performance of heap sort)

def build_max_heap(A):
n = len(A)
for i in range(n // 2, -1, -1):
max_heapify_down(A, n, i)

### Sequence AVL Tree Priority Queue

• Where else have we seen linear build time for an otherwise logarithmic data structure? Sequence AVL Tree!
• Store items of priority queue in Sequence AVL Tree in arbitrary order (insertion order)
• Maintain max (and/or min) augmentation:
• node.max = pointer to node in subtree of node with maximum key
• This is a subtree property, so constant factor overhead to maintain
• find_min() and find _max() in $O(1)$ time
• delete_min() and delete_max() in $O(log n)$ time
• build(A) in $O(n)$ time
• Same bounds as binary heaps (and more)

### Set vs. Multiset

• While our Set interface assumes no duplicate keys, we can use these Sets to implement Multisets that allow items with duplicate keys:
• Each item in the Set is a Sequence (e.g., linked list) storing the Multiset items with the same key, which is the key of the Sequence
• In fact, without this reduction, binary heaps and AVL trees work directly for duplicate-key items (where e.g. delete_max deletes some item of maximum key), taking care to use ≤ constraints (instead of < in Set AVL Trees)

## Notes

### Graph Applications

• Why? Graphs are everywhere!
• any network system has direct connection to graphs
• e.g., road networks, computer networks, social networks
• the state space of any discrete system can be represented by a transition graph
• e.g., puzzle & games like Chess, Tetris, Rubik’s cube
• e.g., application workflows, specifications

### Graph Definitions

• Graph $G = (V, E)$ is a set of vertices $$V$$ and a set of pairs of vertices $E \subseteq V \times X$
• Directed edges are ordered pairs, e.g., $(u, v)$ for $u, v \in V$
• Undirected edges are unordered pairs, ${u, v}$ for $u, v \in V$ i.e., $(u, v)$ and $(v, u)$
• In this class, we assume all graphs are simple:
• edges are distinct, e.g., $(u, v)$ only occurs once in E (though $(v, u)$ may appear), and
• edges are pairs of distinct vertices, e.g., $u \neq v$ for all $(u, v) ∈ E$
• Simple implies $\vert E \vert = O(\vert V \vert ^2)$, since $\vert E \vert ≤ \big(\frac{\vert V \vert }{2}\big)$for undirected, $≤ 2 \big(\frac{\vert V \vert }{2}\big)$ for directed

•  The outgoing neighbor set of $u ∈ V$ is $Adj^+(u) = {v ∈ V (u, v) ∈ E}$
•  The incoming neighbor set of $u ∈ V$ is $Adj^-(u) = {v ∈ V (v, u) ∈ E}$
•  The out-degree of a vertex $u ∈ V$ is $deg^+(u) = Adj^+(u)$
•  The in-degree of a vertex $u ∈ V$ is $deg^-(u) = Adj^-(u)$
• For undirected graphs, $Adj^-(u) = Adj^+(u)$ and $deg^-(u) = deg^+(u)$
• Dropping superscript defaults to outgoing, i.e., $Adj(u) = Adj^+(u)$ and $deg(u) = deg^+(u)$

### Graph Representations

• To store a graph $G = (V, E)$, we need to store the outgoing edges $Adj(u)$ for all $u ∈ V$
• First, need a Set data structure $Adj$ to map $$u$$ to $Adj(u)$
• Then for each $$u$$, need to store $Adj(u)$ in another data structure called an adjacency list
• Common to use direct access array or hash table for $Adj$, since want lookup fast by vertex
• Common to use array or linked list for each $Adj(u)$ since usually only iteration is needed
• For the common representations, $Adj$ has size $Θ(\vert V \vert )$, while each $Adj(u)$ has size $Θ(deg(u))$
•  Since $\Sigma_{u∈V} deg(u) ≤ 2\vert E \vert$ by handshaking lemma, graph storable in $Θ( V + \vert E \vert )$ space
•  Thus, for algorithms on graphs, linear time will mean $Θ( V + \vert E \vert )$ (linear in size of graph)

### Paths

• A path is a sequence of vertices $p = (v_1, v_2, . . . , v_k)$ where $(v_i, v_{i+1}) ∈ E$ for all $1 ≤ i < k$.
• A path is simple if it does not repeat vertices
• The length $\mathcal{l}(p)$ of a path $p$ is the number of edges in the path
• The distance $δ(u, v)$ from $u ∈ V$ to $v ∈ V$ is the minimum length of any path from $$u$$ to $$v$$
• i.e., the length of a shortest path from $$u$$ to $$v$$
• (by convention, $δ(u, v) = ∞$ if $$u$$ is not connected to $$v$$)

### Graph Path Problems

• There are many problems you might want to solve concerning paths in a graph:
• SINGLE_PAIR_REACHABILITY(G, s, t): is there a path in $G$ from $s ∈ V$ to $t ∈ V$ ?
• SINGLE_PAIR_SHORTEST_PATH(G, s, t): return distance $δ(s,t)$, and a shortest path in $G = (V, E)$ from $s ∈ V$ to $t ∈ V$
• SINGLE_SOURCE_SHORTEST_PATHS(G, s): return $δ(s, v)$ for all $v ∈ V$ , and a shortest-path tree containing a shortest path from $s$ to every $v ∈ V$ (defined below)
• Each problem above is at least as hard as every problem above it (i.e., you can use a black-box that solves a lower problem to solve any higher problem)
• We won’t show algorithms to solve all of these problems
•  Instead, show one algorithm that solves the hardest in $O( V + \vert E \vert )$ time!

### Shortest Paths Tree

• How to return a shortest path from source vertex $s$ for every vertex in graph?
• Many paths could have length $Ω(\vert V \vert )$, so returning every path could require $Ω(\vert V \vert ^2)$ time
• Instead, for all $v ∈ V$ , store its parent $P(v)$: second to last vertex on a shortest path from $s$
• Let $P(s)$ be null (no second to last vertex on shortest path from $s$ to $s$)
• Set of parents comprise a $shortest paths tree$ with $O(|V |)$ size! (i.e., reversed shortest paths back to $s$ from every vertex reachable from $s$)

• How to compute $δ(s, v)$ and $P(v)$ for all $v ∈ V$ ?
• Store $δ(s, v)$ and $P(v)$ in Set data structures mapping vertices $$v$$ to distance and parent
• (If no path from $s$ to $$v$$, do not store $$v$$ in $P$ and set $δ(s, v)$ to $∞$)
• Idea! Explore graph nodes in increasing order of distance
•  Goal: Compute level sets $L_i = {v v ∈ V and d(s, v) = i}$ (i.e., all vertices at distance $i$)
• Claim: Every vertex $v ∈ L_i$ must be adjacent to a vertex $u ∈ L_{i-1}$ (i.e., $v ∈ Adj(u)$)
• Claim: No vertex that is in $L_j$ for some $j < i$, appears in $L_i$
• Invariant: $δ(s, v)$ and $P(v)$ have been computed correctly for all $$v$$ in any $L_j$ for $j < i$
• Base case $(i = 1): L_0 = {s}, δ(s, s) = 0, P(s) = None$
• Inductive Step: To compute $L_i$:
• for every vertex $$u$$ in $L_{i-1}$:
• for every vertex $v ∈ Adj(u)$ that does not appear in any $L_j$ for $j < i$:
• add $$v$$ to $L_i$, set $δ(s, v) = i$, and set $P(v) = u$
• Repeatedly compute $L_i$ from $L_j$ for $j < i$ for increasing $i$ until $L_i$ is the empty set
• Set $δ(s, v) = ∞$ for any $v ∈ V$ for which $δ(s, v)$ was not set
• Breadth-first search correctly computes all $δ(s, v)$ and $P(v)$ by induction
• Running time analysis:
•  Store each $L_i$ in data structure with $Θ( L_i )$ time iteration and $O(1)$ time insertion (i.e., in a dynamic array or linked list)
• Checking for a vertex $$v$$ in any $L_j$ for $j < i$ can be done by checking for $$v$$ in $P$
• Maintain $δ$ and $P$ in Set data structures supporting dictionary ops in $O(1)$ time (i.e., direct access array or hash table)
• Algorithm adds each vertex $$u$$ to $≤ 1$ level and spends $O(1)$ time for each $v ∈ Adj(u)$
• Work upper bounded by $O(1) × \Sigma_{u\in V} deg(u) = O(\vert E \vert )$ by handshake lemma
•  Spend $Θ( V )$ at end to assign $δ(s, v)$ for vertices $v ∈ V$ not reachable from $s$So
•  breadth-first search runs in linear time! $O( V + \vert E \vert )$
parent = [None for v in adj]
parent[s] = s
levels = [[s]]
while 0 < len(levels[-1]):
level = []
for u in levels[-1]:
if parent[v] is None:
parent[v] = u
level.append(v)
levels.append(level)
return parents

if parent[t] is None: return None
i = t
path = [t]
while i != s:
i = parent[i]
path.append(i)
return reversed(path)

# 10 Depth-First Search

## Notes

### Depth-First Search (DFS)

• Searches a graph from a vertex $s$, similar to BFS
• Solves Single Source Reachability, not Single Source Shortest Paths. Useful for solving other problems (later)!
• Return (not necessarily shortest) parent tree of parent pointers back to $s$.
• Idea! Visit outgoing adjacencies recursively, but never revisit a vertex
• i.e., follow any path until you get stuck, backtrack until finding an unexplored path to explore
• $P(s) = None$, then run $visit(s)$, where
• visit(u)
• for every $v \in Adj(u)$ that does not appear in $P$:
• set $P(v)= u$ and recursively call visit(v)
• (DFS finishes visiting vertex $$u$$, for use later!)
if parent is None:
parent = [None for v in adj]
parent[s] = s
order = []
if parent[v] is None:
parent[v] = s
order.append(s)
return parent, order

### Correctness

• Claim: DFS visits $$v$$ and correctly sets $P(v)$ for every vertex $$v$$ reachable from $s$
• Proof: induct on $k$, for claim on only vertices within distance $k$ from $s$
• Base case $(k = 0): P(s)$ is set correctly for $s$ and $s$ is visited
• Inductive step: Consider vertex $$v$$ with $δ(s, v) = k’ + 1$
• Consider vertex $$u$$, the second to last vertex on some shortest path from $s$ to $$v$$
• By induction, since $δ(s, u) = k’$ , DFS visits $$u$$ and sets $P(u)$ correctly
• While visiting $$u$$, DFS considers $v ∈ Adj(u)$
• Either $$v$$ is in $P$, so has already been visited, or $$v$$ will be visited while visiting $$u$$
• In either case, $$v$$ will be visited by DFS and will be added correctly to $P$ $\square$

### Running Time

• Algorithm visits each vertex $$u$$ at most once and spends $O(1)$ time for each $v ∈ Adj(u)$
• Work upper bounded by $O(1) × \Sigma_{u \in V}deg(u) = O(\vert E \vert )$
• Unlike BFS, not returning a distance for each vertex, so DFS runs in $O(\vert E \vert )$ time

### Full-BFS and Full-DFS

• Suppose want to explore entire graph, not just vertices reachable from one vertex
• Idea! Repeat a graph search algorithm $A$ on any unvisited vertex
• Repeat the following until all vertices have been visited:
• Choose an arbitrary unvisited vertex $s$, use $A$ to explore all vertices reachable from $s$
• We call this algorithm Full-A, specifically Full-BFS or Full-DFS if A is BFS or DFS
• Visits every vertex once, so both Full-BFS and Full-DFS run in $O(\vert V \vert + \vert E \vert )$ time
parent = [None for v in adj]
order = []
if parent[v] is None:
parent[v] = v
return parent, order

### DFS Edge Classification

• Consider a graph edge from vertex $$u$$ to $$v$$, we call the edge a tree edge if the edge is part of the DFS tree (i.e. $parent[v] = u$)
• Otherwise, the edge from $$u$$ to $$v$$ is not a tree edge, and is either:
• a back edge - $$u$$ is a descendant of $$v$$
• a forward edge - $$v$$ is a descendant of $$u$$
• a cross edge - neither are descendants of each other

### Graph Connectivity

• An undirected graph is connected if there is a path connecting every pair of vertices
• In a directed graph, vertex $$u$$ may be reachable from $$v$$, but $v$ may not be reachable from $$u$$
• Connectivity is more complicated for directed graphs (we won’t discuss in this class)
• Connectivity(G): is undirected graph G connected?
• Connected_Components(G): given undirected graph $G = (V, E)$, return partition of $V$ into subsets $V_i ⊆ V$ (connected components) where each $V_i$ is connected in $G$ and there are no edges between vertices from different connected components
• Consider a graph algorithm $A$ that solves Single Source Reachability
• Claim: $A$ can be used to solve Connected Components
• Proof: Run Full-$A$. For each run of $A$, put visited vertices in a connected component $\square$

### Topological Sort

• A Directed Acyclic Graph (DAG) is a directed graph that contains no directed cycle
• A Topological Order of a graph $G = (V, E)$ is an ordering $f$ on the vertices such that: every $edge (u, v) ∈ E$ satisfies $f(u) < f(v)$
• Exercise: Prove that a directed graph admits a topological ordering if and only if it is a DAG
• How to find a topological order?
• A Finishing Order is the order in which a Full-DFS finishes visiting each vertex in G
• Claim: If $G = (V, E)$ is a DAG, the reverse of a finishing order is a topological order
• Proof: Need to prove, for every $edge (u, v) ∈ E$ that $$u$$ is ordered before $v$, i.e., the visit to $v$ finishes before visiting $$u$$. Two cases:
• If $$u$$ visited before $v$:
• Before visit to $$u$$ finishes, will visit $v$ (via $(u, v)$ or otherwise)
• Thus the visit to $v$ finishes before visiting $$u$$
• If $v$ visited before $$u$$:
• $$u$$ can’t be reached from $v$ since graph is acyclic
• Thus the visit to $v$ finishes before visiting $$u$$

### Cycle Detection

• Full-DFS will find a topological order if a graph $G = (V, E)$ is acyclic
• If reverse finishing order for Full-DFS is not a topological order, then $G$ must contain a cycle
• Check if $G$ is acyclic: for each edge $(u, v)$, check if $v$ is before $$u$$ in reverse finishing order
• Can be done in $O(\vert E \vert )$ time via a hash table or direct access array
• To return such a cycle, maintain the set of ancestors along the path back to $s$ in Full-DFS
• Claim: If $G$ contains a cycle, Full-DFS will traverse an edge from $v$ to an ancestor of $v$
• Proof: Consider a cycle $(v_0, v_1, . . . , v_k, v_0)$ in $G$
• Without loss of generality, let $v_0$ be the first vertex visited by Full-DFS on the cycle
• For each $v_i$, before visit to $v_i$ finishes, will visit $v_{i+1}$ and finish
• Will consider edge $(v_i, v_{i+1})$, and if $v_{i+1}$ has not been visited, it will be visited now
• Thus, before visit to $v_0$ finishes, will visit $v_k$ (for the first time, by $v_0$ assumption)
• So, before visit to $v_k$ finishes, will consider $(v_k, v_0)$, where $v_0$ is an ancestor of $v_k$ $\square$

# 11 Weighted Shortest Paths

## Notes

### Weighted Graphs

• A weighted graph is a graph $G = (V, E)$ together with a weight function $w: E \rightarrow Z$

• i.e., assign each edge $e = (u, v) \in E$ an integer weight: $w(e) = w(u, v)$

• Many applications for edge weights in a graph:

• latency in network connections
• strength of a relationship in a social network
• Two common ways to represent weights computationlly:

• Inside graph representation: store edge weight with each vertex in adjacency lists
• Store separate Set data structure mapping each edge to its weight
• We assume a representation that allows querying the weight of an edge in $O(1)$ time

• Examples

### Weighted Paths

• The weight $w(\pi)$ of a path $\pi$ in a weighted graph is the sum of weights of edges in the path
• The (weighted) shortest path from $s \in V$ to $t \in V$ is path of minimum weight from $s$ to $t$
•  $\delta(s, t) = inf{w(\pi) \ path \ \pi \ from \ s \ to \ t}$ is the shortest-path weight from $s$ to $t$
• (Often use “distance” for shortest -path weight in weighted graphs, not number of edges)
• As with unweighted graphs:
• $\delta(s,t) = \inf$ if no path from $s$ to $t$
• Subpaths of shortest paths are shortest paths (or else could splice in a shortest path)
• Why infimum not minimum? Possible that no finite-length minimum-weight path exists
• When? Can occur if there is a negative-weight cycle in the graph, Ex: $(b,f,g,c,b)$ in $G1$

• A negative-weight cycle is a path $\pi$ starting and ending at same vertex $w(\pi) < 0$
• $\delta (s, t) = - \infty$ if there is a path from $s$ to $t$ through a vertex on a negative-weight cycle
• If this occurs, don’t want a shortest path, but may want the negative-weight cycle

### Weighted Shortest Paths Algorithms

• Already know one algorithm: Breadth-First Search! Runs in $O(\vert V \vert + \vert E \vert )$ time when, e.g.:
• graph has positive weights, and all weights are the same
• graph has positive weights, and sum of all weights at most $O(\vert V \vert + \vert E \vert )$
• For general weighted graphs, we don’t know how to solve SSSP in $O(\vert V \vert + \vert E \vert )$ time
• But if your graph is a Directed Acyclic Graph you can!
Restrictions   SSSP Algorithm
Graph Weights Name Running Time $O(\cdot)$
General Unweighted BFS $\vert V \vert + \vert E \vert$
DAG Any DAG Relaxation $\vert V \vert + \vert E \vert$
General Any Bellman-Ford $\vert V \vert \cdot\vert E \vert$
General Non-negative Dijkstra $\vert V \vert log\vert V \vert + \vert E \vert$

### Shortest-Paths Tree

• For BFS, we kept track of parent pointers during search. Alternatively, compute them after!
• If know $\delta(s, v)$ for all vertices $v\in V$, can construct shortest-path tree in $O(\vert V \vert + \vert E \vert )$ time
• For weighted shortest paths from $s$, only need parent pointers for vertices $v$ with finite $\delta(s, v)$
• Initialize empty $P$ and set $P(s) = None$
• For each vertex $u\in V$ where $\delta(s, u)$ is finite:
• For each outgoing neighbor $v \in Adj^+(u)$:
• If $P(v)$ not assigned and $\delta(s, v) = \delta(s, u) + w(u, v)$
• There exits a shortest path through edge $(u, v)$, so set $P(v) = u$
• Parent pointers may traverse cycles of zero weight. Mark each vertex in such a cycle.
• For each unmarked vertex $u\in V$ (including vertices later marked):
• For each $v\in Adj^+(u)$ where $v$ is marked and $\delta(s, v) = \delta(s, u) + w(u, v)$
• Unmark vertices in cycle containing $v$ by traversing parent pointers from $v$
• Set $P(v) = u$, breaking the cycle

### Relaxation

• A relaxation algorithm searches for a solution to an optimization problem by starting with a solution that is not optimal, then iteratively improves the solution until it becomes an optimal solution to the original problem.
def try_to_relax(adj, w, d, parent, u, v):
if d[v] > d[u] + w(u, v):
d[v] = d[u] + w(u, v)
parent[v] = u

d = [float('inf') for _ in adj]
parent = [None for _ in adj]
d[s], parent[s] = 0, s
(u, v) = get_relaxable_edge(adj, w, d)
try_to_relax(adj, w, d, parent, u, v)
return d, parent

### DAG Relaxation

• Idea! Maintain a distance estimate $d(s, v)$ (initially $\infty$ ) for each vertex $v\in V$, that always upper bounds true distance $\delta(s, v)$, then gradually lowers until $d(s, v) = \delta(s, v)$
• When do we lower? When an edge violates the triangle inequality!
• Triangle Inequality: the shortest-path weight from $$u$$ to $v$ cannot be greater than the shortest path from $$u$$ to $v$ through another vertex $x$, i.e., $\delta(u, v) \neq \delta(u, x) + \delta(x, v)$ for all $u, v, x \in V$
• If $d(s, v) > d(s, u) + w(u, v)$ for some edge $u, v$, then triangle inequality is violated :(
• Fix by lowering $d(s, v)$ to $d(s, u) + w(u, v)$, i.e., relax $(u, v)$ to satisfy violated constraint
• Claim: Relaxation is safe: maintains that each $d(s, v)$ is weight of a path to $v$ (or $\infty$) $\forall v \in V$
• Proof: Assume $d(s, v’)$ is weight of a path (or $\infty$) for $\forall v’ \in V$. Relaxing some edge $(u, v)$ sets $d(s, v)$ to $d(s, u) + w(u, v)$, which is the weight of a path from $s$ to $v$ through $$u$$ $\qquad \square$
• Set $d(s, v) = \infty$ for all $v \in V$, then set $d(s, s) = 0$
• Process each vertex $$u$$ in a topological sort order of G:
• For each outgoing neighbor $v \in Adj^+(u)$:
• If $d(s, v) > d(s, u) + w(u, v)$
• relax edge $(u, v)$, i.e., set $d(s, v) = d(s, u) + w(u, v)$
d = [float('inf') for _ in adj]
parent = [None for _ in adj]
d[s], parent[s] = 0, s
for u in order:
try_to_relax(adj, w, d, parent, u, v)
return d, parent

### Correctness

• Claim: At end of DAG Relaxation: $d(s, v) = \delta(s, v)$ for all $v\in V$
• Proof: Induct on $k$: $d(s, v) = \delta(s, v)$ for all $v$ in first $k$ vertices in topological order
• Base case: Vertex $s$ and every vertex before $s$ in topological order satisfies claim at start
• Inductive Step: Assume claim holds for first $k’$ vertices, let $v$ be the $(k’+1)^{th}$
• Consider a shortest path from $s$ to $v$, and let $$u$$ be the vertex preceding $v$ on path
• $$u$$ occurs before $v$ in topological order, so $d(s, u) = \delta(s, u)$ by induction
• When processing $u, d(s, v)$ is set to be no larger than $\delta(s, u) + w(u, v) = \delta(s, v)$
• But $d(s, v) \geq \delta(s, v)$ since relaxation is safe, so $d(s, v) = \delta(s, v)$
• Alternatively:
•  For any vertex $v$, DAG relaxation sets $d(s, v) = min { d(s,u) + w(u, v) u \in dj^-(v) }$
• Shortest path to $v$ must pass through some incoming neighbor $$u$$ of $v$
• So if $d(s, u) = \delta(s, u)$ for all $u \in Adj^-(v)$ by induction, then $d(s, v) = \delta(s, v)$

### Running Time

• Initialization takes $O(\vert V \vert )$ time, and Topological Sort takes $O(\vert V \vert + \vert E \vert )$ time
• Additional work upper bounded by $O(1) \times \Sigma_{u\in V}deg^+(u) = O(\vert E \vert )$
• Total running time is linear, $O(\vert V \vert + \vert E \vert )$

# 12 Bellman-Ford

## Notes

### Simple Shortest Paths

• If graph contains cycles and negative weights, might contain negative-weight cycles :(
• If graph does not contain negative-weight cycles, shortest paths are simple!
• Claim 1: If $\delta(s, v)$ is finite, there exists a shortest path to $v$ that is simple
• Suppose no simple shortest path: let $\pi$ be a shortest path with fewest vertices
• $\pi$ not simple, so exists cycle $C$ in $\pi$; $C$ has non-negative weight (or else $\delta(s,v) = -\infty$)
• Removing $C$ form $\pi$ forms path $\pi’$ with fewest vertices and weight $w(\pi’) \leq w(\pi)$
• Since simple paths cannot repeat vertices, finite shortest paths contain at most $\vert V \vert - 1$ edges

### Negative Cycle Witness

• k-Edge Distance $\delta_k(s, v)$: the minimum weight of any path from $s$ to $v$ using $\leq k$ edges
• Idea! Compute $\delta_{\vert V \vert -1}(s, v)$ and $\delta_{\vert V \vert }(s, v)$ for all $v \in V$
• If $\delta(s, v)\neq - \infty, \delta(s, v) = \delta_{\vert V \vert -1}(s,v)$, since a shortest path is simple (or nonexistent)
• If $\delta_{\vert V \vert }(s,v) < \delta_{\vert V \vert - 1}(s, v)$
• there exists a shorter non-simple path to $v$, so $\delta_{\vert V \vert }(s, v) = -\infty$
• call $v$ a (negative cycle) witness
• However, there may be vertices with $-\infty$ shortest-path weight that are not witness
• Claim 2: if $\delta(s,v) = - \infty$, then $v$ is reachable from a witness
• Proof: Suffices to prove: every negative-weight cycle reachable from s contains a witness
• Consider a negative-weight cycle $C$ reachable from $s$
• For $v ∈ C$, let $v’ ∈ C$ denote $v$’s predecessor in $C$, where $\Sigma_{v∈C} w(v’ , v) < 0$
• Then $\delta_{\vert V \vert }(s, v) \leq \delta_{\vert V \vert - 1}(s, v’) + w(v’, v)$ (RHS weight of some path on $\leq \vert V \vert$ vertices)
• so $\Sigma_{v\in C}\delta_{\vert V \vert }(s, v) \leq \Sigma_{v\in C}\delta_{\vert V \vert -1}(s, v’) + \Sigma_{v\in C}w(v’, v) < \Sigma_{v\in C}\delta_{\vert V \vert - 1}(s, v’)$
• If $C$ contains no witness, $\delta_{\vert V \vert }(s, v) \geq \delta_{\vert V \vert - 1}(s, v)$ for all $v \in C$, a contradiction $\qquad \square$

### Bellman-Ford

• Idea! Use graph duplication: make multiple copies (or levels) of the graph
• $\vert V \vert + 1$ levels: vertex $v_k$ in level $k$ represents reaching vertex $v$ from $s$ using $\leq k$ edges
• If edges only increase in level, resulting graph is a DAG!
• Construct new DAG $G’ = (V’, E’)$ from $G = (V, E)$
• $G’$ has $\vert V \vert (\vert V \vert + 1)$ vertices $v_k$ for all $v\in V$ and $k \in {0, …, \vert V \vert }$
• $G’$ has $\vert V \vert (\vert V \vert + \vert E \vert )$ edges:
• $$\vert V \vert$$ edges $(v_{k-1}, v_k)$ for $k \in {1, …, \vert V \vert }$ of weight zero for each $v \in V$
• $$\vert V \vert$$ edges $(u_{k-1}, v_k)$ for $k \in {1, …, \vert V \vert }$ of weight $w(u, v)$ for each $(u, v) \in E$
• Run DAG Relaxation on $G’$ from $s_0$ to compute $\delta(s_0, v_k)$ for all $v_k \in V’$
•  For each vertex: set $d(s, v) = \delta(s_0, v_{ v-1 })$
• For each witness $u \in V$ where $\delta (s_0, u_{\vert V \vert }) < \delta(s_0, u_{\vert V \vert - 1})$
• For each vertex $v$ reachable from $$u$$ in $G$:
• set $d(s, v) = -\infty$

INF = float('inf')

# initialization
d = [INF for _ in adj]
parent = [None for _ in adj]
d[s], parent[s] = 0, s

# construct shortest paths in rounds
for k in range(V-1):
for u in range(V):
try_to_relax(adj, w, d, parent, u, v)

# check for negative weight cycles accessible from s
for u in range(V):
if d[v] > d[u] + w(u, v):
raise Exception("found a negative weight in cycle!")
return d, parent

TBC

### Running Time

•  $G’$ has size $O( V ( V + \vert E \vert ))$ and can be constructed in as much time
• Running DAG Relaxation on $G’$takes linear time in the size of $G’$
• Does $O(1)$ work for each vertex reachable from a witness
• Finding reachability of a witness takes $O(\vert E \vert )$ time, with at most $O(\vert V \vert )$ witnesses: $O(\vert V \vert \vert E \vert )$
• (Alternatively, connect super node $x$ to witnesses via 0-weight edges, linear search from $x$)
• Pruning $G$ at start to only subgraph reachable from $s$ yields $O(\vert V \vert \vert E \vert )$ time algorithm

# 13 Dijkstra’s Algorithm

## Notes

### Non-negative Edge Weights

• Idea! Generalize BFS approach to weighted graphs:
• Grow a sphere centered at source s
• Repeatedly explore closer vertices before further ones
• But how to explore closer vertices if you don’t know distances beforehand? :(
• Observation 1: If weights non-negative, monotonic distance increasing along shortest paths
• i.e., if vertex $$u$$ appears on a shortest path from $s$ to $v$, then $\delta(s, u) \leq \delta(s, v)$
• Let $V_x \subset V$ be the subset of vertices reachable within distance $\leq x$ from $s$
• If $v \in V_x$ then any shortest path from $s$ to $v$ only contains vertices from $V_x$
• Perhaps grow $V_x$ one vertex at a time! (but growing for every $x$ is slow if weights large)
• Observation 2: Can solve SSSP fast if given order of vertices in increasing distance from $s$
• Remove edges that go against this order (since cannot participate in shortest paths)
• May still have cycles if zero-weight edges: repeatedly collapse into single vertices
• Compute $\delta(s, v)$ for each $v\in V$ using DAG relaxation in $O(\vert V \vert + \vert E \vert )$ time

### Dijkstra’s Algorithm

• Idea! Relax edges from each vertex in increasing order of distance from source $s$

• Idea! Efficiently find next vertex in the order using a data structure

• Changeable Priority Queue $Q$ on items with keys and unique IDs, supporting operations:

Operation Specification
Q.build(X) initialize $Q$ with items in iterator X
Q.delete_min() remove an item with minimum key
Q.decrease_key(id, k) find stored item with ID id and change key to k
• Implement by cross-linking a Priority Queue $Q’$ and a Dictionary $D$ mapping IDs into $Q’$

• Assume vertex IDs are integers from $0$ to $\vert V \vert -1$ so can use a direct access array for D

• For brevity, say item x is the tuple $(x.id, x.key)$

• Set $d(s, v) = \infty$ for all $v \in V$, then set $d(s,s) = 0$

• Build changeable priority queue $Q$ with an item $(v, d(s, v))$ for each vertex $v \in V$

• For vertex $v$ in outgoing adjacencies $Adj^+(u)$:
• If $d(s, v)>d(s,u) + w(u, v)$:
• Relax edge $(u, v)$, i.e., set $d(s, v) = d(s, u) + w(u, v)$
• Decrease the key of $v$ in $Q$ to new estimate $d(s, v)$

d = [INF for _ in adj]
parent = [None for _ in adj]
d[s], parent[s] = 0, s
Q = PriorityQueue()
for v in range(V):
Q.insert(v, d[v])   # label and key
for _ in range(V):
u = Q.extract_min() # get label for item with min key
try_to_relax(adj, w, d, parent, u, v)
Q.decrease_key(v, d[v]) # alter key for given label
return d, parent

class PriorityQueue:
def __init__(self):
self.A = {}

def insert(self, label, key):
self.A[label] = key

def extract_min(self):
min_label = None
for label in self.A:
if (min_label is None) or (self.A[label] < self.A[min_label]):
min_label = label
del self.A[min_label]
return min_label

def decrease_key(self, label, key):
if (label in self.A) and (key < self.A[label]):
self.A[label] = key

class Item:
def __init__(self, label, key):
self.label, self.key = label, key

def __lt__(self, other):
return self.key < other.key

class PriorityQueue:
def __init__(self):
self.A = []
self.label_2_idx = dict()

def insert(self, label, key):
item = Item(label, key)
self.A.append(item)
idx= len(self.A) - 1
self.label_2_idx[label] = idx
heapq._siftdown(self.A, 0, idx)

def extract_min(self):
label = self.A[0].label
self.A[0], self.A[-1] = self.A[-1], self.A[0]
self.label_2_idx[self.A[0].label] = 0
self.label_2_idx.pop(self.A[-1].label)
self.A.pop()
if self.A: heapq._siftup(self.A, 0)
return label

def decrease_key(self, label, key):
if label in self.label_2_idx:
idx = self.label_2_idx[label]
if self.A[idx].key < key:
self.A[idx].key = key
heapq._siftdown(self.A, 0, idx)

### Correctness

• Claim: At end of Dijkstra’s algorithm $d(s, v) = \delta(s, v)$ for all $v \in V$

• Proof:

• If relaxation sets $d(s, v)$ to $\delta(s, v)$, then $d(s, v) = \delta(s, v)$ at the end of the algorithm

• Relaxation can only decrease estimates $d(s, v)$
• Relaxation is safe, i.e., maintains that each $d(s, v)$ is weight of a path to $v$ (or $\infty$)
• Suffices to show $d(s, v) = \delta(s, v)$ when vertex $v$ is removed from $Q$

• Proof by induction on first $k$ vertices removed from $Q$

• Base Case ($k$ = 1): $s$ is first vertex removed from $Q$, and $d(s, s) = 0 = δ(s, s)$

• Inductive Step: Assume true for $k < k’$ , consider $k’$ th vertex $v_0$ removed from $Q$

• Consider some shortest path $π$ from $s$ to $v’$, with $w(π) = δ(s, v’ )$

• Let $(x, y)$ be the first edge in $π$ where $y$ is not among first $k’ - 1$ (perhaps $y = v’$ )

• When $x$ was removed from $Q$, $d(s, x) = δ(s, x)$ by induction, so: \begin{align} d(s, y) &\leq \delta(s, x) + w(x, y) \\ &=\delta(s, y) \\ &\leq \delta(s, v') \\ &\leq d(s, v') \\ &\leq d(s, y) \\ \end{align}

• So $d(s, v’) = \delta(s, v’)$ as desired

### Running Time

• Count operations on changeable priority queue Q, assuming it contains n items:

Operation Time Occurrences in Dijkstra
Q.build(X) $B_n$ 1
Q.delete_min() $M_n$ |V|
Q.decrease_key(id, k) $D_n$ |E|
• Total running time is $O(B_{\vert V \vert } + \vert V \vert \cdot M_{\vert V \vert } + \vert E \vert \cdot D_{\vert V \vert })$

•  Assume pruned graph to search only vertices reachable from the source, so $V = O(\vert E \vert )$

# 15 Recursive Algorithms

## Notes

### Design your own recursive algorith

• Constant-sized program to solve arbitrary input
• Need looping or recursion, analyze by induction
• Recursive function call: vertex in a graph, directed edge from $A\rightarrow B$ if $B$ calls $A$
• Dependency graph of recursive calls must be acyclic (if can terminate)
• Classify based on shape of graph
Class Graph
Brute Force Star
Decrease & Conquer Chain
Divide & Conquer Tree
Dynamic Programming DAG
Greedy / Incremental Subgraph
• Hard part is thinking inductively to construct recurrence on subproblems
• How to solve a problem recursively (SRT BOT)
• Subproblem definition
• Relate subproblem solutions recursively
• Topological order on subproblems ($\Rightarrow$ subproblem DAG)
• Base cases of relation
• Original problem solution via subproblems(s)
• Time analysis

### Merge Sort in SRT BOT Framework

• Merge sorting an array $A$ of $$n$$ elements can be expressed in SRT BOT as follows:
• Subproblems: $S(i,j)=$ sorted array on elements of $A[i:j]$ for $i \leq i \leq j \leq n$
• Relation: $S(i,j) = merge(S(i, m), S(m, j))$ where $m=\lfloor (i+j) / 2 \rfloor$
• Topological order: Increasing $j - i$
• Base cases: $S(i, i + 1) = [A[i]]$
• Original: $S(0, n)$
• Time: $T(n) = 2T(n/2) + O(n) = O(nlogn)$

### Fibonacci Numbers

• Compute the $$n$$th Fibonacci number $F_n$
• Subproblems: $F(i)=$ the $i$th Fibonacci number $F_i$ for $i \in {0,1,…, n}$
• Relation: $F(i) = F(i-1)+F(i-2)$ (definition of Fibonacci numbers)
• Topological order: Increasing $i$
• Base cases: $F(0) = 0, F(1) = 1$
• Original problem: $F(n)$
def fib(n):
if n < 2: return n
return fib(n-1) + fib(n-2)
• Divide and conquer implies a tree of recursive calls
• Time: $T(n) = T(n-1)+T(n-2) + O(1) > 2T(n-2), T(n) = \Omega(2 ^{n/2})$ exponential… :(
• Subproblem $F(k)$ computed more than once! ($F(n-k)$ times)
• Can we avoid this waste?

### Re-using Subproblem Solutions

• Either:
• Top down: record subproblem solutions in a memo and re-use
• Bottom up: solve subproblems in topological sort order (usually via loops)
• For Fibonacci, $n + 1$ subproblems (vertices) and $<2n$ dependencies (edges)
• Time to compute is then $O(n)$ additions
def fib(n):
memo = dict()
def F(i):
if i < 2: return i
if i not in memo:
memo[i] = F(i-1) + F(i-2)
return memo[i]
return F(n)

def fib(n):
F = dict()
F[0], F[1] = 0, 1
for i in range(2, n+1):
F[i] = F[i-1] + F[i-2]
return F[n]
• A subtlety is that Fibonacci numbers grow to $\Theta(n)$ bits long, potentially $\gg$ word size $w$
• Each addition costs $O(\lceil n / w \rceil)$ time
• So total cost is $O(n\lceil n / w \rceil) = O(n + n^2/w)$ time

### Dynamic Programming

• Weird name coined by Richard Bellman
• Wanted government funding, needed cool name to disguise doing mathematics!
• Updating (dynamic) a plan or schedule (program)
• Existence of recursive solution implies decomposable subproblems
• Recursive algorithm implies a graph of computation
• Dynamic programming if subproblem dependencies overlap (DAG, in-degree > 1)
• “Recurse but re-use” (Top down: record and lookup subproblem solutions)
• “Careful brute force” (Bottom up: do each subproblem in order)
• Often useful for counting/optimization problems: almost trivially correct recurrences

### How to Solve a Problem Recursively (SRT BOT)

• Subproblem definition subproblem $x\in X$
• Describe the meaning of a subproblem in words, in terms of parameters
• Often subsets of input: prefixes, suffixes, contiguous substrings of a sequence
• Often record partial state: add subproblems by incrementing some auxiliary variable
• Relate subproblem solutions recursively $x(i) = f(x(j),…)$ for one or more $j < i$
• Topological order: to argue relation is acyclic and subproblems form a DAG
• Base cases
• State solutions for all (reachable) independent subproblems where relation breaks down
• Original problem
• Show how to compute solution to original problem from solutions to subproblem(s)
• Possibly use parent pointers to recover actual solution, not just objective function
• Time analysis
•  $\Sigma_{x\in X}work(x),$ or if $work(x)=O(W)$ for all $x\in X$, then $X \cdot O(W)$
• $work(x)$ measures non-recursive work in relation; treat recursions as taking $O(1)$ time

### DAG Shortest Paths

• DAG SSSP problem: given a DAG G and vertex $s$, compute $\delta(s, v)$ for all $v\in V$
• Subproblems: $\delta(s, v)$ for all $v\in V$
• Relation: $\delta(s, v)=min{\delta(s, u) + w(u, v) \vert u \in Adj^-(v)} \cup {\infty}$
• Topological order: Topological order of G
• Base case: $\delta(s, s) = 0$
• Original: All subproblem
• Time: $\Sigma_{v\in V}O(1+\vert A dj^-(v)\vert ) = O(\vert V \vert +\vert E \vert )$
• DAG Relaxation computes the same min values as this dynamic program, just
• step-by-step (if new value < min, update min via edge relaxation), and
• from the perspective of $$u$$ and $Adj^+(u)$ instead of $v$ and $Adj^-(v)$

### How to Relate Subproblem Solutions

• The general approach we’re following to define a relation on subproblem solutions:
• Identify a question about a subproblem solution that, if you knew the answer to, would reduce to “smaller” subproblem(s)
• Then locally brute-force the question by trying all possible answers, and taking the best
• Alternatively, we can think of correctly guessing the answer to the question, and directly recursing; but then we actually check all possible guesses, and return the “best”
• The key for efficiency is for the question to have a small (polynomial) number of possible answers, so brute forcing is not too expensive
• Often (but not always) the non-recursive work to compute the relation is equal to the number of answers we’re trying

# 16 Dynamic Programming Subproblems

## Notes

### Longest Common Subsequence (LCS)

• Given two strings $A$ and $B$, find a longest (not necessarily contiguous) subsequence of $A$ that is also a subsequence of $B$.
• Example: $A=hieroglyphology \ B=michaelangelo$
• Solution: $hello$ or $heglo$ or $iello$ or $ieglo$, all length 5
• Maximization problem on length of subsequence
1. Subproblems:
• $x(i, j)=$ length of the longest common subsequence of suffixes $A[i:]$ and $B[j:]$
• For $0\leq i \leq \vert A \vert$ and $0 \leq j \leq \vert B \vert$
2. Relate:
• Either first characters match or they don’t
• If first characters match, some longest common subsequence will use them
• (if no LCS uses first matched pair, using it will only improve solution)
• (if an LCS uses first in $A[i]$ but not first in $B[j]$, matching $B[j]$ is also optimal)
• If they do not match, they cannot both be in a longest common subsequence
• Guess whether $A[i]$ or $B[j]$ is not in LCS
• $x(i, j) = \begin{cases} x(i + 1, j+1) + 1 \qquad if A[i] = B[j]\max{x(i+1, j), x(i,j+1)} \qquad otherwise \end{cases}$
3. Topological order:
• Subproblem $x(i, j)$ depend only on strictly larger $i$ or $j$ or both
• Simplest order to state: Decreasing $i + j$
• Nice order for bottom-up code: Decreasing $i$, then decreasing $j$
4. Base
• $x(i, \vert B \vert ) = x(\vert A \vert , j)=0$ (one string is empty)
5. Original problem
• Length of longest common subsequence of $A$ and $B$ is $x(0,0)$
• Store parent pointers to reconstruct subsequencce
• If the parent pointer increases both indices, add that character to LCS
6. Time:
• # subproblems: $(\vert A \vert + 1)\cdot(\vert B \vert + 1)$
• work per subproblem: $O(1)$
• $O(\vert A \vert \ cdot \vert B \vert )$ running time
def lcs(A, B):
a, b = len(A), len(B)
x = [[0] * (b + 1) for _ in range(a + 1)]
for i in reversed(range(a)):
for j in reversed(range(b)):
if A[i] == B[j]:
x[i][j] = x[i + 1][j + 1] + 1
else:
x[i][j] = max(x[i + 1][j], x[i][j + 1])
return x[0][0]

def lcs(A, B):
a, b = len(A), len(B)
x = [[0] * (b + 1) for _ in range(a + 1)]
for i in range(1, a + 1):
for j in range(1, b + 1):
if A[i] == B[j]:
x[i][j] = x[i - 1][j - 1] + 1
else:
x[i][j] = max(x[i - 1][j], x[i][j - 1])
return x[0][0]

### Longest Increasing Subsequence (LIS)

• Given a string $$A$$, find a longest (not necessarily contiguous) subsequence of $$A$$ that strictly increases (lexicographically).
• Example: $$A= carbohydrate$$
• Solution: $$abort$$ of length 5
• Maximization problem on length of subsequence
• Attempted solution:
• Natural subproblems are prefixes or suffixes of $$A$$, say suffix $$A[i :]$$
• Natural question about LIS of $$A[i :]$$: is $$A[i]$$ in the LIS? (2 possible answers)
• But then how do we recurse on $$A[i + 1 :]$$ and guarantee increasing subsequence?
• Fix: add constraint to subproblems to give enough structure to achieve increasing property
1. Subproblems
• $$x(i)$$ = length of longest increasing subsequence of suffix $$A[i:]$$ that includes $$A[i]$$
• For $$0\leq i \leq \vert A \vert$$
2. Relate
• We’re told that $$A[i]$$ is in LIS (first element)
• Next question: what is the second element of LIS?
• Could be any $$A[j]$$ where $$j > i$$ and $$A[j] > A[i]$$ (so increasing)
• Or $$A[i]$$ might be the last element of LIS
• $x(i) = max\{1 + x(j) \vert i < j < \vert A \vert , A[j] > A[i]\} \cup \{1\}$
3. Topological order:
• Decreasing $$i$$
4. Base
• No base case necessary, because we consider the possibility that $$A[i]$$ is last
5. Original problem
• What is the first element of LIS? Guess!
• Length of LIS of $$A$$ is $$max\{x(i) \vert 0\leq i < \vert A \vert \}$$
• Store parent pointers to reconstruct subsequence
6. Time
• # subproblems: $$\vert A \vert$$
• work per subproblem $$O( \vert A \vert )$$
• $$O( \vert A \vert ^2)$$ running time
• speed up to $$O( \vert A \vert log \vert A \vert )$$ by doing only $$O(log \vert A \vert )$$ work per subproblem, via AVL tree augmentation
def lis(A):
a = len(A)
x = [1] * a
for i in reversed(range(a)):
for j in range(i, a):
if A[j] > A[i]:
x[i] = max(x[i], 1 + x[j])
return max(x)

### Alternating Coin Game

• Given sequence of n coins of value $$v_0, v_1,...,v_{n1}$$
• Two players (“me” and “you”) take turns
• In a turn, take first or last coin among remaining coins
• My goal is to maximize total value of my taken coins, where I go first
• First solution exploits that this is a zero-sum game: I take all coins you don’t
1. Subproblems
• Choose subproblems that correspond to the state of the game
• For every contiguous subsequence of coins from $$i$$ to $$j$$, $$0\leq i \leq j < n$$
• $$x(i, j)=$$ maximum total value I can take starting from coins of values $$v_i, ..., v_j$$
2. Relate
• I must choose either coin $$i$$ or coin $$j$$ (Guess!)
• Then it’s your turn, so you’ll get values $$x(i+1, j)$$ or $$x(i, j-1)$$ respectively
• To figure out how much value I get, subtract this from total coin values
• $$x(i,j) = max\{v_i + \Sigma_{k=i+1}^{j} v_k \qquad x(i + 1, j), v_j + \Sigma_{k=i}^{j}v_k \qquad x(i, j-1)\}$$ ???
3. Topological order
• Increasing $$j - i$$
4. Base
• $x(i,i) = v_i$
5. Original problem
• $x(0, n- 1)$
• store parent pointers to reconstruct strategy
6. Time
• # subproblems: $$\Theta(n^2)$$
• work per subproblem: $$\Theta(n)$$ to compute sums
• $$\Theta(n^3)$$ running time
• Speed up to $$\Theta(n^2)$$ time by pre-computing all sums $$\Sigma_{k=i}^{j}v_k$$ in $$\Theta(n^2)$$ time via dynamic programming
• Second solution uses subproblem expansion: add subproblems for when you move next
1. Subproblems
• Choose subproblems that correspond to the full state of the game
• Contiguous subsequence of coins from $$i$$ to $$j$$, and which player $$p$$ goes next
• $$x(i, j, p) =$$ maximum total value I can take when player $$p \in \{me, you\}$$ starts from coins of values $$v_i,...,v_j$$
2. Relate
• Player $$p$$ must choose either coin $$i$$ or coin $$j$$ (Guess!)
• If $$p =$$ me, then I get the value; otherwise, I get nothing
• Then it’s the other player’s turn
• $x(i, j, me) = max\{v_i + x(i + 1, j, you), v_j + x(i, j - 1, you)\}$
• $x(i, j, you) = min\{x(i + 1, j, me), x(i, j-1, me)\}$
3. Topological order
• Increasing $$j - i$$
4. Base
• $x(i, i, me) = v_i$
• $x(i,i,you)=0$
5. Original problem
• $x(0, n-1, me)$
• Store parent pointers to reconstruct strategy
6. Time
• # subproblems: $$\Theta(n^2)$$
• work per subproblem: $$\Theta(1)$$
• $$\Theta(n^2)$$ running time

Yet another alternative solution.

def coin_game(coins):
n = len(coins)
dp = [[0] * n for _ in range(n)]
for i in reversed(range(n)):
for j in range(i, n):
if i == j:
d[i][j] = coins[i]
else:
dp[i][j] = max(coins[i] - dp[i+1][j], coins[j] - dp[i][j-1])
return dp[0][n-1] >= 0

def coin_game(coins):
n = len(coins)
dp = [[0] * n for _ in range(n)]
parents = dict()

for i in reversed(range(n)):
for j in range(i, n):
if i == j:
d[i][j] = coins[i]
parents[(l, r)] = ((l, r), coins[l])
else:
a = coins[i] - dp[i+1][j]
b = coins[j] - dp[i][j-1]
if a > b:
dp[i][j] = a
parents[(l, r)] = ((l+1, r), nums[l])
else:
dp[i][j] = b
parents[(l, r)] = ((l, r-1), nums[r])

if dp[0][n-1] >= 0
state = (0, n-1)
turn = 1
while parents[state][0] != state:
print(f"player {turn % 2} took {parents[state][1]}")
state = parents[state][0]
turn += 1

return dp[0][n-1] >= 0

### Subproblem Constraints and Expansion

• We’ve now seen two examples of constraining or expanding subproblems
• If you find yourself lacking information to check the desired conditions of the problem, or lack the natural subproblem to recurse on, try subproblem constraint/expansion!
• More subproblems and constraints give the relation more to work with, so can make DP more feasible
• Usually a trade-off between number of subproblems and branching/complexity of relation

# 17 Dynamic Programming III

## Notes

### Single-Source Shortest Paths Revisited

1. Subproblems
• Expand subproblems to add information to make acyclic!
• $$\delta_k(s,v)=$$ weight of shortest path from $$s$$ to $$v$$ using at most $$k$$ edges
• For $$v\in V$$ and $$0 \leq k \leq \vert V \vert$$
2. Relate:
• Guess last edge $$(u, v)$$ on shortest path from $$s$$ to $$v$$
• $\delta_k(s, v) = min\{\delta_{k-1}(s, u) + w(u, v) \vert (u, v) \in E\}\ \cup\ \{\delta_{k-1}(s, v)\}$
3. Topological order:
• Increasing k: subproblems depend on subproblems only with strictly smaller $$k$$.
4. base
• $$\delta_{0}(s, s)=0$$ and $$\delta_0(s, v) = \infty$$ for $$v \neq s$$ (no edges)
5. Original problem
• If has finte shortest path, then $$\delta(s, v) = \delta_{\vert V \vert- 1}(s, v)$$
• Otherwise some $$\delta_{\vert V \vert }(s, v) < \delta_{\vert V \vert - 1}(s, v)$$, so path contains a negative-weight cycle
• Can keep track of parent pointers to subproblem that minimized recurrence
6. Time
• # subproblems: $$\vert V \vert \times (\vert V \vert +1)$$
• Work for subproblem $$\delta_k(s,v): O(deg_{in}(v))$$ $$\Sigma_{k=0}^{\vert V \vert } \Sigma_{v\in V}O(deg_{in}(v)) = \Sigma_{k=0}^{\vert V \vert } O(\vert E \vert ) =O(\vert V \vert \ cdot\vert E \vert )$$
• This is just Bellman-Ford! (computed in a slightly different order)

### All-Pairs Shortest Paths: Floyd-Warshall

• Could define subproblem $$\delta_k(u, v)=$$ minimum weight of path from $$u$$ to $$v$$ using at most $$k$$ edges, as in Bellman-Ford
• Resulting running time is $$\vert V \vert$$ times Bellman-Ford, i.e., $$O(\vert V \vert ^2 \cdot \vert E \vert ) = O(\vert V \vert ^4)$$
• Know a better algorithm from L14: Johnson achieves $$O(\vert V \vert ^2log \vert V+\vert V \vert \cdot\vert E \vert )=O(\vert V \vert ^3)$$
• Can achieve $$\Theta(\vert V \vert ^3)$$ running time (matching Johnson for dense graphs) with a simple dynamic program, called Floyd-Warshall.
• Number vertices so that $$V=\{1,...,\vert V \vert \}$$
1. Subproblems:
• $$d(u, v, k)=$$ minimum weight of a path from $$u$$ to $$v$$ that only uses vertices from $$\{1,2,...,k\} \cup \{u, v\}$$
• For $$u, v \in V$$ and $$1\leq k \leq \vert V \vert$$
2. Relate
• $x(u, v, k)= min\{x(u, k, k-1) + x(k,v,k-1), x(u, v, k-1)\}$
• Only constant branching! No longer guessing previous vertex/edge
3. Topological order
• Increasing $$k$$: relation depends only on smaller $$k$$
4. Base
• $x(u, u, 0) = 0$
• $$x(u, v, 0) = w(u, v)$$ if $$(u, v)\in E$$
• $$x(u,v,0)=\infty$$ if none of the above
5. Original problem
• $$x(u, v, \vert V \vert )$$ for all $$u, v \in V$$
6. Time
• $$O(\vert V \vert ^3)$$ subproblems
• Each $$O(1)$$ work
• $$O(\vert V \vert ^3)$$ in total
• Constant number of dependencies per subproblem brings the factor of $$O(\vert E \vert)$$ in the running time down to $$O(\vert V \vert)$$