Notes on python built-in modules

1. `itertools`

1.1 `pairwise(iterable)`

pairwise('ABCDEFG') --> AB BC CD DE EF FG
Roughly equivalent to:

def pairwise(iterable):
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

1.2 `Chain(*iterables)`

chain('ABC', 'DEF') --> A B C D E F
Roughly equivalent to:

def chain(*iterables):
    for it in iterables:
        for element in it:
            yield element

Example use case: with certain sliding window algorithm to simplify code to avoid a dangling boundry case evaluation.

def large_group_positions(s):
    curr, l, large_groups = '', 0, []
    for r, char in enumerate(chain(s, " ")):
        if char != curr:
            if r - l >= 3: large_groups.append([l, r - 1])
            curr, l = char, r
    return large_groups

1.3 `product(*iterables, repeat=1)`

Roughly equivalent to the following, the actual implementation does not build the intermediate results in memory:

def product(*iterables, repeat=1):
    pools = [tuple(pool) for pool in iterables] * repeat
    result = [[]]
    for pool in pools:
        result = [x + [y] for x in result for y in pool]
    for product in result:
        yield tuple(prod)

Use cases: Simplify nested loops.

nrow, ncol = len(matrix), len(matrix[0])
for r, c in product(range(nrow), range(ncol)):
  if matrix[r][c] in visited: continue
  dfs(r, c)

2. `collections`

2.1 `Counter`

A dict that keep track of the counts for elemnts from a collection.
elements() : returns the elemnts with non-negative counts, each unique key is duplicated to match its count.
most_common(n=None) : returns (element, count) tuple sorted by the count in decreasing order stop at nth tuple if specified.
total() : returns the sum of counts.
a.substract(b) : a[e] = a.get(e, 0) - b[e] for e in set.union(set(a.keys()), set(b.keys()))
a.update(b) : a[e] = a.get(e, 0) + b[e] for e in set.union(set(a.keys()), set(b.keys()))

> c = Counter(a=3, b=1)
> d = Counter(a=1, b=2)
> c + d                       # add two counters together:  c[x] + d[x]
Counter({'a': 4, 'b': 3})
> c - d                       # subtract (keeping only positive counts)
Counter({'a': 2})
> c & d                       # intersection:  min(c[x], d[x])
Counter({'a': 1, 'b': 1})
> c | d                       # union:  max(c[x], d[x])
Counter({'a': 3, 'b': 2})
> c == d                      # equality:  c[x] == d[x]
False
> c <= d                      # inclusion:  c[x] <= d[x]
False

2.2 `OrderedDict`

Since Python 3.7, regular dict will remember the order of keys inserted, but not able to rearrange the order of keys. OrderedDict provides the ability to do so.
Under the hood, it is implemented with crossing doubly linked list with hash map. See here
The rearrange of key can only happen to either end of the linkedlist:
- move_to_end(key, last=True)
  - move existing key to either end
  - last = True -> right end; last = False -> left end
- This is done by short the link for the given key node. e.g. node = map[key]; node.prev.next = node.next
- Then attach it to the specified end of the linked list. e.g. node.prev, node.next, sentinel.next = sentinel, sentinel.next, node
As a result, we can not only pop a given key, we can also pop key from either end.
- pop(key)
- popitem(last=True)
  - return and removes a key, value pair
  - with argument last = True -> LIFO;
  - with argument last = False -> FIFO
Also supports reversed() on the key, value, item views
Can be used to implement LRU cache.

class LastUpdatedOrderedDict(OrderedDict):
    def __setitem__(self, key, value):
        super().__setitem__(key, value)
        super().move_to_end(key)

class LRUCache(OrderedDict):
    def __init__(self, capacity: int):
        self.capacity = capacity

    def get(self, key: int) -> int:
        if key not in self: return None
        super().move_to_end(key) # get is a use. 
        return self[key]

    def put(self, key: int, value: int) -> None:
        if key not in self and len(self) == self.capacity: 
            super().popitem(last=False) # pop the least recent used to make space
        self[key] = value
        super().move_to_end(key) # put is also a use

2.3 `defaultdict(default_factory=None,/[,...])`

A subclass of dict which overrides the __missing__(key) method(which is called by __getitem__()).
If a key is not present in the object, it create a value for the key by calling the default_factory.
default_factory should be a callable, and the result by calling the callable without argument will be the default value for the missing key.
As a result __getitem__(key) will always creates an entry when key is missing, which sometimes may cause un-wanted behavior. We can reset default_factory to None once we are done inserting.
Use Cases:

adj_lists = defaultdict(list)
for f, t in edge_lists:
    adj_lists[f].append(t)  # instead of adj_lists.setdefault(f, []).append(t)

3 `sortedcontainers`

SortedList is a data sctructure consists of:
1. a sorted list of sorted sublists to supports ~O(logN) search, insert and delete
2. and coupled with a tree strcutre which to supports ~O(logN) time indexed access
3. and iterator / slice operations on the list also have a ~O(logN) overhead

l = SortedList([1,2,3,3,3,5,7,9])

# binary search 
print(l.bisect_left(3)) # -> 2    O(logN)
print(l.bisect_right(3)) # -> 5   O(logN)

# indexed access
print(l[6])  # -> 7               O(logN)
print(l[0])  # 1                  O(1)                     
print(l[~0]) # 9                  O(1)   

# index based slice access took O(logN) to create the iterator, O(1) per `next`
print(l[3:5])               # -> 3,3
print(list(l.islice(3, 5))) # -> 3,3

it = l.islice(1, 4)
print(next(it))             # 2
print(next(it))             # 3
print(next(it))             # 3
print(next(it))             # StopIteration exception

# both ends inclusive by default, set inclusive=(True, False) etc to alter behavior
# under the hood, two binary search are conducted to find the the two boundries, 
# then chained iterators of the sublists are returned
# thus O(logN) to create the iterator, O(1) for each `next`
print(list(l.irange(3, 5))) # -> 3,3,3,5  

# append element with `add` method, since this is really a set data structure.  
l.add(4)                        # O(logN)
# extend by `update` O(mlog(n+m))
l.update([4, 8])

# delete element by value with `remove` or `discard`
l.remove(4)                     # O(logN)
l.discard(9)                    # safe when element is not in list

# index based `pop`             # O(logN)
l.pop()    # remove the largest        
l.pop(-2)  # remove second to the last 
l.pop(3)   # remove the 4th item         

# count number of repeats for a given value
l.count(3)                      # O(logN) two binary search?

SortedSet is a data sctructure crossing a SortedList with a HashSet:
1. That is, a python set with SortedList index
2. Since indexed by a SortedList, insert and delete are now O(logN)
3. And the update, union, difference, intersection operations are all now O(NlogN)
4. contains check is still O(1), and also has a count method pretty much does the same thing
5. And slice / range in sorted order provided by the sorted list

s = SortedSet([1,1,2,3,3,3,4,5])

print(s[1:3]) # -> [2, 3]

print(list(s.irange(2, 5)))  # -> [2,3,4,5]

SortedDict is a data sctructure crossing a SortedList with a HashMap:
1. That is, a python dict with SortedList index
2. Since indexed by a SortedList, insert and delete are now O(logN)
3. And the update operations is now O(NlogN)
4. contains check is still O(1)
5. And slice / range on the keys in sorted order provided by the sorted list
6. Additionally a indexed key-value look up method peekitem(index=-1), which return the key-value pair, where based of the key’s index in sorted keys.

m = SortedDict({3:4, 5:6, 7:0, 1:1, 2:3, })  # O(NlogN)

print(m.keys()[1:3]) # -> [2, 3]  # O(logN + M)

print(list(m.islice(1:3))) # -> [2, 3]  # O(logN + M)
print(list(s.irange(2, 5)))  # -> [2,3,5] # O(logN + M)

print(m.peekitem(3))     # -> (5, 6)

Notes on python built-in modules

1. itertools

1.1 pairwise(iterable)

1.2 Chain(*iterables)

1.3 product(*iterables, repeat=1)

2. collections

2.1 Counter

2.2 OrderedDict

2.3 defaultdict(default_factory=None,/[,...])

3 sortedcontainers

1. `itertools`

1.1 `pairwise(iterable)`

1.2 `Chain(*iterables)`

1.3 `product(*iterables, repeat=1)`

2. `collections`

2.1 `Counter`

2.2 `OrderedDict`

2.3 `defaultdict(default_factory=None,/[,...])`

3 `sortedcontainers`