Notes on python built-in modules

1. itertools

1.1 pairwise(iterable)

  • pairwise('ABCDEFG') --> AB BC CD DE EF FG
  • Roughly equivalent to:
def pairwise(iterable):
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)

1.2 Chain(*iterables)

  • chain('ABC', 'DEF') --> A B C D E F
  • Roughly equivalent to:
def chain(*iterables):
    for it in iterables:
        for element in it:
            yield element
  • Example use case: with certain sliding window algorithm to simplify code to avoid a dangling boundry case evaluation.
def large_group_positions(s):
    curr, l, large_groups = '', 0, []
    for r, char in enumerate(chain(s, " ")):
        if char != curr:
            if r - l >= 3: large_groups.append([l, r - 1])
            curr, l = char, r
    return large_groups

1.3 product(*iterables, repeat=1)

  • Roughly equivalent to the following, the actual implementation does not build the intermediate results in memory:
def product(*iterables, repeat=1):
    pools = [tuple(pool) for pool in iterables] * repeat
    result = [[]]
    for pool in pools:
        result = [x + [y] for x in result for y in pool]
    for product in result:
        yield tuple(prod)
  • Use cases: Simplify nested loops.
    nrow, ncol = len(matrix), len(matrix[0])
    for r, c in product(range(nrow), range(ncol)):
      if matrix[r][c] in visited: continue
      dfs(r, c)
    

2. collections

2.1 Counter

  • A dict that keep track of the counts for elemnts from a collection.
  • elements() : returns the elemnts with non-negative counts, each unique key is duplicated to match its count.
  • most_common(n=None) : returns (element, count) tuple sorted by the count in decreasing order stop at nth tuple if specified.
  • total() : returns the sum of counts.
  • a.substract(b) : a[e] = a.get(e, 0) - b[e] for e in set.union(set(a.keys()), set(b.keys()))
  • a.update(b) : a[e] = a.get(e, 0) + b[e] for e in set.union(set(a.keys()), set(b.keys()))
> c = Counter(a=3, b=1)
> d = Counter(a=1, b=2)
> c + d                       # add two counters together:  c[x] + d[x]
Counter({'a': 4, 'b': 3})
> c - d                       # subtract (keeping only positive counts)
Counter({'a': 2})
> c & d                       # intersection:  min(c[x], d[x])
Counter({'a': 1, 'b': 1})
> c | d                       # union:  max(c[x], d[x])
Counter({'a': 3, 'b': 2})
> c == d                      # equality:  c[x] == d[x]
False
> c <= d                      # inclusion:  c[x] <= d[x]
False

2.2 OrderedDict

  • Since Python 3.7, regular dict will remember the order of keys inserted, but not able to rearrange the order of keys. OrderedDict provides the ability to do so.
  • Under the hood, it is implemented with crossing doubly linked list with hash map. See here
  • The rearrange of key can only happen to either end of the linkedlist:
    • move_to_end(key, last=True)
      • move existing key to either end
      • last = True -> right end; last = False -> left end
    • This is done by short the link for the given key node. e.g. node = map[key]; node.prev.next = node.next
    • Then attach it to the specified end of the linked list. e.g. node.prev, node.next, sentinel.next = sentinel, sentinel.next, node
  • As a result, we can not only pop a given key, we can also pop key from either end.
    • pop(key)
    • popitem(last=True)
      • return and removes a key, value pair
      • with argument last = True -> LIFO;
      • with argument last = False -> FIFO
  • Also supports reversed() on the key, value, item views
  • Can be used to implement LRU cache.
class LastUpdatedOrderedDict(OrderedDict):
    def __setitem__(self, key, value):
        super().__setitem__(key, value)
        super().move_to_end(key)

class LRUCache(OrderedDict):
    def __init__(self, capacity: int):
        self.capacity = capacity

    def get(self, key: int) -> int:
        if key not in self: return None
        super().move_to_end(key) # get is a use. 
        return self[key]

    def put(self, key: int, value: int) -> None:
        if key not in self and len(self) == self.capacity: 
            super().popitem(last=False) # pop the least recent used to make space
        self[key] = value
        super().move_to_end(key) # put is also a use

2.3 defaultdict(default_factory=None,/[,...])

  • A subclass of dict which overrides the __missing__(key) method(which is called by __getitem__()).
  • If a key is not present in the object, it create a value for the key by calling the default_factory.
  • default_factory should be a callable, and the result by calling the callable without argument will be the default value for the missing key.
  • As a result __getitem__(key) will always creates an entry when key is missing, which sometimes may cause un-wanted behavior. We can reset default_factory to None once we are done inserting.
  • Use Cases:
adj_lists = defaultdict(list)
for f, t in edge_lists:
    adj_lists[f].append(t)  # instead of adj_lists.setdefault(f, []).append(t)

3 sortedcontainers

  • SortedList is a data sctructure consists of:
    1. a sorted list of sorted sublists to supports ~O(logN) search, insert and delete
    2. and coupled with a tree strcutre which to supports ~O(logN) time indexed access
    3. and iterator / slice operations on the list also have a ~O(logN) overhead
l = SortedList([1,2,3,3,3,5,7,9])

# binary search 
print(l.bisect_left(3)) # -> 2    O(logN)
print(l.bisect_right(3)) # -> 5   O(logN)

# indexed access
print(l[6])  # -> 7               O(logN)
print(l[0])  # 1                  O(1)                     
print(l[~0]) # 9                  O(1)   

# index based slice access took O(logN) to create the iterator, O(1) per `next`
print(l[3:5])               # -> 3,3
print(list(l.islice(3, 5))) # -> 3,3

it = l.islice(1, 4)
print(next(it))             # 2
print(next(it))             # 3
print(next(it))             # 3
print(next(it))             # StopIteration exception

# both ends inclusive by default, set inclusive=(True, False) etc to alter behavior
# under the hood, two binary search are conducted to find the the two boundries, 
# then chained iterators of the sublists are returned
# thus O(logN) to create the iterator, O(1) for each `next`
print(list(l.irange(3, 5))) # -> 3,3,3,5  

# append element with `add` method, since this is really a set data structure.  
l.add(4)                        # O(logN)
# extend by `update` O(mlog(n+m))
l.update([4, 8])

# delete element by value with `remove` or `discard`
l.remove(4)                     # O(logN)
l.discard(9)                    # safe when element is not in list

# index based `pop`             # O(logN)
l.pop()    # remove the largest        
l.pop(-2)  # remove second to the last 
l.pop(3)   # remove the 4th item         

# count number of repeats for a given value
l.count(3)                      # O(logN) two binary search?
  • SortedSet is a data sctructure crossing a SortedList with a HashSet:
    1. That is, a python set with SortedList index
    2. Since indexed by a SortedList, insert and delete are now O(logN)
    3. And the update, union, difference, intersection operations are all now O(NlogN)
    4. contains check is still O(1), and also has a count method pretty much does the same thing
    5. And slice / range in sorted order provided by the sorted list
s = SortedSet([1,1,2,3,3,3,4,5])

print(s[1:3]) # -> [2, 3]

print(list(s.irange(2, 5)))  # -> [2,3,4,5]
  • SortedDict is a data sctructure crossing a SortedList with a HashMap:
    1. That is, a python dict with SortedList index
    2. Since indexed by a SortedList, insert and delete are now O(logN)
    3. And the update operations is now O(NlogN)
    4. contains check is still O(1)
    5. And slice / range on the keys in sorted order provided by the sorted list
    6. Additionally a indexed key-value look up method peekitem(index=-1), which return the key-value pair, where based of the key’s index in sorted keys.
m = SortedDict({3:4, 5:6, 7:0, 1:1, 2:3, })  # O(NlogN)

print(m.keys()[1:3]) # -> [2, 3]  # O(logN + M)

print(list(m.islice(1:3))) # -> [2, 3]  # O(logN + M)
print(list(s.irange(2, 5)))  # -> [2,3,5] # O(logN + M)

print(m.peekitem(3))     # -> (5, 6)