Python
Notes on python built-in modules
1. itertools
1.1 pairwise(iterable)
pairwise('ABCDEFG') --> AB BC CD DE EF FG- Roughly equivalent to:
def pairwise(iterable):
a, b = tee(iterable)
next(b, None)
return zip(a, b)
1.2 Chain(*iterables)
chain('ABC', 'DEF') --> A B C D E F- Roughly equivalent to:
def chain(*iterables):
for it in iterables:
for element in it:
yield element
- Example use case: with certain sliding window algorithm to simplify code to avoid a dangling boundry case evaluation.
def large_group_positions(s):
curr, l, large_groups = '', 0, []
for r, char in enumerate(chain(s, " ")):
if char != curr:
if r - l >= 3: large_groups.append([l, r - 1])
curr, l = char, r
return large_groups
1.3 product(*iterables, repeat=1)
- Roughly equivalent to the following, the actual implementation does not build the intermediate results in memory:
def product(*iterables, repeat=1):
pools = [tuple(pool) for pool in iterables] * repeat
result = [[]]
for pool in pools:
result = [x + [y] for x in result for y in pool]
for product in result:
yield tuple(prod)
- Use cases: Simplify nested loops.
nrow, ncol = len(matrix), len(matrix[0]) for r, c in product(range(nrow), range(ncol)): if matrix[r][c] in visited: continue dfs(r, c)
2. collections
2.1 Counter
- A
dictthat keep track of the counts for elemnts from a collection. elements(): returns the elemnts with non-negative counts, each unique key is duplicated to match its count.most_common(n=None): returns (element, count) tuple sorted by the count in decreasing order stop at nth tuple if specified.total(): returns the sum of counts.a.substract(b):a[e] = a.get(e, 0) - b[e]foreinset.union(set(a.keys()), set(b.keys()))a.update(b):a[e] = a.get(e, 0) + b[e]foreinset.union(set(a.keys()), set(b.keys()))
> c = Counter(a=3, b=1)
> d = Counter(a=1, b=2)
> c + d # add two counters together: c[x] + d[x]
Counter({'a': 4, 'b': 3})
> c - d # subtract (keeping only positive counts)
Counter({'a': 2})
> c & d # intersection: min(c[x], d[x])
Counter({'a': 1, 'b': 1})
> c | d # union: max(c[x], d[x])
Counter({'a': 3, 'b': 2})
> c == d # equality: c[x] == d[x]
False
> c <= d # inclusion: c[x] <= d[x]
False
2.2 OrderedDict
- Since Python 3.7, regular
dictwill remember the order of keys inserted, but not able to rearrange the order of keys.OrderedDictprovides the ability to do so. - Under the hood, it is implemented with crossing doubly linked list with hash map. See here
- The rearrange of key can only happen to either end of the linkedlist:
move_to_end(key, last=True)- move existing key to either end
- last = True -> right end; last = False -> left end
- This is done by short the link for the given key node. e.g.
node = map[key]; node.prev.next = node.next - Then attach it to the specified end of the linked list. e.g.
node.prev, node.next, sentinel.next = sentinel, sentinel.next, node
- As a result, we can not only pop a given key, we can also pop key from either end.
pop(key)popitem(last=True)- return and removes a key, value pair
- with argument
last = True-> LIFO; - with argument
last = False-> FIFO
- Also supports
reversed()on the key, value, item views - Can be used to implement LRU cache.
class LastUpdatedOrderedDict(OrderedDict):
def __setitem__(self, key, value):
super().__setitem__(key, value)
super().move_to_end(key)
class LRUCache(OrderedDict):
def __init__(self, capacity: int):
self.capacity = capacity
def get(self, key: int) -> int:
if key not in self: return None
super().move_to_end(key) # get is a use.
return self[key]
def put(self, key: int, value: int) -> None:
if key not in self and len(self) == self.capacity:
super().popitem(last=False) # pop the least recent used to make space
self[key] = value
super().move_to_end(key) # put is also a use
2.3 defaultdict(default_factory=None,/[,...])
- A subclass of
dictwhich overrides the__missing__(key)method(which is called by__getitem__()). - If a key is not present in the object, it create a value for the key by calling the
default_factory. default_factoryshould be a callable, and the result by calling the callable without argument will be the default value for the missing key.- As a result
__getitem__(key)will always creates an entry when key is missing, which sometimes may cause un-wanted behavior. We can resetdefault_factorytoNoneonce we are done inserting. - Use Cases:
adj_lists = defaultdict(list)
for f, t in edge_lists:
adj_lists[f].append(t) # instead of adj_lists.setdefault(f, []).append(t)
3 sortedcontainers
SortedListis a data sctructure consists of:- a sorted list of sorted sublists to supports
~O(logN)search, insert and delete - and coupled with a tree strcutre which to supports
~O(logN)time indexed access - and iterator / slice operations on the list also have a
~O(logN)overhead
- a sorted list of sorted sublists to supports
l = SortedList([1,2,3,3,3,5,7,9])
# binary search
print(l.bisect_left(3)) # -> 2 O(logN)
print(l.bisect_right(3)) # -> 5 O(logN)
# indexed access
print(l[6]) # -> 7 O(logN)
print(l[0]) # 1 O(1)
print(l[~0]) # 9 O(1)
# index based slice access took O(logN) to create the iterator, O(1) per `next`
print(l[3:5]) # -> 3,3
print(list(l.islice(3, 5))) # -> 3,3
it = l.islice(1, 4)
print(next(it)) # 2
print(next(it)) # 3
print(next(it)) # 3
print(next(it)) # StopIteration exception
# both ends inclusive by default, set inclusive=(True, False) etc to alter behavior
# under the hood, two binary search are conducted to find the the two boundries,
# then chained iterators of the sublists are returned
# thus O(logN) to create the iterator, O(1) for each `next`
print(list(l.irange(3, 5))) # -> 3,3,3,5
# append element with `add` method, since this is really a set data structure.
l.add(4) # O(logN)
# extend by `update` O(mlog(n+m))
l.update([4, 8])
# delete element by value with `remove` or `discard`
l.remove(4) # O(logN)
l.discard(9) # safe when element is not in list
# index based `pop` # O(logN)
l.pop() # remove the largest
l.pop(-2) # remove second to the last
l.pop(3) # remove the 4th item
# count number of repeats for a given value
l.count(3) # O(logN) two binary search?
SortedSetis a data sctructure crossing aSortedListwith aHashSet:- That is, a python
setwithSortedListindex - Since indexed by a
SortedList, insert and delete are nowO(logN) - And the update, union, difference, intersection operations are all now
O(NlogN) containscheck is stillO(1), and also has acountmethod pretty much does the same thing- And slice / range in sorted order provided by the sorted list
- That is, a python
s = SortedSet([1,1,2,3,3,3,4,5])
print(s[1:3]) # -> [2, 3]
print(list(s.irange(2, 5))) # -> [2,3,4,5]
SortedDictis a data sctructure crossing aSortedListwith aHashMap:- That is, a python
dictwithSortedListindex - Since indexed by a
SortedList, insert and delete are nowO(logN) - And the update operations is now
O(NlogN) containscheck is stillO(1)- And slice / range on the keys in sorted order provided by the sorted list
- Additionally a indexed key-value look up method
peekitem(index=-1), which return the key-value pair, where based of the key’s index in sorted keys.
- That is, a python
m = SortedDict({3:4, 5:6, 7:0, 1:1, 2:3, }) # O(NlogN)
print(m.keys()[1:3]) # -> [2, 3] # O(logN + M)
print(list(m.islice(1:3))) # -> [2, 3] # O(logN + M)
print(list(s.irange(2, 5))) # -> [2,3,5] # O(logN + M)
print(m.peekitem(3)) # -> (5, 6)