Python
Notes on python built-in modules
1. itertools
1.1 pairwise(iterable)
pairwise('ABCDEFG') --> AB BC CD DE EF FG
- Roughly equivalent to:
def pairwise(iterable):
a, b = tee(iterable)
next(b, None)
return zip(a, b)
1.2 Chain(*iterables)
chain('ABC', 'DEF') --> A B C D E F
- Roughly equivalent to:
def chain(*iterables):
for it in iterables:
for element in it:
yield element
- Example use case: with certain sliding window algorithm to simplify code to avoid a dangling boundry case evaluation.
def large_group_positions(s):
curr, l, large_groups = '', 0, []
for r, char in enumerate(chain(s, " ")):
if char != curr:
if r - l >= 3: large_groups.append([l, r - 1])
curr, l = char, r
return large_groups
1.3 product(*iterables, repeat=1)
- Roughly equivalent to the following, the actual implementation does not build the intermediate results in memory:
def product(*iterables, repeat=1):
pools = [tuple(pool) for pool in iterables] * repeat
result = [[]]
for pool in pools:
result = [x + [y] for x in result for y in pool]
for product in result:
yield tuple(prod)
- Use cases: Simplify nested loops.
nrow, ncol = len(matrix), len(matrix[0]) for r, c in product(range(nrow), range(ncol)): if matrix[r][c] in visited: continue dfs(r, c)
2. collections
2.1 Counter
- A
dict
that keep track of the counts for elemnts from a collection. elements()
: returns the elemnts with non-negative counts, each unique key is duplicated to match its count.most_common(n=None)
: returns (element, count) tuple sorted by the count in decreasing order stop at nth tuple if specified.total()
: returns the sum of counts.a.substract(b)
:a[e] = a.get(e, 0) - b[e]
fore
inset.union(set(a.keys()), set(b.keys()))
a.update(b)
:a[e] = a.get(e, 0) + b[e]
fore
inset.union(set(a.keys()), set(b.keys()))
> c = Counter(a=3, b=1)
> d = Counter(a=1, b=2)
> c + d # add two counters together: c[x] + d[x]
Counter({'a': 4, 'b': 3})
> c - d # subtract (keeping only positive counts)
Counter({'a': 2})
> c & d # intersection: min(c[x], d[x])
Counter({'a': 1, 'b': 1})
> c | d # union: max(c[x], d[x])
Counter({'a': 3, 'b': 2})
> c == d # equality: c[x] == d[x]
False
> c <= d # inclusion: c[x] <= d[x]
False
2.2 OrderedDict
- Since Python 3.7, regular
dict
will remember the order of keys inserted, but not able to rearrange the order of keys.OrderedDict
provides the ability to do so. - Under the hood, it is implemented with crossing doubly linked list with hash map. See here
- The rearrange of key can only happen to either end of the linkedlist:
move_to_end(key, last=True)
- move existing key to either end
- last = True -> right end; last = False -> left end
- This is done by short the link for the given key node. e.g.
node = map[key]; node.prev.next = node.next
- Then attach it to the specified end of the linked list. e.g.
node.prev, node.next, sentinel.next = sentinel, sentinel.next, node
- As a result, we can not only pop a given key, we can also pop key from either end.
pop(key)
popitem(last=True)
- return and removes a key, value pair
- with argument
last = True
-> LIFO; - with argument
last = False
-> FIFO
- Also supports
reversed()
on the key, value, item views - Can be used to implement LRU cache.
class LastUpdatedOrderedDict(OrderedDict):
def __setitem__(self, key, value):
super().__setitem__(key, value)
super().move_to_end(key)
class LRUCache(OrderedDict):
def __init__(self, capacity: int):
self.capacity = capacity
def get(self, key: int) -> int:
if key not in self: return None
super().move_to_end(key) # get is a use.
return self[key]
def put(self, key: int, value: int) -> None:
if key not in self and len(self) == self.capacity:
super().popitem(last=False) # pop the least recent used to make space
self[key] = value
super().move_to_end(key) # put is also a use
2.3 defaultdict(default_factory=None,/[,...])
- A subclass of
dict
which overrides the__missing__(key)
method(which is called by__getitem__()
). - If a key is not present in the object, it create a value for the key by calling the
default_factory
. default_factory
should be a callable, and the result by calling the callable without argument will be the default value for the missing key.- As a result
__getitem__(key)
will always creates an entry when key is missing, which sometimes may cause un-wanted behavior. We can resetdefault_factory
toNone
once we are done inserting. - Use Cases:
adj_lists = defaultdict(list)
for f, t in edge_lists:
adj_lists[f].append(t) # instead of adj_lists.setdefault(f, []).append(t)
3 sortedcontainers
SortedList
is a data sctructure consists of:- a sorted list of sorted sublists to supports
~O(logN)
search, insert and delete - and coupled with a tree strcutre which to supports
~O(logN)
time indexed access - and iterator / slice operations on the list also have a
~O(logN)
overhead
- a sorted list of sorted sublists to supports
l = SortedList([1,2,3,3,3,5,7,9])
# binary search
print(l.bisect_left(3)) # -> 2 O(logN)
print(l.bisect_right(3)) # -> 5 O(logN)
# indexed access
print(l[6]) # -> 7 O(logN)
print(l[0]) # 1 O(1)
print(l[~0]) # 9 O(1)
# index based slice access took O(logN) to create the iterator, O(1) per `next`
print(l[3:5]) # -> 3,3
print(list(l.islice(3, 5))) # -> 3,3
it = l.islice(1, 4)
print(next(it)) # 2
print(next(it)) # 3
print(next(it)) # 3
print(next(it)) # StopIteration exception
# both ends inclusive by default, set inclusive=(True, False) etc to alter behavior
# under the hood, two binary search are conducted to find the the two boundries,
# then chained iterators of the sublists are returned
# thus O(logN) to create the iterator, O(1) for each `next`
print(list(l.irange(3, 5))) # -> 3,3,3,5
# append element with `add` method, since this is really a set data structure.
l.add(4) # O(logN)
# extend by `update` O(mlog(n+m))
l.update([4, 8])
# delete element by value with `remove` or `discard`
l.remove(4) # O(logN)
l.discard(9) # safe when element is not in list
# index based `pop` # O(logN)
l.pop() # remove the largest
l.pop(-2) # remove second to the last
l.pop(3) # remove the 4th item
# count number of repeats for a given value
l.count(3) # O(logN) two binary search?
SortedSet
is a data sctructure crossing aSortedList
with aHashSet
:- That is, a python
set
withSortedList
index - Since indexed by a
SortedList
, insert and delete are nowO(logN)
- And the update, union, difference, intersection operations are all now
O(NlogN)
contains
check is stillO(1)
, and also has acount
method pretty much does the same thing- And slice / range in sorted order provided by the sorted list
- That is, a python
s = SortedSet([1,1,2,3,3,3,4,5])
print(s[1:3]) # -> [2, 3]
print(list(s.irange(2, 5))) # -> [2,3,4,5]
SortedDict
is a data sctructure crossing aSortedList
with aHashMap
:- That is, a python
dict
withSortedList
index - Since indexed by a
SortedList
, insert and delete are nowO(logN)
- And the update operations is now
O(NlogN)
contains
check is stillO(1)
- And slice / range on the keys in sorted order provided by the sorted list
- Additionally a indexed key-value look up method
peekitem(index=-1)
, which return the key-value pair, where based of the key’s index in sorted keys.
- That is, a python
m = SortedDict({3:4, 5:6, 7:0, 1:1, 2:3, }) # O(NlogN)
print(m.keys()[1:3]) # -> [2, 3] # O(logN + M)
print(list(m.islice(1:3))) # -> [2, 3] # O(logN + M)
print(list(s.irange(2, 5))) # -> [2,3,5] # O(logN + M)
print(m.peekitem(3)) # -> (5, 6)