Ren’s Cabinet of Curiosities

BPE Tokenizer Implementation Exercise

2025-02-14T19:03:43+00:00

Partial solution to BPE Tokenizer Implementation Exercise from Andrej Karpathy.

Corresponding youtube video on the tokenizer topic.

import regex
import requests
from collections import Counter


GPT4_SPLIT_PATTERN = r"""'(?i:[sdmt]|ll|ve|re)|[^\r\n\p{L}\p{N}]?+\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]++[\r\n]*|\s*[\r\n]|\s+(?!\S)|\s+"""
SHAKESPEAR_TEXT_URL = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"


class BPETokenizer:
    def __init__(self, *, pattern=GPT4_SPLIT_PATTERN, special_tokens=[]):
        self.pattern = pattern  # regex pattern to split text into words
        self.special_tokens = special_tokens  # pre-allocated special tokens
        self.vocab = self._init_vocab()  # map byte to token id
        self.itob = {} # the reverse map of vocab, map token id to byte

    def _init_vocab(self) -> dict:
        vocab = {}
        for i in range(2**8):
            vocab[bytes([i])] = i
        for special_token in self.special_tokens:
            vocab[bytes(special_token.encode("utf-8"))] = len(vocab)
        return vocab

    def _get_stats(self, bytes_of_words: list[list[bytes]]) -> Counter:
        counts = Counter()
        # count the frequencey of each adjacent byte pairs result stat will be used to find merge rules.
        for bytes_of_word in bytes_of_words:
            for byte_pair in zip(bytes_of_word, bytes_of_word[1:]):
                counts[byte_pair] += 1
        return counts

    # split the text into words then for each word further split into bytes.
    def _parse_text(self, text: str) -> list[list[bytes]]:
        return [
            [bytes([b]) for b in word.encode("utf-8")]
            for word in regex.findall(self.pattern, text)
        ]

    def train(self, text: str, vocab_size: int, verbose: int = 0):
        bytes_of_words = self._parse_text(text)
        num_merges = vocab_size - len(self.vocab)
        if verbose:
            print(f"total {num_merges} merges to learn")

        for step in range(num_merges):
            # find the merge
            counts = self._get_stats(bytes_of_words)
            pair_to_merge = max(counts.keys(), key=counts.get)
            byte_pair = b"".join(pair_to_merge)
            self.vocab[byte_pair] = len(self.vocab)

            # apply the merge to training data
            temp_bytes_of_words = []
            for bytes_of_word in bytes_of_words:
                temp_bytes_of_word = []
                just_merged = False
                for first, second in zip(bytes_of_word, bytes_of_word[1:]):
                    if just_merged:
                        just_merged = False
                        continue
                    if (first, second) == pair_to_merge:
                        temp_bytes_of_word.append(byte_pair)
                        just_merged = True
                    else:
                        temp_bytes_of_word.append(first)
                if not just_merged:
                    temp_bytes_of_word.append(bytes_of_word[-1])
                temp_bytes_of_words.append(temp_bytes_of_word)
            bytes_of_words = temp_bytes_of_words

            if verbose and (step + 1) % verbose == 0:
                print(
                    f"merge discovered at step {step + 1} is : ",
                    f"{pair_to_merge[0]} + {pair_to_merge[1]} -> {byte_pair}",
                )

    def encode(self, text):
        bytes_of_words = self._parse_text(text)
        for bytes_of_word in bytes_of_words:
            # speed this up? only one instance of the lowest rank pair gets updated each time
            while True:
                min_idx = min_rank = merged_bytes = None
                # find the mergeable byte pairs with the lowest rank
                for i, byte_pair in enumerate(zip(bytes_of_word, bytes_of_word[1:])):
                    rank = self.vocab.get(byte_pair, None)
                    if rank is None:
                        continue

                    if min_rank is None or min_rank > rank:
                        min_rank = rank
                        min_idx = i
                        merged_bytes = b"".join(byte_pair)

                if min_rank is None:
                    break
                bytes_of_word = (
                    bytes_of_word[:min_idx] + [merged_bytes] + bytes_of_word[min_idx + 2:]
                )
        token_ids = [
            self.vocab[b] for bytes_of_word in bytes_of_words for b in bytes_of_word
        ]
        return token_ids

    def decode(self, token_ids):
        if not self.itob: self.itob = {v:k for k,v in self.vocab.items()}
        return b"".join((self.itob[i] for i in token_ids)).decode("utf-8") 

    def save(self):
        pass

    def load(self):
        pass

def read_text_from_url(url):
    try:
        response = requests.get(url)
        response.raise_for_status()  # Raise HTTPError for bad responses (4xx or 5xx)
        return response.text
    except requests.exceptions.RequestException as e:
        print(f"Error fetching data from URL: {e}")
        return None


if __name__ == "__main__":
    text = read_text_from_url(SHAKESPEAR_TEXT_URL)
    a = BPETokenizer(special_tokens=["", "", "", ""])
    a.train(text, vocab_size=1024, verbose=100)
    encoded = a.encode(text[:512])
    print(text[:512], "\n encoded as: \n", encoded)
    decoded = a.decode(encoded)
    print("decoded: ", decoded)
    print("equal to original text? ", decoded == text[:512])

Some random after thoughts:

The text used to train the tokenizer should ideally match the training/inference text distribution. If the training and inference distribution are quite different, maybe use a separated tokenizer. For example, the output is English comment and code only, while the input can be multi-language and more descriptive of the code we want to generate. Can we use a tokenizer of a smaller vocab size for output?
If a token id is never seen during the training run, its embedding will be random, prompting the model with such a token will cause undefined behavior. Eg., “solidgoldmagikarp”. Maybe run frequency counter on the token id seen during training, reject bad input, or reserve a unk token and map bad token to it?
For multilingual models, the tokenizer might be an important factor in determining the less performant language. Balancing the language mixture in tokenizer training data may help.
Larger vocabulary size leads to shorter encoded sequences, which allows more information to be retrained in the limited context window and, therefore, improves performance. On the flip side, it will require more memory for training and make the softmax more expensive at inference.

RLHF Reading Notes 1

2025-02-11T03:53:43+00:00

Glossed Overview: RLHF for LLM
Further Background Readings

Glossed Overview: RLHF for LLM

Reinforcement Learning from Human Feedback (RLHF) is a technique to integrate human preferences into AI systems, particularly for problems that are difficult to define explicitly.

The core RLHF process involves three steps:

training a capable language model
collecting human preference data to train a reward model
optimizing the language model using reinforcement learning guided by the reward model.

RLHF is a crucial part of “post-training” for LLM, a set of techniques to enhance model usability, including:

Supervised Instructional Finetuning for learning features of language that form the basis of the desired output format and the ability of instruction following.
Preference Finetuning for learning the output style and subtle alignment with human preferences and
Reinforcement Finetuning for further performance boosts in verifiable domains

Further Background Readings

RLHF in Deep Reinforcement Learning

P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” Advances in neural information processing systems, vol. 30, 2017.

Challenge: Difficulty in Specifying Reward Functions

Manually designing reward functions for complex tasks is incredibly difficult and often leads to unintended or suboptimal agent behavior.

Proposal: Learning from Human Preferences Instead of Explicit Rewards

Instead of trying to define a reward function directly, learn a reward function from human judgments about which behavior is better.

Details:

Generate Trajectory Pairs: The agent performs the task and generates pairs of trajectories (sequences of actions and states).
Human Preference Judgments: Humans are presented with these pairs of trajectories and asked to choose which one they prefer (which trajectory is “better” according to some criteria). Crucially, humans don’t need to explicitly define why one is better, just to indicate their preference.
Reward Model Training: These human preference judgments are used to train a reward model. This reward model learns to predict which trajectory a human would prefer. Essentially, it learns to approximate the underlying, implicit reward function based on human feedback.
Reinforcement Learning with Learned Reward Model: The trained reward model is then used as the reward signal for a standard deep reinforcement learning algorithm (like policy gradients or Q-learning). The agent is trained to maximize the reward predicted by the reward model, which in turn is aligned with human preferences.

Learning to summarize from human feedback

N. Stiennon et al., “Learning to summarize with human feedback,” Advances in Neural Information Processing Systems, vol. 33, pp. 3008–3021, 2020.

Challenge: Limitations of Traditional Summarization Metrics and Methods.

Traditional automatic summarization methods, often optimized using metrics like ROUGE, don’t always align well with human preferences for good summaries. ROUGE primarily measures n-gram overlap with reference summaries, which can be a crude proxy for summary quality. Furthermore, directly optimizing for metrics like ROUGE can lead to models that generate summaries that are grammatically correct but lack coherence, focus, or truly capture the essence of the original text as a human would.

Proposal: Training Summarization Models with Human Preference Feedback.

Similar to the Christiano et al. (2017) paper, this work proposes to move away from solely relying on automatic metrics and instead train summarization models using direct human feedback on the quality of generated summaries. The idea is to teach the model to generate summaries that humans prefer, rather than just those that score well on automatic metrics.

Details:

Pre-training a Summarization Model: Pre-train a sequence-to-sequence model for summarization.
Collecting Human Preference Data (Comparison Data): Collect human judgments by presenting human annotators with pairs of summaries generated by different models (or different versions of the same model). The annotators are asked to choose which summary is better based on criteria like:
- Helpfulness: Is the summary informative and useful?
- Relevance: Does the summary accurately reflect the content of the original document?
- Readability: Is the summary well-written and easy to understand?
- Non-redundancy: Does the summary avoid unnecessary repetition?
Training a Reward Model: The collected human preference data (pairs of summaries and the preferred one) is used to train a reward model. This reward model learns to predict which summary a human would prefer given an input document. The reward model is trained to assign higher scores to summaries that humans tend to prefer.
Fine-tuning the Summarization Model with Reinforcement Learning: The pre-trained summarization model is then fine-tuned using reinforcement learning. The reward signal for RL is provided by the trained reward model. The RL objective is to generate summaries that maximize the score given by the reward model, effectively guiding the summarization model towards generating summaries that are more human-preferred. They used Proximal Policy Optimization (PPO) algorithm for this RL fine-tuning stage.

Webgpt: Browser-assisted question-answering with human feedback

R. Nakano et al., “Webgpt: Browser-assisted question-answering with human feedback,” arXiv preprint arXiv:2112.09332, 2021.

Challenge: Limitations of Traditional Question Answering and the Need for Browser Assistance:

Traditional question-answering (QA) models rely solely on their internal knowledge or pre-indexed datasets. Many real-world questions require accessing and processing information from the open web to provide comprehensive and up-to-date answers. Furthermore, simply retrieving documents isn’t enough; the model needs to effectively browse, extract relevant information, and synthesize it into a coherent answer.

Proposal: WebGPT - A Browser-Assisted QA Model Trained with Human Feedback

WebGPT, a model that is trained to use a web browser to answer questions. It’s not just a language model; it’s an agent that can interact with the web in a controlled manner, including searching, clicking links, scrolling, and reading web pages. Crucially, WebGPT is trained using Reinforcement Learning from Human Feedback (RLHF) to generate answers that are helpful, truthful, and harmless.

Details: Browser-in-the-Loop Question Answering with RLHF:

Browser Environment: They created a simulated browser environment that WebGPT can interact with. This environment provides actions like searching, clicking links, scrolling, and observing the rendered web page content.
WebGPT Agent: WebGPT is a Transformer-based language model trained to act as an agent within this browser environment. Given a question, it decides on a sequence of browser actions to gather information and ultimately generate an answer.
Human Feedback Collection: Human evaluators are crucial. They are asked to compare pairs of answers generated by different models (including WebGPT and baseline models) and indicate which answer is better based on criteria like:
- Helpfulness: Is the answer useful and informative?
- Truthfulness/Accuracy: Is the answer factually correct and supported by evidence?
- Harmlessness: Is the answer safe and avoids harmful or biased content?
- Browser Usage Quality: Was the browsing process efficient and effective in finding relevant information?
Reward Model Training: The human preference data is used to train a reward model. This reward model learns to predict which answer a human would prefer, based on the quality criteria. It also learns to reward efficient and effective browser usage.
Reinforcement Learning Fine-tuning: WebGPT’s policy (how it decides to act in the browser and generate answers) is then fine-tuned using reinforcement learning (Proximal Policy Optimization). The reward signal comes from the trained reward model. The RL objective is to train WebGPT to perform browser actions and generate answers that maximize the reward predicted by the reward model, thus aligning with human preferences for helpful, truthful, and harmless answers.

Training language models to follow instructions with human feedback

L. Ouyang et al., “Training language models to follow instructions with human feedback,” Advances in neural information processing systems, vol. 35, pp. 27730–27744, 2022.

Challenge: Mismatch between Language Model Objectives and User Intent

A key problem with standard language models trained for next-token prediction: they are good at generating text that is statistically likely but not necessarily helpful, truthful, or harmless (the “alignment problem”). These models often generate outputs that are:

Unhelpful: Not actually answering the user’s question or fulfilling the user’s request.
Untruthful: Generating factually incorrect or misleading information.
Harmful: Producing biased, toxic, or unsafe content.

The core issue is that optimizing for next-token prediction alone doesn’t incentivize models to align with human intent and values.

Proposal: InstructGPT - Training Language Models to Follow Instructions via RLHF

The central solution proposed is InstructGPT, a language model specifically trained to follow instructions using Reinforcement Learning from Human Feedback (RLHF). The goal is to directly train the model to be helpful, truthful, and harmless, aligning its behavior with what humans actually want.

Details: A Three-Step RLHF Pipeline for Instruction Following:

Supervised Fine-tuning (SFT) on Instruction Data: First, fine-tune a pre-trained language model (in this case, a GPT-3 model) on a dataset of human-written demonstrations of instruction following. This dataset consists of prompts (instructions) and desired responses. This step teaches the model to initially understand and attempt to follow instructions.
Reward Model Training from Human Preference Data: Next, collect human preference data. Humans are presented with multiple responses generated by the SFT model for a given instruction. They are asked to rank these responses based on which one is better, considering factors like helpfulness, truthfulness, and harmlessness. This preference data is used to train a reward model. The reward model learns to predict which response a human would prefer for a given instruction. It essentially learns to score responses based on alignment with human values.
Reinforcement Learning Fine-tuning with the Reward Model: Finally, the SFT model is further fine-tuned using reinforcement learning (Proximal Policy Optimization). The reward signal for RL is provided by the trained reward model. The RL objective is to train the model to generate responses that maximize the reward predicted by the reward model. This step directly optimizes the language model for alignment with human preferences as captured by the reward model.

Training a helpful and harmless assistant with reinforcement learning from human feedback

Y. Bai et al., “Training a helpful and harmless assistant with reinforcement learning from human feedback,” arXiv preprint arXiv:2204.05862, 2022.

Challenge: Ensuring Harmlessness in AI Assistants trained with RLHF

While previous RLHF work (like InstructGPT) focused on helpfulness and truthfulness, this paper specifically tackles the challenge of ensuring harmlessness in AI assistants. They argue that directly relying on human feedback for all aspects of harmlessness can be problematic and potentially lead to inconsistent or biased judgments. It’s difficult for humans to consistently and comprehensively define “harmlessness” in all situations.

Proposal: Constitutional AI (CAI) - Using a Constitution to Guide Harmlessness Learning

Instead of directly asking humans to rate harmlessness in every instance, they propose to use a set of principles, or a “constitution,” to define and guide what constitutes harmless behavior. This constitution is used to:

Self-Critique: The AI assistant itself uses the constitution to critique its own responses and identify potentially harmful outputs.
Guide Reward Model Training: The constitution informs the training of the reward model, so the model learns to penalize responses that violate the constitutional principles.

Details: Two-Phase RLHF with Constitutional Guidance

Constitutional Reinforcement Learning (Constitutional RL):
- Agent Generates Responses: The AI assistant generates responses to prompts.
- Constitutional Critique: The assistant then uses the pre-defined constitution to critique its own generated responses. This critique identifies potential violations of the constitutional principles.
- Self-Correction: Based on the critique, the assistant refines or regenerates its response to better align with the constitution.
- Reward based on Constitutional Alignment: A reward signal is generated based on how well the response aligns with the constitution (i.e., how few constitutional violations it has). This phase trains the assistant to be constitutionally aligned.
Human Preference Reinforcement Learning (Preference RL):
- Agent Generates Pairs of Responses: The constitutionally trained assistant generates pairs of responses (often one from the constitutional RL phase and one from a baseline model, or variations of constitutionally aligned responses).
- Human Preference Judgments (Helpfulness): Humans are then asked to compare these pairs of responses and choose which one is more helpful (ignoring harmlessness at this stage, as harmlessness is already addressed in phase 1).
- Reward Model Training (Helpfulness Reward): Human preference data is used to train a reward model that specifically focuses on predicting human preferences for helpfulness.
- RL Fine-tuning with Helpfulness Reward: The constitutionally aligned assistant is further fine-tuned using reinforcement learning, but now with the reward signal from the helpfulness reward model. This phase trains the assistant to be helpful, while retaining the harmlessness learned in phase 1.

Python dictionary get with default value

2022-03-10T15:43:43+00:00

Yesterday I spent way more time than neccessary debugging a piece of python code. It has something to do with how the python dictionary’s get method works with default arguments. TLDR: it is ok when default is a simple value, not recommended if default is a function call.

Here is what the problematic code looks like:

def get_option(option_name):
    option = # some logic
    return option

x = d.get("option_1", default=get_option("option_1"))

The problem here is that the default argument is a pointer to the return value of a function call, thus the function will be evaluated before the body of the get method. So no matter whether the key exists or not, the function will always be evaluated. It can break if:

the function is not defined.
the function call raises an Error or Exception.

The fix is change it to:

x = d.get("option_1", default=None) or get_option("option_1")

So that the function call will be used as a last resort when key is not in the dictionary.

But one drawback is that, if the value is logically False, like a int value 0, it will still fallback to the function call.

So a better rewrite is:

x = d["option_1"] if "option_1" in d else get_option("option_1")

This is slightly different to collections.defaultdict. In which, the fallback default_factory callable is only called after the key check. But ofcourse, default_factory does not accept any argument, thus does not suit the usecase here.

Python sequentially unpacks tuple with assignment expression

2021-04-14T15:43:43+00:00

The other day, I shadowed an interview with a data science candidate. The primary focus is obviously not on coding skills, but we do want to assess basic knowledge of the programming language of his choice. So, my colleague asked a very simple python question to warm him up. The question is: ‘how do you swap values of two variables wtihout using a temprary variable?’. To my surprise the candidate had no clue it is as simple as a, b = b, a.

In most other languages this is not a valid statement. The reason it works in python is as follows:

the expression on the right hand side gets evaluated. As a result, a tuple of two elements (a, b) is created.
then python unpack this tuple, assign the values to each variables on the left hand side sequentially in the left to right order.

So if a = 5; b = 3. When python evaluates a, b = b, a:

It first create a tuple of (3, 5)
Then it assigns 3 to a
Finally it assigns 5 to b

This is very handy and readable. However, users may think, as long as they aligned elements on the left with the corresponding elements on the right, the swap should always work. Indeed, if there is no error, code will be excuted, but not always as intened. The sequential order of unpacking should be considered when we write multiple assignments via tuple unpacking to avoid any ‘suprising’ behavior.

Let’s say that we want to do a simple linked list reversal. In other programming language, we will usually use a temprary variable to hold the next node from curr to make sure we can advance to it after we rewire curr.next to prev:

def reverse_linked_list(head):
    prev, curr = None, head
    while curr:
        next_ = curr.next
        curr.next = prev
        prev = curr
        curr = next_
    return prev

With a quick drawing, you can picture how this rewiring works and verify it is correct.

With tuple unpacking, we can do the same thing in python like this:

while curr: curr.next, prev, curr = prev, curr, curr.next

This follows the assignments order from above with temprary variable swap pattern.

However this will also work.

while curr: prev, curr.next, curr = curr, prev, curr.next

At first read, one may feel the first two elements on both sides are out of order. But it does not matter, because the reference to the node objects are already stored in the tuple before the unpacking. However, this does not mean any order will work:

while curr: curr, prev, curr.next = curr.next, curr, prev
while curr: prev, curr, curr.next = curr, curr.next, prev

Both above will not work. Because the sequential unpacking. In the last two versions, after the first two unpacking, curr will in both cases become referencing to the second node in the linked list. And the last unpacking assignment will wirte this node’s next pointer to what prev was at the begining of this unpacking happened, which is None. As a result, the loop will throw error in the second iteration, as we will try to access .next from None.

If the above is hard to wrap your head around, the one-liner is roughly equivlent to:

while curr:
    snapshot = (prev, curr, curr.next)
    curr.next = snapshot[0]
    prev = snapshot[1]
    curr = snapshot[2]

Learning to rank

2019-10-31T15:43:43+00:00

Notes on Learning To Rank

Task

We want to learn a function $f(q, D)$ which takes in a query $q$ and a list of documents $D=\{d_1, d_2, ..., d_n\}$, and produces scores using which we can rank/order the list of documents.

Types

There are multiple ways we can formulate the problem:

Pointwise
Pairwise
Listwise

Pointwise

In this approach we learn $f(q,d)$, which scores the match-ness between the query and document independently. When scoring a data point, the function does not take other document in the list into consideration.

To train a model in this approach, the data would be in the long format where each row contains a $(q,d)$ pair and we need labels for every row. Either the label is binary(classification) or relevance scores(regression).

Pairwise

In this approach we learn $Pr(rank(d_i,q)\succ rank(d_j,q))$, that is to learn to determine relevant preference between two documents given a query.

It can be treated as binary classification problem, the data would be in the format where each row contains a triplet of $(q,d_i,d_j)$ and we need a binary label for each row. We can hand crafting features that captures the difference between $d_i,d_j$ with respect to $q$ and feed that difference to a binary classifier.

Or more often we learn it in a pointwise fashion, by learning the intermediate rank function. Let $rank(d_i,q)=s_i$ , the pairwise classification problem becomes classification on the difference between rank scores. That is:

\[Pr(rank(d_i,q) > rank(d_j,q))=\frac{1}{1+exp(-(s_i-s_j))}\]

The loss would be the negative log of this likelihood, which is: $L_{ij}=log(1+exp(s_j-s_i))$ and we can train the rank function to minimize this loss.

If we work out the graident with respect to the parameter in the rank function, it is:

\[\begin{aligned} \frac{\partial L_{ij}}{\partial \theta}&=\frac{\partial L_{ij}}{\partial s_i}\frac{\partial s_i}{\partial \theta} + \frac{\partial L_{ij}}{\partial s_j}\frac{\partial s_j}{\partial \theta} \\ &=-\frac{1}{1 + exp(s_i-s_j)}(\frac{\partial s_i}{\theta} - \frac{\partial s_j}{\theta}) \\ &=\lambda_{ij}(\frac{\partial s_i}{\theta} - \frac{\partial s_j}{\theta}) \end{aligned}\]

As a result, a single gradient descent step with this gradient is doing a gradient ascent for $s_i$ and gradient descent for $s_j$ together with a weight of $\lambda_{ij}$. That is, for a given pair of documents, we make the score of a more relevant document higher, and make the score of a less relevant document lower, and how much we perform the update is determined by the score difference.

Listwise

To be continued.

Just when you think you know shake up enough

2019-10-25T18:48:00+00:00

Yester day I experienced my biggest shakeup ever since I started participating on Kaggle. I dropped from silver zone to nowhere. The competition is an image segmentation competition, and based on past experience, CV competitions has relatively low shakeups(compare to tabular/transaction/time series predictions). I have seen CV competitions with low shakeups even when there is obvious train/test distribution difference. So this comes a bit surprise, but honestly, it is not all that bad.

I have chosen a risky strategy by training models in full training set instead of using cross validations, and relying solely on public leaderboard for feedbacks, in order to fit 5 times more models from different architectures. This is actually all fine, my ensemble of 7 models with no threshold tuning based on public board still could retain my silver medal position. It was the thresholding that sunk my ranks. A lesson well learnt.

Iterate over an iterable multiple times

2019-08-22T19:50:00+00:00

I was working on a piece of code today and I need to iterate over a iterable multiple times to do some computes. The body of code is the same for all passes. One obvious thing I can do is to do double loops:

results = []
for t in range(n_repeats):
    for item in get_iterable():
        results.append(f.process(item))

The reason that I need to loop over it multiple times rather than loop once and duplicate results is that f has an internal state, it gives a different output based on the number of times it has seen the input. The above code did work, but not as nice as I’d like. I utilized chain and repeat from itertools to clean up it a bit:

from itertools import chain, repeat
results = []
for item in chain(*repeat(get_iterable(), n_repeats)):
    results.append(f.process(item))

It does the exactly same thing, but the code reads more like straight English.

Clean python code to get a reverse mapping

2019-08-21T15:40:00+00:00

When working with machine learning problems, often I use python dictionary to map categorical values to its integer encoded values. Something like:

import string
feature_encoder = {v:i for i, v in enumerate(string.ascii_lowercase)}

To get back the original value, I need to have a reverse mapping, I used to create it with:

feature_decoder = {v:k for k, v in feature_encoder.items()}

This is fine, but I dislike it for that I have to spend some mental effort to read it to know what I was doing the next time I read my code. And today I found a nicer way to get the reverse mapping.

feature_decoder = dict(map(reversed, feature_encoder.items()))

It is doing exactly the same thing, but I can just read it as an English sentence to know what is going on.

Hello Jekyll on Github Pages

2019-05-22T02:39:43+00:00

My current website hosting costs me $7.99 a month plus ~$20 for domain name per year. I have been paying this amount for a little more than 3 years, and finally decided to cut the hosting fee by migrating over to github pages.