# Notes on Learning To Rank

We want to learn a function $f(q, D)$ which takes in a query $q$ and a list of documents $D=\{d_1, d_2, ..., d_n\}$, and produces scores using which we can rank/order the list of documents.

## Types

There are multiple ways we can formulate the problem:

1. Pointwise
2. Pairwise
3. Listwise

### Pointwise

In this approach we learn $f(q,d)$, which scores the match-ness between the query and document independently. When scoring a data point, the function does not take other document in the list into consideration.

To train a model in this approach, the data would be in the long format where each row contains a $(q,d)$ pair and we need labels for every row. Either the label is binary(classification) or relevance scores(regression).

### Pairwise

In this approach we learn $Pr(rank(d_i,q)\succ rank(d_j,q))$, that is to learn to determine relevant preference between two documents given a query.

It can be treated as binary classification problem, the data would be in the format where each row contains a triplet of $(q,d_i,d_j)$ and we need a binary label for each row. We can hand crafting features that captures the difference between $d_i,d_j$ with respect to $q$ and feed that difference to a binary classifier.

Or more often we learn it in a pointwise fashion, by learning the intermediate rank function. Let $rank(d_i,q)=s_i$ , the pairwise classification problem becomes classification on the difference between rank scores. That is:

The loss would be the negative log of this likelihood, which is: $L_{ij}=log(1+exp(s_j-s_i))$ and we can train the rank function to minimize this loss.

If we work out the graident with respect to the parameter in the rank function, it is:

As a result, a single gradient descent step with this gradient is doing a gradient ascent for $s_i$ and gradient descent for $s_j$ together with a weight of $\lambda_{ij}$. That is, for a given pair of documents, we make the score of a more relevant document higher, and make the score of a less relevant document lower, and how much we perform the update is determined by the score difference.

To be continued.