Notes on Learning To Rank


We want to learn a function which takes in a query and a list of documents , and produces scores using which we can rank/order the list of documents.


There are multiple ways we can formulate the problem:

  1. Pointwise
  2. Pairwise
  3. Listwise


In this approach we learn , which scores the match-ness between the query and document independently. When scoring a data point, the function does not take other document in the list into consideration.

To train a model in this approach, the data would be in the long format where each row contains a pair and we need labels for every row. Either the label is binary(classification) or relevance scores(regression).


In this approach we learn , that is to learn to determine relevant preference between two documents given a query.

It can be treated as binary classification problem, the data would be in the format where each row contains a triplet of and we need a binary label for each row. We can hand crafting features that captures the difference between with respect to and feed that difference to a binary classifier.

Or more often we learn it in a pointwise fashion, by learning the intermediate rank function. Let , the pairwise classification problem becomes classification on the difference between rank scores. That is:

The loss would be the negative log of this likelihood, which is: and we can train the rank function to minimize this loss.

If we work out the graident with respect to the parameter in the rank function, it is:

As a result, a single gradient descent step with this gradient is doing a gradient ascent for and gradient descent for together with a weight of . That is, for a given pair of documents, we make the score of a more relevant document higher, and make the score of a less relevant document lower, and how much we perform the update is determined by the score difference.


To be continued.