Notes on Learning To Rank
We want to learn a function which takes in a query and a list of documents , and produces scores using which we can rank/order the list of documents.
There are multiple ways we can formulate the problem:
In this approach we learn , which scores the match-ness between the query and document independently. When scoring a data point, the function does not take other document in the list into consideration.
To train a model in this approach, the data would be in the long format where each row contains a pair and we need labels for every row. Either the label is binary(classification) or relevance scores(regression).
In this approach we learn , that is to learn to determine relevant preference between two documents given a query.
It can be treated as binary classification problem, the data would be in the format where each row contains a triplet of and we need a binary label for each row. We can hand crafting features that captures the difference between with respect to and feed that difference to a binary classifier.
Or more often we learn it in a pointwise fashion, by learning the intermediate rank function. Let , the pairwise classification problem becomes classification on the difference between rank scores. That is:
The loss would be the negative log of this likelihood, which is: and we can train the rank function to minimize this loss.
If we work out the graident with respect to the parameter in the rank function, it is:
As a result, a single gradient descent step with this gradient is doing a gradient ascent for and gradient descent for together with a weight of . That is, for a given pair of documents, we make the score of a more relevant document higher, and make the score of a less relevant document lower, and how much we perform the update is determined by the score difference.
To be continued.