BPR and Recommendation System

Bayesian Personalized Ranking from Implicit Feedback

When users shop online, they usually browse only the first few pages of websites. Besides, more and more people use tablets and mobiles to shop online, so it is indispensable to do personalized ranking.

When a user clicks an item to see its detail, it usually means some characteristics of the item is attractable. Thus, we can assume that users prefer those products than the others, and use them to rank our products after a user browses the website for a period of time.

We will introduce the paper, Bayesian Personalized Ranking from Implicit Feedback, which provides a method to handle this problem.

As the figure is shown below, we can build a matrix U (#items × #items) for the preferences of user u. If a user clicked the item i but didn’t click the item j, U[i][j] is positive, and U[j][i] is negative. If the user clicked both or neither of them, then we don’t assign any value to both U[i][j] and U[j][i].

Then, we can use Bayesian Personalized Ranking(BPR) to rank movies for users.

— BPR Concepts —

The author proposes BPR, which consists of the optimization criterion BPR-Opt and the algorithm LearnBPR for the optimization.

Rather than predicting a precise rating for each item, we only have to predict the relative user preferences for all pairs (i, j).

So we should maximize the number of correct predictions of all pairs (i, j), which is equivalent to maximizing AUC, the area under the ROC curve.

The equation above is the formula of AUC, where I is the set of all items, and Iu+ is the set of items user u clicked.

However, since AUC contains the Heaviside function, which is not differentiable, we can not use gradient descent to optimize AUC.

The Heaviside function.

BPR-OPT uses the logistic sigmoid function to replace the Heaviside function, so we can use gradient descent for optimization.

This figure is from the original paper[1]. BPR-OPT uses the sigmoid to replace Heaviside.

Then, we can gain a new equation as the figure above, which represents the probability that users preference. What LearnBPR does is to find the model’s parameters, Θ, to maximize BPR-OPT.

Furthermore, the user-specific likelihood function p(>u |Θ) can first be rewritten as the equation above with the following assumptions.

All users act independently with each other.
The ordering of each pair of items (i, j) for a specific user is independent.

Hence, the probability of all users preferences can be written as the following equation.

Since the log function is a strictly increasing function, we can define BPR-OPT as:

BPR-OPT is differential and we can calculate the gradient of BPR-OPT.

We can get a training dataset Ds which contains all pair (user u, item i, item j ) a from the metrics U, where the user prefers item i to item j.

Besides, the authors of the paper also mentioned that training on data sorted by users would cause the model unable to converge, so they recommend to use bootstrap sampling instead of going through all data in an epoch. The algorithm is as below.

— LearnBPR + Matrix Factorization Implementation —

LearnBPR can be used with different models by changing the definition of x_uij.

Matrix factorization is a popular way to implement recommendation systems. Let’s see how to combine BPR and matrix factorization.

What Matrix Factorization does is to find a user matrix and an item matrix to approach existing rating matrix by multiplying them.

We build a user matrix W and an item matrix H and get a user preference matrix X by multiplying W and H. However, in LearnBPR, we don’t have to approximate the existing rating matrix, we only want to solve if the user would rate the item i higher than the item j.

Step 1.

We need to use existing rating matrix to build a set Ds.

Step 2.