COMPARATIVE ANALYSIS OF NEIGHBORHOOD- BASED APPROACHE AND MATRIX FACTORIZATION IN RECOMMENDER SYSTEMS

O . C h e r t o v Doctor of technical sciences, Head of the Department* E-mail: chertov@i.ua A . B r u n PhD, Associate Professor** E-mail: armelle.brun @loria.fr A . B o y e r PhD, Professor, Head of the KIWI research team** E-mail: anne.boyer@loria.fr M . A l e k s a n d r o v a * PhD student** E-mail: rita.v.aleksandrova@gmail.com *Applied Mathematics department National Technical University of Ukraine “Kyiv Polytechnic Institute” 37, Prospect Peremohy, Kyiv, Ukraine, 03056 **Lorraine Research Laboratory in Computer Science and its Applications (LORIA) University of Lorraine Campus scientifique, BP 239, Vandoeuvre-lès-Nancy Cedex, France, 54506 В статті описаний взаємозв’язок між двома методами колаборативної фільтрації: методом найближчих сусідів та методом матричної факторизації, які, зазвичай, представляються як протилежні. В даній роботі показано, що обидва підходи є взаємопов’язаними: процес оцінки рейтингів є схожим і, за певних умов, елементи, що використовуються обома підходами, мають високе значення взаємної кореляції, але не є ідентичними Ключові слова: колаборативна фільтрація, метод найближчих сусідів, матрична факторизація, інтерпретація латентних характеристик


Introduction
The amount of digital information produced by humanity grows exponentially from year to year [1], which makes the process of useful information search more and more difficult.That is why the development of different approaches and systems that help people navigate digital information available is in demand.
One of the classes of systems that help solve such kind of tasks is the class of recommender systems (RS).Recommender systems aim at recommending users some items that are likely to interest them.They are intensively used in many domains, such as e-commerce, e-tourism, e-learning, etc., and help not only contribute to the satisfaction of the user, but also increase profits of commercial systems.The task of rating prediction by the RS can be considered as a task of filling unknown values of a rating matrix, in which each row represents a user and each column -an item.The intersection of a specific row and column reveals the rating of the current user on the current item.
There are three categories of recommendation algorithms [2]: content-based, collaborative filtering and hybrid approaches.Content-based approaches [3] recommend to the active user those items, which are similar to the items already highly appreciated by him.The main drawback of this kind of methods is that the system cannot follow the change of preferences and tastes of the user.Collaborative filtering (CF) [4] relies on the ratings of other users while estimating unknown user preferences.Hybrid-based approaches [5] use the ideas of both content and collaborative-based recommendation algorithms.
Collaborative filtering is proven to result in accurate recommendations and are widely used, especially in the cases when either no or not sufficient amount of content information (information about the items and their similarity) is provided.Two major approaches are used in CF-based recommender systems: the neighborhood-based approach and the matrix factorization-based approach.The Neighborhood-based approach (NB) [6] relies on the preferences of the user's neighbors (other users with similar preferences) Математика и кибернетика -прикладные аспекты to estimate his/her preferences.The Matrix Factorization (MF) [7] is a relatively new approach.MF represents the relation between users and items through a set of latent factors (also called features).It forms two low-rank matrices, each representing the relation between users (or items) and this set of features.The multiplication of these two matrices allows estimating users' future preferences.Although Matrix Factorization does not have the same intuitive interpretation as NB-based approaches, it was proven to result in accurate recommendations, especially in the case of sparse input rating matrices [7].
We believe that interpretation of features as real users can reveal the deep ideological interconnection of these two approaches.This can lead to the qualitatively new understanding of the basic Collaborative Filtering algorithms and can open new possibilities for their joined usage.

Analysis of Published Works and Problem Statement
MF and NB are usually presented as opposed approaches [8] as they rely on different elements: either neighbors or latent features (the latter do not have specific physical meaning).They have never been compared in other terms than their respective performance, for example in terms of Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) [9].However, the objective of this paper is joint analysis of the ratings estimation processes of Matrix Factorization and Neighborhood-based approaches, which, according to our knowledge, have not been presented in other works.
Recently, we proposed to interpret features of MF-based approach as users [10,11], referred to as representative users (RU).We assumed that a feature k represents a user if the vector of this user has a canonical form: it is a unitary column with only k one non-zero and equal to 1 element on position .We have shown that Non-negative Matrix Factorization with Multiplicative Update Rules (for the sake of simplicity further referred to as NMF) naturally results in factorization matrices that have vectors with a form close to the one previously described [11].Other works, dedicated to feature interpretation in MF-based approaches proposed to interpret them as behavioral patterns [12] or groups of users [13].However, we believe that features interpretation as real users can make a link between so different otherwise NB and MF approaches.

Purpose and objectives of the study
The objective of this paper is to propose a connection between NB and MF through the notion of representative users.
In order to fulfill this goal the following tasks were performed: 1. Comparative analysis of MF and NB-based approaches.
2. Proposition of a model for connecting MF and NB through the notion of representative users.
3. Validation of the proposed model.

Algorithmic Analysis of NB and MF
To recommend items to a user, called the active user, both NB and MF aim at estimating u a 's ratings on the items that he/she has not rated yet.Let U be the set of users (of size M) and I the set of items (of size N).In order to perform this estimation, both approaches rely on users' ratings, represented as a rating matrix R, where r ui is the rating that a user u assigned to an item i.

1. The MF Approach
Matrix factorization is an unsupervised learning method for latent variable decomposition [14].It has recently received great popularity, especially since the Netflix Price Competition [7].
MF assumes that a small number of latent factors influences users' ratings.It aims at forming two low rank matrices W and V, with , where K is the number of features.The product of both matrices approximates the rating matrix: T ≈ R W V. W and V respectively represent the extent to which users and items are related to these latent factors.
To get the estimated rating of an active user a u on an item i, MF calculates the dot product of the two vectors in W and V that correspond to a u and i. Features obtained with MF algorithms usually don't have any physical sense.
Factor matrices W and V correspond to the solutions of an optimization task , which can be obtained with such algorithms as Alternating Least Squires (ALS) [15] and Stochastic Gradient Descend (SGD) [16].
where • denotes Euclidian norm.Non-negative Matrix Factorization is a variant of MF, which forces the values in both matrices to be non-negative.Non-negative factor matrices can be obtained through posing corresponding conditions on solutions obtained with ALS and SGD methods (first group); or through a special optimization procedure, that ensures non-negativity of matrix elements (second group).One popular approach in the second group is Multiplicative Update Rules [17], which updates factor matrices according to the formulae

2. The NB Approach
The NB approach, which has emerged from the beginning of CF [18], assumes that users' preferences are correlated and that similar users rate items similarly.To estimate the rating of an active user a u on item i, this approach exploits the ratings of a set of users similar to a u : his/her neighbors.The NB technique defines the neighbors of a u as the set of his K most similar users who rated item i.
The identification of neighbors thus relies on a similarity measure between users (for this reason a similarity matrix S (dim( ) M M = × S ) is computed).This measure is generally calculated by the Cosine similarity or the Pearson correlation coefficient [6].The Cosine similarity, contrary to the Pearson correlation, always results in non-negative values if input vectors are non-negative and is computed by the formula ( ) The similarity measure is also used to estimate the rating of a u on item i, as the weight associated to neighbors.Estimated ratings are usually evaluated with equation .
( ) where a u U is the set of the K nearest neighbors of a u , who have rated i.

1. Identification of Representative Users
In [11] we have shown that feature of NMF can be associated to a set of real users (representative users), also an algorithm of RU identification was proposed.This algorithm consists of 6 steps presented on fig. 1 and further detailed below.
Step 1.A traditional matrix factorization is performed, resulting in both matrices W and V with K features.
Step 2. A normalization of each of the M column vectors of the matrix W is performed to result in unitary columns.The resulting normalized matrix is denoted by norm W and the set of normalization coefficients is denoted by C.
Step 3.This step is dedicated to the identification of the representative users in the norm W matrix.As shown in [11] first all users are divided on groups of preimage candidates for each feature according the position of the maximum element in the column-vector norm w : a user w for whom the maximum of the column-vector norm w is situated on the position k belongs to the preimage group of the k-th feature.After this the quality score q of the each preimage candidate m w is computed using the formula , and the user with the highest quality score among all candidates is considered as the representative one for the feature k.
where ( ) -is an Euclidian distance between vectors 1 v and 2 v ; k f -k -th canonic column-vector, with one non-zero element situated on the position k; ( ) max dist K -the maximum distance between a preimage candidate and a canonic column-vector of dimensionality K.As shown in [11], the maximum distance is computed by a formula ( ) Once all RU are identified, the matrix norm is modified in the following way: for every column-vector k ′ w  , which corresponds to a representative user of the feature k, all values are set to 0, except the one on the position k, that is set to 1.This transformation performs one-to-one mapping of representative users and corresponding features.The resulting modified matrix is the matrix mod norm W . Fig. 2 presents an example of such transformation.For the sake of simplicity, all representative users are grouped on the left part of the matrix.In some cases, a feature, say feature k, may have no candidate preimage.In this case, we can either decrease the number of features considered for factorization or search for a vector with the second maximum situated on that specific position.
Step 4. Each column of the matrix mod norm W is multiplied by the appropriate normalization factor from the set C Математика и кибернетика -прикладные аспекты (Fig. 2).After this, representative users will remain preimages of the features but with scaling coefficients.
Step 5.In order to obtain the best model matrix V can be modified under the condition of minimal loss.Modification of V can be performed using optimization methods with the starting value obtained after the first step.
Step 6.The resulting recommendation model is made up of matrices mod

W
and V (or mod W and mod V ).

2. Connection between NB and MF
Let us compare the rating estimation processes of MF and NB .
Both equations perform a sum.In the case of NB, this sum is made of the K nearest neighbors.In the case of MF, it is made of the K features.If the number of neighbors is equal to the number of features, then both sums use the same number of terms.Focusing in details on the terms that are summed up, we can find additional similar points.First, the element n,i r in equation is the rating of the neighbor user n on item i.The element k,i v in the equation represents to what extent item i is related to feature k.In the case features are interpreted as representative users, we rise question (1): does k,i v correspond to the rating of the th k representative user on item i?If yes, both elements n,i r and k,i v can be linked to each other and matrix V can be considered as an approximation of a rating matrix.Second, the element If there is actually a correspondence between these elements, we can conclude that the estimation processes of NB and MF are similar.The questions we raise are schematically presented in Fig. 3.
We have to mention here that there is a big difference between both processes: the set of features (representative users) is unique, whereas the set of neighbors is dependent on each pair (user, item).However, it was shown in [19] that exploiting a unique set of neighbor users in NB, leads to a high quality of recommendations (low MAE).NB and MF may thus be considered as similar.
In the following section, we conduct experiments on a benchmark dataset, to determine if the elements used in the estimation process of NB and of NMF are similar.Because NMF algorithm was used to perform Matrix Factorization, NB with cosine similarity was considered, to ensure non-negativity of both models.

Experimental Analysis of NB and MF Rating Estimation Processes
We conduct the experiments on the 100k MovieLens dataset [20], which contains 100 000 ratings, ranging from 1 to 5, assigned by 943 users to 1682 items.80 % of the ratings are randomly chosen to form the learning set and the 20 % remaining ratings are used for the test set.The number of features used for NMF is K 10 = (following the experiments in [11], where the best results were obtained with K 10 = ), and the number of neighbors used for NB is K 10 = .The accuracy of the models is evaluated with the standard mean absolute error (MAE), computed by formula, where L corresponds to the number of ratings in the test set, l r represents a rating value from the test set and * l r -the corresponding estimated value.
We first aim at answering question (1): can matrix from NMF, be considered as a rating matrix?With NMF, after identifying the users that correspond to the features (the representative users), we study if the values in V correspond to the ratings of the representative users in matrix R. We calculate the cosine similarity between the corresponding lines in the matrices.
The resulting average similarity is 0.972, with a standard deviation equal to 0.013.This shows that matrix V is highly similar to the lines in R , that correspond to representative users.We can thus answer question (1): matrix V can be considered as an approximation of the rating matrix of the representative users.
Based on this answer, we can now raise question (2): can matrix W be considered as a similarity matrix between representative users and all users in the system?As in the previous case, we calculate the cosine similarity between lines of matrix W and lines of matrix S that correspond to representative users.The resulting average similarity value is 0.666, with a standard deviation equal to 0.110.We can conclude that matrices W and S are fairly similar, even if they are less similar than V and R.
As V is highly related to R, we perform an additional experiment.We force V to contain the rating values from R (those of the representative users).We run one additional iteration of NMF to update W and we study if the resulting matrix W is closer to S or not.First, we assign the value 0 to unknown rating values (model In this case the mean and standard deviation values of the similarity between vectors in matrix W and the corresponding vectors in S are equal to 0.741 and 0.089.Second, we assign the values of V (from NMF) to unknown rating values (model ).The resulting mean and standard deviation values are 0.671 and 0.110.Results of similarity analysis of different elements of MF and NB approaches are summarized in Table 1.
We can conclude that filling V with ratings increases the similarity between W and S, especially when V is initialized with the value 0 in the case of unknown ratings.

Fig. 1 .
Fig. 1.RU identification algorithm n in equation represents the similarity between user a u and his/her neighbor user n.The element a u ,k w in equation represents to what extent user a u is related to feature k.As this feature is interpreted as a user, to each other and matrix W can be considered as an approximation of the similarity matrix W.