Repository: spark
Updated Branches:
  refs/heads/master 1c04652c8 -> a6e0afdcf


SPARK-2085: [MLlib] Apply user-specific regularization instead of uniform 
regularization in ALS

The current implementation of ALS takes a single regularization parameter and 
apply it on both of the user factors and the product factors. This kind of 
regularization can be less effective while user number is significantly larger 
than the number of products (and vice versa). For example, if we have 10M users 
and 1K product, regularization on user factors will dominate. Following the 
discussion in [this 
thread](http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tt2567.html#a2704),
 the implementation in this PR will regularize each factor vector by #ratings * 
lambda.

Author: Shuo Xiang <[email protected]>

Closes #1026 from coderxiang/als-reg and squashes the following commits:

93dfdb4 [Shuo Xiang] Merge remote-tracking branch 'upstream/master' into als-reg
b98f19c [Shuo Xiang] merge latest master
52c7b58 [Shuo Xiang] Apply user-specific regularization instead of uniform 
regularization in Alternating Least Squares (ALS)


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a6e0afdc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a6e0afdc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a6e0afdc

Branch: refs/heads/master
Commit: a6e0afdcf0174425e8a6ff20b2bc2e3a7a374f19
Parents: 1c04652
Author: Shuo Xiang <[email protected]>
Authored: Thu Jun 12 17:37:06 2014 -0700
Committer: Xiangrui Meng <[email protected]>
Committed: Thu Jun 12 17:37:06 2014 -0700

----------------------------------------------------------------------
 .../scala/org/apache/spark/mllib/recommendation/ALS.scala    | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/a6e0afdc/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
index f1ae7b8..cc56fd6 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
@@ -507,6 +507,9 @@ class ALS private (
     val tempXtX = DoubleMatrix.zeros(triangleSize)
     val fullXtX = DoubleMatrix.zeros(rank, rank)
 
+    // Count the number of ratings each user gives to provide user-specific 
regularization
+    val numRatings = Array.fill(numUsers)(0)
+
     // Compute the XtX and Xy values for each user by adding products it rated 
in each product
     // block
     for (productBlock <- 0 until numProductBlocks) {
@@ -519,6 +522,7 @@ class ALS private (
         if (implicitPrefs) {
           var i = 0
           while (i < us.length) {
+            numRatings(us(i)) += 1
             // Extension to the original paper to handle rs(i) < 0. confidence 
is a function
             // of |rs(i)| instead so that it is never negative:
             val confidence = 1 + alpha * abs(rs(i))
@@ -534,6 +538,7 @@ class ALS private (
         } else {
           var i = 0
           while (i < us.length) {
+            numRatings(us(i)) += 1
             userXtX(us(i)).addi(tempXtX)
             SimpleBlas.axpy(rs(i), x, userXy(us(i)))
             i += 1
@@ -550,9 +555,10 @@ class ALS private (
       // Compute the full XtX matrix from the lower-triangular part we got 
above
       fillFullMatrix(userXtX(index), fullXtX)
       // Add regularization
+      val regParam = numRatings(index) * lambda
       var i = 0
       while (i < rank) {
-        fullXtX.data(i * rank + i) += lambda
+        fullXtX.data(i * rank + i) += regParam
         i += 1
       }
       // Solve the resulting matrix, which is symmetric and positive-definite

Reply via email to