spark git commit: [SPARK-12632][PYSPARK][DOC] PySpark fpm and als parameter desc to consistent format

mlnick Mon, 22 Feb 2016 02:51:03 -0800

Repository: spark
Updated Branches:
  refs/heads/master 024482bf5 -> e298ac91e



[SPARK-12632][PYSPARK][DOC] PySpark fpm and als parameter desc to consistent 
format

Part of task for 
[SPARK-11219](https://issues.apache.org/jira/browse/SPARK-11219) to make 
PySpark MLlib parameter description formatting consistent.  This is for the fpm 
and recommendation modules.

Closes #10602
Closes #10897

Author: Bryan Cutler <[email protected]>
Author: somideshmukh <[email protected]>

Closes #11186 from BryanCutler/param-desc-consistent-fpmrecc-SPARK-12632.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e298ac91
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e298ac91
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e298ac91

Branch: refs/heads/master
Commit: e298ac91e3f6177c6da83e2d8ee994d9037466da
Parents: 024482b
Author: Bryan Cutler <[email protected]>
Authored: Mon Feb 22 12:48:37 2016 +0200
Committer: Nick Pentreath <[email protected]>
Committed: Mon Feb 22 12:48:37 2016 +0200

----------------------------------------------------------------------
 docs/mllib-collaborative-filtering.md           |   3 +-
 .../org/apache/spark/mllib/fpm/FPGrowth.scala   |   2 +-
 .../org/apache/spark/mllib/fpm/PrefixSpan.scala |   6 +-
 .../apache/spark/mllib/recommendation/ALS.scala | 114 +++++++++----------
 .../MatrixFactorizationModel.scala              |   4 +-
 python/pyspark/mllib/fpm.py                     |  47 +++++---
 python/pyspark/mllib/recommendation.py          |  89 ++++++++++++---
 7 files changed, 164 insertions(+), 101 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/e298ac91/docs/mllib-collaborative-filtering.md
----------------------------------------------------------------------
diff --git a/docs/mllib-collaborative-filtering.md 
b/docs/mllib-collaborative-filtering.md
index b8f0566..5c33292 100644
--- a/docs/mllib-collaborative-filtering.md
+++ b/docs/mllib-collaborative-filtering.md
@@ -21,7 +21,8 @@ following parameters:
 
 * *numBlocks* is the number of blocks used to parallelize computation (set to 
-1 to auto-configure).
 * *rank* is the number of latent factors in the model.
-* *iterations* is the number of iterations to run.
+* *iterations* is the number of iterations of ALS to run. ALS typically 
converges to a reasonable
+  solution in 20 iterations or less.
 * *lambda* specifies the regularization parameter in ALS.
 * *implicitPrefs* specifies whether to use the *explicit feedback* ALS variant 
or one adapted for
   *implicit feedback* data.

http://git-wip-us.apache.org/repos/asf/spark/blob/e298ac91/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala
index 1250bc1..85d6093 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala
@@ -152,7 +152,7 @@ object FPGrowthModel extends Loader[FPGrowthModel[_]] {
  * [[http://dx.doi.org/10.1145/335191.335372 Han et al., Mining frequent 
patterns without candidate
  *  generation]].
  *
- * @param minSupport the minimal support level of the frequent pattern, any 
pattern appears
+ * @param minSupport the minimal support level of the frequent pattern, any 
pattern that appears
  *                   more than (minSupport * size-of-the-dataset) times will 
be output
  * @param numPartitions number of partitions used by parallel FP-growth
  *

http://git-wip-us.apache.org/repos/asf/spark/blob/e298ac91/mllib/src/main/scala/org/apache/spark/mllib/fpm/PrefixSpan.scala
----------------------------------------------------------------------
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/fpm/PrefixSpan.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/fpm/PrefixSpan.scala
index ed49c94..94a24b5 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/fpm/PrefixSpan.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/fpm/PrefixSpan.scala
@@ -38,9 +38,9 @@ import org.apache.spark.storage.StorageLevel
  * The PrefixSpan algorithm is described in J. Pei, et al., PrefixSpan: Mining 
Sequential Patterns
  * Efficiently by Prefix-Projected Pattern Growth 
([[http://doi.org/10.1109/ICDE.2001.914830]]).
  *
- * @param minSupport the minimal support level of the sequential pattern, any 
pattern appears
- *                   more than  (minSupport * size-of-the-dataset) times will 
be output
- * @param maxPatternLength the maximal length of the sequential pattern, any 
pattern appears
+ * @param minSupport the minimal support level of the sequential pattern, any 
pattern that appears
+ *                   more than (minSupport * size-of-the-dataset) times will 
be output
+ * @param maxPatternLength the maximal length of the sequential pattern, any 
pattern that appears
  *                         less than maxPatternLength will be output
  * @param maxLocalProjDBSize The maximum number of items (including delimiters 
used in the internal
  *                           storage format) allowed in a projected database 
before local

http://git-wip-us.apache.org/repos/asf/spark/blob/e298ac91/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
index 33aaf85..3e619c4 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
@@ -218,7 +218,7 @@ class ALS private (
   }
 
   /**
-   * Run ALS with the configured parameters on an input RDD of (user, product, 
rating) triples.
+   * Run ALS with the configured parameters on an input RDD of [[Rating]] 
objects.
    * Returns a MatrixFactorizationModel with feature vectors for each user and 
product.
    */
   @Since("0.8.0")
@@ -279,18 +279,17 @@ class ALS private (
 @Since("0.8.0")
 object ALS {
   /**
-   * Train a matrix factorization model given an RDD of ratings given by users 
to some products,
-   * in the form of (userID, productID, rating) pairs. We approximate the 
ratings matrix as the
-   * product of two lower-rank matrices of a given rank (number of features). 
To solve for these
-   * features, we run a given number of iterations of ALS. This is done using 
a level of
-   * parallelism given by `blocks`.
+   * Train a matrix factorization model given an RDD of ratings by users for a 
subset of products.
+   * The ratings matrix is approximated as the product of two lower-rank 
matrices of a given rank
+   * (number of features). To solve for these features, ALS is run iteratively 
with a configurable
+   * level of parallelism.
    *
-   * @param ratings    RDD of (userID, productID, rating) pairs
+   * @param ratings    RDD of [[Rating]] objects with userID, productID, and 
rating
    * @param rank       number of features to use
-   * @param iterations number of iterations of ALS (recommended: 10-20)
-   * @param lambda     regularization factor (recommended: 0.01)
+   * @param iterations number of iterations of ALS
+   * @param lambda     regularization parameter
    * @param blocks     level of parallelism to split computation into
-   * @param seed       random seed
+   * @param seed       random seed for initial matrix factorization model
    */
   @Since("0.9.1")
   def train(
@@ -305,16 +304,15 @@ object ALS {
   }
 
   /**
-   * Train a matrix factorization model given an RDD of ratings given by users 
to some products,
-   * in the form of (userID, productID, rating) pairs. We approximate the 
ratings matrix as the
-   * product of two lower-rank matrices of a given rank (number of features). 
To solve for these
-   * features, we run a given number of iterations of ALS. This is done using 
a level of
-   * parallelism given by `blocks`.
+   * Train a matrix factorization model given an RDD of ratings by users for a 
subset of products.
+   * The ratings matrix is approximated as the product of two lower-rank 
matrices of a given rank
+   * (number of features). To solve for these features, ALS is run iteratively 
with a configurable
+   * level of parallelism.
    *
-   * @param ratings    RDD of (userID, productID, rating) pairs
+   * @param ratings    RDD of [[Rating]] objects with userID, productID, and 
rating
    * @param rank       number of features to use
-   * @param iterations number of iterations of ALS (recommended: 10-20)
-   * @param lambda     regularization factor (recommended: 0.01)
+   * @param iterations number of iterations of ALS
+   * @param lambda     regularization parameter
    * @param blocks     level of parallelism to split computation into
    */
   @Since("0.8.0")
@@ -329,16 +327,15 @@ object ALS {
   }
 
   /**
-   * Train a matrix factorization model given an RDD of ratings given by users 
to some products,
-   * in the form of (userID, productID, rating) pairs. We approximate the 
ratings matrix as the
-   * product of two lower-rank matrices of a given rank (number of features). 
To solve for these
-   * features, we run a given number of iterations of ALS. The level of 
parallelism is determined
-   * automatically based on the number of partitions in `ratings`.
+   * Train a matrix factorization model given an RDD of ratings by users for a 
subset of products.
+   * The ratings matrix is approximated as the product of two lower-rank 
matrices of a given rank
+   * (number of features). To solve for these features, ALS is run iteratively 
with a level of
+   * parallelism automatically based on the number of partitions in `ratings`.
    *
-   * @param ratings    RDD of (userID, productID, rating) pairs
+   * @param ratings    RDD of [[Rating]] objects with userID, productID, and 
rating
    * @param rank       number of features to use
-   * @param iterations number of iterations of ALS (recommended: 10-20)
-   * @param lambda     regularization factor (recommended: 0.01)
+   * @param iterations number of iterations of ALS
+   * @param lambda     regularization parameter
    */
   @Since("0.8.0")
   def train(ratings: RDD[Rating], rank: Int, iterations: Int, lambda: Double)
@@ -347,15 +344,14 @@ object ALS {
   }
 
   /**
-   * Train a matrix factorization model given an RDD of ratings given by users 
to some products,
-   * in the form of (userID, productID, rating) pairs. We approximate the 
ratings matrix as the
-   * product of two lower-rank matrices of a given rank (number of features). 
To solve for these
-   * features, we run a given number of iterations of ALS. The level of 
parallelism is determined
-   * automatically based on the number of partitions in `ratings`.
+   * Train a matrix factorization model given an RDD of ratings by users for a 
subset of products.
+   * The ratings matrix is approximated as the product of two lower-rank 
matrices of a given rank
+   * (number of features). To solve for these features, ALS is run iteratively 
with a level of
+   * parallelism automatically based on the number of partitions in `ratings`.
    *
-   * @param ratings    RDD of (userID, productID, rating) pairs
+   * @param ratings    RDD of [[Rating]] objects with userID, productID, and 
rating
    * @param rank       number of features to use
-   * @param iterations number of iterations of ALS (recommended: 10-20)
+   * @param iterations number of iterations of ALS
    */
   @Since("0.8.0")
   def train(ratings: RDD[Rating], rank: Int, iterations: Int)
@@ -372,11 +368,11 @@ object ALS {
    *
    * @param ratings    RDD of (userID, productID, rating) pairs
    * @param rank       number of features to use
-   * @param iterations number of iterations of ALS (recommended: 10-20)
-   * @param lambda     regularization factor (recommended: 0.01)
+   * @param iterations number of iterations of ALS
+   * @param lambda     regularization parameter
    * @param blocks     level of parallelism to split computation into
    * @param alpha      confidence parameter
-   * @param seed       random seed
+   * @param seed       random seed for initial matrix factorization model
    */
   @Since("0.8.1")
   def trainImplicit(
@@ -392,16 +388,15 @@ object ALS {
   }
 
   /**
-   * Train a matrix factorization model given an RDD of 'implicit preferences' 
given by users
-   * to some products, in the form of (userID, productID, preference) pairs. 
We approximate the
-   * ratings matrix as the product of two lower-rank matrices of a given rank 
(number of features).
-   * To solve for these features, we run a given number of iterations of ALS. 
This is done using
-   * a level of parallelism given by `blocks`.
+   * Train a matrix factorization model given an RDD of 'implicit preferences' 
of users for a
+   * subset of products. The ratings matrix is approximated as the product of 
two lower-rank
+   * matrices of a given rank (number of features). To solve for these 
features, ALS is run
+   * iteratively with a configurable level of parallelism.
    *
-   * @param ratings    RDD of (userID, productID, rating) pairs
+   * @param ratings    RDD of [[Rating]] objects with userID, productID, and 
rating
    * @param rank       number of features to use
-   * @param iterations number of iterations of ALS (recommended: 10-20)
-   * @param lambda     regularization factor (recommended: 0.01)
+   * @param iterations number of iterations of ALS
+   * @param lambda     regularization parameter
    * @param blocks     level of parallelism to split computation into
    * @param alpha      confidence parameter
    */
@@ -418,16 +413,16 @@ object ALS {
   }
 
   /**
-   * Train a matrix factorization model given an RDD of 'implicit preferences' 
given by users to
-   * some products, in the form of (userID, productID, preference) pairs. We 
approximate the
-   * ratings matrix as the product of two lower-rank matrices of a given rank 
(number of features).
-   * To solve for these features, we run a given number of iterations of ALS. 
The level of
-   * parallelism is determined automatically based on the number of partitions 
in `ratings`.
+   * Train a matrix factorization model given an RDD of 'implicit preferences' 
of users for a
+   * subset of products. The ratings matrix is approximated as the product of 
two lower-rank
+   * matrices of a given rank (number of features). To solve for these 
features, ALS is run
+   * iteratively with a level of parallelism determined automatically based on 
the number of
+   * partitions in `ratings`.
    *
-   * @param ratings    RDD of (userID, productID, rating) pairs
+   * @param ratings    RDD of [[Rating]] objects with userID, productID, and 
rating
    * @param rank       number of features to use
-   * @param iterations number of iterations of ALS (recommended: 10-20)
-   * @param lambda     regularization factor (recommended: 0.01)
+   * @param iterations number of iterations of ALS
+   * @param lambda     regularization parameter
    * @param alpha      confidence parameter
    */
   @Since("0.8.1")
@@ -437,16 +432,15 @@ object ALS {
   }
 
   /**
-   * Train a matrix factorization model given an RDD of 'implicit preferences' 
ratings given by
-   * users to some products, in the form of (userID, productID, rating) pairs. 
We approximate the
-   * ratings matrix as the product of two lower-rank matrices of a given rank 
(number of features).
-   * To solve for these features, we run a given number of iterations of ALS. 
The level of
-   * parallelism is determined automatically based on the number of partitions 
in `ratings`.
-   * Model parameters `alpha` and `lambda` are set to reasonable default values
+   * Train a matrix factorization model given an RDD of 'implicit preferences' 
of users for a
+   * subset of products. The ratings matrix is approximated as the product of 
two lower-rank
+   * matrices of a given rank (number of features). To solve for these 
features, ALS is run
+   * iteratively with a level of parallelism determined automatically based on 
the number of
+   * partitions in `ratings`.
    *
-   * @param ratings    RDD of (userID, productID, rating) pairs
+   * @param ratings    RDD of [[Rating]] objects with userID, productID, and 
rating
    * @param rank       number of features to use
-   * @param iterations number of iterations of ALS (recommended: 10-20)
+   * @param iterations number of iterations of ALS
    */
   @Since("0.8.1")
   def trainImplicit(ratings: RDD[Rating], rank: Int, iterations: Int)

http://git-wip-us.apache.org/repos/asf/spark/blob/e298ac91/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
 
b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
index 0dc4048..628cf1d 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
@@ -206,7 +206,7 @@ class MatrixFactorizationModel @Since("0.8.0") (
   }
 
   /**
-   * Recommends topK products for all users.
+   * Recommends top products for all users.
    *
    * @param num how many products to return for every user.
    * @return [(Int, Array[Rating])] objects, where every tuple contains a 
userID and an array of
@@ -224,7 +224,7 @@ class MatrixFactorizationModel @Since("0.8.0") (
 
 
   /**
-   * Recommends topK users for all products.
+   * Recommends top users for all products.
    *
    * @param num how many users to return for every product.
    * @return [(Int, Array[Rating])] objects, where every tuple contains a 
productID and an array

http://git-wip-us.apache.org/repos/asf/spark/blob/e298ac91/python/pyspark/mllib/fpm.py
----------------------------------------------------------------------
diff --git a/python/pyspark/mllib/fpm.py b/python/pyspark/mllib/fpm.py
index 2039dec..7a2d77a 100644
--- a/python/pyspark/mllib/fpm.py
+++ b/python/pyspark/mllib/fpm.py
@@ -29,7 +29,6 @@ __all__ = ['FPGrowth', 'FPGrowthModel', 'PrefixSpan', 
'PrefixSpanModel']
 @inherit_doc
 @ignore_unicode_prefix
 class FPGrowthModel(JavaModelWrapper):
-
     """
     .. note:: Experimental
 
@@ -68,11 +67,15 @@ class FPGrowth(object):
         """
         Computes an FP-Growth model that contains frequent itemsets.
 
-        :param data: The input data set, each element contains a
-            transaction.
-        :param minSupport: The minimal support level (default: `0.3`).
-        :param numPartitions: The number of partitions used by
-            parallel FP-growth (default: same as input data).
+        :param data:
+          The input data set, each element contains a transaction.
+        :param minSupport:
+          The minimal support level.
+          (default: 0.3)
+        :param numPartitions:
+          The number of partitions used by parallel FP-growth. A value
+          of -1 will use the same number as input data.
+          (default: -1)
         """
         model = callMLlibFunc("trainFPGrowthModel", data, float(minSupport), 
int(numPartitions))
         return FPGrowthModel(model)
@@ -128,17 +131,27 @@ class PrefixSpan(object):
     @since("1.6.0")
     def train(cls, data, minSupport=0.1, maxPatternLength=10, 
maxLocalProjDBSize=32000000):
         """
-        Finds the complete set of frequent sequential patterns in the input 
sequences of itemsets.
-
-        :param data: The input data set, each element contains a sequnce of 
itemsets.
-        :param minSupport: the minimal support level of the sequential 
pattern, any pattern appears
-            more than  (minSupport * size-of-the-dataset) times will be output 
(default: `0.1`)
-        :param maxPatternLength: the maximal length of the sequential pattern, 
any pattern appears
-            less than maxPatternLength will be output. (default: `10`)
-        :param maxLocalProjDBSize: The maximum number of items (including 
delimiters used in
-            the internal storage format) allowed in a projected database 
before local
-            processing. If a projected database exceeds this size, another
-            iteration of distributed prefix growth is run. (default: 
`32000000`)
+        Finds the complete set of frequent sequential patterns in the
+        input sequences of itemsets.
+
+        :param data:
+          The input data set, each element contains a sequence of
+          itemsets.
+        :param minSupport:
+          The minimal support level of the sequential pattern, any
+          pattern that appears more than (minSupport *
+          size-of-the-dataset) times will be output.
+          (default: 0.1)
+        :param maxPatternLength:
+          The maximal length of the sequential pattern, any pattern
+          that appears less than maxPatternLength will be output.
+          (default: 10)
+        :param maxLocalProjDBSize:
+          The maximum number of items (including delimiters used in the
+          internal storage format) allowed in a projected database before
+          local processing. If a projected database exceeds this size,
+          another iteration of distributed prefix growth is run.
+          (default: 32000000)
         """
         model = callMLlibFunc("trainPrefixSpanModel",
                               data, minSupport, maxPatternLength, 
maxLocalProjDBSize)

http://git-wip-us.apache.org/repos/asf/spark/blob/e298ac91/python/pyspark/mllib/recommendation.py
----------------------------------------------------------------------
diff --git a/python/pyspark/mllib/recommendation.py 
b/python/pyspark/mllib/recommendation.py
index 93e47a7..7e60255 100644
--- a/python/pyspark/mllib/recommendation.py
+++ b/python/pyspark/mllib/recommendation.py
@@ -138,7 +138,8 @@ class MatrixFactorizationModel(JavaModelWrapper, 
JavaSaveable, JavaLoader):
     @since("0.9.0")
     def predictAll(self, user_product):
         """
-        Returns a list of predicted ratings for input user and product pairs.
+        Returns a list of predicted ratings for input user and product
+        pairs.
         """
         assert isinstance(user_product, RDD), "user_product should be RDD of 
(user, product)"
         first = user_product.first()
@@ -165,28 +166,33 @@ class MatrixFactorizationModel(JavaModelWrapper, 
JavaSaveable, JavaLoader):
     @since("1.4.0")
     def recommendUsers(self, product, num):
         """
-        Recommends the top "num" number of users for a given product and 
returns a list
-        of Rating objects sorted by the predicted rating in descending order.
+        Recommends the top "num" number of users for a given product and
+        returns a list of Rating objects sorted by the predicted rating in
+        descending order.
         """
         return list(self.call("recommendUsers", product, num))
 
     @since("1.4.0")
     def recommendProducts(self, user, num):
         """
-        Recommends the top "num" number of products for a given user and 
returns a list
-        of Rating objects sorted by the predicted rating in descending order.
+        Recommends the top "num" number of products for a given user and
+        returns a list of Rating objects sorted by the predicted rating in
+        descending order.
         """
         return list(self.call("recommendProducts", user, num))
 
     def recommendProductsForUsers(self, num):
         """
-        Recommends top "num" products for all users. The number returned may 
be less than this.
+        Recommends the top "num" number of products for all users. The
+        number of recommendations returned per user may be less than "num".
         """
         return self.call("wrappedRecommendProductsForUsers", num)
 
     def recommendUsersForProducts(self, num):
         """
-        Recommends top "num" users for all products. The number returned may 
be less than this.
+        Recommends the top "num" number of users for all products. The
+        number of recommendations returned per product may be less than
+        "num".
         """
         return self.call("wrappedRecommendUsersForProducts", num)
 
@@ -234,11 +240,34 @@ class ALS(object):
     def train(cls, ratings, rank, iterations=5, lambda_=0.01, blocks=-1, 
nonnegative=False,
               seed=None):
         """
-        Train a matrix factorization model given an RDD of ratings given by 
users to some products,
-        in the form of (userID, productID, rating) pairs. We approximate the 
ratings matrix as the
-        product of two lower-rank matrices of a given rank (number of 
features). To solve for these
-        features, we run a given number of iterations of ALS. This is done 
using a level of
-        parallelism given by `blocks`.
+        Train a matrix factorization model given an RDD of ratings by users
+        for a subset of products. The ratings matrix is approximated as the
+        product of two lower-rank matrices of a given rank (number of
+        features). To solve for these features, ALS is run iteratively with
+        a configurable level of parallelism.
+
+        :param ratings:
+          RDD of `Rating` or (userID, productID, rating) tuple.
+        :param rank:
+          Rank of the feature matrices computed (number of features).
+        :param iterations:
+          Number of iterations of ALS.
+          (default: 5)
+        :param lambda_:
+          Regularization parameter.
+          (default: 0.01)
+        :param blocks:
+          Number of blocks used to parallelize the computation. A value
+          of -1 will use an auto-configured number of blocks.
+          (default: -1)
+        :param nonnegative:
+          A value of True will solve least-squares with nonnegativity
+          constraints.
+          (default: False)
+        :param seed:
+          Random seed for initial matrix factorization model. A value
+          of None will use system time as the seed.
+          (default: None)
         """
         model = callMLlibFunc("trainALSModel", cls._prepare(ratings), rank, 
iterations,
                               lambda_, blocks, nonnegative, seed)
@@ -249,11 +278,37 @@ class ALS(object):
     def trainImplicit(cls, ratings, rank, iterations=5, lambda_=0.01, 
blocks=-1, alpha=0.01,
                       nonnegative=False, seed=None):
         """
-        Train a matrix factorization model given an RDD of 'implicit 
preferences' given by users
-        to some products, in the form of (userID, productID, preference) 
pairs. We approximate the
-        ratings matrix as the product of two lower-rank matrices of a given 
rank (number of
-        features).  To solve for these features, we run a given number of 
iterations of ALS.
-        This is done using a level of parallelism given by `blocks`.
+        Train a matrix factorization model given an RDD of 'implicit
+        preferences' of users for a subset of products. The ratings matrix
+        is approximated as the product of two lower-rank matrices of a
+        given rank (number of features). To solve for these features, ALS
+        is run iteratively with a configurable level of parallelism.
+
+        :param ratings:
+          RDD of `Rating` or (userID, productID, rating) tuple.
+        :param rank:
+          Rank of the feature matrices computed (number of features).
+        :param iterations:
+          Number of iterations of ALS.
+          (default: 5)
+        :param lambda_:
+          Regularization parameter.
+          (default: 0.01)
+        :param blocks:
+          Number of blocks used to parallelize the computation. A value
+          of -1 will use an auto-configured number of blocks.
+          (default: -1)
+        :param alpha:
+          A constant used in computing confidence.
+          (default: 0.01)
+        :param nonnegative:
+          A value of True will solve least-squares with nonnegativity
+          constraints.
+          (default: False)
+        :param seed:
+          Random seed for initial matrix factorization model. A value
+          of None will use system time as the seed.
+          (default: None)
         """
         model = callMLlibFunc("trainImplicitALSModel", cls._prepare(ratings), 
rank,
                               iterations, lambda_, blocks, alpha, nonnegative, 
seed)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-12632][PYSPARK][DOC] PySpark fpm and als parameter desc to consistent format

Reply via email to