Repository: spark Updated Branches: refs/heads/master c48053773 -> 2b36344f5
SPARK-1675. Make clear whether computePrincipalComponents requires centered data Just closing out this small JIRA, resolving with a comment change. Author: Sean Owen <[email protected]> Closes #1171 from srowen/SPARK-1675 and squashes the following commits: 45ee9b7 [Sean Owen] Add simple note that data need not be centered for computePrincipalComponents Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2b36344f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2b36344f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2b36344f Branch: refs/heads/master Commit: 2b36344f588d4e7357ce9921dc656e2389ba1dea Parents: c480537 Author: Sean Owen <[email protected]> Authored: Thu Jul 3 11:54:51 2014 -0700 Committer: Xiangrui Meng <[email protected]> Committed: Thu Jul 3 11:54:51 2014 -0700 ---------------------------------------------------------------------- .../org/apache/spark/mllib/linalg/distributed/RowMatrix.scala | 2 ++ 1 file changed, 2 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/2b36344f/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala ---------------------------------------------------------------------- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala index 1a0073c..695e03b 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala @@ -347,6 +347,8 @@ class RowMatrix( * The principal components are stored a local matrix of size n-by-k. * Each column corresponds for one principal component, * and the columns are in descending order of component variance. + * The row data do not need to be "centered" first; it is not necessary for + * the mean of each column to be 0. * * @param k number of top principal components. * @return a matrix of size n-by-k, whose columns are principal components
