SoftImpute is a new package for matrix completion - i.e. for imputing missing 
values in matrices.
SoftImpute was written by myself and Rahul Mazumder.
softImpute uses uses squared-error loss with nuclear norm regularization - one 
can think of it as 
the "lasso" for matrix approximation - to find a low-rank approximation to the 
observed entries in the matrix.
This low-rank approximation is then used to impute the missing entries.

softImpute works in a kind of "EM" fashion. Given a current guess, it fills in 
the missing entries.
Then it computes a soft-thresholded SVD of this complete matrix,  which yields 
the next guess.
These steps are iterated till convergence to the solution of the 
convex-optimation problem.

The algorithm can work with large matrices, such as the "netflix" matrix (400K 
x 20K) by making heavy use
of sparse-matrix methods in the Matrix package. It creates new S4 classes such 
as "Incomplete"  for storing the large   
data matrix, and "SparseplusLowRank" for representing the completed matrix. SVD 
computations are done using
a specially built block-alternating algorithm, svd.als, that exploits these 
structures and uses warm starts.


Some of the methods used are described in 
Rahul Mazumder, Trevor Hastie and Rob Tibshirani: 
Spectral Regularization Algorithms for Learning Large Incomplete Matrices. 
JMLR 2010 11 2287-2322 

Other newer and more efficient methods that inter-weave the alternating block 
algorithm steps with imputation steps will
be described in a forthcoming article.

Some of the features of softImpute are

* works with large matrices using sparse matrix methods, or smaller matrices 
using standard svd methods.
* one can control the maximum rank of the solution, to avoid overly expensive 
operations.
* warm starts can be used to move from one solution to a new solution with a 
different value for the nuclear-norm regularization parameter lambda (and/or a 
different rank)
* with lambda=0 and a specified rank, one automatically gets an implementation 
of "hardImpute" - iterative svd imputation
* softImpute has an option "type" which can be "svd" or "als" (alternating 
least squares), for specifying which of the two approaches above should be used.
*included in the package is svd.als, an efficient rank-restricted svd algorithm 
that can exploit sparsity and other special structure, and accept warm starts.
* a function biScale is provided, for centering and scaling both rows and 
columns of matrix to have means zero and variance 1. The centering and scaling
        constants are stored on the object. For sparse matrices with centering, 
the centered object is stored in "SparseplusLowRank" form to preserve its 
special structure
* prediction functions impute and complete are provided.


Trevor Hastie
 
----------------------------------------------------------------------------------------
  Trevor Hastie                                   has...@stanford.edu  
  Professor, Department of Statistics, Stanford University
  Phone: (650) 725-2231                 Fax: (650) 725-8977  
  URL: http://www.stanford.edu/~hastie  
   address: room 104, Department of Statistics, Sequoia Hall
           390 Serra Mall, Stanford University, CA 94305-4065  
 
_______________________________________________
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to