Hi, On Mon, Jun 23, 2014 at 9:11 AM, ashwin ittoo <ashwin.itt...@gmail.com> wrote: > Hello > I have been using R for some text pre-processing. I have 2 qestions > concerning the tm package/ > 1) the function removeSparseTerms takes as parameters a matrix and a > sparsefactor. Can anyone please tell me how is the sparsefactor calculated? > I have tried playing around with different values and then inspecting the > marix. But I could not still grasp the maths behind the sparsefactor
The help says percentage, although since sparse can range from 0 to 1 this is likely proportion instead. But you could always look at the source yourself if you want to know for certain. > > 2) Similarly, the function findAssocs() takes as parameters a matrix , a > term and an association threshold, e.g. findAssocs(mat, "test",.5) will > return all the tokens in the matrix mat (created from a corpus) that have > an association strength of 0.5 with the term "test". Can anyone please tell > me what association metric is being used, for e.g. chi-squared,mutual > information,....The documentation, help.search("findAssocs"), does not say > anything. I read on a web page (which i cannot retrieve now) that > findAssocs is a *generic* function, but this is still very vague The help says correlation, and the vignette "Introduction to the tm Package" confirms that. Again, you could check the source, or you could contact the package maintainer, which is the appropriate thing to do for questions of this sort. Sarah -- Sarah Goslee http://www.functionaldiversity.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.