Hello I have been using R for some text pre-processing. I have 2 qestions concerning the tm package/ 1) the function removeSparseTerms takes as parameters a matrix and a sparsefactor. Can anyone please tell me how is the sparsefactor calculated? I have tried playing around with different values and then inspecting the marix. But I could not still grasp the maths behind the sparsefactor
2) Similarly, the function findAssocs() takes as parameters a matrix , a term and an association threshold, e.g. findAssocs(mat, "test",.5) will return all the tokens in the matrix mat (created from a corpus) that have an association strength of 0.5 with the term "test". Can anyone please tell me what association metric is being used, for e.g. chi-squared,mutual information,....The documentation, help.search("findAssocs"), does not say anything. I read on a web page (which i cannot retrieve now) that findAssocs is a *generic* function, but this is still very vague kind regards ashwin [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.