Hello
I have been using R for some text pre-processing. I have 2 qestions
concerning the tm package/
1)  the function removeSparseTerms takes as parameters a matrix and a
sparsefactor. Can anyone please tell me how is the sparsefactor calculated?
I have tried playing around with different values and then inspecting the
marix. But I could not still grasp the maths behind the sparsefactor


2) Similarly, the function findAssocs() takes as parameters a matrix , a
term and an association threshold, e.g. findAssocs(mat, "test",.5) will
return all the tokens in the matrix mat (created from a corpus) that have
an association strength of 0.5 with the term "test". Can anyone please tell
me what association metric is being used, for e.g. chi-squared,mutual
information,....The documentation,  help.search("findAssocs"), does not say
anything. I read on a web page (which i cannot retrieve now) that
findAssocs is a *generic* function, but this is still very vague

kind regards
ashwin

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to