I think there is a problem with R's tm package's weightTfIdf function. The manual says that the idf is calculated as
idf(term ) = log (|D|/number of documents that contain the term) In cases where the dictionary is passed in the control list as given below: dtm = DocumentTermMatrix(myCorpus,control=list(dictionary=myDict,weighting=function(x)weightTfIdf(x,normalize=FALSE))) There are chances that there is no document that contains a term.In that case the denominator in the idf becomes 0 leading to a NAN How can I raise this issue and actually fix the code so that we use idf(term) = log (|D|/number of documents that contain the term+1) Any help on this would be appreciated! Shivani [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.