Hi, Wishing you all well.
I am exploring text mining with R. Here is where I need help: 1. The starting point is a data frame worder1<- c("I am, taking 2","are these the three samples?", "He speaks differently to you, aint it !","This is distilled - my dear, now give me $3","I saved 2500 this month.") df1 <- data.frame(id=1:5, words=worder1) here in dput format: dput(df1) structure(list(id = 1:5, words = structure(c(3L, 1L, 2L, 5L, 4L), .Label = c("are these the three samples?", "He speaks differently to you, aint it !", "I am, taking 2", "I saved 2500 this month.", "This is distilled - my dear, now give me $3" ), class = "factor")), .Names = c("id", "words"), row.names = c(NA, -5L), class = "data.frame") 2. The corpus rituals ... corp1 <- Corpus(VectorSource(df1$words)) inspect(corp1) class(corp1) corp1 <- tm_map(corp1, removeNumbers) corp1 <- tm_map(corp1, removePunctuation) corp1 <- tm_map(corp1, removeWords, stopwords("english")) corp1 <- tm_map(corp1, stripWhitespace) class(corp1) 3. Getting to the analysis tdm1 <- TermDocumentMatrix(corp1) inspect(tdm1[1:5,]) dtm1 <- DocumentTermMatrix(corp1) inspect(dtm1[1:5,]) 4. Now here is the problem If I do a translation, not in getTransformations(), I am unable to convert to tdm or dtm corp1 <- tm_map(corp1, tolower) class(corp1) tdm1.2 <- TermDocumentMatrix(corp1) dtm1.2 <- DocumentTermMatrix(corp1) The error returned is: Error: inherits(doc, "TextDocument") is not TRUE 5. The explaination on internet suggests either a) corp1 <- tm_map(corp1, content_transformer(tolower)) which in my case returns error: Error in UseMethod("content", x) : no applicable method for 'content' applied to an object of class "character" b) corpus_clean <- tm_map(corp1, PlainTextDocument) which results in loss of all the meta data I will appreciate any help. Lastly to keep the doc ids with R corpus, should the step 2 be changed as: corp1 <- Corpus(DataframeSource(df1)) from: corp1 <- Corpus(VectorSource(df1$words)) Thanks / ----------------------------------------------------------------------------------------------------------------------------- Some of the references I explored: http://stackoverflow.com/questions/25638503/tm-loses-the-metadata-when-applying-tm-map http://stackoverflow.com/questions/24191728/documenttermmatrix-error-on-corpus-argument http://stackoverflow.com/questions/24771165/r-project-no-applicable-method-for-meta-applied-to-an-object-of-class-charact http://stackoverflow.com/questions/25551514/termdocumentmatrix-errors-in-r http://stackoverflow.com/questions/20699111/tm-map-error-message-in-r http://stackoverflow.com/questions/31996891/error-in-usemethodmeta-x-no-applicable-method-for-meta-applied-to-an-ob http://stackoverflow.com/questions/11876740/r-stemming-a-string-document-corpus [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.