Hello. I am trying to work with the text mining package tm. I have a directory called textsTweet1 which contains three files short.txt myTextFile.txt myTextFile.csv
short.txt contains one line: THE CAT IN THE HAT\n myTextFile contains some tweets from Twitter. The first few lines of myTextFile.txt are: @oliviamunn I miss a good Yakaniku...I miss Japan...I NEED COCO EVERYBODY. I NEED TO GET ON JAPAN TIME NOW. NO SLEEP!!!SAKURA at Niigata, Japan http://ff.im/-29ufG19:30 [BS Japan] 絶対可憐チルドレン #50 「一意奮闘!オーバー・ザ・フューチャー」RT@ kvsrinath Japan's New Flat Screens: The Eco-Friendly TV . http://is.gd/sIS7 #greenMold99 says: Introduction to Chiropractic and manual therapeutics when unfit.Choice of schools in Japan, and mo... http://i.sitesays.com/lc7Japan Said to Sell 17 Trillion Yen of Extra Bonds - Bloomberg Actually there were no new lines in the original file but I inserted a new line before every occurrence of http. I ran the following code: library("tm") my.path <- 'C:\\dataForR\\textsTweet1\\' my.path.csv<-'C:\\dataForR\\textsTweet1\\myTextFile.csv' (ovid <- Corpus(DirSource(my.path), readerControl = list(reader = readPlain, language = "la"))) Response from R: A text document collection with 3 text documents Warning message: In readLines(filename, encoding = encoding) : incomplete final line found on 'C:\dataForR\textsTweet1\/short.txt' Then I ran the TermDocMatrix function. It is supposed to take a file and more or less count the occurrences of each word in the file. Or as the documentation says "Constructs a term-document matrix" > tdm<-TermDocMatrix(ovid) > Data(tdm)[1:2, 105:107] 2 x 3 sparse Matrix of class "dgCMatrix" revealed said sakura 1 . . . 2 15 15 15 > Data(tdm)[1:21, 100:105] Error in intI(i, n = di[1], dn = dn[[1]]) : index larger than maximal 3 I don't understand why I am getting only two lines. I can see that the first line is for the short.txt file and the second line seems to be for the whole myTextFile.txt file. How can I get TermDocMatrix to output each row of myTextFile.txt as a separate row? Thanks very much. -- View this message in context: http://www.nabble.com/question-about-the-Text-Mining-package-tm-tp23091573p23091573.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.