Let's say I have a corpus and want to find the two, three, etc word phrases that occur most frequently in the data. I normally do this in the following manner but am getting an error message and am having some difficulty diagnosing what is going wrong. Given the following data, I'd just want a count of 2 for the number of 2 word phrases given that "that sucks" appears twice.
dat = c("love it", "who goes there", "what is wrong", "that sucks", "that sucks") (corpus <- Corpus(VectorSource(dat))) matrix <- create_matrix(corpus, ngramLength=2) bww_freq = findFreqTerms(matrix, lowfreq=5) Here is the error message when I attempt to create a matrix > (corpus <- Corpus(VectorSource(dat))) <<VCorpus (documents: 5, metadata (corpus/indexed): 0/0)>> > matrix <- create_matrix(corpus, ngramLength=2) Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x), : dims [product 5] do not match the length of object [3] Can anyone tell me what could be going wrong? or a workaround? or another package which could give me the desires result in a more efficient manner. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.