Dear all, I'm sorry but as all the newbies I have a lot of problems to solve. I'm using R 3.1.2 under osx 10.10.2. I'm working with tm to analyze some tweets and I received some strange errors when I tried to remove stopwords (See below error 1), to transform content (See below error 2) and to create document term Matrix (See below error 3) Could anyone help me?
Error 1 > tweets = searchTwitter("rimini", n=1000) > tweets = sapply(tweets, function(x) x$getText()) > tweets_corpus = Corpus(VectorSource(tweets)) > toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x)) > tweets_corpus <- tm_map(tweets_corpus, toSpace, "(f|ht)tp(s?)://(.*)[.][a-z]+") > tweets_corpus <- tm_map(tweets_corpus, toSpace, "RT |via ") > tweets_corpus <- tm_map(tweets_corpus, toSpace, "@[^\\s]+") > tweets_corpus <- tm_map(tweets_corpus, removeNumbers) > tweets_corpus <- tm_map(tweets_corpus, removePunctuation) > tweets_corpus <- tm_map(tweets_corpus, removeWords, c("rimini", "Rimini", "Riviera", "riviera")) > tweets_corpus <- tm_map(tweets_corpus, stopwords("italian")) Warning message: In mclapply(content(x), FUN, ...) : all scheduled cores encountered errors in user code Error2 > tweets = searchTwitter("rimini", n=1000) > tweets = sapply(tweets, function(x) x$getText()) > tweets_corpus = Corpus(VectorSource(tweets)) > tweets_corpus <<VCorpus (documents: 1000, metadata (corpus/indexed): 0/0)>> > toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x)) > tweets_corpus <- tm_map(tweets_corpus, toSpace, "(f|ht)tp(s?)://(.*)[.][a-z]+") > tweets_corpus <- tm_map(tweets_corpus, toSpace, "RT |via ") > tweets_corpus <- tm_map(tweets_corpus, toSpace, "@[^\\s]+") > tweets_corpus <- tm_map(tweets_corpus, removeNumbers) > tweets_corpus <- tm_map(tweets_corpus, removePunctuation) > tweets_corpus <- tm_map(tweets_corpus, removeWords, c("rimini", "Rimini", "Riviera", "riviera")) > tweets_corpus <- tm_map(tweets_corpus, content_transformer(tolower)) Warning message: In mclapply(content(x), FUN, ...) : all scheduled cores encountered errors in user code Error3 > tweets = searchTwitter("rimini", n=1000) > tweets = sapply(tweets, function(x) x$getText()) > tweets_corpus = Corpus(VectorSource(tweets)) > tweets_corpus <<VCorpus (documents: 1000, metadata (corpus/indexed): 0/0)>> > toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x)) > tweets_corpus <- tm_map(tweets_corpus, toSpace, "(f|ht)tp(s?)://(.*)[.][a-z]+") > tweets_corpus <- tm_map(tweets_corpus, toSpace, "RT |via ") > tweets_corpus <- tm_map(tweets_corpus, toSpace, "@[^\\s]+") > tweets_corpus <- tm_map(tweets_corpus, removeNumbers) > tweets_corpus <- tm_map(tweets_corpus, removePunctuation) > tweets_corpus <- tm_map(tweets_corpus, removeWords, c("rimini", "Rimini", "Riviera", "riviera")) > dtm <- DocumentTermMatrix(tweets_corpus) Errore in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), : 'i, j, v' different lengths Inoltre: Warning messages: 1: In mclapply(unname(content(x)), termFreq, control) : all scheduled cores encountered errors in user code 2: In simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms), : si รจ prodotto un NA per coercizione Thank you for your help [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.