Hello! I am performing a sentiment analysis of 2.000 negative and positive reviews. I think my code needs improvement because I am getting accuracy 68 % and the running duration of the code is 20 minutes!! Please find below a part of the code.
" # Read data from their directories pos <- Corpus(DirSource(pos_dir), readerControl=list(language="english", reader=readPlain)) neg <- Corpus(DirSource(neg_dir), readerControl=list(language="english", reader=readPlain)) # Create training and testing corpuses print("Creating training and testing corpuses...") split.percentage <- 0.75 split.pos.size <- length(pos) split.neg.size <- length(neg) split.pos.train.size <- floor(split.pos.size * split.percentage) split.neg.train.size <- floor(split.neg.size * split.percentage) split.pos.test.size <- split.pos.size - split.pos.train.size split.neg.test.size <- split.neg.size - split.neg.train.size corpus.train <- c(pos[1:split.pos.train.size], neg[1:split.neg.train.size]) corpus.test <- c(pos[(split.pos.train.size + 1) : split.pos.size], neg[(split.neg.train.size + 1) : split.neg.size]) # Perform preprocessing print("Pre-processing corpuses...") corpus.train <- preProcess(corpus.train) corpus.test <- preProcess(corpus.test) # Create the Document Term Matrix print("Creating document term matrices...") corpus.train.dtm <- DocumentTermMatrix(corpus.train, control=list(minWordLength = 2)) corpus.test.dtm <- DocumentTermMatrix(corpus.test, control=list(minWordLength = 2)) # Create the Data Frame print("Creating data matrices...") corpus.train.df <- as.matrix(corpus.train.dtm) corpus.test.df <- as.matrix(corpus.test.dtm) # Generate vector with class values print("Creating class information...") class.train <- c(rep("pos", split.pos.train.size), rep("neg", split.neg.train.size)) class.test <- c(rep("pos", split.pos.test.size), rep("neg", split.neg.test.size)) # Train classifier print("Training classifier...") classifier <- naiveBayes(corpus.train.df, as.factor(class.train)) # Evaluate Classifier print("Evaluating... Please be patient. This will take a while...") corpus.predictions <- predict(classifier, corpus.test.df) table(corpus.predictions, class.test) " I could use some ideas. Thank you for your time. V. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.