Hi,
I'm trying to do text corpus processing on some novels, with koRpus
package and Tree Tagger. The script lists all txt files (11 in all) in a
dir, and processes it one by one.
##########
rm(list=ls())
library(koRpus)
library(koRpus.lang.en)
set.kRp.env(TT.cmd = "/pathto/tree-tagger-english", lang = "en")
outdir <- "/pathto/corpora"
corpdir <- paste0(outdir,"/","morrison11")
files <- list.files(path=corpdir, pattern = "*.txt", full.names = F)
n <- length(files)
output <- file(paste0(outdir,"/calc_results_morrison11.txt"), open="at")
for (i in 1:n) {
cat(i," - ",files[i],"\n", file = output)
tagged.results <- treetag(paste0(corpdir,'/',files[i]),
treetagger="kRp.env")
capture.output(flesch(tagged.results), file = output)
cat("\n", file=output)
capture.output(TTR(tagged.results), file = output)
cat("\n", file=output)
capture.output(textFeatures(tagged.results), file=output)
cat("\n===========================\n", file = output)
}
close(output)
#########
The problem is, the script always throws the following error when it
works on the last txt file and prematurely exits:
Error in all.patterns[[word.length]] : subscript out of bounds
I can't figure out what this message means. the dir's are correct;
there's no problem with Tree Tagger installation; n and files have the
correct values.
Please help, many thanks!
Jiayue
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.