Hi,

I'm trying to do text corpus processing on some novels, with koRpus package and Tree Tagger. The script lists all txt files (11 in all) in a dir, and processes it one by one.

##########
rm(list=ls())
library(koRpus)
library(koRpus.lang.en)
set.kRp.env(TT.cmd = "/pathto/tree-tagger-english", lang = "en")
outdir <- "/pathto/corpora"
corpdir <- paste0(outdir,"/","morrison11")

files <- list.files(path=corpdir, pattern = "*.txt", full.names = F)
n <- length(files)

output <- file(paste0(outdir,"/calc_results_morrison11.txt"), open="at")
for (i in 1:n) {
  cat(i," - ",files[i],"\n", file = output)
  tagged.results <- treetag(paste0(corpdir,'/',files[i]),
     treetagger="kRp.env")
  capture.output(flesch(tagged.results), file = output)
  cat("\n", file=output)
  capture.output(TTR(tagged.results), file = output)
  cat("\n", file=output)
  capture.output(textFeatures(tagged.results), file=output)
  cat("\n===========================\n", file = output)
}
close(output)
#########

The problem is, the script always throws the following error when it works on the last txt file and prematurely exits:

  Error in all.patterns[[word.length]] : subscript out of bounds

I can't figure out what this message means. the dir's are correct; there's no problem with Tree Tagger installation; n and files have the correct values.

Please help, many thanks!

Jiayue

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to