[R] Stemming functions only work on the last word of plain text documents

Christian Timmermann Mon, 05 Sep 2011 01:15:17 -0700

Hello,


I want to use the SnowballStemmer on a collection of plain text documents. 
However, when I apply it to my corpus using the tm_map function it only stems 
the last word of each document (The problem is the for wordStem and 
stemDocument does not work at all).  An example:


> path <- c("c:\path\to\directory")       # collection of plain text documents
> corp <- Corpus(DirSource(path), readerControl = list(reader = readPlain, 
> language = "en_US" , load = T))

> inspect(corp)
A corpus with 2 text documents

The metadata consists of 2 tag-value pairs and a data frame
Available tags are:
  create_date creator 
Available variables in the data frame are:
  MetaID 

$`1.txt`
running runs runners

$`2.txt`
happyness happies

> corp2<-tm_map(corp, SnowballStemmer)
> inspect(corp2)
A corpus with 2 text documents

The metadata consists of 2 tag-value pairs and a data frame
Available tags are:
  create_date creator 
Available variables in the data frame are:
  MetaID 

$`1.txt`
[1] running runs runn

$`2.txt`
[1] happyness happi


How can I get the stemming function to work?
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Stemming functions only work on the last word of plain text documents

Reply via email to