Suggest look at http://www.inside-r.org/packages/cran/tm/docs/stemDocument



-----Original Message-----
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Andy Wolfe
Sent: 26 July 2016 08:10
To: r-help@r-project.org
Subject: [R] word stemming for corpus linguistics

Hi list

On a piece of work I'm doing in corpus linguistics, using a combo of texts by 
Gries "Quantitative Corpus Linguistics with R: A Practical Introduction" and 
Jockers "Text Analysis with R for Students of Literature", which are both 
really excellent by the way, I want to stem or lemmatize the words so that, for 
e.g., 'facilitating', 'facilitated', and 'facilitates' all become 'facilit'.

In text mining, using a combination of the packages 'tm' and 'SnowballC' 
this is feasible, but then I am finding that working with the DTM (document 
term matrix) becomes difficult for when I want to do concordance (or key word 
in context) analysis.

So, two questions:

(1) is there a package for R version 3.3.1 that can work with corpus 
linguistics? and/ or

(2) is there a way of doing concordance analysis using the tm package as part 
of the whole text mining process?

I appreciate any help. Thanks.

Andy


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to