Re: [R] word stemming for corpus linguistics

Andy Wolfe Tue, 26 Jul 2016 01:15:38 -0700

Hi Paul

I have seen this - it's part of the tm package mentioned originally. So,I've tried it again and perhaps I'm using stemDocument incorrectly, butthis is what I am doing:


# > library(tm)
Loading required package: NLP
> text.v <- scan(file.choose(), what = 'char', sep = '\n')
Read 938 items
# >text.stem.v <- stemDocument(text.v, language = 'english')

But it isn't changing anything in the body of the text I'm passing to it- the words are unlemmatized/ unstemmed.

When I try using SnowballC, the error returned is that tm_map doesn'thave a method to work with objects of class 'character'.

Again, the problem is that tm doesn't seem to allow for concordanceanalysis ... or perhaps it does and I just haven't figured out how to doit, so am happy to be shown some documentation on that process, andwhether that is applied before or after the text is transformed into aDTM because searching on-line hasn't (yet) thrown anything back.


Thanks.
Andy


On 26/07/16 08:50, Paul Johnston wrote:

Suggest look at http://www.inside-r.org/packages/cran/tm/docs/stemDocument

-----Original Message-----
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Andy Wolfe
Sent: 26 July 2016 08:10
To: r-help@r-project.org
Subject: [R] word stemming for corpus linguistics

Hi list

On a piece of work I'm doing in corpus linguistics, using a combo of texts by Gries 
"Quantitative Corpus Linguistics with R: A Practical Introduction" and Jockers "Text 
Analysis with R for Students of Literature", which are both really excellent by the way, I 
want to stem or lemmatize the words so that, for e.g., 'facilitating', 'facilitated', and 
'facilitates' all become 'facilit'.

In text mining, using a combination of the packages 'tm' and 'SnowballC'
this is feasible, but then I am finding that working with the DTM (document 
term matrix) becomes difficult for when I want to do concordance (or key word 
in context) analysis.

So, two questions:

(1) is there a package for R version 3.3.1 that can work with corpus 
linguistics? and/ or

(2) is there a way of doing concordance analysis using the tm package as part 
of the whole text mining process?

I appreciate any help. Thanks.

Andy

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] word stemming for corpus linguistics

Reply via email to