Hi

Thanks for following up on this thread.

I've opted for this, albeit circuitous, route: use the tm package to stem the document and then use writeCorpus to write the stemmed document to disk, so that I can open it up and do the concordancing piece.

Many thanks - this'll do me fine until I come across a better (read, more elegant) solution.
Best
Andy


On 26/07/16 14:05, Paul Johnston wrote:
Hi

I use the tm_map() with stemDocument used as an argument

Looking at a particular file before stemming

writeLines(as.character(data_mined_volatile[[1]]))

## The European Union is a "force for social injustice" which backs "the haves 
rather than the have-nots", Iain Duncan Smith has said.
## The ex-work and pensions secretary said "uncontrolled migration" drove down 
wages and increased the cost of living.
## He appealed to people "who may have done OK from the EU" to "think about the 
people that haven't".
## But Labour's Alan Johnson said the EU protected workers and stopped them from being 
"exploited".
## The former Labour home secretary accused the Leave campaign of dismissing such 
protections as "red tape".
## In other EU referendum campaign developments:
## Thirteen former US secretaries of state and defence and national security advisers, including 
Madeleine Albright and Leon Panetta, say in a letter to the Times that the UK's "place and 
influence" in the world would be diminished if it left the EU - and Europe would be 
"dangerously weakened"
## A British Chambers of Commerce survey suggests most business people back 
Remain but the gap with those backing Leave has narrowed.
## Five former heads of Nato claimed the UK would lose influence and "give succour 
to its enemies" by leaving the EU - claims dismissed as scaremongering by Boris 
Johnson
## Mr Corbyn is launching his party's battle bus, saying Labour votes will be 
crucial if the Remain side is to win
## The official Scottish campaign to keep the UK in the European Union is due 
to be launched in Edinburgh
## Mr Duncan Smith's speech came after he told the Sun Germany had a "de facto veto" over 
David Cameron's EU renegotiations, with Angela Merkel blocking the PM's plans for an 
"emergency brake" on EU migration.
## Downing Street said curbs it negotiated on in-work benefits for EU migrants were a 
"more effective" way forward.
## Follow the latest developments on BBC EU referendum live
## Laura Kuenssberg: Can Leave win over the have-nots


Now look at the same text after stemming

corpus <- data_mined_volatile
corpus <- tm_map(corpus,stemDocument)

writeLines(as.character(corpus[[1]]))

## The European Union is a "forc for social injustice" which back "the have rather 
than the have-nots", Iain Duncan Smith has said.
## The ex-work and pension secretari said "uncontrol migration" drove down wage 
and increas the cost of living.
## He appeal to peopl "who may have done OK from the EU" to "think about the peopl 
that haven't".
## But Labour Alan Johnson said the EU protect worker and stop them from be 
"exploited".
## The former Labour home secretari accus the Leav campaign of dismiss such protect as 
"red tape".
2
## In other EU referendum campaign developments:
## Thirteen former US secretari of state and defenc and nation secur advisers, includ Madelein 
Albright and Leon Panetta, say in a letter to the Time that the UK "place and influence" 
in the world would be diminish if it left the EU - and Europ would be "danger weakened"
## A British Chamber of Commerc survey suggest most busi peopl back Remain but 
the gap with those back Leav has narrowed.
## Five former head of Nato claim the UK would lose influenc and "give succour to it 
enemies" by leav the EU - claim dismiss as scaremong by Bori Johnson
## Mr Corbyn is launch his parti battl bus, say Labour vote will be crucial if 
the Remain side is to win
## The offici Scottish campaign to keep the UK in the European Union is due to 
be launch in Edinburgh
## Mr Duncan Smith speech came after he told the Sun Germani had a "de facto veto" over 
David Cameron EU renegotiations, with Angela Merkel block the PM plan for an "emerg 
brake" on EU migration.
## Down Street said curb it negoti on in-work benefit for EU migrant were a "more 
effective" way forward.
## Follow the latest develop on BBC EU referendum live
## Laura Kuenssberg: Can Leav win over the have-not

-----Original Message-----
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Andy Wolfe
Sent: 26 July 2016 09:14
To: r-help@r-project.org
Subject: Re: [R] word stemming for corpus linguistics

Hi Paul

I have seen this - it's part of the tm package mentioned originally. So, I've 
tried it again and perhaps I'm using stemDocument incorrectly, but this is what 
I am doing:

# > library(tm)
Loading required package: NLP
  > text.v <- scan(file.choose(), what = 'char', sep = '\n') Read 938 items # 
>text.stem.v <- stemDocument(text.v, language = 'english')

But it isn't changing anything in the body of the text I'm passing to it
- the words are unlemmatized/ unstemmed.

When I try using SnowballC, the error returned is that tm_map doesn't have a 
method to work with objects of class 'character'.

Again, the problem is that tm doesn't seem to allow for concordance analysis 
... or perhaps it does and I just haven't figured out how to do it, so am happy 
to be shown some documentation on that process, and whether that is applied 
before or after the text is transformed into a DTM because searching on-line 
hasn't (yet) thrown anything back.

Thanks.
Andy


On 26/07/16 08:50, Paul Johnston wrote:
Suggest look at
http://www.inside-r.org/packages/cran/tm/docs/stemDocument



-----Original Message-----
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Andy
Wolfe
Sent: 26 July 2016 08:10
To: r-help@r-project.org
Subject: [R] word stemming for corpus linguistics

Hi list

On a piece of work I'm doing in corpus linguistics, using a combo of texts by Gries 
"Quantitative Corpus Linguistics with R: A Practical Introduction" and Jockers "Text 
Analysis with R for Students of Literature", which are both really excellent by the way, I 
want to stem or lemmatize the words so that, for e.g., 'facilitating', 'facilitated', and 
'facilitates' all become 'facilit'.

In text mining, using a combination of the packages 'tm' and 'SnowballC'
this is feasible, but then I am finding that working with the DTM (document 
term matrix) becomes difficult for when I want to do concordance (or key word 
in context) analysis.

So, two questions:

(1) is there a package for R version 3.3.1 that can work with corpus
linguistics? and/ or

(2) is there a way of doing concordance analysis using the tm package as part 
of the whole text mining process?

I appreciate any help. Thanks.

Andy


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to