Hi
Thanks for following up on this thread.
I've opted for this, albeit circuitous, route: use the tm package to
stem the document and then use writeCorpus to write the stemmed document
to disk, so that I can open it up and do the concordancing piece.
Many thanks - this'll do me fine until I come across a better (read,
more elegant) solution.
Best
Andy
On 26/07/16 14:05, Paul Johnston wrote:
Hi
I use the tm_map() with stemDocument used as an argument
Looking at a particular file before stemming
writeLines(as.character(data_mined_volatile[[1]]))
## The European Union is a "force for social injustice" which backs "the haves
rather than the have-nots", Iain Duncan Smith has said.
## The ex-work and pensions secretary said "uncontrolled migration" drove down
wages and increased the cost of living.
## He appealed to people "who may have done OK from the EU" to "think about the
people that haven't".
## But Labour's Alan Johnson said the EU protected workers and stopped them from being
"exploited".
## The former Labour home secretary accused the Leave campaign of dismissing such
protections as "red tape".
## In other EU referendum campaign developments:
## Thirteen former US secretaries of state and defence and national security advisers, including
Madeleine Albright and Leon Panetta, say in a letter to the Times that the UK's "place and
influence" in the world would be diminished if it left the EU - and Europe would be
"dangerously weakened"
## A British Chambers of Commerce survey suggests most business people back
Remain but the gap with those backing Leave has narrowed.
## Five former heads of Nato claimed the UK would lose influence and "give succour
to its enemies" by leaving the EU - claims dismissed as scaremongering by Boris
Johnson
## Mr Corbyn is launching his party's battle bus, saying Labour votes will be
crucial if the Remain side is to win
## The official Scottish campaign to keep the UK in the European Union is due
to be launched in Edinburgh
## Mr Duncan Smith's speech came after he told the Sun Germany had a "de facto veto" over
David Cameron's EU renegotiations, with Angela Merkel blocking the PM's plans for an
"emergency brake" on EU migration.
## Downing Street said curbs it negotiated on in-work benefits for EU migrants were a
"more effective" way forward.
## Follow the latest developments on BBC EU referendum live
## Laura Kuenssberg: Can Leave win over the have-nots
Now look at the same text after stemming
corpus <- data_mined_volatile
corpus <- tm_map(corpus,stemDocument)
writeLines(as.character(corpus[[1]]))
## The European Union is a "forc for social injustice" which back "the have rather
than the have-nots", Iain Duncan Smith has said.
## The ex-work and pension secretari said "uncontrol migration" drove down wage
and increas the cost of living.
## He appeal to peopl "who may have done OK from the EU" to "think about the peopl
that haven't".
## But Labour Alan Johnson said the EU protect worker and stop them from be
"exploited".
## The former Labour home secretari accus the Leav campaign of dismiss such protect as
"red tape".
2
## In other EU referendum campaign developments:
## Thirteen former US secretari of state and defenc and nation secur advisers, includ Madelein
Albright and Leon Panetta, say in a letter to the Time that the UK "place and influence"
in the world would be diminish if it left the EU - and Europ would be "danger weakened"
## A British Chamber of Commerc survey suggest most busi peopl back Remain but
the gap with those back Leav has narrowed.
## Five former head of Nato claim the UK would lose influenc and "give succour to it
enemies" by leav the EU - claim dismiss as scaremong by Bori Johnson
## Mr Corbyn is launch his parti battl bus, say Labour vote will be crucial if
the Remain side is to win
## The offici Scottish campaign to keep the UK in the European Union is due to
be launch in Edinburgh
## Mr Duncan Smith speech came after he told the Sun Germani had a "de facto veto" over
David Cameron EU renegotiations, with Angela Merkel block the PM plan for an "emerg
brake" on EU migration.
## Down Street said curb it negoti on in-work benefit for EU migrant were a "more
effective" way forward.
## Follow the latest develop on BBC EU referendum live
## Laura Kuenssberg: Can Leav win over the have-not
-----Original Message-----
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Andy Wolfe
Sent: 26 July 2016 09:14
To: r-help@r-project.org
Subject: Re: [R] word stemming for corpus linguistics
Hi Paul
I have seen this - it's part of the tm package mentioned originally. So, I've
tried it again and perhaps I'm using stemDocument incorrectly, but this is what
I am doing:
# > library(tm)
Loading required package: NLP
> text.v <- scan(file.choose(), what = 'char', sep = '\n') Read 938 items #
>text.stem.v <- stemDocument(text.v, language = 'english')
But it isn't changing anything in the body of the text I'm passing to it
- the words are unlemmatized/ unstemmed.
When I try using SnowballC, the error returned is that tm_map doesn't have a
method to work with objects of class 'character'.
Again, the problem is that tm doesn't seem to allow for concordance analysis
... or perhaps it does and I just haven't figured out how to do it, so am happy
to be shown some documentation on that process, and whether that is applied
before or after the text is transformed into a DTM because searching on-line
hasn't (yet) thrown anything back.
Thanks.
Andy
On 26/07/16 08:50, Paul Johnston wrote:
Suggest look at
http://www.inside-r.org/packages/cran/tm/docs/stemDocument
-----Original Message-----
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Andy
Wolfe
Sent: 26 July 2016 08:10
To: r-help@r-project.org
Subject: [R] word stemming for corpus linguistics
Hi list
On a piece of work I'm doing in corpus linguistics, using a combo of texts by Gries
"Quantitative Corpus Linguistics with R: A Practical Introduction" and Jockers "Text
Analysis with R for Students of Literature", which are both really excellent by the way, I
want to stem or lemmatize the words so that, for e.g., 'facilitating', 'facilitated', and
'facilitates' all become 'facilit'.
In text mining, using a combination of the packages 'tm' and 'SnowballC'
this is feasible, but then I am finding that working with the DTM (document
term matrix) becomes difficult for when I want to do concordance (or key word
in context) analysis.
So, two questions:
(1) is there a package for R version 3.3.1 that can work with corpus
linguistics? and/ or
(2) is there a way of doing concordance analysis using the tm package as part
of the whole text mining process?
I appreciate any help. Thanks.
Andy
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.