Hi

I use the tm_map() with stemDocument used as an argument

Looking at a particular file before stemming

writeLines(as.character(data_mined_volatile[[1]]))

## The European Union is a "force for social injustice" which backs "the haves 
rather than the have-nots", Iain Duncan Smith has said.
## The ex-work and pensions secretary said "uncontrolled migration" drove down 
wages and increased the cost of living.
## He appealed to people "who may have done OK from the EU" to "think about the 
people that haven't".
## But Labour's Alan Johnson said the EU protected workers and stopped them 
from being "exploited".
## The former Labour home secretary accused the Leave campaign of dismissing 
such protections as "red tape".
## In other EU referendum campaign developments:
## Thirteen former US secretaries of state and defence and national security 
advisers, including Madeleine Albright and Leon Panetta, say in a letter to the 
Times that the UK's "place and influence" in the world would be diminished if 
it left the EU - and Europe would be "dangerously weakened"
## A British Chambers of Commerce survey suggests most business people back 
Remain but the gap with those backing Leave has narrowed.
## Five former heads of Nato claimed the UK would lose influence and "give 
succour to its enemies" by leaving the EU - claims dismissed as scaremongering 
by Boris Johnson
## Mr Corbyn is launching his party's battle bus, saying Labour votes will be 
crucial if the Remain side is to win
## The official Scottish campaign to keep the UK in the European Union is due 
to be launched in Edinburgh
## Mr Duncan Smith's speech came after he told the Sun Germany had a "de facto 
veto" over David Cameron's EU renegotiations, with Angela Merkel blocking the 
PM's plans for an "emergency brake" on EU migration.
## Downing Street said curbs it negotiated on in-work benefits for EU migrants 
were a "more effective" way forward.
## Follow the latest developments on BBC EU referendum live
## Laura Kuenssberg: Can Leave win over the have-nots


Now look at the same text after stemming

corpus <- data_mined_volatile
corpus <- tm_map(corpus,stemDocument)

writeLines(as.character(corpus[[1]]))

## The European Union is a "forc for social injustice" which back "the have 
rather than the have-nots", Iain Duncan Smith has said.
## The ex-work and pension secretari said "uncontrol migration" drove down wage 
and increas the cost of living.
## He appeal to peopl "who may have done OK from the EU" to "think about the 
peopl that haven't".
## But Labour Alan Johnson said the EU protect worker and stop them from be 
"exploited".
## The former Labour home secretari accus the Leav campaign of dismiss such 
protect as "red tape".
2
## In other EU referendum campaign developments:
## Thirteen former US secretari of state and defenc and nation secur advisers, 
includ Madelein Albright and Leon Panetta, say in a letter to the Time that the 
UK "place and influence" in the world would be diminish if it left the EU - and 
Europ would be "danger weakened"
## A British Chamber of Commerc survey suggest most busi peopl back Remain but 
the gap with those back Leav has narrowed.
## Five former head of Nato claim the UK would lose influenc and "give succour 
to it enemies" by leav the EU - claim dismiss as scaremong by Bori Johnson
## Mr Corbyn is launch his parti battl bus, say Labour vote will be crucial if 
the Remain side is to win
## The offici Scottish campaign to keep the UK in the European Union is due to 
be launch in Edinburgh
## Mr Duncan Smith speech came after he told the Sun Germani had a "de facto 
veto" over David Cameron EU renegotiations, with Angela Merkel block the PM 
plan for an "emerg brake" on EU migration.
## Down Street said curb it negoti on in-work benefit for EU migrant were a 
"more effective" way forward.
## Follow the latest develop on BBC EU referendum live
## Laura Kuenssberg: Can Leav win over the have-not

-----Original Message-----
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Andy Wolfe
Sent: 26 July 2016 09:14
To: r-help@r-project.org
Subject: Re: [R] word stemming for corpus linguistics

Hi Paul

I have seen this - it's part of the tm package mentioned originally. So, I've 
tried it again and perhaps I'm using stemDocument incorrectly, but this is what 
I am doing:

# > library(tm)
Loading required package: NLP
 > text.v <- scan(file.choose(), what = 'char', sep = '\n') Read 938 items # 
 > >text.stem.v <- stemDocument(text.v, language = 'english')

But it isn't changing anything in the body of the text I'm passing to it
- the words are unlemmatized/ unstemmed.

When I try using SnowballC, the error returned is that tm_map doesn't have a 
method to work with objects of class 'character'.

Again, the problem is that tm doesn't seem to allow for concordance analysis 
... or perhaps it does and I just haven't figured out how to do it, so am happy 
to be shown some documentation on that process, and whether that is applied 
before or after the text is transformed into a DTM because searching on-line 
hasn't (yet) thrown anything back.

Thanks.
Andy


On 26/07/16 08:50, Paul Johnston wrote:
> Suggest look at 
> http://www.inside-r.org/packages/cran/tm/docs/stemDocument
>
>
>
> -----Original Message-----
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Andy 
> Wolfe
> Sent: 26 July 2016 08:10
> To: r-help@r-project.org
> Subject: [R] word stemming for corpus linguistics
>
> Hi list
>
> On a piece of work I'm doing in corpus linguistics, using a combo of texts by 
> Gries "Quantitative Corpus Linguistics with R: A Practical Introduction" and 
> Jockers "Text Analysis with R for Students of Literature", which are both 
> really excellent by the way, I want to stem or lemmatize the words so that, 
> for e.g., 'facilitating', 'facilitated', and 'facilitates' all become 
> 'facilit'.
>
> In text mining, using a combination of the packages 'tm' and 'SnowballC'
> this is feasible, but then I am finding that working with the DTM (document 
> term matrix) becomes difficult for when I want to do concordance (or key word 
> in context) analysis.
>
> So, two questions:
>
> (1) is there a package for R version 3.3.1 that can work with corpus 
> linguistics? and/ or
>
> (2) is there a way of doing concordance analysis using the tm package as part 
> of the whole text mining process?
>
> I appreciate any help. Thanks.
>
> Andy
>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to