Re: [R] Removing words and initials with tm

Bob Green Sat, 11 Apr 2015 04:07:54 -0700

Hello Sun,

The order of the TM transformations makes a lot of difference.

It isn't a shortcut, but if you identify all names you could createyour own Stop words list:


corpus  <-tm_map(corpus , removeWords, c("english", "  "))

In the case of York, Key Word in Context (KWIC) syntax could be usedto check how certain words are used. You could identify the wordsuseages you want to remove or retain and respectively rename therelevant instances.

This is labour intensive, but Greis in his Quantitative CorpusLinguistics, notes that sometimes time spent on trying to refine codemight be better spent on manual analysis (p164). This book includes aKWIC type function (page 127), but I haven't been able to work outhow to modify it to read more than six words either side of thespecified word. Six should be adequate for your purpose. Jockers bookalso includes a KWIC function but I don't believe it searches theentire corpus, rather a specified text.

I recently checked and TM doesn't have a KWIC function, but for the Rtalented (which excludes me) it might be possible to write one. Forexample, Jim Holtman once wrote a KWIC function to identify word usein a csv file.


Bob

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Removing words and initials with tm

Reply via email to