Hi Jim The name's come up on my radar, but that's about it. I'll look into it.
Thanks for the reference. All the best S On 10/04/15 23:36, Jim Lemon wrote: > Hi Sun, > No, I was thinking of something like hunspell, which seems to fit into > the sort of work that you are doing. > > Jim > > > On Fri, Apr 10, 2015 at 11:42 PM, Sun Shine <phaedr...@gmail.com > <mailto:phaedr...@gmail.com>> wrote: > > Thanks Jeff. > > I'll add that to the ever-growing list my current studies are > generating daily. :-) > > Cheers > S > > > > On 10/04/15 14:32, Jeff Newmiller wrote: > > "I suspect that it might have something to do with regular > expressions, but to be honest, I'm (currently) pretty crap > with those." > > I cannot think of a better incentive to take action on this > hole in your education and buckle down to learn regular > expressions. There are many books and tutorials available. > > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... > Go Live... > DCN:<jdnew...@dcn.davis.ca.us > <mailto:jdnew...@dcn.davis.ca.us>> Basics: ##.#. > ##.#. Live Go... > Live: OO#.. Dead: > OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. > rocks...1k > > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > On April 10, 2015 3:19:51 AM PDT, Sun Shine > <phaedr...@gmail.com <mailto:phaedr...@gmail.com>> wrote: > > Hi list > > Using the tm package, part of the pre-processing work is > to remove > words, etc. from the corpus. > > I wish to remove people's names and also their initials > which are > peppered throughout the corpus. But, because some people's > initials are > > the same as parts of common words - e.g. 'am' = 'became' > => 'bec e' or > 'ec' = 'because' => 'b ause' or 'ar' = 'arrival' => > 'rival' (which has > a > completely different meaning). > > Is there any way of doing this without leaving a trail of > nonsense > half-terms behind? I suspect that it might have something > to do with > regular expressions, but to be honest, I'm (currently) > pretty crap with > > those. > > Would it make a difference if I removed initials and names > *prior* to > converting all text to lower case, so I remove 'AM' and > because > 'became' > is lower case, it should remain unaffected? > > Any recommendations on how best to proceed with this? > > Thanks as always. > Sun > > ______________________________________________ > R-help@r-project.org <mailto:R-help@r-project.org> mailing > list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > > > > ______________________________________________ > R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.