Thank you Michael and David.  I am onto agrep and adist and they look very 
useful for what I am wanting to do.  My initial results are promising!

Brian

On Nov 17, 2012, at 6:20 PM, R. Michael Weylandt wrote:

> On Sat, Nov 17, 2012 at 11:00 PM, Brian Feeny <bfe...@mac.com> wrote:
>> I am looking for a library/function in R that can compare two phrases and 
>> give me a score, or somehow classify them as correct as possible.
>> 
>> The "phrases" are obfuscated/messy.  I am not concerned about which is 
>> "correct" (for example spell checking), I am only concerned in grouping them
>> so that I know they are the closest match.
>> 
>> Example:
>> 
>> I have ROW1 and ROW2 like so:
>> 
>> ROW1                                                    ROW2
>> hamburger helper                                bigmc heartkcatta
>> chicken nuggets                                 chicke, nuggets, jss
>> bigmac heartattack                              some sombody somehwere
>> somebody somehwere                      repleh regrubmah
>> 
>> I am looking for something that can tell me that the best match for 
>> hamburger helper is repleh regrubmah, and the same for each other row.
>> 
>> So my goal is to write a program that foreach phrase in ROW1 runs this 
>> function against ROW2 and gives me the phrase that scored best.
>> 
>> I have read over much of the NLP packages at 
>> http://cran.r-project.org/web/views/NaturalLanguageProcessing.html
>> 
>> I thought lsa might be a good fit, but I am not sure.  I have limited time, 
>> so I am hoping someone can point me in a direction of what I am looking for.
>> 
>> I have been searching for "text classifiers", perhaps this problem is 
>> referred to as something else.
>> 
> 
> This is outside my expertise, but if memory serves, you might benefit
> from googling the Levenshtein (spelling?) distance which allows this
> sort of fuzzy matching of strings.
> 
> MW

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to