I am looking for a library/function in R that can compare two phrases and give 
me a score, or somehow classify them as correct as possible.

The "phrases" are obfuscated/messy.  I am not concerned about which is 
"correct" (for example spell checking), I am only concerned in grouping them
so that I know they are the closest match.

Example:

I have ROW1 and ROW2 like so:

ROW1                                                    ROW2
hamburger helper                                bigmc heartkcatta
chicken nuggets                                 chicke, nuggets, jss
bigmac heartattack                              some sombody somehwere
somebody somehwere                      repleh regrubmah

I am looking for something that can tell me that the best match for hamburger 
helper is repleh regrubmah, and the same for each other row.

So my goal is to write a program that foreach phrase in ROW1 runs this function 
against ROW2 and gives me the phrase that scored best.

I have read over much of the NLP packages at 
http://cran.r-project.org/web/views/NaturalLanguageProcessing.html

I thought lsa might be a good fit, but I am not sure.  I have limited time, so 
I am hoping someone can point me in a direction of what I am looking for.

I have been searching for "text classifiers", perhaps this problem is referred 
to as something else.

Brian

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to