Thank you Michael and David. I am onto agrep and adist and they look very useful for what I am wanting to do. My initial results are promising!
Brian On Nov 17, 2012, at 6:20 PM, R. Michael Weylandt wrote: > On Sat, Nov 17, 2012 at 11:00 PM, Brian Feeny <bfe...@mac.com> wrote: >> I am looking for a library/function in R that can compare two phrases and >> give me a score, or somehow classify them as correct as possible. >> >> The "phrases" are obfuscated/messy. I am not concerned about which is >> "correct" (for example spell checking), I am only concerned in grouping them >> so that I know they are the closest match. >> >> Example: >> >> I have ROW1 and ROW2 like so: >> >> ROW1 ROW2 >> hamburger helper bigmc heartkcatta >> chicken nuggets chicke, nuggets, jss >> bigmac heartattack some sombody somehwere >> somebody somehwere repleh regrubmah >> >> I am looking for something that can tell me that the best match for >> hamburger helper is repleh regrubmah, and the same for each other row. >> >> So my goal is to write a program that foreach phrase in ROW1 runs this >> function against ROW2 and gives me the phrase that scored best. >> >> I have read over much of the NLP packages at >> http://cran.r-project.org/web/views/NaturalLanguageProcessing.html >> >> I thought lsa might be a good fit, but I am not sure. I have limited time, >> so I am hoping someone can point me in a direction of what I am looking for. >> >> I have been searching for "text classifiers", perhaps this problem is >> referred to as something else. >> > > This is outside my expertise, but if memory serves, you might benefit > from googling the Levenshtein (spelling?) distance which allows this > sort of fuzzy matching of strings. > > MW ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.