Hello Ashim and kind regards for you taking the time to answer back.
> library(fuzzyjoin) > ?stringdist_left_join -this will join two tables, but what I am trying to do is just standardize the similarly spelled duplicate names in just the first column of a single table. I don't think fuzzyjoin will help me in that regard. Thanks. Gregg Arizona, USA ------- Original Message ------- On Wednesday, June 15th, 2022 at 8:04 AM, Ashim Kapoor <ashimkap...@gmail.com> wrote: > > > Dear Gregg, > > Check this out: > > library(fuzzyjoin) > ?stringdist_left_join > > Best Regards, > Ashim > > On Wed, Jun 15, 2022 at 8:28 PM Gregg Powell via R-help > r-help@r-project.org wrote: > > > Have data sets where there are names, in the first column, client names in > > the second, and Client start date in the third. > > > > There are thousands of these records with thousands of names/clients/client > > start dates. The name is entered each time the person begins with a new > > client such that each person has many entries in the name column. Often the > > names were not entered in a consistent way. With and without middle > > initial, middle name, or various abbreviations such as ",RN" at the end of > > the name. > > > > Is there a package that can do fuzzy name matching so that the names in > > name column get replaced with a "standardized" format - where some type of > > machine learning can pick the most common spelling of each repeat name and > > replace the different variations with the common spelling? > > > > I included an example below. First table includes the names with the > > various spellings. Second table depicts what I hope to achieve. > > > > Again - this is on a large scale - there are something like 10,000 records > > with names that need to be standardized. > > > > Name > > > > Client > > > > Client Start Date > > > > John Good > > > > Client 1 > > > > 1/1/2020 > > > > Joe Jackson > > > > Client 2 > > > > 6/1/2020 > > > > Bob A. Barker > > > > Client 3 > > > > 8/1/2020 > > > > John B. Good > > > > Client 4 > > > > 10/1/2020 > > > > Joe J. Jackson > > > > Client 5 > > > > 12/1/2020 > > > > Bob Allen Barker > > > > Client 6 > > > > 1/1/2021 > > > > John Good > > > > Client 7 > > > > 5/1/2021 > > > > Joe Jack Jackson > > > > Client 8 > > > > 8/1/2021 > > > > Bob Barker > > > > Client 9 > > > > 12/1/2021 > > > > Name > > > > Client > > > > Client Start Date > > > > John Good > > > > Client 1 > > > > 1/1/2020 > > > > Joe J. Jackson > > > > Client 2 > > > > 6/1/2020 > > > > Bob A. Barker > > > > Client 3 > > > > 8/1/2020 > > > > John Good > > > > Client 4 > > > > 10/1/2020 > > > > Joe J. Jackson > > > > Client 5 > > > > 12/1/2020 > > > > Bob A. Barker > > > > Client 6 > > > > 1/1/2021 > > > > John Good > > > > Client 7 > > > > 5/1/2021 > > > > Joe J. Jackson > > > > Client 8 > > > > 8/1/2021 > > > > Bob A. Barker > > > > Client 9 > > > > 12/1/2021 > > > > THANKS! > > > > Gregg Powell > > > > Arizona, USA______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.
signature.asc
Description: OpenPGP digital signature
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.