Hello Ashim and kind regards for you taking the time to answer back.

> library(fuzzyjoin)
> ?stringdist_left_join

-this will join two tables, but what I am trying to do is just standardize the 
similarly spelled duplicate names in just the first column of a single table.

I don't think fuzzyjoin will help me in that regard.

Thanks.
Gregg
Arizona, USA

------- Original Message -------
On Wednesday, June 15th, 2022 at 8:04 AM, Ashim Kapoor <ashimkap...@gmail.com> 
wrote:


> 

> 

> Dear Gregg,
> 

> Check this out:
> 

> library(fuzzyjoin)
> ?stringdist_left_join
> 

> Best Regards,
> Ashim
> 

> On Wed, Jun 15, 2022 at 8:28 PM Gregg Powell via R-help
> r-help@r-project.org wrote:
> 

> > Have data sets where there are names, in the first column, client names in 
> > the second, and Client start date in the third.
> > 

> > There are thousands of these records with thousands of names/clients/client 
> > start dates. The name is entered each time the person begins with a new 
> > client such that each person has many entries in the name column. Often the 
> > names were not entered in a consistent way. With and without middle 
> > initial, middle name, or various abbreviations such as ",RN" at the end of 
> > the name.
> > 

> > Is there a package that can do fuzzy name matching so that the names in 
> > name column get replaced with a "standardized" format - where some type of 
> > machine learning can pick the most common spelling of each repeat name and 
> > replace the different variations with the common spelling?
> > 

> > I included an example below. First table includes the names with the 
> > various spellings. Second table depicts what I hope to achieve.
> > 

> > Again - this is on a large scale - there are something like 10,000 records 
> > with names that need to be standardized.
> > 

> > Name
> > 

> > Client
> > 

> > Client Start Date
> > 

> > John Good
> > 

> > Client 1
> > 

> > 1/1/2020
> > 

> > Joe Jackson
> > 

> > Client 2
> > 

> > 6/1/2020
> > 

> > Bob A. Barker
> > 

> > Client 3
> > 

> > 8/1/2020
> > 

> > John B. Good
> > 

> > Client 4
> > 

> > 10/1/2020
> > 

> > Joe J. Jackson
> > 

> > Client 5
> > 

> > 12/1/2020
> > 

> > Bob Allen Barker
> > 

> > Client 6
> > 

> > 1/1/2021
> > 

> > John Good
> > 

> > Client 7
> > 

> > 5/1/2021
> > 

> > Joe Jack Jackson
> > 

> > Client 8
> > 

> > 8/1/2021
> > 

> > Bob Barker
> > 

> > Client 9
> > 

> > 12/1/2021
> > 

> > Name
> > 

> > Client
> > 

> > Client Start Date
> > 

> > John Good
> > 

> > Client 1
> > 

> > 1/1/2020
> > 

> > Joe J. Jackson
> > 

> > Client 2
> > 

> > 6/1/2020
> > 

> > Bob A. Barker
> > 

> > Client 3
> > 

> > 8/1/2020
> > 

> > John Good
> > 

> > Client 4
> > 

> > 10/1/2020
> > 

> > Joe J. Jackson
> > 

> > Client 5
> > 

> > 12/1/2020
> > 

> > Bob A. Barker
> > 

> > Client 6
> > 

> > 1/1/2021
> > 

> > John Good
> > 

> > Client 7
> > 

> > 5/1/2021
> > 

> > Joe J. Jackson
> > 

> > Client 8
> > 

> > 8/1/2021
> > 

> > Bob A. Barker
> > 

> > Client 9
> > 

> > 12/1/2021
> > 

> > THANKS!
> > 

> > Gregg Powell
> > 

> > Arizona, USA______________________________________________
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

Attachment: signature.asc
Description: OpenPGP digital signature

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to