Hi, I'm pretty new to R and am trying to develop a reusable set of scripts
that I can use to profile various data types and common fields in our
database. I know that what I'm asking is a can of worms, so please bear
with me. :)

For example, we store a person's first name, last name, phone number, email
address, last gift amount, gift date, etc. as well as integer type data.
I'm wondering if there's a "best practice" for validating a field that
holds, for example, first name or last name. A couple of things I've come
up with are:
1) Count of characters (nchar) in the first (or last) name field
2) Number of unique tokens
3) Patterns (converting alpha to A and numeric to N) and count the
frequency of each unique pattern that results.I suppose I could make lower
case alpha 'a' and upper = 'A' to be more specific.
4) Min and max name (helps identify those with leading spaces, numbers)

Does anyone have more suggestions for techniques that are common or that
you'd recommend for name fields? Ultimately, I'm looking to develop a
common set of profiles for various data types, so if there's a white paper
(I've googled, but not found any that hit the mark yet) I'd love to see it.

Perhaps there's even a package for this type of thing?

Thanks much!

-- 
Jeff

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to