[R] Patterns on postal codes

Jeff Johnson Tue, 07 Jan 2014 20:24:13 -0800

Hi all,

I'm pretty new to R and have a question. I have a postal_code field which
can have a variety of values such as:
For US postal codes: 22942-0173 or 32601
For Canada postal codes: N9YZE6 or S7V 1J9


What I want to do is represent these as patterns, such as:
US: NNNNN-NNNN or NNNNN
Canada: ANAAAN or ANA NAN
where N = any number and A = any alpha character, space = space, etc (other
characters such as ' should be represented as '.

Ultimately I want to count these to see how many have a pattern of
NNNNN-NNNN, ANA NAN, etc so that I can visualize the outliers.

Does anyone know if there is a built-in function in R to do this?
Currently, the str() function on the postal_code field shows a factor with
90,993 levels which isn't particularly helpful.

Thanks in advance!

-- 
Jeff

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Patterns on postal codes

Reply via email to