Hi all, I'm pretty new to R and have a question. I have a postal_code field which can have a variety of values such as: For US postal codes: 22942-0173 or 32601 For Canada postal codes: N9YZE6 or S7V 1J9
What I want to do is represent these as patterns, such as: US: NNNNN-NNNN or NNNNN Canada: ANAAAN or ANA NAN where N = any number and A = any alpha character, space = space, etc (other characters such as ' should be represented as '. Ultimately I want to count these to see how many have a pattern of NNNNN-NNNN, ANA NAN, etc so that I can visualize the outliers. Does anyone know if there is a built-in function in R to do this? Currently, the str() function on the postal_code field shows a factor with 90,993 levels which isn't particularly helpful. Thanks in advance! -- Jeff [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.