Dear all,
I have several character strings with a high number of different levels.
unique(x) gives me values in the range of 100-200.
This creates problems as I would like to use them as predictors in a coxph
model.
I therefore would like to convert each of these strings to a new string
(x_new).
x_new should be equal to x for the top n categories (i.e. the top n levels
with the highest occurrence) and NAN elsewhere.
For example, for n=3 x_new would have three levels: The three most common
levels of x + NAN.
Is there some convenient way of doing this?
Thanks in advance,
Michael
Michael Haenlein
Associate Professor of Marketing
ESCP Europe
Paris, France
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.