Thank you so much, Jessica, The specific of my case is that I have a very detailed variable 'Interests' which may have several thousands of possible values. Usually each customer has 3-10 different interests. For example: customer_id|...|interests 10000001 |...| cycling, swimming, cooking 10000002 |...| cooking, singing, dancing
Total number of possible distinct values is several thousands. I m curious how to use these interests in SVM (represent as a vector of real numbers with several thousands of elements?). If you have any ideas please let me know. Thank you, -Alex ________________________________ From: Jessica Streicher [j.streic...@micromata.de] Sent: 27 March 2012 11:18 To: Alekseiy Beloshitskiy Subject: Re: [R] normalization of multi-value string variable Well, not sure what you mean with scaling and normalizing strings, but if you want to represent the interests as numbers, you can do something like this: n<-seq(1,length(unique(my_strings)))[factor(my_strings)] Am 26.03.2012 um 18:50 schrieb Alekseiy Beloshitskiy: Hi All, I need to normalize/scale string variable which represents interests of customers (e.g., 'cycling, rollerblading, swimming' etc). Does anybody know how to do this, I want then use it along with other numeric variables for SVM classification. Appreciate for any advice. -Alex [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org<mailto:R-help@r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Velti anti-spam filter: Click here<https://www.mailcontrol.com/sr/r0FnbR2LtoLTndxI!oX7UvIItv2OGGpT0AcqlhvMu8o1Dzu7YBkufzUjcExl8H5fIQg52m9U+4B6aunJTqVygQ==> to report this email as spam. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.