Re: [R] normalization of multi-value string variable

Jessica Streicher Tue, 27 Mar 2012 06:28:09 -0700

Hm.. so what you need is either

- one new feature for each activity that has a binary value
e.g.:
cust_id , cycling, swimming, cooking
1001     , 1          , 0                , 1


- one new feature that has a value corresponding to a certain combination of 
activities
so if you had just the three activities you would have 2^3 possible values
I'm not sure how useful that would be though for the classification.

(Would need to think about how to compute this, i'm new to R as well. Would 
probably just iterate over the data)

If you make one feature per activity, and you end up having too many to 
properly compute the svm, you might try to reduce it by other methods, PCA 
comes to mind for example, though i never used that on "binary" data before.


Am 27.03.2012 um 11:34 schrieb Alekseiy Beloshitskiy:

> Thank you so much, Jessica,
> 
> The specific of my case is that I have a very detailed variable 'Interests' 
> which may have several thousands of possible values. Usually each customer 
> has 3-10 different interests. For example:
> customer_id|...|interests
> 10000001   |...| cycling, swimming, cooking
> 10000002   |...| cooking, singing, dancing
> 
> Total number of possible distinct values is several thousands. I m curious 
> how to use these interests in SVM (represent as a vector of real numbers with 
> several thousands of elements?).
> 
> If you have any ideas please let me know.
> 
> 
> Thank you,
> -Alex
> 
> From: Jessica Streicher [j.streic...@micromata.de]
> Sent: 27 March 2012 11:18
> To: Alekseiy Beloshitskiy
> Subject: Re: [R] normalization of multi-value string variable
> 
> Well, not sure what you mean with scaling and normalizing strings, but if you 
> want to represent the interests as numbers, you can do something like this:
> 
> n<-seq(1,length(unique(my_strings)))[factor(my_strings)]
> 
> 
> Am 26.03.2012 um 18:50 schrieb Alekseiy Beloshitskiy:
> 
>> Hi All,
>> 
>> I need to normalize/scale string variable which represents interests of 
>> customers (e.g., 'cycling, rollerblading, swimming' etc).
>> 
>> Does anybody know how to do this, I want then use it along with other 
>> numeric variables for SVM classification.
>> 
>> Appreciate for any advice.
>> 
>> -Alex
>> 
>> [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> Velti anti-spam filter: Click here to report this email as spam.


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] normalization of multi-value string variable

Reply via email to