> > The best encoding depends upon which language you would like to manipulate > > the variable in. In R, genders are most naturally represented as factors. > > That means that in an external data source (like a spreadsheet of data), > > you should ideally have the gender recorded as human-understandable text > > ("male" and "female", or "M" and "F"). Once the data is read into R, by > > default R will convert the string to factors (keeping the human readable > > labels). This way you avoid having to remember that 1 means male (or > > whatever). > > > > If you were manipulating the data in a different language that didn't have > > factors, then it might be more appropriate to use an integer. Which > > integers you use doesn't matter, you need to have a look-up table to know > > what each number refers to, whatever you choose. > > > Yes, that's what I thought. However somebody told me that it is better > to use 1/2 rather than 0/1 for a 2 level factor such as gender, and I've > no idea why. I told them it didn't matter, but have since seen quite a > few examples where they use 1/2 (admittedly in SPSS).
The only benefit that I can see of using 1/2 instead of 0/1 is fairly minor. If you have cases where there are missing values, and you are working in a language that doesn't support NA values for integers (or factors; I'm thinking of something like C), then you could encode your genders as 0: not recorded 1: female 2: male Then you can include logic like if(gender) { do something } The alternative encoding of 0/1, would be something like -1: not recorded 0: female 1: male This makes the code slightly less pretty. if(gender != -1) { do something } Again, none of this really applies to R, since you should be using factors for this sort of variable. Regards, Richie. Mathematical Sciences Unit HSL ------------------------------------------------------------------------ ATTENTION: This message contains privileged and confidential inform...{{dropped:20}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.