Be reminded that s1 and s2 are only the indexes on AD_0 and AD_1 of the data which you want to keep.
therefore traindata <- rbind(s1,s2) will not work. you need to take data from AD_0 and AD_0 for that, similarly with what you did with s3 and s4. Am 26.04.2012 um 12:56 schrieb Dwaipayan Dasgupta: > Hi , > Thanks again for helping me out. > Here is the code I am using > Ad_1 <- subset(Attrition_data_1,Attrition_ind=="1") > Ad_0 <- subset(Attrition_data_1,Attrition_ind=="0") > > s1<-sample(1:dim(Ad_0)[1],0.8*dim(Ad_0)[1])# 80% of the non-attrites > s2<-sample(1:dim(Ad_1)[1],0.8*dim(Ad_1)[1])# 80% of attritees > > s3<- Ad_0 [-s1,] > summary(s3) > > s4<- Ad_1 [-s2,] > summary(s4) > > traindata <- rbind(s1,s2) > testdata <- rbind(s3,s4) > > this works for the test dataset but throws up an error of > Warning message: > In rbind(s1, s2) : > number of columns of result is not a multiple of vector length (arg 2) > > I understand that I am trying to append vectors of unequal vector lengths but > dont know how to work around this process. > Would you help please > > Thanks, > Dwaipayan > > From: Jessica Streicher [mailto:j.streic...@micromata.de] > Sent: Wednesday, April 25, 2012 9:25 PM > To: Dwaipayan Dasgupta > Cc: r-help@r-project.org > Subject: Re: [R] Splitting data into test and train (80:20) kepping > attributes similar > > Don't know whats wrong there (except if you're using the eclipse R plugin on > a mac like me and the window for choosing the download site doesn't pop up.. > did it?^^) > > Anyway, you could just split all of your data into 2 datasets, one that has > all the data labeled 0, the other for all labeled 1, then take a random 80% > of both, put them back together into the 80% data, and put the rest back > together to form the 20%. > > Since i don't know your data, heres an example: > M<-cbind(c(rep(0,10),rep(1,10)),1:20) > > M > [,1] [,2] > [1,] 0 1 > [2,] 0 2 > [3,] 0 3 > [4,] 0 4 > [5,] 0 5 > [6,] 0 6 > [7,] 0 7 > [8,] 0 8 > [9,] 0 9 > [10,] 0 10 > [11,] 1 11 > [12,] 1 12 > [13,] 1 13 > [14,] 1 14 > [15,] 1 15 > [16,] 1 16 > [17,] 1 17 > [18,] 1 18 > [19,] 1 19 > [20,] 1 20 > > index1<-which(M[,1]==1) > > index1 > [1] 11 12 13 14 15 16 17 18 19 20 > > > M1<-M[index1,] > > M1 > [,1] [,2] > [1,] 1 11 > [2,] 1 12 > [3,] 1 13 > [4,] 1 14 > [5,] 1 15 > [6,] 1 16 > [7,] 1 17 > [8,] 1 18 > [9,] 1 19 > [10,] 1 20 > > > M0<-M[-index1,] > > M0 > [,1] [,2] > [1,] 0 1 > [2,] 0 2 > [3,] 0 3 > [4,] 0 4 > [5,] 0 5 > [6,] 0 6 > [7,] 0 7 > [8,] 0 8 > [9,] 0 9 > [10,] 0 10 > > > s1<-sample(1:dim(M1)[1],0.8*dim(M1)[1]) > > s1 > [1] 10 3 5 9 2 6 8 7 > > > s0<-sample(1:dim(M0)[1],0.8*dim(M0)[1]) > > s0 > [1] 8 10 9 3 7 4 2 1 > > > data80<-rbind(M1[s1,],M0[s0,]) > > data80 > [,1] [,2] > [1,] 1 20 > [2,] 1 13 > [3,] 1 15 > [4,] 1 19 > [5,] 1 12 > [6,] 1 16 > [7,] 1 18 > [8,] 1 17 > [9,] 0 8 > [10,] 0 10 > [11,] 0 9 > [12,] 0 3 > [13,] 0 7 > [14,] 0 4 > [15,] 0 2 > [16,] 0 1 > > > data20<-rbind(M1[-s1,],M0[-s0,]) > > data20 > [,1] [,2] > [1,] 1 11 > [2,] 1 14 > [3,] 0 5 > [4,] 0 6 > > > which is probably not how you really do things efficiently, but it should > work. > > greetings > Jessi > > > Am 25.04.2012 um 17:18 schrieb Dwaipayan Dasgupta: > > > Thank you so much for replying. I tried what you said but it still throws the > same error i.e could not find function "sample.split" > Might be because of the version of R I am running (R version 2.12.2).i do not > have admin rights to upgrade to the newest version. > Is there anything else I can try? Im trying to split my data into 80:20 > keeping the ratio of 0,1 in the Y variable(binary) constant. > > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Jessica Streicher > Sent: Wednesday, April 25, 2012 7:17 PM > To: r-help@r-project.org > Subject: Re: [R] Splitting data into test and train (80:20) kepping > attributes similar > > Well, it throws an error, because there is no such function in default R. A > bit of googling showed it might be the one in the caTools package. > > execute this: > install.packages("caTools") > library(caTools) > > before executing your code > > > Am 25.04.2012 um 12:39 schrieb Dwaipayan Dasgupta: > > > Hi, > Could someone help me with this please , im trying to use > Y = Attrition_data[,1] # extract labels from the data > msk = sample.split (Y, SplitRatio=3/4) > table(Y,msk) > to do the splitting but it keeps throwing up and error > Error: could not find function "sample.split" > Could you please help > > Thanks in advance > doy > > > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Dwaipayan Dasgupta > Sent: Tuesday, April 24, 2012 9:08 PM > To: r-help@r-project.org > Subject: [R] Splitting data into test and train (80:20) kepping attributes > similar > > Hi, > I am trying to do some predictive modeling around attrition and want to split > the dataset into test and train (80:20) and keep the ratio of attritees:non > attrites same. > In my dataset the attrition indicator is coded as 0(for non-attritees) and 1 > (for attritees) and I want to keep the ratio of 0's to 1 similar. > I apologize for this trivial question but this is my second week with R. > > Thanks, > Doy > > > > > > American Express made the following annotations on Tue Apr 24 2012 08:38:50 > > ****************************************************************************** > > "This message and any attachments are solely for the intended recipient and > may contain confidential or privileged information. If you are not the > intended recipient, any disclosure, copying, use, or distribution of the > information included in this message and any attachments is prohibited. If > you have received this communication in error, please notify us by reply > e-mail and immediately and permanently delete this message and any > attachments. Thank you." > > American Express a ajouté le commentaire suivant le Tue Apr 24 2012 08:38:50 > > Ce courrier et toute pièce jointe qu'il contient sont réservés au seul > destinataire indiqué et peuvent renfermer des renseignements confidentiels et > privilégiés. Si vous n'êtes pas le destinataire prévu, toute divulgation, > duplication, utilisation ou distribution du courrier ou de toute pièce jointe > est interdite. Si vous avez reçu cette communication par erreur, veuillez > nous en aviser par courrier et détruire immédiatement le courrier et les > pièces jointes. Merci. > > ****************************************************************************** > ------------------------------------------------------------------------------- > > > [[alternative HTML version deleted]] > > > American Express made the following annotations on Wed Apr 25 2012 03:39:08 > > ****************************************************************************** > > "This message and any attachments are solely for the intended recipient and > may contain confidential or privileged information. If you are not the > intended recipient, any disclosure, copying, use, or distribution of the > information included in this message and any attachments is prohibited. If > you have received this communication in error, please notify us by reply > e-mail and immediately and permanently delete this message and any > attachments. Thank you." > > American Express a ajouté le commentaire suivant le Wed Apr 25 2012 03:39:08 > > Ce courrier et toute pièce jointe qu'il contient sont réservés au seul > destinataire indiqué et peuvent renfermer des renseignements confidentiels et > privilégiés. Si vous n'êtes pas le destinataire prévu, toute divulgation, > duplication, utilisation ou distribution du courrier ou de toute pièce jointe > est interdite. Si vous avez reçu cette communication par erreur, veuillez > nous en aviser par courrier et détruire immédiatement le courrier et les > pièces jointes. Merci. > > ****************************************************************************** > ------------------------------------------------------------------------------- > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > American Express made the following annotations on Wed Apr 25 2012 08:18:52 > > ****************************************************************************** > > > "This message and any attachments are solely for the intended recipient and > may contain confidential or privileged information. If you are not the > intended recipient, any disclosure, copying, use, or distribution of the > information included in this message and any attachments is prohibited. If > you have received this communication in error, please notify us by reply > e-mail and immediately and permanently delete this message and any > attachments. Thank you." > > American Express a ajouté le commentaire suivant le Wed Apr 25 2012 08:18:52 > > Ce courrier et toute pièce jointe qu'il contient sont réservés au seul > destinataire indiqué et peuvent renfermer des renseignements confidentiels et > privilégiés. Si vous n'êtes pas le destinataire prévu, toute divulgation, > duplication, utilisation ou distribution du courrier ou de toute pièce jointe > est interdite. Si vous avez reçu cette communication par erreur, veuillez > nous en aviser par courrier et détruire immédiatement le courrier et les > pièces jointes. Merci. > > ****************************************************************************** > > ------------------------------------------------------------------------------- > > > American Express made the following annotations on Thu Apr 26 2012 03:56:52 > ****************************************************************************** > > "This message and any attachments are solely for the intended recipient and > may contain confidential or privileged information. If you are not the > intended recipient, any disclosure, copying, use, or distribution of the > information included in this message and any attachments is prohibited. If > you have received this communication in error, please notify us by reply > e-mail and immediately and permanently delete this message and any > attachments. Thank you." > American Express a ajouté le commentaire suivant le Thu Apr 26 2012 03:56:52 > Ce courrier et toute pièce jointe qu'il contient sont réservés au seul > destinataire indiqué et peuvent renfermer des renseignements confidentiels et > privilégiés. Si vous n'êtes pas le destinataire prévu, toute divulgation, > duplication, utilisation ou distribution du courrier ou de toute pièce jointe > est interdite. Si vous avez reçu cette communication par erreur, veuillez > nous en aviser par courrier et détruire immédiatement le courrier et les > pièces jointes. Merci. > ****************************************************************************** > > > ------------------------------------------------------------------------------- > > > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.