Re: [R] Randomly drop a percent of data from a data.frame

Richard Kwock Fri, 16 Aug 2013 17:27:15 -0700

Try this:

data <- data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5))
data <- round(data,digits=3)


#get the total counts
n = prod(dim(data))

#set up a dummy array/matrix
dummy <- rep(F, n/2)
dummy[sample(1:(n/2), n*.2)] <- T

# 5x2 dummy matrix with T and F
matrix(dummy, nc = 2)

#subset the T indices in x3 and x4 and replace with NAs
data[,c("x3", "x4")][matrix(dummy, nc = 2)]  <- NA

data

#      x1     x2     x3     x4
#1 -1.310  0.659     NA  0.510
#2 -3.003 -0.004     NA     NA
#3  0.584  0.310     NA -0.087
#4  1.644 -2.792 -0.390 -0.382
#5 -1.791  0.840  1.137  0.820

Richard


On Fri, Aug 16, 2013 at 2:34 PM, arun <[email protected]> wrote:

> Hi,
> May be this helps:
> #data1 (changed `data` to `data1`)
> set.seed(6245)
>  data1 <- data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5))
>  data1<- round(data1,digits=3)
>
> data2<- data1
>
> data1[,3:4]<-lapply(data1[,3:4],function(x){x1<-
> match(x,sample(unlist(data1[,3:4]),round(0.8*length(unlist(data1[,3:4])))));x[
> is.na(x1)]<-NA;x})
>  data1
> #      x1     x2     x3     x4
> #1  0.482  1.320     NA -0.142
> #2 -0.753 -0.041 -0.063  0.886
> #3  0.028 -0.256 -0.069  0.354
> #4 -0.086  0.475  0.244  0.781
> #5  0.690 -0.181  1.274  1.633
>
>
> #or
> data2[,3:4]<-lapply(data2[,3:4],function(x){x1<-
> match(x,sample(unlist(data2[,3:4]),round(0.8*length(unlist(data2[,3:4])))));x[
> is.na(x1)]<-NA;x})
>  data2
> #      x1     x2     x3     x4
> #1  0.482  1.320 -0.859 -0.142
> #2 -0.753 -0.041     NA     NA
> #3  0.028 -0.256 -0.069  0.354
> #4 -0.086  0.475  0.244  0.781
> #5  0.690 -0.181  1.274  1.633
> A.K.
>
>
>
> ----- Original Message -----
> From: Christopher Desjardins <[email protected]>
> To: "[email protected]" <[email protected]>
> Cc:
> Sent: Friday, August 16, 2013 3:02 PM
> Subject: [R] Randomly drop a percent of data from a data.frame
>
> Hi,
> I have the following data.
>
> > set.seed(6245)
> > data <- data.frame(x1=rnorm(5),x2=rnorm(5),x3=rnorm(5),x4=rnorm(5))
> > round(data,digits=3)
>       x1     x2     x3     x4
> 1  0.482  1.320 -0.859 -0.142
> 2 -0.753 -0.041 -0.063  0.886
> 3  0.028 -0.256 -0.069  0.354
> 4 -0.086  0.475  0.244  0.781
> 5  0.690 -0.181  1.274  1.633
>
> What I would like to do is drop 20% of the data. But I want this 20% to
> only come from dropping data from x3 and x4. It doesn't have to be evenly,
> i.e. I don't care to drop 2 from x3 and 2 from x4 or make sure only one
> observation has missing data on only one variable. I just want to drop 20%
> of the data through x3 and x4 only.  In other words,
>
>        x1     x2     x3     x4
> 1  0.482  1.320 -0.859 NA
> 2 -0.753 -0.041 -0.063  0.886
> 3  0.028 -0.256      NA  0.354
> 4 -0.086  0.475      NA  0.781
> 5  0.690 -0.181      NA  1.633
>
> OR
>
>       x1     x2     x3     x4
> 1  0.482  1.320     NA -0.142
> 2 -0.753 -0.041 -0.063  0.886
> 3  0.028 -0.256      NA  NA
> 4 -0.086  0.475  0.244  NA
> 5  0.690 -0.181  1.274  1.633
>
> OR
>
>       x1     x2     x3     x4
> 1  0.482  1.320 -0.859 -0.142
> 2 -0.753 -0.041 -0.063     NA
> 3  0.028 -0.256 -0.069     NA
> 4 -0.086  0.475  0.244     NA
> 5  0.690 -0.181  1.274     NA
>
> ETC. are all fine.
>
> Any ideas how I can do this?
> Chris
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Randomly drop a percent of data from a data.frame

Reply via email to