Re: [R] Sampling letters

Dimitris Rizopoulos Tue, 04 Mar 2008 07:56:09 -0800

you could try a simple for() loop, e.g.

N <- 100
k <- 10


set.seed(12345)
mat <- matrix(sample(0:1, N * k, TRUE), N, k)
key <- sample(letters[1:4], k, TRUE)

out <- matrix("", N, k)
unq.key <- unique(key)
for (i in 1:k) {
    ind <- mat[, i] == 1
    out[ind, i] <- key[i]
    vals <- unq.key[!unq.key %in% key[i]]
    out[!ind, i] <- sample(vals, sum(!ind), TRUE)
}
out


I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm


----- Original Message ----- 
From: "Doran, Harold" <[EMAIL PROTECTED]>
To: <r-help@r-project.org>
Sent: Tuesday, March 04, 2008 4:33 PM
Subject: [R] Sampling letters


>I have a binary matrix of size N x 300. I then create the following:
>
>> set.seed(1234)
>> (key_file <- sample(letters[1:4], 300, replace=TRUE))
>  [1] "a" "c" "c" "c" "d" "c" "a" "a" "c" "c" "c" "c" "b" "d" "b" "d"
> "b" "b" "a" "a" "b" "b" "a"
> [24] "a" "a" "d" "c" "d" "d" "a" "b" "b" "b" "c" "a" "d" "a" "b" "d"
> "d" "c" "c" "b" "c" "b" "c"
> [47] "c" "b" "a" "d" "a" "b" "c" "c" "a" "c" "b" "d" "a" "d" "d" "a"
> "b" "a" "a" "c" "b" "c" "a"
> [70] "c" "a" "d" "a" "d" "a" "c" "b" "a" "b" "c" "d" "b" "a" "c" "a"
> "d" "b" "b" "a" "d" "a" "d"
> [93] "a" "a" "a" "c" "b" "a" "b" "c" "a" "c" "b" "a" "a" "b" "a" "a"
> "b" "a" "c" "a" "d" "a" "a"
> [116] "d" "d" "b" "a" "d" "c" "d" "d" "d" "b" "b" "b" "c" "b" "b" 
> "d"
> "c" "a" "b" "d" "b" "d" "c"
> [139] "c" "d" "c" "d" "b" "b" "b" "c" "c" "c" "d" "c" "b" "a" "a" 
> "d"
> "a" "d" "c" "d" "b" "c" "b"
> [162] "c" "b" "a" "a" "c" "b" "a" "d" "b" "d" "c" "c" "b" "b" "d" 
> "b"
> "c" "a" "b" "b" "b" "c" "a"
> [185] "d" "a" "d" "c" "b" "c" "c" "d" "a" "d" "d" "d" "d" "c" "d" 
> "c"
> "c" "c" "b" "d" "c" "c" "b"
> [208] "b" "a" "d" "c" "b" "a" "d" "c" "d" "c" "c" "b" "d" "b" "a" 
> "b"
> "b" "a" "b" "d" "b" "c" "b"
> [231] "d" "c" "a" "d" "c" "a" "c" "b" "b" "d" "b" "a" "a" "c" "d" 
> "b"
> "d" "c" "d" "d" "c" "c" "b"
> [254] "b" "a" "c" "b" "a" "c" "c" "d" "a" "c" "b" "a" "a" "c" "a" 
> "a"
> "c" "b" "d" "b" "d" "a" "c"
> [277] "d" "c" "b" "b" "b" "b" "d" "d" "c" "b" "b" "b" "c" "d" "c" 
> "b"
> "d" "a" "c" "d" "c" "a" "c"
> [300] "b"
>
> I now replace all 1's in column 1 with key_file[1], I replace all 
> 1's in
> column 2 with key_file[2] and so on through column 300. This part is
> simple.
>
> Now, I want to replace the 0's in column 1 with either b,c, or d, 
> but
> not with an a since that was used to replace the 1's. For column 2 I
> want to replace all 0's with either a,b, or d but not with c since 
> that
> was used to replace the 1's.
>
> However, I do not want all 0's in column 1 to be the same letter. 
> That
> is, I would not want them all to be replaced with a 'b'. Rather, I 
> want
> to randomly recode the 0's as either b,c, or d. So, some 0's will be
> recoded as b, some as c, and some as d.
>
> If I were replacing the zeros with the same letter, this would be a
> simple ifelse command. But, because I want randomness I'm not sure 
> how I
> can do this other than a costly loop than goes through the data 
> matrix
> cell-by-cell and does some replacement. That would be fine, but very
> time consuming.
>
> Does anyone have thoughts on how else I could tackle this?
>
> Harold
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Sampling letters

Reply via email to