Thanks everyone for your help and advice. For the R-help archives, here is
what I ended up doing.
First creating a separate function to handle one day at a time -
byrow.gen2 <- function(genmat,rownum,use1,use2,num,ortho_obs_used){
prev = rownum-1
ran = runif(length(rownum),0,1)
if(genmat[rownum,use1]==0 & genmat[rownum,use2]==0 & genmat[prev,num]==0) {
if(ran<ortho_obs_used$Pr[1]){ genmat[rownum,num] = 1 }else{
genmat[rownum,num] = 0}
}
if(genmat[rownum,use1]==0 & genmat[rownum,use2]==0 & genmat[prev,num]==1) {
if(ran<ortho_obs_used$Pr[4]){ genmat[rownum,num] = 1 }else{
genmat[rownum,num] = 0}
}
if(genmat[rownum,use1]==0 & genmat[rownum,use2]==1 & genmat[prev,num]==0) {
if(ran<ortho_obs_used$Pr[2]){ genmat[rownum,num] = 1 }else{
genmat[rownum,num] = 0}
}
if(genmat[rownum,use1]==0 & genmat[rownum,use2]==1 & genmat[prev,num]==1) {
if(ran<ortho_obs_used$Pr[5]){ genmat[rownum,num] = 1 }else{
genmat[rownum,num] = 0}
}
if(genmat[rownum,use1]==1 & genmat[rownum,use2]==0 & genmat[prev,num]==0) {
if(ran<ortho_obs_used$Pr[3]){ genmat[rownum,num] = 1 }else{
genmat[rownum,num] = 0}
}
if(genmat[rownum,use1]==1 & genmat[rownum,use2]==0 & genmat[prev,num]==1) {
if(ran<ortho_obs_used$Pr[7]){ genmat[rownum,num] = 1 }else{
genmat[rownum,num] = 0}
}
if(genmat[rownum,use1]==1 & genmat[rownum,use2]==1 & genmat[prev,num]==0) {
if(ran<ortho_obs_used$Pr[6]){ genmat[rownum,num] = 1 }else{
genmat[rownum,num] = 0}
}
if(genmat[rownum,use1]==1 & genmat[rownum,use2]==1 & genmat[prev,num]==1) {
if(ran<ortho_obs_used$Pr[8]){ genmat[rownum,num] = 1 }else{
genmat[rownum,num] = 0}
}
genmat
}
Then applying the foreach package in the original function
event.gen3 = function(genmat,use1,use2,num,ortho_obs_used){
rownum = 2:nrow(genmat)
test = foreach(r=iter(rownum,by='row')) %dopar% { genmat =
byrow.gen2(genmat,r,use1,use2,num,ortho_obs_used) }
rm(test)
genmat
}
The final results were exactly as I needed them to be in my initial post,
but the processing time dropped from 2 seconds per station to 0.05 seconds
per station.
Thanks to everyone for giving me the advice and the idea to try this!
Adrienne
On Thu, Aug 12, 2010 at 8:15 AM, Adrienne Wootten <[email protected]> wrote:
> Not quite what I was trying to say. The process generates a random uniform
> number between 0 and 1 and compares to a specific conditional probability.
> It is looking for this in particular:
>
> random number < Pr( rain(station=i,day=d)=1 | rain(station=i,day=d-1)=0 &
> rain(station=j,day=d)=0 & rain(station=k,day=d)=0)
>
> In this particular example, if the random number is less than the
> probability the value for station i and day d will be given as 1, otherwise
> it will be zero.
>
> There are 8 possible combinations. i is the station to be generated, j and
> k are the two stations most strongly correlated with station i. Stations j
> and k have already been generated in the example I gave previously. So I
> want to know given what is going on at stations j and k during day d and at
> station i for day d-1 if the value for station i day d will be 1 or 0.
>
> Hope this provides some clarification.
> A
>
>
> On Thu, Aug 12, 2010 at 3:21 AM, Petr PIKAL <[email protected]>wrote:
>
>> Hi
>>
>> without toy example it is rather complicated to check your function. So
>> only few remarks:
>>
>> Instead of generating 1 random number inside a loop generate whole vector
>> of random numbers outside a loop and choose a number
>>
>> Do not mix ifelse with if. ifelse is intended to work with whole vector.
>>
>> Work with matrices instead of data frames whenever possible if speed is an
>> issue.
>>
>> If I understand correctly you want to put 1 or 0 into one column based on:
>>
>> previous value in the same column
>> comparison of some random number with predefined probabilities in vector
>> of 6 values
>>
>> So here is vectorised version of your 4 ifs based on assumption
>>
>> 0 in col1 0 in col 2 = 5
>> 0 in col1 1 in col 2 = 9
>> 1 in col1 0 in col 2 = 6
>> 1 in col1 1 in col 2 =10
>>
>>
>> col1<-sample(1:2, 20, replace=T)
>> col2<-sample(c(4,8), 20, replace=T)
>>
>> col1+col2
>> [1] 5 6 9 6 6 5 9 10 9 9 6 9 10 6 10 9 10 9 5 5
>> cols<-as.numeric(as.factor(col1+col2))
>>
>> cols
>> [1] 1 2 3 2 2 1 3 4 3 3 2 3 4 2 4 3 4 3 1 1
>>
>>
>> And here is computed comparison of six values p (ortho obs used) with 20
>> generated random values
>>
>> ran<-runif(20)
>> p<-runif(8)
>> comparison <- outer(ran,p, "<")
>> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
>> [1,] FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
>> [2,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
>> [3,] FALSE TRUE FALSE TRUE FALSE TRUE TRUE FALSE
>> [4,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
>> [5,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
>> [6,] FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE
>> [7,] FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE
>> [8,] FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
>> [9,] FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
>> [10,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
>> [11,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
>> [12,] FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
>> [13,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
>> [14,] FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
>> [15,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
>> [16,] FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
>> [17,] FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE
>> [18,] FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
>> [19,] FALSE TRUE FALSE TRUE TRUE TRUE TRUE TRUE
>> [20,] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
>>
>>
>> Now the only what you need to put in loop is to select appropriate column
>> from matrix comparison based on value on cols vector and 0 or 1 in
>> previous row of station column.
>>
>> Something like (untested)
>>
>> gen.log<-rep(NA, nrow(genmat)-1)
>>
>> for (i in 2:nrow(genmat)) {
>>
>> gen.log[i] <- if( genmat[i-1, num] ==0) comparison[i, cols[i]] else
>> comparison[i,cols[i+5]]
>>
>> }
>>
>> genmat[2:nrow(genmat), num] <- gen.log*1
>>
>> Regards
>> Petr
>>
>>
>> [email protected] napsal dne 11.08.2010 18:35:37:
>>
>> > Hello Everyone!
>> >
>> > Here's what I'm trying to do. I'm working on generating occurrences of
>> > precipitation based upon precipitation occurrence for a station during
>> the
>> > previous day and two stations that have already been generated by joint
>> > probablities and 1st order Markov chains or by the same generation
>> process.
>> > This has to be done for each remaining stations for each month.
>> >
>> > > genmat # 7 stations in this example, line_before is the climatology of
>> the
>> > last day of the previous month. Stations 4 and 6 have been generated
>> already
>> > in this example
>> > [,1] [,2] [,3] [,4] [,5] [,6] [,7]
>> > line_before 1 1 1 0 1 1 1
>> > NA NA NA 1 NA 0 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 1 NA 0 NA
>> > NA NA NA 1 NA 1 NA
>> > NA NA NA 1 NA 1 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 1 NA 1 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 1 NA 1 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 1 NA 1 NA
>> > NA NA NA 1 NA 1 NA
>> > NA NA NA 1 NA 1 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 0 NA 1 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 0 NA 0 NA
>> > NA NA NA 1 NA 1 NA
>> > NA NA NA 1 NA 1 NA
>> > NA NA NA 1 NA 1 NA
>> > NA NA NA 0 NA 0 NA
>> > > num # station to generate
>> > [1] 2
>> > > use1 # 1st station to use in generation
>> > [1] 6
>> > > use2 # 2nd station to use in generation
>> > [1] 4
>> >
>> > > genmat = event.gen2(genmat,use1,use2,num,ortho_obs_used) # Generation
>> > function shown below
>> > > genmat # genmat - after it has gone through station 2
>> > [,1] [,2] [,3] [,4] [,5] [,6] [,7]
>> > line_before 1 1 1 0 1 1 1
>> > NA 0 NA 1 NA 0 NA
>> > NA 0 NA 0 NA 0 NA
>> > NA 0 NA 0 NA 0 NA
>> > NA 0 NA 0 NA 0 NA
>> > NA 0 NA 0 NA 0 NA
>> > NA 0 NA 0 NA 0 NA
>> > NA 0 NA 0 NA 0 NA
>> > NA 0 NA 1 NA 0 NA
>> > NA 1 NA 1 NA 1 NA
>> > NA 1 NA 1 NA 1 NA
>> > NA 1 NA 0 NA 0 NA
>> > NA 0 NA 0 NA 0 NA
>> > NA 1 NA 1 NA 1 NA
>> > NA 0 NA 0 NA 0 NA
>> > NA 0 NA 0 NA 0 NA
>> > NA 0 NA 0 NA 0 NA
>> > NA 0 NA 0 NA 0 NA
>> > NA 0 NA 0 NA 0 NA
>> > NA 1 NA 1 NA 1 NA
>> > NA 0 NA 0 NA 0 NA
>> > NA 1 NA 1 NA 1 NA
>> > NA 0 NA 1 NA 1 NA
>> > NA 1 NA 1 NA 1 NA
>> > NA 0 NA 0 NA 0 NA
>> > NA 1 NA 0 NA 1 NA
>> > NA 0 NA 0 NA 0 NA
>> > NA 0 NA 0 NA 0 NA
>> > NA 1 NA 1 NA 1 NA
>> > NA 1 NA 1 NA 1 NA
>> > NA 1 NA 1 NA 1 NA
>> > NA 0 NA 0 NA 0 NA
>> >
>> > Where event.gen2 is this function:
>> >
>> > event.gen2 = function(genmat,use1,use2,num,ortho_obs_used){
>> >
>> > for(r in 2:nrow(genmat)){
>> >
>> > ran = runif(1,0,1)
>> >
>> > if(genmat[r,use1]==0 & genmat[r,use2]==0){
>> >
>>
>> genmat[r,num]<-ifelse(genmat[r-1,num]==0,ifelse(ran<ortho_obs_used$Pr[1],1,
>> > 0),ifelse(ran<ortho_obs_used$Pr[4],1,0))
>> > }
>> >
>> > if(genmat[r,use1]==0 & genmat[r,use2]==1){
>> >
>>
>> genmat[r,num]<-ifelse(genmat[r-1,num]==0,ifelse(ran<ortho_obs_used$Pr[2],1,
>> > 0),ifelse(ran<ortho_obs_used$Pr[5],1,0))
>> > }
>> >
>> > if(genmat[r,use1]==1 & genmat[r,use2]==0){
>> >
>>
>> genmat[r,num]<-ifelse(genmat[r-1,num]==0,ifelse(ran<ortho_obs_used$Pr[3],1,
>> > 0),ifelse(ran<ortho_obs_used$Pr[7],1,0))
>> > }
>> >
>> > if(genmat[r,use1]==1 & genmat[r,use2]==1){
>> >
>>
>> genmat[r,num]<-ifelse(genmat[r-1,num]==0,ifelse(ran<ortho_obs_used$Pr[6],1,
>> > 0),ifelse(ran<ortho_obs_used$Pr[8],1,0))
>> > }
>> >
>> > gc()
>> > }
>> >
>> > genmat
>> >
>> > }
>> >
>> > ####
>> >
>> > ortho_obs_used is a data frame that contains the probablity of
>> precipitation
>> > occurring on a given day for a specific set of condtions.
>> > For instance ortho_obs_used$Pr[1] is the probablity of rain at station s
>> for
>> > day d, given that there was no rain at station s for day d-1 and no rain
>> at
>> > either of the other two stations for day d.
>> >
>> > The event.gen2 function handles the generation, and it runs quickly for
>> the
>> > 5 remaining stations and one month, but I have to run this for 317
>> stations
>> > over 48 months or more, and it becomes a really bad bottleneck. So what
>> I'd
>> > like to know is if there is anyway that I can re-write this function to
>> work
>> > without a loop. I couldn't find anything from previous posts about
>> getting
>> > out of loops where the previous iteration is required to determine the
>> next
>> > calculation.
>> >
>> > Sorry for the length of the post, but I thought it best to try to
>> explain
>> > what I was doing first, before diving into my question
>> >
>> > Thanks in advance!
>> >
>> > Adrienne Wootten
>> > Graduate Research Assistant/Environmental Meteorologist
>> > M.S. Atmospheric Science
>> > NC State University
>> > State Climate Office of North Carolina
>> > Raleigh, NC 27695
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > [email protected] mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.