`Research' involves looking at all the competitor methods, devising a near-optimal strategy and selecting amongst methods according to that strategy. It is not a quick fix we are looking for but something that will be good for the long term.
On Thu, 23 Jun 2005, Bo Peng wrote: >> We suggest that you take up your own suggestion, research this area and >> offer the R project the results of your research for consideration as your >> contribution. > > I implemented Walker's alias method and re-compiled R. Here is what > I did: > > 1. replace function ProcSampleReplace in R-2.1.0/src/main/random.c > with the following one > > static void ProbSampleReplace(int n, double *p, int *perm, int nans, int *ans) > { > /* allocate memory for a, p and HL */ > double * q = Calloc(n, double); > int * a = Calloc(n, int); > int * HL = Calloc(n, int); > int * H = HL; > int * L = HL+n-1; > int i, j, k; > double rU; /* U[0,1)*n */ > > /* set up alias table */ > /* initialize q with n*p0,...n*p_n-1 */ > for(i=0; i<n; ++i) > q[i] = p[i]*n; > > /* initialize a with indices */ > for(i=0; i<n; ++i) > a[i] = i; > > /* set up H and L */ > for(i=0; i<n; ++i) { > if( q[i] >= 1.) > *H++ = i; > else > *L-- = i; > } > > while( H != HL && L != HL+n-1) { > j = *(L+1); > k = *(H-1); > a[j] = k; > q[k] += q[j] - 1; > L++; /* remove j from L */ > if( q[k] < 1. ) { > *L-- = k; /* add k to L */ > --H; /* remove k */ > } > } > > /* generate sample */ > for (i = 0; i < nans; ++i) { > rU = unif_rand() * n; > > k = (int)(rU); > rU -= k; /* rU becomes rU-[rU] */ > > if( rU < q[k] ) > ans[i] = k+1; > else > ans[i] = a[k]+1; > } > Free(HL); > Free(a); > Free(q); > } > > 2. make and make install > > 3. test the new sample function by code like > >> b=sample(seq(1,100), prob=seq(1,100), replace=TRUE, size=1000000) >> table(b)/1000000*sum(seq(1,100)) > > 4. run the following code in current R 2.1.0 and updated R. > > for(prob in seq(1,4)){ > for(sample in seq(1,4)){ > x = seq(1:(10^prob)) # short to long x > p = abs(rnorm(length(x))) # prob vector > times = 10^(6-prob) # run shorter cases more times > Rprof(paste("sample_", prob, "_", sample, ".prof", sep='')) > for(t in seq(1,times)){ > sample(x, prob=p, size=10^sample, replace=TRUE ) > } > Rprof(NULL) > } > } > > Basically, I tried to test the performance of sample(replace=TRUE, prob=..) > with different length of x and size. > > 5. process the profiles and here is the result. > p: length of prob and x > size: size of sample > cell: execution time of old/updated sample() > > size\p 10 10^2 10^3 10^4 > 10 2.4/1.6 0.32/0.22 0.20/0.08 0.24/0.06 > 10^2 3.1/2.6 0.48/0.28 0.28/0.06 0.30/0.06 > 10^3 11.8/11.1 1.84/1.14 0.94/0.18 0.96/0.08 > 10^4 96.8/96.6 15.34/9.68 7.54/1.06 7.48/0.16 > run: 10000 1000 100 10 times > > We can see that the alias method is quicker than the linear search > method in all cases. The performance difference is greatest (>50 times) > when the original algorithm need to search in a long prob vector. > > I have not thoroughly tested the new function. I will do so if you > (the developers) think that this has the potential to be incorporated > into R. > > Thanks. > > Bo Peng > Department of Statistics > Rice University > > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel