I didn't describe the problem clearly. It's about the number of distinct
values. So just ignore cycle issue.
My tests were:
RNGkind(kind="Knuth-TAOCP");
sum(duplicated(runif(1e7))); #return 46552
RNGkind(kind="Knuth-TAOCP-2002");
sum(duplicated(runif(1e7))); #return 46415
#These collision frequency suggested there were 2^30 distinct values by
birthday problem.
RNGkind(kind="Marsaglia-Multicarry");
sum(duplicated(runif(1e7))); #return 11682
RNGkind(kind="Super-Duper");
sum(duplicated(runif(1e7))); #return 11542
RNGkind(kind="Mersenne-Twister");
sum(duplicated(runif(1e7))); #return 11656
#These indicated there were 2^32 distinct values, which agrees with the
help info.
RNGkind(kind="Wichmann-Hill");
sum(duplicated(runif(1e7))); #return 0
#So for this method, there should be more than 2^32 distinct values.
You may not get the exact numbers, but they should be close. So how to
explain above problem?
I need generate a large sample without any ties, it seems to me
"Wichmann-Hill" is only choice right now.
========================================
Shengqiao Li
The Department of Statistics
PO Box 6330
West Virginia University
Morgantown, WV 26506-6330
========================================
On Thu, 14 Aug 2008, Peter Dalgaard wrote:
Shengqiao Li wrote:
Hello all,
I am generating large samples of random numbers. The RNG help page says:
"All the supplied uniform generators return 32-bit integer values that are
converted to doubles, so they take at most 2^32 distinct values and long
runs will return duplicated values." But I find that the cycles are not the
same as the 32-bit integer.
My test indicated that the cycles for Knuth's methods were 2^30 while
Wichmann-Hill's cycle was larger than 2^32! No numbers were duplicated in
10M numbers generated by runif using Wichmann-Hill. The other three methods
had cycle length of 2^32.
So, anybody can explain this? And any improvement to the implementation can
be made to increase the cycle length like the Wichmann-Hill method?
What test? These are not simple linear congruential generators. Just because
you get the same value twice, it doesn't mean that the sequence is repeating.
Perhaps you should read the entire help page rather than just the note.
--
O__ ---- Peter Dalgaard Ă˜ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.