Hi All,Figured I'd put my two cents in here as the welch-lab's LIGER package currently uses mann-whitney on datasets much larger than m = 200. Our current version uses a modified PRESTO (https://github.com/immunogenomics/presto) implementation over the inbuilt tests because of the lack of scaling. I stumbled into this thread while working on some improvements for it and would like to make it known that there is absolutely an audience for the high-member use-case.
Best, -Andrew Robbins On 1/17/2024 5:55 AM, Andreas Löffler wrote:
Performance statistics are interesting. If we assume the two populations have a total of `m` members, then this implementation runs slightly slower for m < 20, and much slower for 50 < m < 100. However, this implementation works significantly *faster* for m > 200. The breakpoint is precisely when each population has a size of 50; `qwilcox(0.5,50,50)` runs in 8 microseconds in the current version, but `qwilcox(0.5, 50, 51)` runs in 5 milliseconds. The new version runs in roughly 1 millisecond for both. This is probably because of internal logic that requires many more `free/calloc` calls if either population is larger than `WILCOX_MAX`, which is set to 50.Also because cwilcox_sigma has to be evaluated, and this is slightly more demanding since it uses k%d. There is a tradeoff here between memory usage and time of execution. I am not a heavy user of the U test but I think the typical use case does not involve several hundreds of tests in a session so execution time (my 2 cents) is less important. But if R crashes one execution is already problematic. But the takeaway is probably: we should implement both approaches in the code and leave it to the user which one she prefers. If time is important and memory not an issue and if m, n are low go for the "traditional approach". Otherwise, use my formula? PS (@Aidan): I have applied for an bugzilla account two days ago and heard not back from them. Also Spam is empty. Is that ok or shall I do something? [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Andrew Robbins Systems Analyst, Welch Lab<https://welch-lab.github.io> University of Michigan Department of Computational Medicine and Bioinformatics
OpenPGP_signature.asc
Description: OpenPGP digital signature
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel