------- Comment #8 from jaffe at broadinstitute dot org 2009-10-20 10:55 ------- Subject: Re: [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
Regarding comment #7, I just ran this now on a machine with 32 processors and 512 GB memory. (a) Sorting 4 x 10^9 ints took 0.9 minutes. (b) Sorting 5 x 10^9 ints took 16 minutes. The second test used about 40 GB, which is a small fraction of the available memory. (c) Sorting 2.5 x 10^9 structures having 2 ints each took 1.1 minutes. Regarding comment #6, repeating (a) and (b) with __gnu_parallel::balanced_quicksort_tag( ): (a') 6.3 minutes (b') 8.1 minutes, so the algorithm is slower on these data but does not exhibit the same jump in runtime. I also tried __gnu_parallel::quicksort_tag( ) which was about the same for (b) [(a) not tested]. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852