[Bug libstdc++/40852] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
--- Comment #2 from jaffe at broadinstitute dot org 2009-07-24 20:43 --- Subject: Re: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 If instead of sorting a vec, one sorts a vec, there is still a ten-fold slowdown, as one increases the vector size from 4 to 5 billion. So it's not the total amount of memory that matters, but rather the number of entries in the vector. I don't think this is about cache effects. Best, David rguenth at gcc dot gnu dot org wrote: > --- Comment #1 from rguenth at gcc dot gnu dot org 2009-07-24 20:29 > --- > I suppose you are running into cache effects. Why do you think this is a GCC > bug? > > -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852
[Bug libstdc++/40852] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
--- Comment #4 from jaffe at broadinstitute dot org 2009-07-24 21:20 --- Subject: Re: parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 Oh crap, yes I did, and now I see that I accidentally left off the first three lines of sort_test.cc. They are: #define _GLIBCXX_PARALLEL #include #include David === paolo dot carlini at oracle dot com wrote: > --- Comment #3 from paolo dot carlini at oracle dot com 2009-07-24 21:15 > --- > Out of curiosity, did you try parallel-mode on that machine? Basically, just > add -D_GLIBCXX_PARALLEL, but refer to the documentation of course: > > http://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html#manual.ext.parallel_mode.intro > > I'm also adding Johannes, in CC... > > Note, I don't think we have any specific issue with the normal, serial, > std::sort... > > -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852
[Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
--- Comment #8 from jaffe at broadinstitute dot org 2009-10-20 10:55 --- Subject: Re: [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 Regarding comment #7, I just ran this now on a machine with 32 processors and 512 GB memory. (a) Sorting 4 x 10^9 ints took 0.9 minutes. (b) Sorting 5 x 10^9 ints took 16 minutes. The second test used about 40 GB, which is a small fraction of the available memory. (c) Sorting 2.5 x 10^9 structures having 2 ints each took 1.1 minutes. Regarding comment #6, repeating (a) and (b) with __gnu_parallel::balanced_quicksort_tag( ): (a') 6.3 minutes (b') 8.1 minutes, so the algorithm is slower on these data but does not exhibit the same jump in runtime. I also tried __gnu_parallel::quicksort_tag( ) which was about the same for (b) [(a) not tested]. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852
[Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
--- Comment #15 from jaffe at broadinstitute dot org 2009-10-22 10:22 --- Subject: Re: [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 Wonderful! Thank you very much for fixing this problem. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852
[Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9
--- Comment #21 from jaffe at broadinstitute dot org 2009-10-27 09:45 --- Subject: Re: [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9 I tested the patch from comment #19, sorting X billion integers on a machine having 32 processors and 256 GB memory, X = 4, 6, ..., 26. The overall behavior is very close to linear. For example, X = 4 took 1.02 minutes, whereas X = 20 took 5.22 minutes. Very nice! -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852