[Bug libstdc++/40852] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9

2009-07-24 Thread jaffe at broadinstitute dot org


--- Comment #2 from jaffe at broadinstitute dot org  2009-07-24 20:43 
---
Subject: Re:  parallel sort run time increases ~10 fold
 when vector size gets over ~4*10^9

If instead of sorting a vec, one sorts a vec, there is still a
ten-fold
slowdown, as one increases the vector size from 4 to 5 billion.  So it's not
the total
amount of memory that matters, but rather the number of entries in the vector. 
I don't
think this is about cache effects.

Best,

David



rguenth at gcc dot gnu dot org wrote:
> --- Comment #1 from rguenth at gcc dot gnu dot org  2009-07-24 20:29 
> ---
> I suppose you are running into cache effects.  Why do you think this is a GCC
> bug?
> 
> 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852



[Bug libstdc++/40852] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9

2009-07-24 Thread jaffe at broadinstitute dot org


--- Comment #4 from jaffe at broadinstitute dot org  2009-07-24 21:20 
---
Subject: Re:  parallel sort run time increases ~10 fold
 when vector size gets over ~4*10^9

Oh crap, yes I did, and now I see that I accidentally left off the first three
lines of sort_test.cc.
They are:

#define _GLIBCXX_PARALLEL
#include 
#include 

David

===

paolo dot carlini at oracle dot com wrote:
> --- Comment #3 from paolo dot carlini at oracle dot com  2009-07-24 21:15 
> ---
> Out of curiosity, did you try parallel-mode on that machine? Basically, just
> add -D_GLIBCXX_PARALLEL, but refer to the documentation of course:
> 
> http://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html#manual.ext.parallel_mode.intro
> 
> I'm also adding Johannes, in CC...
> 
> Note, I don't think we have any specific issue with the normal, serial,
> std::sort...
> 
> 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852



[Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9

2009-10-20 Thread jaffe at broadinstitute dot org


--- Comment #8 from jaffe at broadinstitute dot org  2009-10-20 10:55 
---
Subject: Re:  [parallel-mode] parallel sort run time
 increases ~10 fold when vector size gets over ~4*10^9

Regarding comment #7, I just ran this now on a machine with 32 processors and
512 GB memory.

(a) Sorting 4 x 10^9 ints took 0.9 minutes.
(b) Sorting 5 x 10^9 ints took 16 minutes.

The second test used about 40 GB, which is a small fraction of the available
memory.

(c) Sorting 2.5 x 10^9 structures having 2 ints each took 1.1 minutes.

Regarding comment #6, repeating (a) and (b) with
__gnu_parallel::balanced_quicksort_tag( ):

(a') 6.3 minutes
(b') 8.1 minutes,

so the algorithm is slower on these data but does not exhibit the same jump in
runtime.
I also tried __gnu_parallel::quicksort_tag( ) which was about the same for (b)
[(a) not tested].


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852



[Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9

2009-10-22 Thread jaffe at broadinstitute dot org


--- Comment #15 from jaffe at broadinstitute dot org  2009-10-22 10:22 
---
Subject: Re:  [parallel-mode] parallel sort run time
 increases ~10 fold when vector size gets over ~4*10^9

Wonderful!  Thank you very much for fixing this problem.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852



[Bug libstdc++/40852] [parallel-mode] parallel sort run time increases ~10 fold when vector size gets over ~4*10^9

2009-10-27 Thread jaffe at broadinstitute dot org


--- Comment #21 from jaffe at broadinstitute dot org  2009-10-27 09:45 
---
Subject: Re:  [parallel-mode] parallel sort run time
 increases ~10 fold when vector size gets over ~4*10^9

I tested the patch from comment #19, sorting X billion integers on a machine
having
32 processors and 256 GB memory, X = 4, 6, ..., 26.  The overall behavior is
very
close to linear.  For example, X = 4 took 1.02 minutes, whereas X = 20 took
5.22
minutes.  Very nice!


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40852