------- Comment #6 from fang at csl dot cornell dot edu 2007-03-19 18:51 ------- Subject: Re: std::valarray should be annotated with OpenMP directives
> "bangerth at dealii dot org" <[EMAIL PROTECTED]> writes: > > | (In reply to comment #3) > | > I suspect that parallelizing for SSE/Altivec might be more peneficial > | > in most cases than for OpenMP -- OpenMP is a 1,000 pounds gorilla. > | > | I certainly agree. The beauty is that one may have both: SSE/Altivec/... if > | the template argument of std::valarray is float/double/int (in which case > one > | would have to have explicit specializations of the member functions), and > | OpenMP if it is anything else. > > on my single node AMD64 machine, I would prefer the compiler to > generate codes that takes advantage of SSE than launch OpenMP. On > the other hand, if I had multiple nodes, I might be contemplating > OpenMP for some of the valarray<double>s, so I'm not sure the issue is > that simply cut... Thinking out loud... Is there any interest/effort in placing vectorizable operations somewhere outside of valarray so that other STL algorithms/containers might be able to be able to leverage them? For example, I'd like to be able to use tr1/array on basic numeric types and have the benefits of valarray operations without having to first copy to a valarray, which uses heap-allocated memory. I'm imagining something like vectorize_traits that would check for the operation's vectorizability (std::plus) with the vectorizability of the value_type (_Integral). Then a subset of algorithms (<numeric> among others) would have additional level of template-wrapping to dispatch the appropriate __algorithm() based on vectorize_traits and iterator_traits. One issue however might be assumptions about the aliasing of input/output iterators... we're aware that many optimizations rely on non-aliasing assumptions, whereas the standard algorithms make no such assumptions (except valarray's ops). A run-time overlap check on random_access_iterators would incur a slight penalty. But yes, having STL take advantage of low-level acceleration through abstraction and compile-time polymorphism is a good thing, IMHO. Fang -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31000