Hello,

You probably want to disable this transformation when the number of iterations
is predicted to be small, right?

Shouldn't dot product transform be predicated on -fassociative-math?

Do you have a vision of a generalized pattern matcher to allow adding other
routines easily?

I'm curious what gap is between GCC's vectorizer output and fine-tuned BLAS
libraries. [*] Or is the intention here to enable use of accelerated BLAS on
HSA-like architectures?  Or using BLAS when the vectorizer can't possibly
match it (matmult -- but then again it's not easy to pattern-match in the
first place; or non-trivial strides -- but what can a BLAS lib do in that
case)?

  [*] The gap is definitely huge on something like ia64 (IIRC vectorization 
  is not important there, but you need to unroll and schedule carefully), but
  I presume you're mostly interested in x86-64.

GCC currently has a somewhat similar in spirit feature for the vectorizer --
-mveclibabi.  Is it known how it is used in practice?

Thanks.

Alexander

Reply via email to