Hello, You probably want to disable this transformation when the number of iterations is predicted to be small, right?
Shouldn't dot product transform be predicated on -fassociative-math? Do you have a vision of a generalized pattern matcher to allow adding other routines easily? I'm curious what gap is between GCC's vectorizer output and fine-tuned BLAS libraries. [*] Or is the intention here to enable use of accelerated BLAS on HSA-like architectures? Or using BLAS when the vectorizer can't possibly match it (matmult -- but then again it's not easy to pattern-match in the first place; or non-trivial strides -- but what can a BLAS lib do in that case)? [*] The gap is definitely huge on something like ia64 (IIRC vectorization is not important there, but you need to unroll and schedule carefully), but I presume you're mostly interested in x86-64. GCC currently has a somewhat similar in spirit feature for the vectorizer -- -mveclibabi. Is it known how it is used in practice? Thanks. Alexander