"Singh, Ravi Kumar (Ravi)" <ravi.si...@lsi.com> wrote: > None of the generated code contains the NEON instructions. Code > generated with case 1 is taking 3000 cycles, and code generated by > option 2 is taking 2500 cycles. > > Even if vectorization failed in case1, it should not generate more > inefficient code than case 2. My belief was that the executables > from both would take same cycles, any thing done for doing > unsuccessful vectorization must be reverted if it did not succeed.
I suspect the reason vectorization fails is the direct reference to the loop counter in the inner loop: index = j; After vectorization, the loop counter is no longer available, so code that accesses is as in your example usually cannot be automatically vectorized. As to why -ftree-vectorize still generates different code, that is probably because the flag actually enables two other optimizations that are distinct from the vectorizer, but usually enable it to do a better job: if-conversion and store-sinking. I suspect in your test case, if-conversion actually transforms the if in the inner loop. However, if the result is then still not vectorizable, that transformation might happen to be a net loss ... You can switch off those extra transformations while still enabling vectorization using something like: -ftree-vectorize -fno-tree-if-conversion --param max-stores-to-sink=0 (Note that this might cause some loops that would otherwise have been vectorized to become non-vectorizable ...) Mit freundlichen Gruessen / Best Regards Ulrich Weigand -- Dr. Ulrich Weigand | Phone: +49-7031/16-3727 STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E. IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz | Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht Stuttgart, HRB 243294 _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain