> Even if you fix that, gcc will only vectorize if you pass the > -ftree-vectorize option. And it will only vectorize code in loops.
Supporting straight-line code vectorization is in the works, but at first we'll look for such opportunities only in loops (i.e. exploit vector-parallelism within an iteration rather than only across iterations). So we'll be able to vectorize unrolled loops and such, but not the code example in question, if it's not enclosed in any loop. > And it unfortunately doesn't do a good job of using movups, so it will > mess around with checking the alignment. Yes, the way we handle unaligned stores is to peel the loop to make that access aligned. We do use the movups for the load though. This is just a random restriction that can easily be fixed - I have an old patch for misaligned stores sitting around - I'll just go ahead and send it. > And there isn't a good way > to specify alignment. > > I do see use of the vector instructions for this example > > > float *vector_add4f(float * __restrict va, float * __restrict vb) > { > int i; > > for (i = 0; i < 4; ++i) > va[i] += vb[i]; > return va; > } > > if I compile with -O2 -ftree-vectorize. Frankly the generated code is > really awful, and I wouldn't be surprised if it runs more slowly than > the non-vectorized code. If the va access is unaligned the vectorized code will definitely be slower than the original code, because the vector code will not be executed - just the peel-loop before the vector-loop (that tries to align the store) and the peel-loop after he vector-loop (for the remaining iterations). So we will just have executed extra ifs/branches and the code size will have increased. Probably even if the store is aligned it will not be much of a win for such a small trip count. I have a small patch that lets you specify the minimum number of vector iterations under which you don't want to allow vectorization. I'll go ahead and send that too. > This is evidently an area where the compiler > could use more work. > True, and indeed there are people who are currently looking into adding a cost model to the vectorizer. The patch I mentioned above would be a first (trivial) step towards that. dorit > Ian