Hi everyone,
I'm trying to autovectorize the loop, and Thank you for the omnipotent
macros, everything goes alright. But recently I need to further optimize the
loop, I had some problems.
As our vector instruction can process 16 numbers at the same time, if the for
loop counter is equal or larger than 16, the loop will be autovectorized. For
example:
for (int i = 0; i <16; i++) c[i] = a[i] + b[i];
will goes to:
vld v0, a0
vld v1, a1
vadd v0,v0,v1
vfst v0, a2
And if I wrote code like: for (int i = 0; i <15; i++) c[i] = a[i] + b[i]; the
autovectorization will miss it. But we got a instruction "vlen", which can
change the length of the vector operation, and I wish to generate the assembler
like this when the loop counter is 15:
vlen 15
vld v0, a0
vld v1, a1
vadd v0,v0,v1
vfst v0, a2
What should I do to achieve this goal? I've tried to "define
TARGET_HAVE_DOLOOP_BEGIN" and define_expand "doloop_begin". and the
"doloop_begin" won't be called. Is there any other way? and If the loop counter
is bigger than 16 like 30,31 or just a varable, what should I do with "vlen".
Any hint would be helpful. Thank you very much.