Hi everyone,
  I'm trying to autovectorize the loop, and Thank you for the omnipotent 
macros, everything goes alright. But recently I need to further optimize the 
loop, I had some problems.
  As our vector instruction can process 16 numbers at the same time, if the for 
loop counter is equal or larger than 16, the loop will be autovectorized. For 
example:
  for (int i = 0; i <16; i++) c[i] = a[i] + b[i];
  will goes to:
  vld v0, a0
  vld v1, a1
  vadd v0,v0,v1
  vfst v0, a2
  And if I wrote code like: for (int i = 0; i <15; i++) c[i] = a[i] + b[i]; the 
autovectorization will miss it. But we got a instruction "vlen", which can 
change the length of the vector operation, and I wish to generate the assembler 
like this when the loop counter is 15:
  vlen 15
  vld v0, a0
  vld v1, a1
  vadd v0,v0,v1
  vfst v0, a2
  What should I do to achieve this goal? I've tried to "define 
TARGET_HAVE_DOLOOP_BEGIN" and define_expand "doloop_begin". and the 
"doloop_begin" won't be called. Is there any other way? and If the loop counter 
is bigger than 16 like 30,31 or just a varable, what should I do with "vlen". 
Any hint would be helpful. Thank you very much.

Reply via email to