Hi,

I'm porting a gcc backend (4.6.1) for a 16-bit MCU with PSI pmode, and
SI ptr_mode.

I have a QoR problem with loops: the chosen IVs are often not good.
I looked at tree-ssa-loop-ivopts.c but it is hard to understand that
code. So sorry if my questions are a bit confused but I would like to
understand what happens.

First of all, I checked many times and the rtx_cost function is right.

It seems that the choice of IVs is done according to the cost of IV
candidates themselves, but also their uses, register pressure (...) so
that it is difficult for me to understand why a candidate is preferred
from another one.
But what I "feel" is that gcc tries to use "important" candidates to
satisfy all uses. For example in a simple copy from an int array to
another ( for (i=0; i<N; i++) M1[i] = M2[i]; ), the i is extended to SI
(ptr_mode), addresses are computed in SImode from i, and then truncated
into PSImode. When modifing the code so that the IV is explicited as a
pointer (ex: for (ptr1=M1; ptr1<XXX;) *ptr1++=*ptr2++;) the code can be
reduced by 20%.

Moreover, in loop intensive computations, setting the
iv-max-considered-uses=2 (so preventing optimization on complex loops)
can make code size much much better (in Os), until 30% reduction! So it
seems that, in such test cases, trying to optimize loops is worst than
doing nothing.


Here are my questions:

- Is there a probable explanation for such behaviors when optimizing loops?

- Is there a document (other than gccint) describing loops and their
optimization?

- It seems that keeping computations and IVs in PSI is often preferable,
but there is no Pmode in tree representation, right? So when/where is
the choice for the mode around pointer operations made (ptr_mode vs Pmode) ?

- PSImode is only used in very few backends as Pmode (m32c). Is its use
really optimized from middle-end algorithms/heuristics ?

- Looking at the code, it seems there are different sets of IVs, for
instance in find_optimal_ivs_set with origset and set. Sometimes,
forcing one (often origset) generates better code. But what is the
difference between origset and set ?

- And finally, is there something I can do from the back-end to make
loop code better?


Thank you by advance!
Aurélien

Reply via email to