Hi, I'm porting a gcc backend (4.6.1) for a 16-bit MCU with PSI pmode, and SI ptr_mode.
I have a QoR problem with loops: the chosen IVs are often not good. I looked at tree-ssa-loop-ivopts.c but it is hard to understand that code. So sorry if my questions are a bit confused but I would like to understand what happens. First of all, I checked many times and the rtx_cost function is right. It seems that the choice of IVs is done according to the cost of IV candidates themselves, but also their uses, register pressure (...) so that it is difficult for me to understand why a candidate is preferred from another one. But what I "feel" is that gcc tries to use "important" candidates to satisfy all uses. For example in a simple copy from an int array to another ( for (i=0; i<N; i++) M1[i] = M2[i]; ), the i is extended to SI (ptr_mode), addresses are computed in SImode from i, and then truncated into PSImode. When modifing the code so that the IV is explicited as a pointer (ex: for (ptr1=M1; ptr1<XXX;) *ptr1++=*ptr2++;) the code can be reduced by 20%. Moreover, in loop intensive computations, setting the iv-max-considered-uses=2 (so preventing optimization on complex loops) can make code size much much better (in Os), until 30% reduction! So it seems that, in such test cases, trying to optimize loops is worst than doing nothing. Here are my questions: - Is there a probable explanation for such behaviors when optimizing loops? - Is there a document (other than gccint) describing loops and their optimization? - It seems that keeping computations and IVs in PSI is often preferable, but there is no Pmode in tree representation, right? So when/where is the choice for the mode around pointer operations made (ptr_mode vs Pmode) ? - PSImode is only used in very few backends as Pmode (m32c). Is its use really optimized from middle-end algorithms/heuristics ? - Looking at the code, it seems there are different sets of IVs, for instance in find_optimal_ivs_set with origset and set. Sometimes, forcing one (often origset) generates better code. But what is the difference between origset and set ? - And finally, is there something I can do from the back-end to make loop code better? Thank you by advance! Aurélien