Sorry for the delayed reply. On Mon, 2014-03-31 at 09:44 +0200, Christian Bruel wrote: > On 03/30/2014 11:02 PM, Oleg Endo wrote: > > Hi, > > > > On Wed, 2014-03-26 at 08:58 +0100, Christian Bruel wrote: > > > >> This patches adds a few instructions to the inlined builtin_strlen to > >> unroll the remaining bytes for word-at-a-time loop. This enables to have > >> 2 distinct execution paths (no fall-thru in the byte-at-a-time loop), > >> allowing block alignment assignation. This partially improves the > >> problem reported with by Oleg. in [Bug target/0539] New: [SH] builtin > >> string functions ignore loop and label alignment > > Actually, my original concern was the (mis)alignment of the 4 byte inner > > loop. AFAIR it's better for the SH pipeline if the first insn of a loop > > is 4 byte aligned. > > yes, this is why I haven't closed the PR. IMHO the problem is with the > non-aligned loop stems from to the generic alignment code in final.c. > changing branch frequencies is quite impacting to BB reordering as well. > Further tuning of static branch estimations, or tuning of the LOOP_ALIGN > macro is needed.
OK, I've updated PR 60539 accordingly. > Note that my branch estimations in this code is very > empirical, a dynamic profiling benchmarking would be nice as well. > My point was just that forcing a local .align in this code is a > workaround, as we should be able to rely on generic reordering/align > code for this. So the tuning of loop alignment is more global (and well > exhibited here indeed) I think that those two are separate issues. I've opened a new PR 60884 for this. Let's continue the discussions and experiments there. Cheers, Oleg