Mike Stump <mikest...@comcast.net> writes: >>>> Silvermont processors have penalty for instructions having 4+ bytes of >>>> prefixes (including escape bytes in opcode). This situation happens >>>> when REX prefix is used in SSE4 instructions. This patch tries to >>>> avoid such situation by preferring xmm0-xmm7 usage over xmm8-xmm15 in >>>> those instructions. I achieved it by adding new tuning flag and new >>>> alternatives affected by tuning. >>> >>> Why make it a tuning flag? Shouldn't this help unconditionally for code >>> size everywhere? Or is there some drawback? >> >> I don't think it will make code smaller, if you already have some value in >> xmm8..xmm15 register, then by not allowing those registers directly on SSE4 >> insns just means it reloading and larger code. > > I can’t help but think a better way to do this is to explain the costs > of the REX registers as being more expensive, then let the register > allocator prefer the cheaper registers. You then leave them all as > valid, which I think is better than disappearing 1/2 of the registers.
Yes that would sound like a much better strategy. BTW I thought gcc already did that for the integer registers to avoid unnecessary prefixes, but maybe I misremember. Copying Honza. -Andi -- a...@linux.intel.com -- Speaking for myself only