On 15 Jul 10:42, Uros Bizjak wrote:
> On Tue, Jul 15, 2014 at 10:25 AM, Ilya Enkovich <[email protected]>
> wrote:
>
> >>>>> Also fully restrict xmm8-15 does not seem right. It is just costly
> >>>>> but not fully disallowed.
> >>>>
> >>>> As said earlier, you can try "Ya*x" as a constraint.
> >>>
> >>> I tried it. It does not seem to affect allocation much. I do not see
> >>> any gain on targeted tests.
> >>
> >> Strange, because the documentation claims:
> >>
> >> '*'
> >> Says that the following character should be ignored when choosing
> >> register preferences. '*' has no effect on the meaning of the
> >> constraint as a constraint, and no effect on reloading. For LRA
> >> '*' additionally disparages slightly the alternative if the
> >> following character matches the operand.
> >>
> >> Let me rethink this a bit. Prehaps we could reconsider Jakub's
> >> proposal with "Ya,!x" (with two alternatives). IIRC this approach was
> >> needed for some MMX alternatives, where we didn't want RA to allocate
> >> a MMX register when the value could be passed in integer regs, but the
> >> value was still allowed in MMX register.
> >
> > That's is what my patch already does, but with '?' instead of '!'.
>
> Yes, I know. The problem is, that Ya*x type conditional allocation
> worked OK in the past for "not preferred, but still alowed regclass"
> registers, There are several patterns in i386.md that live by this
> premise, including movsf_internal and movdf_internal. If this approach
> doesn't work anymore, then we have to either figure out what is the
> reason, or invent a new strategy that will be applicable to all cases.
>
> Can you please post a small test that illustrates the case where Ya,!x
> works, but Ya*x doesn't?
It's hard to compose a small testcase which will have SSE4 instructions
generated with required register usage. I use tcpjumbo test from TCPmark for
initial check of how my patch works. This test has a lot of pmovzxwd
instructions generated and many of them use xmm8-15. I tried two versions of a
simple patch which modifies only pmovzxwd instruction.
Patch1:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d907353..6b03b72 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -11852,10 +11852,10 @@
(set_attr "mode" "OI")])
(define_insn "sse4_1_<code>v4hiv4si2"
- [(set (match_operand:V4SI 0 "register_operand" "=x")
+ [(set (match_operand:V4SI 0 "register_operand" "=Yr,!x")
(any_extend:V4SI
(vec_select:V4HI
- (match_operand:V8HI 1 "nonimmediate_operand" "xm")
+ (match_operand:V8HI 1 "nonimmediate_operand" "Yr,!xm")
(parallel [(const_int 0) (const_int 1)
(const_int 2) (const_int 3)]))))]
"TARGET_SSE4_1"
Patch2:
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index d907353..b3721c4 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -11852,10 +11852,10 @@
(set_attr "mode" "OI")])
(define_insn "sse4_1_<code>v4hiv4si2"
- [(set (match_operand:V4SI 0 "register_operand" "=x")
+ [(set (match_operand:V4SI 0 "register_operand" "=Yr*x")
(any_extend:V4SI
(vec_select:V4HI
- (match_operand:V8HI 1 "nonimmediate_operand" "xm")
+ (match_operand:V8HI 1 "nonimmediate_operand" "Yr*xm")
(parallel [(const_int 0) (const_int 1)
(const_int 2) (const_int 3)]))))]
"TARGET_SSE4_1"
Here are results of looking for pmovzxwd in resulting binaries:
#objdump -d tcpjumbo-orig | grep pmovzxwd | grep
"xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
76
#objdump -d tcpjumbo-patch1 | grep pmovzxwd | grep
"xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
0
#objdump -d tcpjumbo-patch2 | grep pmovzxwd | grep
"xmm8\|xmm9\|xmm10\|xmm11\|xmm12\|xmm13\|xmm14\|xmm15" | wc -l
76
Therefore I make a conclusion that Yr*x does not really differ much from x.
Thanks,
Ilya
>
> I have added Vlad to CC for his opinion on this matter. This is
> clearly a RA issue that has to be resolved before your patch is
> committed.
>
> Thanks,
> Uros.