Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-13 Thread Richard Earnshaw
On 12/08/14 18:03, Kyrill Tkachov wrote:
> 
> On 12/08/14 16:16, Kyrill Tkachov wrote:
>> On 12/08/14 16:11, Kyrill Tkachov wrote:
>>> On 12/08/14 15:22, Jeff Law wrote:
 On 08/12/14 04:31, Kyrill Tkachov wrote:
> On 12/08/14 10:39, Richard Biener wrote:
>> On Mon, Aug 11, 2014 at 9:56 PM, Jeff Law  wrote:
>>> On 08/11/14 07:41, Kyrill Tkachov wrote:
 I haven't been able to get combine to match the comparison+xor+neg+plus
 RTL and it seems like it would be just a workaround to undo the
 tree-level transformation.
>>> Yea, it'd just be a workaround, but it's probably the easiest way to
>>> deal
>>> with this problem.  Can you describe in further detail why you
>>> weren't able
>>> to get this to work?
>> Too many instructions to combine I guess.  You might want to add
>> intermediate "combine" insn-and-splits.  If that's still a no-go then
>> read on.
 My guess was too many insns as well..  But that's often solvable.

> From the combine dump I can see that it tried to combine up to:
> (set (reg/i:SI 0 x0)
> (plus:SI (xor:SI (neg:SI (reg:SI 84 [ D.2565 ]))
> (reg:SI 73 [ D.2564 ]))
> (reg:SI 84 [ D.2565 ])))
 And did it find a match for this?   What happens if (just for testing
 purposes), you create a pattern for this?  Does combine then try
 something even more complex, possibly getting your conditional negation?
>>> I managed to get combine to recognise this pattern:
>>> (set (match_operand:GPI 0 "register_operand" "=r")
>>>(plus:GPI (xor:GPI (neg:GPI (match_operand:GPI 1
>>> "register_operand" "r"))
>>>   (match_operand:GPI 2 "register_operand" "r"))
>>>  (match_dup 1)))
>>>
>>> Now what I need is for operand 1 to instead be a cc_reg comparison, but
>>> when I change operand 1 to this pattern:
>>>
>>> (set (match_operand:GPI 0 "register_operand" "=r")
>>>(plus:GPI (xor:GPI (neg:GPI (match_operator:GPI 1
>>> "aarch64_comparison_operator"
>>>[(match_operand:CC 2 "cc_register" "")
>>> (const_int 0)]))
>>>   (match_operand:GPI 3 "register_operand" "r"))
>>>  (match_dup 1)))
>>>
>>> This doesn't match. Is there any way to express this in a combineable
>>> pattern?
>> argh, Thunderbird enforced the 80 character limit...
>>
>> The pattern that matched was:
>> (set (match_operand:GPI 0 "register_operand" "=r")
>>   (plus:GPI
>> (xor:GPI
>>   (neg:GPI (match_operand:GPI 1 "register_operand" "r"))
>>   (match_operand:GPI 2 "register_operand" "r"))
>> (match_dup 1)))
> 
> If I match that to a define_and_split and split it into two sets:
> (set (reg1) (neg reg2))
> (set (plus (xor (reg1) (reg3))
>  (reg2)))
> 
> and then add a define_insn with the full pattern:
> 
> (set (match_operand:GPI 0 "register_operand" "=r")
>   (plus:GPI
> (xor:GPI
>   (neg:GPI
> (match_operator:GPI 1 "aarch64_comparison_operator"
>   [(match_operand:CC 2 "cc_register" "")
>  (const_int 0)]))
> (match_operand:GPI 3 "register_operand" "r"))
> (match_dup 1)))
> 
> Then this does manage to match the full thing that I want. But I had to 
> add another define_split to break up the plus+xor Frankenstein's monster 
> back into two separate insns for the cases where it picks up 
> non-conditional-negate patterns.
> 
> Does that seem reasonable or too much of a hack? Any plus+xor rtxs that 
> get left after combine should be split up again relatively quickly in 
> split1 and shouldn't inhibit further optimisation too badly, no?
> 
> 

The problem with the frankenmonster patterns is that they tend to
proliferate into the machine description, and before you know where you
are the back-end is full of them.

Furthermore, they are very sensitive to the greedy first-match nature of
combine: a better, later, combination is missed because a less good,
earlier, optimization matched.  If the first insn in the sequence is
merged into an earlier instruction then you can end up with a junk
sequence that completely fails to simplify.  That ends up with
super-frankenmonster patterns to deal with all the subcases and the
problems grow exponentially from there.

I really do think that the best solution would be to try and catch this
during expand if possible and generate the right pattern from the start;
then you don't risk combine failing to come to the rescue after several
intermediate transformations have taken place.

> Kyrill
>>
>> And the one that failed to combine (with operand 1 substituted for a
>> cc_reg comparison) was:
>>
>> (set (match_operand:GPI 0 "register_operand" "=r")
>>   (plus:GPI
>> (xor:GPI
>>   (neg:GPI
>> (match

Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-13 Thread Segher Boessenkool
On Wed, Aug 13, 2014 at 03:57:31PM +0100, Richard Earnshaw wrote:
> The problem with the frankenmonster patterns is that they tend to
> proliferate into the machine description, and before you know where you
> are the back-end is full of them.
> 
> Furthermore, they are very sensitive to the greedy first-match nature of
> combine: a better, later, combination is missed because a less good,
> earlier, optimization matched.  If the first insn in the sequence is
> merged into an earlier instruction then you can end up with a junk
> sequence that completely fails to simplify.  That ends up with
> super-frankenmonster patterns to deal with all the subcases and the
> problems grow exponentially from there.

Right.  Of course, combine should be fixed, yadda yadda.

> I really do think that the best solution would be to try and catch this
> during expand if possible and generate the right pattern from the start;
> then you don't risk combine failing to come to the rescue after several
> intermediate transformations have taken place.

I think ssa-phiopt should simply not do this obfuscation at all.  Without
it, RTL ifcvt picks it up just fine on targets with conditional assignment
instructions.  I agree on targets without expand should do a better job
(also for more generic conditional assignment).

Instruction selection belongs in RTL land.


Segher


Re: LTO bootstrap compare errors for ARM64

2014-08-13 Thread Jan Hubicka
> Hi Honza,
> 
> I did not find any differences in tree level dumps. These are the dump
> differences in IPA
> 
> In gimple-fold.c.000i.cgraph
> 
> (--Snip--)
> < _Z25gimple_build_omp_continueP9tree_nodeS0_/761 
> (gimple_build_omp_continue(tree_node*, tree_node*)) @0x3ff7ebda548
> ---
> > _Z25gimple_build_omp_continueP9tree_nodeS0_/761 
> > (gimple_build_omp_continue(tree_node*, tree_node*)) @0x3ff92b5a548
> 28865c28865
> < _Z26gimple_build_omp_taskgroupP21gimple_statement_base/760
> (gimple_build_omp_taskgroup(gimple_statement_base*)) @0x3ff7ebda400
> ---
> > _Z26gimple_build_omp_taskgroupP21gimple_statement_base/760 
> > (gimple_build_omp_taskgroup(gimple_statement_base*)) @0x3ff92b5a400
> 28875c28875
> < _Z23gimple_build_omp_masterP21gimple_statement_base/759
> (gimple_build_omp_master(gimple_statement_base*)) @0x3ff7ebda2b8
> ---
> > _Z23gimple_build_omp_masterP21gimple_statement_base/759 
> > (gimple_build_omp_master(gimple_statement_base*)) @0x3ff92b5a2b8
> 28885c28885
> < _Z24gimple_build_omp_sectionP21gimple_statement_base/758
> (gimple_build_omp_section(gimple_statement_base*)) @0x3ff7ebda170
> ---
> > _Z24gimple_build_omp_sectionP21gimple_statement_base/758 
> > (gimple_build_omp_section(gimple_statement_base*)) @0x3ff92b5a170
> (--Snip--)

Those are only differences in addesses that is possible in between stages...
> 
> 
> In gimple.c.044i.profile_estimate
> 
> (--Snip--)
> 
> 1987c1987
> < vec::qsort(int (*)(void const*, void const*)) 
> (struct vec * const this, int (*) (const void *, const void *) cmp)
> ---
> > vec::qsort(int (*)(void const*, void const*)) 
> > (struct vec * const this, int (*) (const void *, const void *) cmp)
> (--Snip--)
> 
> gimple.c.048i.inline
> 
> (--Snip--)
> 
> <   min size:   6
> ---
> >   min size:   0
> 6590c6590
> <   min size:   14
> ---
> >   min size:   0
> 6607c6607
> <   min size:   28
> (--Snip--)

This may actually be the problem. Differences in min_size may cause different 
optimization results.
Can I have both inline dumps with -fdump-ipa-inline-details?
I will double check the min size compiltatuion. it is kind of odd it is 0 in 
your case.

Honza


gcc-4.9-20140813 is now available

2014-08-13 Thread gccadmin
Snapshot gcc-4.9-20140813 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20140813/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch 
revision 213932

You'll find:

 gcc-4.9-20140813.tar.bz2 Complete GCC

  MD5=85c4d7f58664a00323dd4e2c807cd295
  SHA1=0bfc8e76f643ff67fdecf82e26a83513ec78203d

Diffs from 4.9-20140806 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.