Re: RISC-V and Ada: undefined references to `__gnat_raise_nodefer_with_msg'

2018-07-03 Thread Eric Botcazou
> It seems the a-except.adb was replaced by  a-except-2005.adb in this commit:

Right, it's by design, the old support for SJLJ exceptions has been ditched 
for full runtimes.  You probably just need to swap the values of

   Frontend_Exceptions   : constant Boolean := True;
   ZCX_By_Default: constant Boolean := False;

in system-rtems.ads.

-- 
Eric Botcazou


For help:Unexpected fail about testsuite of GCC

2018-07-03 Thread 陈龙
Hi,

 

I have run the  testsuite of GCC and compared with results from a similar 
configuration in the  gcc-testresults mailing list,  the results just have a 
little difference. Both of the results have many unexpected fails, I want to 
know why they failed but the log couldn’t affort enough information, so was it 
right with so many fails? And could you help me to explain why they failed  
please?Thanks.

  

Test Environment

 

- x86_64-pc-mingw64 and msys2

- gcc8.1.0

 

Part of Log

 

WARNING: Couldn't find the global config file.

Test run by 320022753 on Mon Jul  2 09:23:09 2018

Native configuration is x86_64-pc-mingw64

 

=== gcc tests ===

 

Schedule of variations:

unix

 

Running target unix

Using ./testsuite/config/default.exp as tool-and-target-specific interface file.

Running ./testsuite/gcc.c-torture/compile/compile.exp ...

FAIL: gcc.dg/pr65658.c (test for excess errors)

FAIL: gcc.dg/pr70859-2.c (test for excess errors)

FAIL: gcc.dg/pr71558.c (test for excess errors)

FAIL: gcc.dg/pr7356.c  (test for errors, line 3)

...




Running ./testsuite/gcc.dg/goacc-gomp/goacc-gomp.exp ...

Running ./testsuite/gcc.dg/goacc/goacc.exp ...

Running ./testsuite/gcc.dg/gomp/gomp.exp ...

Running ./testsuite/gcc.dg/graphite/graphite.exp ...

FAIL: gcc.dg/graphite/scop-19.c scan-tree-dump-times graphite "number of SCoPs: 
0" 1

FAIL: gcc.dg/graphite/id-15.c (test for excess errors)

...




=== gcc Summary ===

 

# of expected passes122342

# of unexpected failures866

# of unexpected successes   25

# of expected failures  440

# of unresolved testcases   70

# of unsupported tests  2679

/mingw64/bin/gcc  version 8.1.0 (x86_64-posix-seh-rev0, Built by MinGW-W64 
project)

 

There are more fail information I have not supplied here since some policy of 
email.




Best regards

CL

Re: For help:Unexpected fail about testsuite of GCC

2018-07-03 Thread Jonathan Wakely
On Tue, 3 Jul 2018 at 08:11, 陈龙 wrote:
>
> Hi,
>
>
>
> I have run the  testsuite of GCC and compared with results from a similar 
> configuration in the  gcc-testresults mailing list,  the results just have a 
> little difference. Both of the results have many unexpected fails, I want to 
> know why they failed but the log couldn’t affort enough information, so was 
> it right with so many fails? And could you help me to explain why they failed 
>  please?Thanks.

Firstly, this is the wrong mailing list, please use gcc-help instead.

Secondly, you didn't show the logs, you only showed the console
output. Look for *.log files under the $build/gcc/testsuite directory
which give more information.


DSE and maskstore trouble

2018-07-03 Thread Andrew Stubbs

Hi All,

I'm trying to implement maskload/maskstore for AMD GCN, which has up-to 
64-lane, 512-byte fully-masked vectors. All seems fine as far as the 
vector operations themselves go, but I've found a problem with the RTL 
Dead Store Elimination pass.


Testcase gcc.c-torture/execute/20050826-2.c uses a maskstore to write 
the 14 DImode pointers all in one go. The problem is that DSE doesn't 
know that the store is masked and judges the width at 512 bytes, not the 
true 56 bytes. This leads it to eliminate prior writes to nearby stack 
locations, and therefore bad code.


Has anyone encountered this problem with SVE or AVX maskstore at all?

I was thinking of solving the problem by adding a target hook to query 
the true length of vector types. Any thoughts on that?


Andrew


Re: DSE and maskstore trouble

2018-07-03 Thread Richard Biener
On Tue, Jul 3, 2018 at 11:59 AM Andrew Stubbs  wrote:
>
> Hi All,
>
> I'm trying to implement maskload/maskstore for AMD GCN, which has up-to
> 64-lane, 512-byte fully-masked vectors. All seems fine as far as the
> vector operations themselves go, but I've found a problem with the RTL
> Dead Store Elimination pass.
>
> Testcase gcc.c-torture/execute/20050826-2.c uses a maskstore to write
> the 14 DImode pointers all in one go. The problem is that DSE doesn't
> know that the store is masked and judges the width at 512 bytes, not the
> true 56 bytes. This leads it to eliminate prior writes to nearby stack
> locations, and therefore bad code.
>
> Has anyone encountered this problem with SVE or AVX maskstore at all?

AVX ones are all UNSPECs I believe - how do your patterns look like?

> I was thinking of solving the problem by adding a target hook to query
> the true length of vector types. Any thoughts on that?

It isn't about the length but about the mask because there can be mask
values that do not affect the length?

Richard.

>
> Andrew


Re: DSE and maskstore trouble

2018-07-03 Thread Andrew Stubbs

On 03/07/18 11:15, Richard Biener wrote:

AVX ones are all UNSPECs I believe - how do your patterns look like?


AVX has both unspec and vec_merge variants (at least for define_expand, 
in GCC8), but in any case, AFAICT dse.c only cares about the destination 
MEM, and all the AVX and SVE patterns appear to use nothing special there.



I was thinking of solving the problem by adding a target hook to query
the true length of vector types. Any thoughts on that?


It isn't about the length but about the mask because there can be mask
values that do not affect the length?


The problem I have right now is that the vector write conflicts with 
writes to distinct variables, in which case the vector length is what's 
important, and it's probably(?) safe to assume that if the vector mask 
is not constant then space for the whole vector has been allocated on 
the stack.


But yes, in general it's true that subsequent writes to the same vector 
could well write distinct elements, in which case the value of the mask 
is significant to DSE analysis.


Andrew


Re: DSE and maskstore trouble

2018-07-03 Thread Andrew Stubbs

On 03/07/18 11:33, Andrew Stubbs wrote:

On 03/07/18 11:15, Richard Biener wrote:

AVX ones are all UNSPECs I believe - how do your patterns look like?


AVX has both unspec and vec_merge variants (at least for define_expand, 
in GCC8), but in any case, AFAICT dse.c only cares about the destination 
MEM, and all the AVX and SVE patterns appear to use nothing special there.


Sorry, my patterns look something like this:

(set (mem:V64DI (reg:DI)
 (vec_merge:V64DI (reg:V64DI) (unspec ...) (reg:DI)))

Where the unspec just means that the destination remains unchanged. We 
could also use (match_dup 0) there, but we don't, so probably there was 
an issue with that at some point.



I was thinking of solving the problem by adding a target hook to query
the true length of vector types. Any thoughts on that?


It isn't about the length but about the mask because there can be mask
values that do not affect the length?


The problem I have right now is that the vector write conflicts with 
writes to distinct variables, in which case the vector length is what's 
important, and it's probably(?) safe to assume that if the vector mask 
is not constant then space for the whole vector has been allocated on 
the stack.


But yes, in general it's true that subsequent writes to the same vector 
could well write distinct elements, in which case the value of the mask 
is significant to DSE analysis.


Andrew




Re: DSE and maskstore trouble

2018-07-03 Thread Richard Biener
On Tue, Jul 3, 2018 at 12:57 PM Andrew Stubbs  wrote:
>
> On 03/07/18 11:33, Andrew Stubbs wrote:
> > On 03/07/18 11:15, Richard Biener wrote:
> >> AVX ones are all UNSPECs I believe - how do your patterns look like?
> >
> > AVX has both unspec and vec_merge variants (at least for define_expand,
> > in GCC8), but in any case, AFAICT dse.c only cares about the destination
> > MEM, and all the AVX and SVE patterns appear to use nothing special there.
>
> Sorry, my patterns look something like this:
>
> (set (mem:V64DI (reg:DI)
>   (vec_merge:V64DI (reg:V64DI) (unspec ...) (reg:DI)))
>
> Where the unspec just means that the destination remains unchanged. We
> could also use (match_dup 0) there, but we don't, so probably there was
> an issue with that at some point.

I believe that the AVX variants like

(define_expand "maskstore"
  [(set (match_operand:V48_AVX512VL 0 "memory_operand")
(vec_merge:V48_AVX512VL
  (match_operand:V48_AVX512VL 1 "register_operand")
  (match_dup 0)
  (match_operand: 2 "register_operand")))]
  "TARGET_AVX512F")

are OK since they represent a use of the memory due to the match_dup 0
while your UNSPEC one doesn't so as the store doesn't actually take
place to all of 0 your insn variant doesn't represent observable behavior.

> >>> I was thinking of solving the problem by adding a target hook to query
> >>> the true length of vector types. Any thoughts on that?
> >>
> >> It isn't about the length but about the mask because there can be mask
> >> values that do not affect the length?
> >
> > The problem I have right now is that the vector write conflicts with
> > writes to distinct variables, in which case the vector length is what's
> > important, and it's probably(?) safe to assume that if the vector mask
> > is not constant then space for the whole vector has been allocated on
> > the stack.
> >
> > But yes, in general it's true that subsequent writes to the same vector
> > could well write distinct elements, in which case the value of the mask
> > is significant to DSE analysis.
> >
> > Andrew
>


Re: DSE and maskstore trouble

2018-07-03 Thread Andrew Stubbs

On 03/07/18 12:02, Richard Biener wrote:

I believe that the AVX variants like

(define_expand "maskstore"
   [(set (match_operand:V48_AVX512VL 0 "memory_operand")
 (vec_merge:V48_AVX512VL
   (match_operand:V48_AVX512VL 1 "register_operand")
   (match_dup 0)
   (match_operand: 2 "register_operand")))]
   "TARGET_AVX512F")

are OK since they represent a use of the memory due to the match_dup 0
while your UNSPEC one doesn't so as the store doesn't actually take
place to all of 0 your insn variant doesn't represent observable behavior.


Hmm, so they're safe, but may prevent the optimization of nearby variables?

What about the unspec AVX variant?

Andrew


Re: DSE and maskstore trouble

2018-07-03 Thread Richard Biener
On Tue, Jul 3, 2018 at 1:06 PM Andrew Stubbs  wrote:
>
> On 03/07/18 12:02, Richard Biener wrote:
> > I believe that the AVX variants like
> >
> > (define_expand "maskstore"
> >[(set (match_operand:V48_AVX512VL 0 "memory_operand")
> >  (vec_merge:V48_AVX512VL
> >(match_operand:V48_AVX512VL 1 "register_operand")
> >(match_dup 0)
> >(match_operand: 2 "register_operand")))]
> >"TARGET_AVX512F")
> >
> > are OK since they represent a use of the memory due to the match_dup 0
> > while your UNSPEC one doesn't so as the store doesn't actually take
> > place to all of 0 your insn variant doesn't represent observable behavior.
>
> Hmm, so they're safe, but may prevent the optimization of nearby variables?

Yes, they prevent earlier stores into lanes that are "really" written
to to be DSEd.

> What about the unspec AVX variant?

They also use match_dup 0 for an input to the UNSPEC so DSE should consider
those similarly (as use).

I think the non-unspec ones exist because AVX512 masks are integer registers
and actually match the vec_merge requirements for the mask operand.  For
the AVX2 variants the mask is a regular vector which is where we do not have
any suitable way to represent things on RTL.  Well, maybe sth like

 (vec_select (vec_concat A B) (mult (const_vector -1 -2 -3 ...) mask-vector))

might work (not exactly, sth more complex is required).  That said,
a "vector"_cond_exec or some other variant of explicit masked-vector
operation is really required to model this stuff exactly on RTL.

Richard.

>
> Andrew


Re: DSE and maskstore trouble

2018-07-03 Thread Andrew Stubbs

On 03/07/18 12:30, Richard Biener wrote:

Hmm, so they're safe, but may prevent the optimization of nearby variables?


Yes, they prevent earlier stores into lanes that are "really" written
to to be DSEd.


Right, but I have unrelated variables allocated to the stack within the 
"shadow" of the masked vector. I didn't ask it to do that, it just does, 
so I presume this is an expect feature of masked vectors with a known mask.


Surely this prevents valid optimizations on those variables?

Andrew


Re: DSE and maskstore trouble

2018-07-03 Thread Richard Biener
On Tue, Jul 3, 2018 at 1:38 PM Andrew Stubbs  wrote:
>
> On 03/07/18 12:30, Richard Biener wrote:
> >> Hmm, so they're safe, but may prevent the optimization of nearby variables?
> >
> > Yes, they prevent earlier stores into lanes that are "really" written
> > to to be DSEd.
>
> Right, but I have unrelated variables allocated to the stack within the
> "shadow" of the masked vector. I didn't ask it to do that, it just does,
> so I presume this is an expect feature of masked vectors with a known mask.

Huh, I don't think so.  I guess that's the real error and I wonder how
that happens.
Are those just spills or real allocations?

> Surely this prevents valid optimizations on those variables?
>
> Andrew


Re: DSE and maskstore trouble

2018-07-03 Thread Andrew Stubbs

On 03/07/18 12:45, Richard Biener wrote:

On Tue, Jul 3, 2018 at 1:38 PM Andrew Stubbs  wrote:


On 03/07/18 12:30, Richard Biener wrote:

Hmm, so they're safe, but may prevent the optimization of nearby variables?


Yes, they prevent earlier stores into lanes that are "really" written
to to be DSEd.


Right, but I have unrelated variables allocated to the stack within the
"shadow" of the masked vector. I didn't ask it to do that, it just does,
so I presume this is an expect feature of masked vectors with a known mask.


Huh, I don't think so.  I guess that's the real error and I wonder how
that happens.
Are those just spills or real allocations?


The code from the testcase looks like this:

struct rtattr rt[2];
struct rtattr *rta[14];
int i;

rt[0].rta_len = sizeof (struct rtattr) + 8;
rt[0].rta_type = 0;
rt[1] = rt[0];
for (i = 0; i < 14; i++)
  rta[i] = &rt[0];

The initialization of rt[0] and rt[1] are being deleted because the 
write to rta[0..13] would overwrite rt if it had actually been the 
maximum rta[0..63].


That, or I've been staring at dumps too long and gone crazy.

Andrew

P.S. I'm trying to test with (match_dup 0), but LRA exploded.


Re: DSE and maskstore trouble

2018-07-03 Thread Richard Biener
On Tue, Jul 3, 2018 at 1:57 PM Andrew Stubbs  wrote:
>
> On 03/07/18 12:45, Richard Biener wrote:
> > On Tue, Jul 3, 2018 at 1:38 PM Andrew Stubbs  
> > wrote:
> >>
> >> On 03/07/18 12:30, Richard Biener wrote:
>  Hmm, so they're safe, but may prevent the optimization of nearby 
>  variables?
> >>>
> >>> Yes, they prevent earlier stores into lanes that are "really" written
> >>> to to be DSEd.
> >>
> >> Right, but I have unrelated variables allocated to the stack within the
> >> "shadow" of the masked vector. I didn't ask it to do that, it just does,
> >> so I presume this is an expect feature of masked vectors with a known mask.
> >
> > Huh, I don't think so.  I guess that's the real error and I wonder how
> > that happens.
> > Are those just spills or real allocations?
>
> The code from the testcase looks like this:
>
>  struct rtattr rt[2];
>  struct rtattr *rta[14];
>  int i;
>
>  rt[0].rta_len = sizeof (struct rtattr) + 8;
>  rt[0].rta_type = 0;
>  rt[1] = rt[0];
>  for (i = 0; i < 14; i++)
>rta[i] = &rt[0];
>
> The initialization of rt[0] and rt[1] are being deleted because the
> write to rta[0..13] would overwrite rt if it had actually been the
> maximum rta[0..63].

Ok, so if we vectorize the above with 64 element masked stores
then indeed the RTL representation is _not_ safe.  That is because
while the uses in the masked stores should prevent things from
going bad there is also TBAA to consider which means those
uses might not actually _be_ uses (TBAA-wise) of the earlier
stores.  In the above case rtattr * doesn't alias int (or whatever
types rta_type or rta_len have).  That means to DSE the earlier
stores are dead.

The fix would be to, instead of (match_dup 0) use
(match_dup_but_change_MEM_ALIAS_SET_to_zero 0) ...
at least for the expanders.  That is, adjust the expanders
to not use (match_dup 0) and add insn variants that also
match alias-set zero (match_dup 0).  Or sth along that lines.

Richard.

> That, or I've been staring at dumps too long and gone crazy.
>
> Andrew
>
> P.S. I'm trying to test with (match_dup 0), but LRA exploded.


Re: DSE and maskstore trouble

2018-07-03 Thread Andrew Stubbs

On 03/07/18 13:21, Richard Biener wrote:

Ok, so if we vectorize the above with 64 element masked stores
then indeed the RTL representation is _not_ safe.  That is because
while the uses in the masked stores should prevent things from
going bad there is also TBAA to consider which means those
uses might not actually _be_ uses (TBAA-wise) of the earlier
stores.  In the above case rtattr * doesn't alias int (or whatever
types rta_type or rta_len have).  That means to DSE the earlier
stores are dead.


I managed to get it to generate maskstore without the unspec, and the 
code now runs correctly.


I don't follow your AA reasoning. You say the use stops it being bad, 
and then you say the stores are dead, which sounds bad, yet it's not 
deleting them now.


Confused. :-(

Andrew


Re: [GSOC] LTO dump tool project

2018-07-03 Thread Hrishikesh Kulkarni
Hi,

Thanks for suggestions. I have started incorporating them. As a first:

I have added command line options:
-print-size  to print the size of functions and variables
size of variables is in bits and size of functions is represented as
number of basic blocks.

-print-value  to print the value of initialization of global variables.

-size-sort  to sort the symbol names according to the size.

for example command line options:

 ../stage1-build/gcc/lto-dump test_hello.o -fdump-lto-list -print-size
-print-value -size-sort

the dump is:

Symbol Table
Name Type VisibilitySize Value

printf functiondefault   0
main functiondefault   3
foofunctiondefault   3
barfunctiondefault   6

z   variabledefault   897
k   variabledefault  325
p   variabledefault  32
4.40095367431640625e+0

I have also tried to make memory allocation dynamic to the best of my knowledge.
I have pushed the changes to the repo. Please find the diff file
attached herewith.

Regards,

Hrishikesh


On Fri, Jun 29, 2018 at 12:55 PM, Martin Liška  wrote:
> On 06/27/2018 10:06 PM, Hrishikesh Kulkarni wrote:
>> Hi,
>>
>> I have added new command line options:
>> -no-demangle -> for the default non demangled output
>> -no-sort -> for the list of symbols in order of their occurrence
>> -alpha-sort -> for the list of symbols in their alphabetical order
>> -reverse-sort -> for the list of symbols in reversed order
>> -defined-only -> for only the defined symbols
>
> Hi.
>
> Good progress.
>
>>
>> for example:
>>
>> ../stage1-build/gcc/lto-dump test_hello.o -fdump-lto-list -alpha-sort
>> -demangle -defined-only -reverse-sort
>
> Now as you have a separate tool (lto-dump), you can strip 'fdump-lto' prefix
> from the older options.
>
>>
>> will dump
>>
>> Symbol Table
>> Name Type Visibility Size
>> mainfunctiondefault
>> kvariabledefault
>> foofunctiondefault
>> barfunctiondefault
>>
>> It is a reversed alphabetical order of demangled symbol names which
>> have been defined(hence printf excluded).
>> Along with this I have also added all previous progress with reference
>> to symbol table to the new branch.
>>
>> For further options to add like -size-sort and -print-size I tried to
>> access size of the symbol with symtab node using
>> TREE_INT_CST_LOW(DECL_SIZE(node->decl));
>> but I am unable to do so.
>> So how should I proceed with it?
>
> Sent advises via instant messaging.
>
> Martin
>
>>
>> Please find the diff file attached herewith. I have also pushed the
>> changes to the new branch.
>>
>> Please advise,
>>
>> Hrishikesh>
>> On Wed, Jun 27, 2018 at 1:15 AM, Hrishikesh Kulkarni
>>  wrote:
>>> Hi,
>>>
>>> I have created another branch lto-dump-tool-improved as suggested.
>>> I have applied the patch for separation to lto-dump binary, made a few
>>> necessary changes in other files and it is running successfully.
>>> I will keep on adding previous work to this branch incrementally.
>>>
>>> Please find the diff file attached for dumping of TREE statistics and
>>> GIMPLE statistics.
>>>
>>> for example:
>>> (after configuring with --enable-gather-detailed-mem-stats)
>>> -fdump-lto-gimple-stats will dump
>>> GIMPLE statements
>>> Kind   Stmts  Bytes
>>> ---
>>> assignments0  0
>>> phi nodes  0  0
>>> conditionals   0  0
>>> everything else0  0
>>> ---
>>> Total  0  0
>>> ---
>>>
>>> -fdump-lto-tree-stats will dump
>>>
>>> Tree Statistics
>>>
>>> Kind   Nodes  Bytes
>>> 
>>> decls   4327 932672
>>> types   1531 257208
>>> blocks 0  0
>>> stmts  0  0
>>> refs   0  0
>>> exprs  4128
>>> constants 82   2060
>>> identifiers 4430 177200
>>> vecs  16  28544
>>> binfos 0  0
>>> ssa names  0  0
>>> constructors   0  0
>>> random kinds7301 291952
>>> lang_decl kinds0  0
>>> lang_type kinds0  0
>>> omp clauses0  0
>>> 
>>> Total  176911689764
>>>
>>>
>>>
>>> Please advise,
>>>
>>> Hrishikesh
>>>
>>> On Wed, Jun 27, 2018 at 1:00

Re: DSE and maskstore trouble

2018-07-03 Thread Richard Biener
On Tue, Jul 3, 2018 at 2:46 PM Andrew Stubbs  wrote:
>
> On 03/07/18 13:21, Richard Biener wrote:
> > Ok, so if we vectorize the above with 64 element masked stores
> > then indeed the RTL representation is _not_ safe.  That is because
> > while the uses in the masked stores should prevent things from
> > going bad there is also TBAA to consider which means those
> > uses might not actually _be_ uses (TBAA-wise) of the earlier
> > stores.  In the above case rtattr * doesn't alias int (or whatever
> > types rta_type or rta_len have).  That means to DSE the earlier
> > stores are dead.
>
> I managed to get it to generate maskstore without the unspec, and the
> code now runs correctly.

OK, that is good.

> I don't follow your AA reasoning. You say the use stops it being bad,
> and then you say the stores are dead, which sounds bad, yet it's not
> deleting them now.

If you look at RTL dumps (with -fstrict-aliasing, thus -O2+) you should
see MEM_ALIAS_SETs differing for the earlier stores and the masked
store uses.

Now I'm of course assuming DSE is perfect, maybe it isn't ... ;)

Richard.

>
> Confused. :-(
>
> Andrew


Re: Understanding tree_swap_operands_p wrt SSA name versions

2018-07-03 Thread Michael Ploujnikov
On 2018-06-20 04:23 AM, Richard Biener wrote:
> On Wed, Jun 20, 2018 at 7:31 AM Jeff Law  wrote:
>>
>> On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
>>> Hi everyone,
>>>
>>> (I hope this is the right place to ask, if not my apologies; please
>>> point me in the right direction)
>>>
>>> I'm trying to get a better understanding of the following part in
>>> tree_swap_operands_p():
>>>
>>>   /* It is preferable to swap two SSA_NAME to ensure a canonical form
>>>  for commutative and comparison operators.  Ensuring a canonical
>>>  form allows the optimizers to find additional redundancies without
>>>  having to explicitly check for both orderings.  */
>>>   if (TREE_CODE (arg0) == SSA_NAME
>>>   && TREE_CODE (arg1) == SSA_NAME
>>>   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
>>> return 1;
>>>
>>> My questions in no particular order: It looks like this was added in
>>> 2004. I couldn't find any info other than what's in the corresponding
>>> commit (cc0bdf913) so I'm wondering if the canonical form/order still
>>> relevant/needed today? Does the ordering have to be done based on the
>>> name versions specifically? Or can it be based on something more
>>> intrinsic to the input source code rather than a GCC internal value, eg:
>>> would alphabetic sort order of the variable names be a reasonable
>>> replacement?
>> Canonicalization is still important and useful.
> 
> Indeed.
> 
>> However, canonicalizing on SSA_NAMEs is problematical due to the way we
>> recycle nodes and re-pack them.
> 
> In the past we made sure to not disrupt order - hopefully that didn't change
> so the re-packing shoudln't invaidate previous canonicalization:
> 
> static void
> release_free_names_and_compact_live_names (function *fun)
> {
> ...
>   /* And compact the SSA number space.  We make sure to not change the
>  relative order of SSA versions.  */
>   for (i = 1, j = 1; i < fun->gimple_df->ssa_names->length (); ++i)
> {
> 
> 
>> I think defining additional rules for canonicalization prior to using
>> SSA_NAME_VERSION as the fallback would be looked upon favorably.
> 
> I don't see a good reason to do that, it will be harder to spot 
> canonicalization
> issues and it will take extra compile-time.
> 
>> Note however, that many of the _DECL nodes referenced by SSA_NAMEs are
>> temporaries generated by the compiler and do not correspond to any
>> declared/defined object in the original source.  So you'll still need
>> the SSA_NAME_VERSION (or something as stable or better) canonicalization
>> to handle those cases.
> 
> And not all SSA_NAMEs have underlying _DECL nodes (or IDENTIFIER_NODE names).
> 
> Richard.
> 
>> Jeff

After a bit more digging I found that insert_phi_nodes inserts PHIs in
the order of UIDs, which indirectly affects the order of vars in
old_ssa_names, which in turn affects the order in which make_ssa_name_fn
is called to pick SSA versions from FREE_SSANAMES so in the end
ordering by SSA_NAME_VERSION's is more or less equivalent to ordering by
UIDs. I'm trying to figure out if there's a way to avoid depending on
UIDs being ordered in a certain way. So if tree_swap_operands_p stays
the same I'm wondering if there's some other info available at the point
of insert_phi_nodes that would be a good replacement for UID. From my
very limited experience with a very small source input, and if I
understand things correctly, all of the var_infos have a var which is
DECL_P and thus should have an IDENTIFIER_NODE. Is that true in the
general case? I don't fully understand what are all the things that
insert_phi_nodes iterates over.

- Michael



signature.asc
Description: OpenPGP digital signature


Re: DSE and maskstore trouble

2018-07-03 Thread Andrew Stubbs

On 03/07/18 14:52, Richard Biener wrote:

If you look at RTL dumps (with -fstrict-aliasing, thus -O2+) you should
see MEM_ALIAS_SETs differing for the earlier stores and the masked
store uses.

Now I'm of course assuming DSE is perfect, maybe it isn't ... ;)


Ok, I see that the stores have MEMs with different alias sets, indeed. I 
can't quite work out if that means it's safe, or unsafe? Do I still need 
to zero the set?


For masked stores, clearly the current DSE implementation must be 
sub-optimal because it ignores the mask. Writing it as a 
load/modify/write means that stores are not erroneously removed, but 
also means that redundant writes (to masked vector destinations) are 
never removed. Anyway, at least I know how to make that part safe now.


Thanks

Andrew


Question on -fopt-info-inline

2018-07-03 Thread Qing Zhao
Hi,

In order to collect complete information on all the inlining transformation 
that GCC applies on a given program,
I searched online, and found that the option -fopt-info-inline might be the 
right option to use:

https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html 


in which, it mentioned:

"As another example,
gcc -O3 -fopt-info-inline-optimized-missed=inline.txt
outputs information about missed optimizations as well as optimized locations 
from all the inlining passes into inline.txt. 

“

Then I checked a very small testcase with GCC9 as following:

[qinzhao@localhost inline_report]$ cat inline_1.c
static int foo (int a)
{
  return a + 10;
}

static int bar (int b)
{
  return b - 20;
}

static int boo (int a, int b)
{
  return foo (a) + bar (b);
}

extern int v_a, v_b;
extern int result;

int compute ()
{
  result = boo (v_a, v_b);
  return result; 
}

[qinzhao@localhost inline_report]$ /home/qinzhao/Install/latest/bin/gcc -O3 
-fopt-info-inline-optimized-missed=inline.txt inline_1.c -S
[qinzhao@localhost inline_report]$ ls -l inline.txt
-rw-rw-r--. 1 qinzhao qinzhao 0 Jul  3 11:25 inline.txt
[qinzhao@localhost inline_report]$ cat inline_1.s
.file   "inline_1.c"
.text
.p2align 4,,15
.globl  compute
.type   compute, @function
compute:
.LFB3:
.cfi_startproc
movlv_a(%rip), %edx
movlv_b(%rip), %eax
leal-10(%rdx,%rax), %eax
movl%eax, result(%rip)
ret
.cfi_endproc
.LFE3:
.size   compute, .-compute
.ident  "GCC: (GNU) 9.0.0 20180702 (experimental)"
.section.note.GNU-stack,"",@progbits

From the above, we can see:
1. the call chains to —>“boo”->”foo”, “bar” in the routine “compute” are 
completely inlined into “compute”;
2. However, there is NO any inline information is dumped into “inline.txt”.


So, My questions are:

1. Is the option -fopt-info-inline  the right option to use to get the complete 
inlining transformation info from GCC?
2. is this a bug that the current -fopt-info-inline cannot dump anything for 
this testing case?


Thanks a lot for your help.

Qing

Re: DSE and maskstore trouble

2018-07-03 Thread Richard Biener
On July 3, 2018 5:19:24 PM GMT+02:00, Andrew Stubbs  
wrote:
>On 03/07/18 14:52, Richard Biener wrote:
>> If you look at RTL dumps (with -fstrict-aliasing, thus -O2+) you
>should
>> see MEM_ALIAS_SETs differing for the earlier stores and the masked
>> store uses.
>> 
>> Now I'm of course assuming DSE is perfect, maybe it isn't ... ;)
>
>Ok, I see that the stores have MEMs with different alias sets, indeed.
>I 
>can't quite work out if that means it's safe, or unsafe? Do I still
>need 
>to zero the set?

I think you need to zero the set of the load in the masked store. The AVX 
patterns suffer from the same issue here. 
Of course I'm not sure if we can construct a miscompilation here but I wouldn't 
be surprised if we can. 

>For masked stores, clearly the current DSE implementation must be 
>sub-optimal because it ignores the mask. Writing it as a 
>load/modify/write means that stores are not erroneously removed, but 
>also means that redundant writes (to masked vector destinations) are 
>never removed. Anyway, at least I know how to make that part safe now.

Yeah. 

Richard. 
>
>Thanks
>
>Andrew



Re: Understanding tree_swap_operands_p wrt SSA name versions

2018-07-03 Thread Richard Biener
On July 3, 2018 4:56:57 PM GMT+02:00, Michael Ploujnikov 
 wrote:
>On 2018-06-20 04:23 AM, Richard Biener wrote:
>> On Wed, Jun 20, 2018 at 7:31 AM Jeff Law  wrote:
>>>
>>> On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
 Hi everyone,

 (I hope this is the right place to ask, if not my apologies; please
 point me in the right direction)

 I'm trying to get a better understanding of the following part in
 tree_swap_operands_p():

   /* It is preferable to swap two SSA_NAME to ensure a canonical
>form
  for commutative and comparison operators.  Ensuring a
>canonical
  form allows the optimizers to find additional redundancies
>without
  having to explicitly check for both orderings.  */
   if (TREE_CODE (arg0) == SSA_NAME
   && TREE_CODE (arg1) == SSA_NAME
   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
 return 1;

 My questions in no particular order: It looks like this was added
>in
 2004. I couldn't find any info other than what's in the
>corresponding
 commit (cc0bdf913) so I'm wondering if the canonical form/order
>still
 relevant/needed today? Does the ordering have to be done based on
>the
 name versions specifically? Or can it be based on something more
 intrinsic to the input source code rather than a GCC internal
>value, eg:
 would alphabetic sort order of the variable names be a reasonable
 replacement?
>>> Canonicalization is still important and useful.
>> 
>> Indeed.
>> 
>>> However, canonicalizing on SSA_NAMEs is problematical due to the way
>we
>>> recycle nodes and re-pack them.
>> 
>> In the past we made sure to not disrupt order - hopefully that didn't
>change
>> so the re-packing shoudln't invaidate previous canonicalization:
>> 
>> static void
>> release_free_names_and_compact_live_names (function *fun)
>> {
>> ...
>>   /* And compact the SSA number space.  We make sure to not change
>the
>>  relative order of SSA versions.  */
>>   for (i = 1, j = 1; i < fun->gimple_df->ssa_names->length (); ++i)
>> {
>> 
>> 
>>> I think defining additional rules for canonicalization prior to
>using
>>> SSA_NAME_VERSION as the fallback would be looked upon favorably.
>> 
>> I don't see a good reason to do that, it will be harder to spot
>canonicalization
>> issues and it will take extra compile-time.
>> 
>>> Note however, that many of the _DECL nodes referenced by SSA_NAMEs
>are
>>> temporaries generated by the compiler and do not correspond to any
>>> declared/defined object in the original source.  So you'll still
>need
>>> the SSA_NAME_VERSION (or something as stable or better)
>canonicalization
>>> to handle those cases.
>> 
>> And not all SSA_NAMEs have underlying _DECL nodes (or IDENTIFIER_NODE
>names).
>> 
>> Richard.
>> 
>>> Jeff
>
>After a bit more digging I found that insert_phi_nodes inserts PHIs in
>the order of UIDs, which indirectly affects the order of vars in
>old_ssa_names, which in turn affects the order in which
>make_ssa_name_fn
>is called to pick SSA versions from FREE_SSANAMES so in the end
>ordering by SSA_NAME_VERSION's is more or less equivalent to ordering
>by
>UIDs. I'm trying to figure out if there's a way to avoid depending on
>UIDs being ordered in a certain way. So if tree_swap_operands_p stays
>the same I'm wondering if there's some other info available at the
>point
>of insert_phi_nodes that would be a good replacement for UID. From my
>very limited experience with a very small source input, and if I
>understand things correctly, all of the var_infos have a var which is
>DECL_P and thus should have an IDENTIFIER_NODE. Is that true in the
>general case? I don't fully understand what are all the things that
>insert_phi_nodes iterates over.

Why do you want to remove the dependence on UID ordering? It's pervasive 
throughout the whole compilation... 

Richard. 

>- Michael



Re: Question on -fopt-info-inline

2018-07-03 Thread Richard Biener
On July 3, 2018 6:01:19 PM GMT+02:00, Qing Zhao  wrote:
>Hi,
>
>In order to collect complete information on all the inlining
>transformation that GCC applies on a given program,
>I searched online, and found that the option -fopt-info-inline might be
>the right option to use:
>
>https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html
>
>
>in which, it mentioned:
>
>"As another example,
>gcc -O3 -fopt-info-inline-optimized-missed=inline.txt
>outputs information about missed optimizations as well as optimized
>locations from all the inlining passes into inline.txt. 
>
>“
>
>Then I checked a very small testcase with GCC9 as following:
>
>[qinzhao@localhost inline_report]$ cat inline_1.c
>static int foo (int a)
>{
>  return a + 10;
>}
>
>static int bar (int b)
>{
>  return b - 20;
>}
>
>static int boo (int a, int b)
>{
>  return foo (a) + bar (b);
>}
>
>extern int v_a, v_b;
>extern int result;
>
>int compute ()
>{
>  result = boo (v_a, v_b);
>  return result; 
>}
>
>[qinzhao@localhost inline_report]$ /home/qinzhao/Install/latest/bin/gcc
>-O3 -fopt-info-inline-optimized-missed=inline.txt inline_1.c -S
>[qinzhao@localhost inline_report]$ ls -l inline.txt
>-rw-rw-r--. 1 qinzhao qinzhao 0 Jul  3 11:25 inline.txt
>[qinzhao@localhost inline_report]$ cat inline_1.s
>   .file   "inline_1.c"
>   .text
>   .p2align 4,,15
>   .globl  compute
>   .type   compute, @function
>compute:
>.LFB3:
>   .cfi_startproc
>   movlv_a(%rip), %edx
>   movlv_b(%rip), %eax
>   leal-10(%rdx,%rax), %eax
>   movl%eax, result(%rip)
>   ret
>   .cfi_endproc
>.LFE3:
>   .size   compute, .-compute
>   .ident  "GCC: (GNU) 9.0.0 20180702 (experimental)"
>   .section.note.GNU-stack,"",@progbits
>
>From the above, we can see:
>1. the call chains to —>“boo”->”foo”, “bar” in the routine “compute”
>are completely inlined into “compute”;
>2. However, there is NO any inline information is dumped into
>“inline.txt”.
>
>
>So, My questions are:
>
>1. Is the option -fopt-info-inline  the right option to use to get the
>complete inlining transformation info from GCC?
>2. is this a bug that the current -fopt-info-inline cannot dump
>anything for this testing case?

I think the early inliner doesn't use opt-info yet. 

Richard. 

>
>Thanks a lot for your help.
>
>Qing



Re: Question on -fopt-info-inline

2018-07-03 Thread Qing Zhao


> On Jul 3, 2018, at 11:48 AM, Richard Biener  
> wrote:
> 
> On July 3, 2018 6:01:19 PM GMT+02:00, Qing Zhao  > wrote:
>> Hi,
>> 
>> In order to collect complete information on all the inlining
>> transformation that GCC applies on a given program,
>> I searched online, and found that the option -fopt-info-inline might be
>> the right option to use:
>> 
>> https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html 
>> 
>> > >
>> 
>> in which, it mentioned:
>> 
>> "As another example,
>> gcc -O3 -fopt-info-inline-optimized-missed=inline.txt
>> outputs information about missed optimizations as well as optimized
>> locations from all the inlining passes into inline.txt. 
>> 
>> “
>> 
>> Then I checked a very small testcase with GCC9 as following:
>> 
>> [qinzhao@localhost inline_report]$ cat inline_1.c
>> static int foo (int a)
>> {
>> return a + 10;
>> }
>> 
>> static int bar (int b)
>> {
>> return b - 20;
>> }
>> 
>> static int boo (int a, int b)
>> {
>> return foo (a) + bar (b);
>> }
>> 
>> extern int v_a, v_b;
>> extern int result;
>> 
>> int compute ()
>> {
>> result = boo (v_a, v_b);
>> return result; 
>> }
>> 
>> [qinzhao@localhost inline_report]$ /home/qinzhao/Install/latest/bin/gcc
>> -O3 -fopt-info-inline-optimized-missed=inline.txt inline_1.c -S
>> [qinzhao@localhost inline_report]$ ls -l inline.txt
>> -rw-rw-r--. 1 qinzhao qinzhao 0 Jul  3 11:25 inline.txt
>> [qinzhao@localhost inline_report]$ cat inline_1.s
>>  .file   "inline_1.c"
>>  .text
>>  .p2align 4,,15
>>  .globl  compute
>>  .type   compute, @function
>> compute:
>> .LFB3:
>>  .cfi_startproc
>>  movlv_a(%rip), %edx
>>  movlv_b(%rip), %eax
>>  leal-10(%rdx,%rax), %eax
>>  movl%eax, result(%rip)
>>  ret
>>  .cfi_endproc
>> .LFE3:
>>  .size   compute, .-compute
>>  .ident  "GCC: (GNU) 9.0.0 20180702 (experimental)"
>>  .section.note.GNU-stack,"",@progbits
>> 
>> From the above, we can see:
>> 1. the call chains to —>“boo”->”foo”, “bar” in the routine “compute”
>> are completely inlined into “compute”;
>> 2. However, there is NO any inline information is dumped into
>> “inline.txt”.
>> 
>> 
>> So, My questions are:
>> 
>> 1. Is the option -fopt-info-inline  the right option to use to get the
>> complete inlining transformation info from GCC?
>> 2. is this a bug that the current -fopt-info-inline cannot dump
>> anything for this testing case?
> 
> I think the early inliner doesn't use opt-info yet. 

so, shall we add the opt-info support to early inliner?

Qing
> 
> Richard. 
> 
>> 
>> Thanks a lot for your help.
>> 
>> Qing



Re: Understanding tree_swap_operands_p wrt SSA name versions

2018-07-03 Thread Michael Ploujnikov
On 2018-07-03 12:46 PM, Richard Biener wrote:
> On July 3, 2018 4:56:57 PM GMT+02:00, Michael Ploujnikov 
>  wrote:
>> On 2018-06-20 04:23 AM, Richard Biener wrote:
>>> On Wed, Jun 20, 2018 at 7:31 AM Jeff Law  wrote:

 On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
> Hi everyone,
>
> (I hope this is the right place to ask, if not my apologies; please
> point me in the right direction)
>
> I'm trying to get a better understanding of the following part in
> tree_swap_operands_p():
>
>   /* It is preferable to swap two SSA_NAME to ensure a canonical
>> form
>  for commutative and comparison operators.  Ensuring a
>> canonical
>  form allows the optimizers to find additional redundancies
>> without
>  having to explicitly check for both orderings.  */
>   if (TREE_CODE (arg0) == SSA_NAME
>   && TREE_CODE (arg1) == SSA_NAME
>   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
> return 1;
>
> My questions in no particular order: It looks like this was added
>> in
> 2004. I couldn't find any info other than what's in the
>> corresponding
> commit (cc0bdf913) so I'm wondering if the canonical form/order
>> still
> relevant/needed today? Does the ordering have to be done based on
>> the
> name versions specifically? Or can it be based on something more
> intrinsic to the input source code rather than a GCC internal
>> value, eg:
> would alphabetic sort order of the variable names be a reasonable
> replacement?
 Canonicalization is still important and useful.
>>>
>>> Indeed.
>>>
 However, canonicalizing on SSA_NAMEs is problematical due to the way
>> we
 recycle nodes and re-pack them.
>>>
>>> In the past we made sure to not disrupt order - hopefully that didn't
>> change
>>> so the re-packing shoudln't invaidate previous canonicalization:
>>>
>>> static void
>>> release_free_names_and_compact_live_names (function *fun)
>>> {
>>> ...
>>>   /* And compact the SSA number space.  We make sure to not change
>> the
>>>  relative order of SSA versions.  */
>>>   for (i = 1, j = 1; i < fun->gimple_df->ssa_names->length (); ++i)
>>> {
>>>
>>>
 I think defining additional rules for canonicalization prior to
>> using
 SSA_NAME_VERSION as the fallback would be looked upon favorably.
>>>
>>> I don't see a good reason to do that, it will be harder to spot
>> canonicalization
>>> issues and it will take extra compile-time.
>>>
 Note however, that many of the _DECL nodes referenced by SSA_NAMEs
>> are
 temporaries generated by the compiler and do not correspond to any
 declared/defined object in the original source.  So you'll still
>> need
 the SSA_NAME_VERSION (or something as stable or better)
>> canonicalization
 to handle those cases.
>>>
>>> And not all SSA_NAMEs have underlying _DECL nodes (or IDENTIFIER_NODE
>> names).
>>>
>>> Richard.
>>>
 Jeff
>>
>> After a bit more digging I found that insert_phi_nodes inserts PHIs in
>> the order of UIDs, which indirectly affects the order of vars in
>> old_ssa_names, which in turn affects the order in which
>> make_ssa_name_fn
>> is called to pick SSA versions from FREE_SSANAMES so in the end
>> ordering by SSA_NAME_VERSION's is more or less equivalent to ordering
>> by
>> UIDs. I'm trying to figure out if there's a way to avoid depending on
>> UIDs being ordered in a certain way. So if tree_swap_operands_p stays
>> the same I'm wondering if there's some other info available at the
>> point
>> of insert_phi_nodes that would be a good replacement for UID. From my
>> very limited experience with a very small source input, and if I
>> understand things correctly, all of the var_infos have a var which is
>> DECL_P and thus should have an IDENTIFIER_NODE. Is that true in the
>> general case? I don't fully understand what are all the things that
>> insert_phi_nodes iterates over.
> 
> Why do you want to remove the dependence on UID ordering? It's pervasive 
> throughout the whole compilation... 
> 
> Richard. 
> 
>> - Michael
> 


Well, I'm working on a reduction of the number of changes seen with
binary diffing (a la https://wiki.debian.org/ReproducibleBuilds) and
since current UID assignment is essentially tied to the order of things
in the input source code one function's changes can cascade to others
(even when they're unchanged). As you said UID dependence is quiet
pervasive, and although finding and improving individual cases (such as
tree_swap_operands_p) won't make it perfect, I think it will be a step
in the positive direction.

Also, I have some ideas for a UID assignment scheme that might improve
things overall, that I'll try to share after I get back from vacation.

- Michael



signature.asc
Description: OpenPGP digital signature


Re: Question on -fopt-info-inline

2018-07-03 Thread Qing Zhao


>> 
>>> 
>>> In order to collect complete information on all the inlining
>>> transformation that GCC applies on a given program,
>>> I searched online, and found that the option -fopt-info-inline might be
>>> the right option to use:
>>> 
>>> https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html 
>>>  
>>> >> >
>>> >> >
>>> 
>>> in which, it mentioned:
>>> 
>>> "As another example,
>>> gcc -O3 -fopt-info-inline-optimized-missed=inline.txt
>>> outputs information about missed optimizations as well as optimized
>>> locations from all the inlining passes into inline.txt. 
>>> 
>>> “
>>> 
>>> Then I checked a very small testcase with GCC9 as following:
>>> 
>>> [qinzhao@localhost inline_report]$ cat inline_1.c
>>> static int foo (int a)
>>> {
>>> return a + 10;
>>> }
>>> 
>>> static int bar (int b)
>>> {
>>> return b - 20;
>>> }
>>> 
>>> static int boo (int a, int b)
>>> {
>>> return foo (a) + bar (b);
>>> }
>>> 
>>> extern int v_a, v_b;
>>> extern int result;
>>> 
>>> int compute ()
>>> {
>>> result = boo (v_a, v_b);
>>> return result; 
>>> }
>>> 
>>> [qinzhao@localhost inline_report]$ /home/qinzhao/Install/latest/bin/gcc
>>> -O3 -fopt-info-inline-optimized-missed=inline.txt inline_1.c -S
>>> [qinzhao@localhost inline_report]$ ls -l inline.txt
>>> -rw-rw-r--. 1 qinzhao qinzhao 0 Jul  3 11:25 inline.txt
>>> [qinzhao@localhost inline_report]$ cat inline_1.s
>>> .file   "inline_1.c"
>>> .text
>>> .p2align 4,,15
>>> .globl  compute
>>> .type   compute, @function
>>> compute:
>>> .LFB3:
>>> .cfi_startproc
>>> movlv_a(%rip), %edx
>>> movlv_b(%rip), %eax
>>> leal-10(%rdx,%rax), %eax
>>> movl%eax, result(%rip)
>>> ret
>>> .cfi_endproc
>>> .LFE3:
>>> .size   compute, .-compute
>>> .ident  "GCC: (GNU) 9.0.0 20180702 (experimental)"
>>> .section.note.GNU-stack,"",@progbits
>>> 
>>> From the above, we can see:
>>> 1. the call chains to —>“boo”->”foo”, “bar” in the routine “compute”
>>> are completely inlined into “compute”;
>>> 2. However, there is NO any inline information is dumped into
>>> “inline.txt”.
>>> 
>>> 
>>> So, My questions are:
>>> 
>>> 1. Is the option -fopt-info-inline  the right option to use to get the
>>> complete inlining transformation info from GCC?
>>> 2. is this a bug that the current -fopt-info-inline cannot dump
>>> anything for this testing case?
>> 
>> I think the early inliner doesn't use opt-info yet. 
> 
> so, shall we add the opt-info support to early inliner?

I just created the following PR to record this work:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86395 


let me know if I missed anything.

thanks.

Qing




Re: Understanding tree_swap_operands_p wrt SSA name versions

2018-07-03 Thread Jeff Law
On 07/03/2018 11:55 AM, Michael Ploujnikov wrote:
> On 2018-07-03 12:46 PM, Richard Biener wrote:
>> On July 3, 2018 4:56:57 PM GMT+02:00, Michael Ploujnikov 
>>  wrote:
>>> On 2018-06-20 04:23 AM, Richard Biener wrote:
 On Wed, Jun 20, 2018 at 7:31 AM Jeff Law  wrote:
>
> On 06/19/2018 12:30 PM, Michael Ploujnikov wrote:
>> Hi everyone,
>>
>> (I hope this is the right place to ask, if not my apologies; please
>> point me in the right direction)
>>
>> I'm trying to get a better understanding of the following part in
>> tree_swap_operands_p():
>>
>>   /* It is preferable to swap two SSA_NAME to ensure a canonical
>>> form
>>  for commutative and comparison operators.  Ensuring a
>>> canonical
>>  form allows the optimizers to find additional redundancies
>>> without
>>  having to explicitly check for both orderings.  */
>>   if (TREE_CODE (arg0) == SSA_NAME
>>   && TREE_CODE (arg1) == SSA_NAME
>>   && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
>> return 1;
>>
>> My questions in no particular order: It looks like this was added
>>> in
>> 2004. I couldn't find any info other than what's in the
>>> corresponding
>> commit (cc0bdf913) so I'm wondering if the canonical form/order
>>> still
>> relevant/needed today? Does the ordering have to be done based on
>>> the
>> name versions specifically? Or can it be based on something more
>> intrinsic to the input source code rather than a GCC internal
>>> value, eg:
>> would alphabetic sort order of the variable names be a reasonable
>> replacement?
> Canonicalization is still important and useful.

 Indeed.

> However, canonicalizing on SSA_NAMEs is problematical due to the way
>>> we
> recycle nodes and re-pack them.

 In the past we made sure to not disrupt order - hopefully that didn't
>>> change
 so the re-packing shoudln't invaidate previous canonicalization:

 static void
 release_free_names_and_compact_live_names (function *fun)
 {
 ...
   /* And compact the SSA number space.  We make sure to not change
>>> the
  relative order of SSA versions.  */
   for (i = 1, j = 1; i < fun->gimple_df->ssa_names->length (); ++i)
 {


> I think defining additional rules for canonicalization prior to
>>> using
> SSA_NAME_VERSION as the fallback would be looked upon favorably.

 I don't see a good reason to do that, it will be harder to spot
>>> canonicalization
 issues and it will take extra compile-time.

> Note however, that many of the _DECL nodes referenced by SSA_NAMEs
>>> are
> temporaries generated by the compiler and do not correspond to any
> declared/defined object in the original source.  So you'll still
>>> need
> the SSA_NAME_VERSION (or something as stable or better)
>>> canonicalization
> to handle those cases.

 And not all SSA_NAMEs have underlying _DECL nodes (or IDENTIFIER_NODE
>>> names).

 Richard.

> Jeff
>>>
>>> After a bit more digging I found that insert_phi_nodes inserts PHIs in
>>> the order of UIDs, which indirectly affects the order of vars in
>>> old_ssa_names, which in turn affects the order in which
>>> make_ssa_name_fn
>>> is called to pick SSA versions from FREE_SSANAMES so in the end
>>> ordering by SSA_NAME_VERSION's is more or less equivalent to ordering
>>> by
>>> UIDs. I'm trying to figure out if there's a way to avoid depending on
>>> UIDs being ordered in a certain way. So if tree_swap_operands_p stays
>>> the same I'm wondering if there's some other info available at the
>>> point
>>> of insert_phi_nodes that would be a good replacement for UID. From my
>>> very limited experience with a very small source input, and if I
>>> understand things correctly, all of the var_infos have a var which is
>>> DECL_P and thus should have an IDENTIFIER_NODE. Is that true in the
>>> general case? I don't fully understand what are all the things that
>>> insert_phi_nodes iterates over.
>>
>> Why do you want to remove the dependence on UID ordering? It's pervasive 
>> throughout the whole compilation... 
>>
>> Richard. 
>>
>>> - Michael
>>
> 
> 
> Well, I'm working on a reduction of the number of changes seen with
> binary diffing (a la https://wiki.debian.org/ReproducibleBuilds) and
> since current UID assignment is essentially tied to the order of things
> in the input source code one function's changes can cascade to others
> (even when they're unchanged). As you said UID dependence is quiet
> pervasive, and although finding and improving individual cases (such as
> tree_swap_operands_p) won't make it perfect, I think it will be a step
> in the positive direction.
> 
> Also, I have some ideas for a UID assignment scheme that might improve
> things overall, that I'll try to share after I get back from vacation.
I'm still not sure what the point is.  GIve

Re: Question on -fopt-info-inline

2018-07-03 Thread Jeff Law
On 07/03/2018 12:28 PM, Qing Zhao wrote:
> 
>>>

 In order to collect complete information on all the inlining
 transformation that GCC applies on a given program,
 I searched online, and found that the option -fopt-info-inline might be
 the right option to use:

 https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html 
  
 >
 >

 in which, it mentioned:

 "As another example,
 gcc -O3 -fopt-info-inline-optimized-missed=inline.txt
 outputs information about missed optimizations as well as optimized
 locations from all the inlining passes into inline.txt. 

 “

 Then I checked a very small testcase with GCC9 as following:

 [qinzhao@localhost inline_report]$ cat inline_1.c
 static int foo (int a)
 {
 return a + 10;
 }

 static int bar (int b)
 {
 return b - 20;
 }

 static int boo (int a, int b)
 {
 return foo (a) + bar (b);
 }

 extern int v_a, v_b;
 extern int result;

 int compute ()
 {
 result = boo (v_a, v_b);
 return result; 
 }

 [qinzhao@localhost inline_report]$ /home/qinzhao/Install/latest/bin/gcc
 -O3 -fopt-info-inline-optimized-missed=inline.txt inline_1.c -S
 [qinzhao@localhost inline_report]$ ls -l inline.txt
 -rw-rw-r--. 1 qinzhao qinzhao 0 Jul  3 11:25 inline.txt
 [qinzhao@localhost inline_report]$ cat inline_1.s
.file   "inline_1.c"
.text
.p2align 4,,15
.globl  compute
.type   compute, @function
 compute:
 .LFB3:
.cfi_startproc
movlv_a(%rip), %edx
movlv_b(%rip), %eax
leal-10(%rdx,%rax), %eax
movl%eax, result(%rip)
ret
.cfi_endproc
 .LFE3:
.size   compute, .-compute
.ident  "GCC: (GNU) 9.0.0 20180702 (experimental)"
.section.note.GNU-stack,"",@progbits

 From the above, we can see:
 1. the call chains to —>“boo”->”foo”, “bar” in the routine “compute”
 are completely inlined into “compute”;
 2. However, there is NO any inline information is dumped into
 “inline.txt”.


 So, My questions are:

 1. Is the option -fopt-info-inline  the right option to use to get the
 complete inlining transformation info from GCC?
 2. is this a bug that the current -fopt-info-inline cannot dump
 anything for this testing case?
>>>
>>> I think the early inliner doesn't use opt-info yet. 
>>
>> so, shall we add the opt-info support to early inliner?
> 
> I just created the following PR to record this work:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86395 
> 
> 
> let me know if I missed anything.
I'm hoping that the work David is doing WRT optimization information
will be usable for the inliner as well.  In fact, inlining and
vectorization are the two use cases we identified internally as the
first targets.


jeff


Re: RISC-V and Ada: undefined references to `__gnat_raise_nodefer_with_msg'

2018-07-03 Thread Sebastian Huber

On 03/07/18 09:10, Eric Botcazou wrote:

It seems the a-except.adb was replaced by  a-except-2005.adb in this commit:

Right, it's by design, the old support for SJLJ exceptions has been ditched
for full runtimes.  You probably just need to swap the values of

Frontend_Exceptions   : constant Boolean := True;
ZCX_By_Default: constant Boolean := False;

in system-rtems.ads.


Thanks, this worked.

--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Sched1 stability issue

2018-07-03 Thread Kugan Vivekanandarajah
Hi,

We noticed a difference in the code generated for aarch64 gcc 7.2
hosted in Linux vs mingw. AFIK, we are supposed to produce the same
output.

For the testacse we have (quite large and I am trying to reduce), the
difference comes from sched1 pass. If I disable sched1 the difference
is going away.

Is this a known issue? Attached is the sched1 dump snippet where there
is the difference.

Thanks,
Kugan


 verify found no changes in insn with uid = 41.
 starting the processing of deferred insns
 ending the processing of deferred insns
 df_analyze called

 Pass 0 for finding pseudo/allocno costs


   r84 costs: CALLER_SAVE_REGS:0 GENERAL_REGS:0 FP_LO_REGS:2
FP_REGS:2 ALL_REGS:2 MEM:8000
   r83 costs: CALLER_SAVE_REGS:0 GENERAL_REGS:0 FP_LO_REGS:2
FP_REGS:2 ALL_REGS:2 MEM:8000
   r80 costs: CALLER_SAVE_REGS:0 GENERAL_REGS:0 FP_LO_REGS:1
FP_REGS:1 ALL_REGS:1 MEM:8000
   r79 costs: CALLER_SAVE_REGS:0 GENERAL_REGS:0 FP_LO_REGS:4000
FP_REGS:4000 ALL_REGS:1 MEM:8000
   r78 costs: CALLER_SAVE_REGS:0 GENERAL_REGS:0 FP_LO_REGS:4000
FP_REGS:4000 ALL_REGS:1 MEM:8000
   r77 costs: CALLER_SAVE_REGS:0 GENERAL_REGS:0 FP_LO_REGS:9000
FP_REGS:9000 ALL_REGS:1 MEM:8000


 Pass 1 for finding pseudo/allocno costs

 r86: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r85: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r84: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r83: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r82: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r81: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r80: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r79: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r78: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r77: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r76: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r75: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r74: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r73: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r72: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r71: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r70: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r69: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r68: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
 r67: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS

   r84 costs: GENERAL_REGS:0 FP_LO_REGS:2 FP_REGS:2
ALL_REGS:2 MEM:8000
   r83 costs: GENERAL_REGS:0 FP_LO_REGS:2 FP_REGS:2
ALL_REGS:2 MEM:8000
   r80 costs: GENERAL_REGS:0 FP_LO_REGS:1 FP_REGS:1
ALL_REGS:1 MEM:8000
   r79 costs: GENERAL_REGS:0 FP_LO_REGS:1 FP_REGS:1
ALL_REGS:1 MEM:8000
   r78 costs: GENERAL_REGS:0 FP_LO_REGS:1 FP_REGS:1
ALL_REGS:1 MEM:8000
   r77 costs: GENERAL_REGS:0 FP_LO_REGS:1 FP_REGS:1
ALL_REGS:1 MEM:8000

 ;;   ==
 ;;   -- basic block 2 from 3 to 48 -- before reload
 ;;   ==

 ;;  0--> b  0: i  24 r77=ap-0x40
:cortex_a53_slot_any:GENERAL_REGS+1(1)FP_REGS+0(0)
 ;;  0--> b  0: i  26 r78=0xffc8
:cortex_a53_slot_any:GENERAL_REGS+1(1)FP_REGS+0(0)
 ;;  1--> b  0: i  25 [sfp-0x10]=r77
:(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_store:@GENERAL_REGS+0(-1)@FP_REGS+0(0)
--
-;;  1--> b  0: i   9 [ap-0x8]=x7
:(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_store:@GENERAL_REGS+0(-1)@FP_REGS+0(0)
--
-;;  2--> b  0: i  22 [sfp-0x20]=ap
:(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_store:GENERAL_REGS+0(0)FP_REGS+0(0)
+;;  1--> b  0: i  22 [sfp-0x20]=ap
:(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_store:@GENERAL_REGS+0(0)@FP_REGS+0(0)
 ;;  2--> b  0: i  23 [sfp-0x18]=ap
:(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_store:GENERAL_REGS+0(0)FP_REGS+0(0)
-;;  3--> b  0: i  27 [sfp-0x8]=r78
:(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_store:GENERAL_REGS+0(-1)FP_REGS+0(0)
+;;  2--> b  0: i  27 [sfp-0x8]=r78
:(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_store:GENERAL_REGS+0(-1)FP_REGS+0(0)
 ;;  3--> b  0: i  28 r79=0xff80
:cortex_a53_slot_any:GENERAL_REGS+1(1)FP_REGS+0(0)
-;;  4--> b  0: i  10 [ap-0xc0]=v0
:(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_store:@GENERAL_REGS+0(0)@FP_REGS+0(-1)
+;;  3--> b  0: i  10 [ap-0xc0]=v0
:(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_store:@GENERAL_REGS+0(0)@FP_REGS+0(-1)
 ;;  4--> b  0: i  29 [sfp-0x4]=r79
:(cortex_a53_slot_