Re: RS6000 emitting sign extention for unsigned type

2019-01-15 Thread kamlesh kumar
Hi all,

Analysed it further and find out that
function ' rs6000_promote_function_mode ' (rs6000.c) needs modifcation.
"""
static machine_mode
rs6000_promote_function_mode (const_tree type ATTRIBUTE_UNUSED,
  machine_mode mode,
  int *punsignedp ATTRIBUTE_UNUSED,
  const_tree, int)
{
  PROMOTE_MODE (mode, *punsignedp, type);
  return mode;
}
"""
Here, This function is promoting the mode but
it is not even touching 'punsignedp' and it is always initialized to zero
by default.
So in all cases 'punsignedp' remain zero even if it is for unsigned type.
which cause the sign extension to happen  even for unsigned type.

is there any way to set 'punsignedp' appropriately here.

Thanks


On Tue, Jan 15, 2019 at 12:11 AM kamlesh kumar 
wrote:

> Hi devs,
> consider below testcase:
> $cat test.c
> void foo(){
> unsigned int x=-1;
> double d=x;
> }
> $./cc1 test.c -msoft-float -m64
> $cat test.s
>
> .foo:
> .LFB0:
> mflr 0
> std 0,16(1)
> stdu 1,-128(1)
> .LCFI0:
> li 9,-1
> stw 9,112(1)
> lwa 9,112(1)
> mr 3,9
> bl .__floatunsidf
> nop
> mr 9,3
> std 9,120(1)
> nop
> addi 1,1,128
> .LCFI1:
> ld 0,16(1)
> mtlr 0
> blr
> .long 0
> .byte 0,0,0,1,128,0,0,1
>
> Here, you can see sign extension before calling the __floatunsidf routine.
> As per my understanding it should emit zero extension here because
> __floatunsidf  has  it argument as unsigned type.
>
> Like to know ,  Reason behind doing  sign extension here , rather than
> zero extension.
> or if this is a bug?
> is there Any work around or hook?
> Even you can point me to the right direction in the source? where we need
> to do modification?
>
> Thanks
> ~Kamlesh
>
>
> Thanks !
> Kamlesh
>


Re: __has_include__ is problematic

2019-01-15 Thread Nathan Sidwell




Why not give the wierdo __has_include__ an unspellable name?
('builtinhasinclude') and take care constructing the
__has_include macro expansion to have a token with exactly that
spelling?


Wouldn't that break -dM rather horribly?

pah!




However, the following thinks __DATE__ is a defined macro, so there must 
be some other subtlety with __has_include?


nathans@zathras:6>gcc -xc - <:2:2: error: #error DATE IS A MACRO

(typing that makes me realize why users think it is __has_include__, 
that's a really unfortunate name to use for an implementation detail)


nathan
--
Nathan Sidwell


GCC's ICF vs. gold's ICF

2019-01-15 Thread Frank Tetzel
Hi,

why is the ICF pass in gcc not folding member functions which depend on
a template parameter but happen to generate identical code?
Is it because it is not identical on the IR level in the compiler?
Can I somehow dump the IR in text form?

The ICF pass in the gold linker can do it on binary level which is kind
of mentioned in manpage of gcc. I'm just interested in why the compiler
cannot do it on its own.

There is a test program in a blog post I wrote [1].

Best regards,
Frank

[1] 
https://tetzank.github.io/posts/identical-code-folding/#consolidating-independent-member-functions


Re: GCC's ICF vs. gold's ICF

2019-01-15 Thread Richard Biener
On Tue, Jan 15, 2019 at 2:18 PM Frank Tetzel
 wrote:
>
> Hi,
>
> why is the ICF pass in gcc not folding member functions which depend on
> a template parameter but happen to generate identical code?
> Is it because it is not identical on the IR level in the compiler?
> Can I somehow dump the IR in text form?

You can look at the ICF dump generated when you pass -fdump-ipa-icf-details

And yes, ICF has to consider IL differences that may result in different
allowed followup optimizations while when the IL is final (such as link-time)
no such considerations have to be made.  A very simple example would
be signed vs. unsigned integer multiplication where from the former IL
overflow would be undefined and optimizations can exploit that while not
for the latter.

> The ICF pass in the gold linker can do it on binary level which is kind
> of mentioned in manpage of gcc. I'm just interested in why the compiler
> cannot do it on its own.
>
> There is a test program in a blog post I wrote [1].
>
> Best regards,
> Frank
>
> [1] 
> https://tetzank.github.io/posts/identical-code-folding/#consolidating-independent-member-functions


Re: GCC's ICF vs. gold's ICF

2019-01-15 Thread Frank Tetzel
> > why is the ICF pass in gcc not folding member functions which
> > depend on a template parameter but happen to generate identical
> > code? Is it because it is not identical on the IR level in the
> > compiler? Can I somehow dump the IR in text form?  
> 
> You can look at the ICF dump generated when you pass
> -fdump-ipa-icf-details
> 
> And yes, ICF has to consider IL differences that may result in
> different allowed followup optimizations while when the IL is final
> (such as link-time) no such considerations have to be made.  A very
> simple example would be signed vs. unsigned integer multiplication
> where from the former IL overflow would be undefined and
> optimizations can exploit that while not for the latter.

Thanks for the information. If I read the dump correctly, it also
considers the return type and that seems to be the problem in my tiny
test program.

snippet from dump:

  group: with 1 classes:
class with id: 1, hash: 2170673536, items: 2
  MyArray::operator[](unsigned int)/4 MyArray::operator[](unsigned int)/3 
  false returned: 'alias sets are different' (compatible_types_p:244)
  false returned: 'result types are different' (equals_wpa:676)

The body of the functions look identical, but the return type differs.
So in C++, ICF is "disabled" for templated functions with a template
parameter as return type.

But why is the return type preventing code folding? Because we do not
know the calling convention at this point in time?


Re: GCC's ICF vs. gold's ICF

2019-01-15 Thread Richard Biener
On Tue, Jan 15, 2019 at 4:43 PM Frank Tetzel
 wrote:
>
> > > why is the ICF pass in gcc not folding member functions which
> > > depend on a template parameter but happen to generate identical
> > > code? Is it because it is not identical on the IR level in the
> > > compiler? Can I somehow dump the IR in text form?
> >
> > You can look at the ICF dump generated when you pass
> > -fdump-ipa-icf-details
> >
> > And yes, ICF has to consider IL differences that may result in
> > different allowed followup optimizations while when the IL is final
> > (such as link-time) no such considerations have to be made.  A very
> > simple example would be signed vs. unsigned integer multiplication
> > where from the former IL overflow would be undefined and
> > optimizations can exploit that while not for the latter.
>
> Thanks for the information. If I read the dump correctly, it also
> considers the return type and that seems to be the problem in my tiny
> test program.
>
> snippet from dump:
>
>   group: with 1 classes:
> class with id: 1, hash: 2170673536, items: 2
>   MyArray::operator[](unsigned int)/4 MyArray 1024>::operator[](unsigned int)/3
>   false returned: 'alias sets are different' (compatible_types_p:244)
>   false returned: 'result types are different' (equals_wpa:676)
>
> The body of the functions look identical, but the return type differs.
> So in C++, ICF is "disabled" for templated functions with a template
> parameter as return type.
>
> But why is the return type preventing code folding? Because we do not
> know the calling convention at this point in time?

Kind-of.  Probably because ICF is not prepared to "fixup" callers
and insert no-op type conversions to make the IL valid.  Martin should
know.

The analysis could certainly be enhanced to avoid comparing some bits
that will not be relevant in the end.

Richard.


Re: Replacing DejaGNU

2019-01-15 Thread Rainer Orth
Hi Iain,

>> On 14 Jan 2019, at 13:53, Rainer Orth  wrote:
>> 
>> "MCC CS"  writes:
>> 
>>> I've been running the testsuite on my macOS, on which
>>> it is especially unbearable. I want to (at least try to)
>> 
>> that problem may well be macOS specific: since at least macOS 10.13
>> (maybe even 10.12; cannot currently tell for certain) make -jN check
>> times on my Mac mini skyrocketed with between 60 and 80% system time.
>> It seems this is due to lock contention on one specific kernel lock, but
>> I haven't been able to find out more yet.
>
> this PR mentions the compilation, but it’ even more apparent on test.
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84257
>
> * Assuming SIP is disabled.
>
> Some testing suggests that each DYLD_LIBRARY_PATH entry adds around 2ms to
> each exe launch.
> So .. when you’re doing something that’s a lot of work per launch, not much
> is seen - but when you’re doing things with a huge number of exe launches -
> e.g. configuring or running the test suite, it bites.
>
> A work-around is to remove the RPATH_ENVAR variable setting in the top
> level Makefile.in (which actually has the same effect as running things
> with SIP enabled)

this change alone helped tremendously: a bootstrap on macOS 10.14 on
20181103 took

   180041.05 real 96489.89 user180864.44 sys

while the current one was only

44886.30 real 74101.86 user 36879.75 sys

However, not unexpectedly quite a number of new failures occur,
e.g. many (all?) plugin tests FAIL with

cc1: error: cannot load plugin ./selfassign.so
   dlopen(./selfassign.so, 10): Symbol not found: __ZdlPvm
  Referenced from: ./selfassign.so
  Expected in: /usr/lib/libstdc++.6.dylib
 in ./selfassign.so
compiler exited with status 1

I'll still have to check which are affected this way.

> === DejaGNU on macOS...
>
> DejaGNU / expect are not fantastic on macOS, even given the comments above
> - it’s true.  Writing an interpreter/funnel for the testsuite has crossed
> my mind more than once.
>
> However, I suspect it’s a large job, and it might be more worth investing
> any available effort in debugging the slowness in expect/dejaGNU -
> especially the lock contention that Rainer mentions.

Indeed: I found it when trying to investigate the high system time with
lockstat.  However, I don't know a way how to relate the lock address
mentioned there to some lock in the darwin sources.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Replacing DejaGNU

2019-01-15 Thread Iain Sandoe
Hey Rainer,

> On 15 Jan 2019, at 17:27, Rainer Orth  wrote:

>>> On 14 Jan 2019, at 13:53, Rainer Orth  wrote:
>>> 
>>> "MCC CS"  writes:
>>> 
 I've been running the testsuite on my macOS, on which
 it is especially unbearable. I want to (at least try to)
>>> 
>>> that problem may well be macOS specific: since at least macOS 10.13
>>> (maybe even 10.12; cannot currently tell for certain) make -jN check
>>> times on my Mac mini skyrocketed with between 60 and 80% system time.
>>> It seems this is due to lock contention on one specific kernel lock, but
>>> I haven't been able to find out more yet.
>> 
>> this PR mentions the compilation, but it’ even more apparent on test.
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84257
>> 
>> * Assuming SIP is disabled.
>> 
>> Some testing suggests that each DYLD_LIBRARY_PATH entry adds around 2ms to
>> each exe launch.
>> So .. when you’re doing something that’s a lot of work per launch, not much
>> is seen - but when you’re doing things with a huge number of exe launches -
>> e.g. configuring or running the test suite, it bites.
>> 
>> A work-around is to remove the RPATH_ENVAR variable setting in the top
>> level Makefile.in (which actually has the same effect as running things
>> with SIP enabled)
> 
> this change alone helped tremendously: a bootstrap on macOS 10.14 on
> 20181103 took

>   180041.05 real 96489.89 user180864.44 sys
> 
> while the current one was only
> 
>44886.30 real 74101.86 user 36879.75 sys
> 
> However, not unexpectedly quite a number of new failures occur,
> e.g. many (all?) plugin tests FAIL with
> 
> cc1: error: cannot load plugin ./selfassign.so
>   dlopen(./selfassign.so, 10): Symbol not found: __ZdlPvm
>  Referenced from: ./selfassign.so
>  Expected in: /usr/lib/libstdc++.6.dylib
> in ./selfassign.so
> compiler exited with status 1
> 
> I'll still have to check which are affected this way.

I’m afraid that with this (or with SIP enabled) “uninstalled testing” can’t 
work,
the libraries have to be found from their intended installed path,
so you have to “make install && make check” 

** and remember to delete the install before building the next revision...

>> === DejaGNU on macOS...
>> 
>> DejaGNU / expect are not fantastic on macOS, even given the comments above
>> - it’s true.  Writing an interpreter/funnel for the testsuite has crossed
>> my mind more than once.
>> 
>> However, I suspect it’s a large job, and it might be more worth investing
>> any available effort in debugging the slowness in expect/dejaGNU -
>> especially the lock contention that Rainer mentions.
> 
> Indeed: I found it when trying to investigate the high system time with
> lockstat.  However, I don't know a way how to relate the lock address
> mentioned there to some lock in the darwin sources.

Well.. let’s take this offline - or park it in a BZ somewhere, if you can be 
more 
specific - would be happy to poke at it a bit :  if it’s a genuine OS bug,
 we can file a radar - but that doesn’t help the system versions out of support.
(and there’s enough useful h/w out there that’s tied to 10.11 etc)

Iain



Re: Parallelize the compilation using Threads

2019-01-15 Thread Giuliano Belinassi
Hi

I've managed to compile gimple-match.c with -ftime-report, and "phase opt and
generate" seems to be what takes most of the compilation time. This is captured
by the "TV_PHASE_OPT_GEN" timevar, and all its occurrences seem to be in
toplev.c and lto.c. Any ideas of which part such that this variable captures is
the most costly? Also, is that percentage in "GGC" column the amount of time
inside the Garbage Collector?

Time variable   usr   sys  wall 
  GGC
 phase setup:   0.01 (  0%)   0.01 (  0%)   0.02 (  0%) 
   1473 kB (  0%)
 phase parsing  :   3.74 (  4%)   1.43 ( 30%)   5.17 (  5%) 
 294287 kB ( 16%)
 phase lang. deferred   :   0.08 (  0%)   0.03 (  1%)   0.11 (  0%) 
   7582 kB (  0%)
 phase opt and generate :  94.10 ( 95%)   3.26 ( 67%)  97.46 ( 93%) 
1543477 kB ( 82%)
 phase last asm :   0.89 (  1%)   0.09 (  2%)   0.98 (  1%) 
  39802 kB (  2%)
 phase finalize :   0.00 (  0%)   0.01 (  0%)   0.50 (  0%) 
  0 kB (  0%)
 |name lookup   :   0.42 (  0%)   0.12 (  2%)   0.46 (  0%) 
   6162 kB (  0%)
 |overload resolution   :   0.37 (  0%)   0.13 (  3%)   0.42 (  0%) 
  18172 kB (  1%)
 garbage collection :   2.99 (  3%)   0.03 (  1%)   3.02 (  3%) 
  0 kB (  0%)
 dump files :   0.11 (  0%)   0.01 (  0%)   0.16 (  0%) 
  0 kB (  0%)
 callgraph construction :   0.35 (  0%)   0.01 (  0%)   0.24 (  0%) 
  61143 kB (  3%)
 callgraph optimization :   0.21 (  0%)   0.01 (  0%)   0.17 (  0%) 
175 kB (  0%)
 ipa function summary   :   0.12 (  0%)   0.00 (  0%)   0.14 (  0%) 
   2216 kB (  0%)
 ipa dead code removal  :   0.04 (  0%)   0.01 (  0%)   0.00 (  0%) 
  0 kB (  0%)
 ipa devirtualization   :   0.00 (  0%)   0.00 (  0%)   0.01 (  0%) 
  0 kB (  0%)
 ipa cp :   0.33 (  0%)   0.01 (  0%)   0.39 (  0%) 
   9073 kB (  0%)
 ipa inlining heuristics:   0.48 (  0%)   0.00 (  0%)   0.48 (  0%) 
   6175 kB (  0%)
 ipa function splitting :   0.10 (  0%)   0.01 (  0%)   0.07 (  0%) 
   9111 kB (  0%)
 ipa comdats:   0.01 (  0%)   0.00 (  0%)   0.00 (  0%) 
  0 kB (  0%)
 ipa various optimizations  :   0.03 (  0%)   0.03 (  1%)   0.01 (  0%) 
480 kB (  0%)
 ipa reference  :   0.01 (  0%)   0.00 (  0%)   0.02 (  0%) 
  0 kB (  0%)
 ipa profile:   0.01 (  0%)   0.00 (  0%)   0.01 (  0%) 
  0 kB (  0%)
 ipa pure const :   0.13 (  0%)   0.00 (  0%)   0.12 (  0%) 
  8 kB (  0%)
 ipa icf:   0.08 (  0%)   0.00 (  0%)   0.08 (  0%) 
  6 kB (  0%)
 ipa SRA:   1.26 (  1%)   0.28 (  6%)   1.78 (  2%) 
 165814 kB (  9%)
 ipa free lang data :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%) 
  0 kB (  0%)
 ipa free inline summary:   0.00 (  0%)   0.00 (  0%)   0.03 (  0%) 
  0 kB (  0%)
 cfg construction   :   0.09 (  0%)   0.00 (  0%)   0.09 (  0%) 
   7926 kB (  0%)
 cfg cleanup:   1.84 (  2%)   0.00 (  0%)   1.73 (  2%) 
  13673 kB (  1%)
 CFG verifier   :   6.05 (  6%)   0.12 (  2%)   6.80 (  7%) 
  0 kB (  0%)
 trivially dead code:   0.32 (  0%)   0.01 (  0%)   0.38 (  0%) 
  0 kB (  0%)
 df scan insns  :   0.23 (  0%)   0.00 (  0%)   0.30 (  0%) 
 28 kB (  0%)
 df multiple defs   :   0.13 (  0%)   0.00 (  0%)   0.20 (  0%) 
  0 kB (  0%)
 df reaching defs   :   0.52 (  1%)   0.00 (  0%)   0.55 (  1%) 
  0 kB (  0%)
 df live regs   :   2.70 (  3%)   0.02 (  0%)   3.08 (  3%) 
425 kB (  0%)
 df live&initialized regs   :   1.28 (  1%)   0.00 (  0%)   1.13 (  1%) 
  0 kB (  0%)
 df must-initialized regs   :   0.14 (  0%)   0.00 (  0%)   0.16 (  0%) 
  0 kB (  0%)
 df use-def / def-use chains:   0.32 (  0%)   0.00 (  0%)   0.26 (  0%) 
  0 kB (  0%)
 df reg dead/unused notes   :   0.96 (  1%)   0.01 (  0%)   0.89 (  1%) 
  11726 kB (  1%)
 register information   :   0.29 (  0%)   0.00 (  0%)   0.21 (  0%) 
  0 kB (  0%)
 alias analysis :   0.54 (  1%)   0.00 (  0%)   0.53 (  1%) 
  17487 kB (  1%)
 alias stmt walking :   1.10 (  1%)   0.08 (  2%)   1.22 (  1%) 
118 kB (  0%)
 register scan  :   0.08 (  0%)   0.01 (  0%)   0.08 (  0%) 
118 kB (  0%)
 rebuild jump labels:   0.12 (  0%)   0.01 (  0%)   0.11 (  0%) 
  0 kB (  0%)
 preprocessing  :   0.29 (  0%)   0.43 (  9%)   0.65 (  1%) 
  37409 kB (  2%)
 parser (global)