Re: RS6000 emitting sign extention for unsigned type
Hi all, Analysed it further and find out that function ' rs6000_promote_function_mode ' (rs6000.c) needs modifcation. """ static machine_mode rs6000_promote_function_mode (const_tree type ATTRIBUTE_UNUSED, machine_mode mode, int *punsignedp ATTRIBUTE_UNUSED, const_tree, int) { PROMOTE_MODE (mode, *punsignedp, type); return mode; } """ Here, This function is promoting the mode but it is not even touching 'punsignedp' and it is always initialized to zero by default. So in all cases 'punsignedp' remain zero even if it is for unsigned type. which cause the sign extension to happen even for unsigned type. is there any way to set 'punsignedp' appropriately here. Thanks On Tue, Jan 15, 2019 at 12:11 AM kamlesh kumar wrote: > Hi devs, > consider below testcase: > $cat test.c > void foo(){ > unsigned int x=-1; > double d=x; > } > $./cc1 test.c -msoft-float -m64 > $cat test.s > > .foo: > .LFB0: > mflr 0 > std 0,16(1) > stdu 1,-128(1) > .LCFI0: > li 9,-1 > stw 9,112(1) > lwa 9,112(1) > mr 3,9 > bl .__floatunsidf > nop > mr 9,3 > std 9,120(1) > nop > addi 1,1,128 > .LCFI1: > ld 0,16(1) > mtlr 0 > blr > .long 0 > .byte 0,0,0,1,128,0,0,1 > > Here, you can see sign extension before calling the __floatunsidf routine. > As per my understanding it should emit zero extension here because > __floatunsidf has it argument as unsigned type. > > Like to know , Reason behind doing sign extension here , rather than > zero extension. > or if this is a bug? > is there Any work around or hook? > Even you can point me to the right direction in the source? where we need > to do modification? > > Thanks > ~Kamlesh > > > Thanks ! > Kamlesh >
Re: __has_include__ is problematic
Why not give the wierdo __has_include__ an unspellable name? ('builtinhasinclude') and take care constructing the __has_include macro expansion to have a token with exactly that spelling? Wouldn't that break -dM rather horribly? pah! However, the following thinks __DATE__ is a defined macro, so there must be some other subtlety with __has_include? nathans@zathras:6>gcc -xc - <:2:2: error: #error DATE IS A MACRO (typing that makes me realize why users think it is __has_include__, that's a really unfortunate name to use for an implementation detail) nathan -- Nathan Sidwell
GCC's ICF vs. gold's ICF
Hi, why is the ICF pass in gcc not folding member functions which depend on a template parameter but happen to generate identical code? Is it because it is not identical on the IR level in the compiler? Can I somehow dump the IR in text form? The ICF pass in the gold linker can do it on binary level which is kind of mentioned in manpage of gcc. I'm just interested in why the compiler cannot do it on its own. There is a test program in a blog post I wrote [1]. Best regards, Frank [1] https://tetzank.github.io/posts/identical-code-folding/#consolidating-independent-member-functions
Re: GCC's ICF vs. gold's ICF
On Tue, Jan 15, 2019 at 2:18 PM Frank Tetzel wrote: > > Hi, > > why is the ICF pass in gcc not folding member functions which depend on > a template parameter but happen to generate identical code? > Is it because it is not identical on the IR level in the compiler? > Can I somehow dump the IR in text form? You can look at the ICF dump generated when you pass -fdump-ipa-icf-details And yes, ICF has to consider IL differences that may result in different allowed followup optimizations while when the IL is final (such as link-time) no such considerations have to be made. A very simple example would be signed vs. unsigned integer multiplication where from the former IL overflow would be undefined and optimizations can exploit that while not for the latter. > The ICF pass in the gold linker can do it on binary level which is kind > of mentioned in manpage of gcc. I'm just interested in why the compiler > cannot do it on its own. > > There is a test program in a blog post I wrote [1]. > > Best regards, > Frank > > [1] > https://tetzank.github.io/posts/identical-code-folding/#consolidating-independent-member-functions
Re: GCC's ICF vs. gold's ICF
> > why is the ICF pass in gcc not folding member functions which > > depend on a template parameter but happen to generate identical > > code? Is it because it is not identical on the IR level in the > > compiler? Can I somehow dump the IR in text form? > > You can look at the ICF dump generated when you pass > -fdump-ipa-icf-details > > And yes, ICF has to consider IL differences that may result in > different allowed followup optimizations while when the IL is final > (such as link-time) no such considerations have to be made. A very > simple example would be signed vs. unsigned integer multiplication > where from the former IL overflow would be undefined and > optimizations can exploit that while not for the latter. Thanks for the information. If I read the dump correctly, it also considers the return type and that seems to be the problem in my tiny test program. snippet from dump: group: with 1 classes: class with id: 1, hash: 2170673536, items: 2 MyArray::operator[](unsigned int)/4 MyArray::operator[](unsigned int)/3 false returned: 'alias sets are different' (compatible_types_p:244) false returned: 'result types are different' (equals_wpa:676) The body of the functions look identical, but the return type differs. So in C++, ICF is "disabled" for templated functions with a template parameter as return type. But why is the return type preventing code folding? Because we do not know the calling convention at this point in time?
Re: GCC's ICF vs. gold's ICF
On Tue, Jan 15, 2019 at 4:43 PM Frank Tetzel wrote: > > > > why is the ICF pass in gcc not folding member functions which > > > depend on a template parameter but happen to generate identical > > > code? Is it because it is not identical on the IR level in the > > > compiler? Can I somehow dump the IR in text form? > > > > You can look at the ICF dump generated when you pass > > -fdump-ipa-icf-details > > > > And yes, ICF has to consider IL differences that may result in > > different allowed followup optimizations while when the IL is final > > (such as link-time) no such considerations have to be made. A very > > simple example would be signed vs. unsigned integer multiplication > > where from the former IL overflow would be undefined and > > optimizations can exploit that while not for the latter. > > Thanks for the information. If I read the dump correctly, it also > considers the return type and that seems to be the problem in my tiny > test program. > > snippet from dump: > > group: with 1 classes: > class with id: 1, hash: 2170673536, items: 2 > MyArray::operator[](unsigned int)/4 MyArray 1024>::operator[](unsigned int)/3 > false returned: 'alias sets are different' (compatible_types_p:244) > false returned: 'result types are different' (equals_wpa:676) > > The body of the functions look identical, but the return type differs. > So in C++, ICF is "disabled" for templated functions with a template > parameter as return type. > > But why is the return type preventing code folding? Because we do not > know the calling convention at this point in time? Kind-of. Probably because ICF is not prepared to "fixup" callers and insert no-op type conversions to make the IL valid. Martin should know. The analysis could certainly be enhanced to avoid comparing some bits that will not be relevant in the end. Richard.
Re: Replacing DejaGNU
Hi Iain, >> On 14 Jan 2019, at 13:53, Rainer Orth wrote: >> >> "MCC CS" writes: >> >>> I've been running the testsuite on my macOS, on which >>> it is especially unbearable. I want to (at least try to) >> >> that problem may well be macOS specific: since at least macOS 10.13 >> (maybe even 10.12; cannot currently tell for certain) make -jN check >> times on my Mac mini skyrocketed with between 60 and 80% system time. >> It seems this is due to lock contention on one specific kernel lock, but >> I haven't been able to find out more yet. > > this PR mentions the compilation, but it’ even more apparent on test. > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84257 > > * Assuming SIP is disabled. > > Some testing suggests that each DYLD_LIBRARY_PATH entry adds around 2ms to > each exe launch. > So .. when you’re doing something that’s a lot of work per launch, not much > is seen - but when you’re doing things with a huge number of exe launches - > e.g. configuring or running the test suite, it bites. > > A work-around is to remove the RPATH_ENVAR variable setting in the top > level Makefile.in (which actually has the same effect as running things > with SIP enabled) this change alone helped tremendously: a bootstrap on macOS 10.14 on 20181103 took 180041.05 real 96489.89 user180864.44 sys while the current one was only 44886.30 real 74101.86 user 36879.75 sys However, not unexpectedly quite a number of new failures occur, e.g. many (all?) plugin tests FAIL with cc1: error: cannot load plugin ./selfassign.so dlopen(./selfassign.so, 10): Symbol not found: __ZdlPvm Referenced from: ./selfassign.so Expected in: /usr/lib/libstdc++.6.dylib in ./selfassign.so compiler exited with status 1 I'll still have to check which are affected this way. > === DejaGNU on macOS... > > DejaGNU / expect are not fantastic on macOS, even given the comments above > - it’s true. Writing an interpreter/funnel for the testsuite has crossed > my mind more than once. > > However, I suspect it’s a large job, and it might be more worth investing > any available effort in debugging the slowness in expect/dejaGNU - > especially the lock contention that Rainer mentions. Indeed: I found it when trying to investigate the high system time with lockstat. However, I don't know a way how to relate the lock address mentioned there to some lock in the darwin sources. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: Replacing DejaGNU
Hey Rainer, > On 15 Jan 2019, at 17:27, Rainer Orth wrote: >>> On 14 Jan 2019, at 13:53, Rainer Orth wrote: >>> >>> "MCC CS" writes: >>> I've been running the testsuite on my macOS, on which it is especially unbearable. I want to (at least try to) >>> >>> that problem may well be macOS specific: since at least macOS 10.13 >>> (maybe even 10.12; cannot currently tell for certain) make -jN check >>> times on my Mac mini skyrocketed with between 60 and 80% system time. >>> It seems this is due to lock contention on one specific kernel lock, but >>> I haven't been able to find out more yet. >> >> this PR mentions the compilation, but it’ even more apparent on test. >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84257 >> >> * Assuming SIP is disabled. >> >> Some testing suggests that each DYLD_LIBRARY_PATH entry adds around 2ms to >> each exe launch. >> So .. when you’re doing something that’s a lot of work per launch, not much >> is seen - but when you’re doing things with a huge number of exe launches - >> e.g. configuring or running the test suite, it bites. >> >> A work-around is to remove the RPATH_ENVAR variable setting in the top >> level Makefile.in (which actually has the same effect as running things >> with SIP enabled) > > this change alone helped tremendously: a bootstrap on macOS 10.14 on > 20181103 took > 180041.05 real 96489.89 user180864.44 sys > > while the current one was only > >44886.30 real 74101.86 user 36879.75 sys > > However, not unexpectedly quite a number of new failures occur, > e.g. many (all?) plugin tests FAIL with > > cc1: error: cannot load plugin ./selfassign.so > dlopen(./selfassign.so, 10): Symbol not found: __ZdlPvm > Referenced from: ./selfassign.so > Expected in: /usr/lib/libstdc++.6.dylib > in ./selfassign.so > compiler exited with status 1 > > I'll still have to check which are affected this way. I’m afraid that with this (or with SIP enabled) “uninstalled testing” can’t work, the libraries have to be found from their intended installed path, so you have to “make install && make check” ** and remember to delete the install before building the next revision... >> === DejaGNU on macOS... >> >> DejaGNU / expect are not fantastic on macOS, even given the comments above >> - it’s true. Writing an interpreter/funnel for the testsuite has crossed >> my mind more than once. >> >> However, I suspect it’s a large job, and it might be more worth investing >> any available effort in debugging the slowness in expect/dejaGNU - >> especially the lock contention that Rainer mentions. > > Indeed: I found it when trying to investigate the high system time with > lockstat. However, I don't know a way how to relate the lock address > mentioned there to some lock in the darwin sources. Well.. let’s take this offline - or park it in a BZ somewhere, if you can be more specific - would be happy to poke at it a bit : if it’s a genuine OS bug, we can file a radar - but that doesn’t help the system versions out of support. (and there’s enough useful h/w out there that’s tied to 10.11 etc) Iain
Re: Parallelize the compilation using Threads
Hi I've managed to compile gimple-match.c with -ftime-report, and "phase opt and generate" seems to be what takes most of the compilation time. This is captured by the "TV_PHASE_OPT_GEN" timevar, and all its occurrences seem to be in toplev.c and lto.c. Any ideas of which part such that this variable captures is the most costly? Also, is that percentage in "GGC" column the amount of time inside the Garbage Collector? Time variable usr sys wall GGC phase setup: 0.01 ( 0%) 0.01 ( 0%) 0.02 ( 0%) 1473 kB ( 0%) phase parsing : 3.74 ( 4%) 1.43 ( 30%) 5.17 ( 5%) 294287 kB ( 16%) phase lang. deferred : 0.08 ( 0%) 0.03 ( 1%) 0.11 ( 0%) 7582 kB ( 0%) phase opt and generate : 94.10 ( 95%) 3.26 ( 67%) 97.46 ( 93%) 1543477 kB ( 82%) phase last asm : 0.89 ( 1%) 0.09 ( 2%) 0.98 ( 1%) 39802 kB ( 2%) phase finalize : 0.00 ( 0%) 0.01 ( 0%) 0.50 ( 0%) 0 kB ( 0%) |name lookup : 0.42 ( 0%) 0.12 ( 2%) 0.46 ( 0%) 6162 kB ( 0%) |overload resolution : 0.37 ( 0%) 0.13 ( 3%) 0.42 ( 0%) 18172 kB ( 1%) garbage collection : 2.99 ( 3%) 0.03 ( 1%) 3.02 ( 3%) 0 kB ( 0%) dump files : 0.11 ( 0%) 0.01 ( 0%) 0.16 ( 0%) 0 kB ( 0%) callgraph construction : 0.35 ( 0%) 0.01 ( 0%) 0.24 ( 0%) 61143 kB ( 3%) callgraph optimization : 0.21 ( 0%) 0.01 ( 0%) 0.17 ( 0%) 175 kB ( 0%) ipa function summary : 0.12 ( 0%) 0.00 ( 0%) 0.14 ( 0%) 2216 kB ( 0%) ipa dead code removal : 0.04 ( 0%) 0.01 ( 0%) 0.00 ( 0%) 0 kB ( 0%) ipa devirtualization : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) ipa cp : 0.33 ( 0%) 0.01 ( 0%) 0.39 ( 0%) 9073 kB ( 0%) ipa inlining heuristics: 0.48 ( 0%) 0.00 ( 0%) 0.48 ( 0%) 6175 kB ( 0%) ipa function splitting : 0.10 ( 0%) 0.01 ( 0%) 0.07 ( 0%) 9111 kB ( 0%) ipa comdats: 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) ipa various optimizations : 0.03 ( 0%) 0.03 ( 1%) 0.01 ( 0%) 480 kB ( 0%) ipa reference : 0.01 ( 0%) 0.00 ( 0%) 0.02 ( 0%) 0 kB ( 0%) ipa profile: 0.01 ( 0%) 0.00 ( 0%) 0.01 ( 0%) 0 kB ( 0%) ipa pure const : 0.13 ( 0%) 0.00 ( 0%) 0.12 ( 0%) 8 kB ( 0%) ipa icf: 0.08 ( 0%) 0.00 ( 0%) 0.08 ( 0%) 6 kB ( 0%) ipa SRA: 1.26 ( 1%) 0.28 ( 6%) 1.78 ( 2%) 165814 kB ( 9%) ipa free lang data : 0.01 ( 0%) 0.00 ( 0%) 0.00 ( 0%) 0 kB ( 0%) ipa free inline summary: 0.00 ( 0%) 0.00 ( 0%) 0.03 ( 0%) 0 kB ( 0%) cfg construction : 0.09 ( 0%) 0.00 ( 0%) 0.09 ( 0%) 7926 kB ( 0%) cfg cleanup: 1.84 ( 2%) 0.00 ( 0%) 1.73 ( 2%) 13673 kB ( 1%) CFG verifier : 6.05 ( 6%) 0.12 ( 2%) 6.80 ( 7%) 0 kB ( 0%) trivially dead code: 0.32 ( 0%) 0.01 ( 0%) 0.38 ( 0%) 0 kB ( 0%) df scan insns : 0.23 ( 0%) 0.00 ( 0%) 0.30 ( 0%) 28 kB ( 0%) df multiple defs : 0.13 ( 0%) 0.00 ( 0%) 0.20 ( 0%) 0 kB ( 0%) df reaching defs : 0.52 ( 1%) 0.00 ( 0%) 0.55 ( 1%) 0 kB ( 0%) df live regs : 2.70 ( 3%) 0.02 ( 0%) 3.08 ( 3%) 425 kB ( 0%) df live&initialized regs : 1.28 ( 1%) 0.00 ( 0%) 1.13 ( 1%) 0 kB ( 0%) df must-initialized regs : 0.14 ( 0%) 0.00 ( 0%) 0.16 ( 0%) 0 kB ( 0%) df use-def / def-use chains: 0.32 ( 0%) 0.00 ( 0%) 0.26 ( 0%) 0 kB ( 0%) df reg dead/unused notes : 0.96 ( 1%) 0.01 ( 0%) 0.89 ( 1%) 11726 kB ( 1%) register information : 0.29 ( 0%) 0.00 ( 0%) 0.21 ( 0%) 0 kB ( 0%) alias analysis : 0.54 ( 1%) 0.00 ( 0%) 0.53 ( 1%) 17487 kB ( 1%) alias stmt walking : 1.10 ( 1%) 0.08 ( 2%) 1.22 ( 1%) 118 kB ( 0%) register scan : 0.08 ( 0%) 0.01 ( 0%) 0.08 ( 0%) 118 kB ( 0%) rebuild jump labels: 0.12 ( 0%) 0.01 ( 0%) 0.11 ( 0%) 0 kB ( 0%) preprocessing : 0.29 ( 0%) 0.43 ( 9%) 0.65 ( 1%) 37409 kB ( 2%) parser (global)