Re: Some beginner's questions concerning doing instrumentation/analysis in GIMPLE

2019-01-14 Thread Richard Biener
On Thu, Jan 10, 2019 at 3:00 PM Carter Cheng  wrote:
>
> Hello,
>
> I am trying to assess an idea and whether it is possible to implement a
> certain idea as a gcc plugin. I look over some of the information on the
> web and the gcc internals documentation but I cannot still figure out some
> basic things concerning manipulating GIMPLE in a plugin.
>
> 1) How does one add and safely remove basic blocks in GIMPLE if one is
> trying to transform a function?
>
> 2) How does one label a basic block with a new label (for conditional
> branches)?
>
> 3) How does one do the same for functions (like in situations when one is
> doing interprocedural analysis and function cloning)?
>
> I apologize if this is in a tutorial somewhere but I could not find it.

Your questions are a bit broad and thus hard to answer in a short e-mail.
I suggest you look at tree-call-cdce.c which creates new control-flow
on GIMPLE.

For more specific questions please shortly elaborate on what you are
intending to do IL-wise.

Richard.

> Regards,
>
> Carter.


Re: ISL tiling question (gcc.dg/graphite/interchange-3.c)

2019-01-14 Thread Richard Biener
On Fri, Jan 11, 2019 at 9:02 PM Steve Ellcey  wrote:
>
> Someone here was asking about GCC, ISL, and tiling and we looked at
> the test gcc.dg/graphite/interchange-3.c on Aarch64.  When this
> test is run the graphite pass output file contains the string 'not
> tiled' and since the dg-final scan-tree-dump is just looking for
> the string 'tiled', it matches and the test passes.
>
> Is this intentional?  It seems like if we wanted to check that it was
> not tiled we sould grep for 'not tiled', not just 'tiled'.  If we
> want grep to see that it is tiled, then the check for tiling happening
> is wrong.

;)

I _think_ the testcases got annotated with "tiled" a lot and really
meant to test whether tiling would be possible (interchange tests
should check for interchange only).

I think this is probably not intentional but unless you want extra
FAILs I suggest to leave this alone...

Richard.

> Steve Ellcey
> sell...@marvell.com
>


Re: Parallelize the compilation using Threads

2019-01-14 Thread Giuliano Belinassi
Hi,

I am currently studying the GIMPLE IR documentation and thinking about a
way easily gather the timing information. I was thinking about about
adding this feature to gcc to show/dump the elapsed time on GIMPLE. Does
this makes sense? Is this already implemented somewhere? Where is a good
way to start it?

Richard Biener: I would like to know What is your nickname in IRC :)

Thank you,
Giuliano.

On 12/17, Richard Biener wrote:
> On Wed, Dec 12, 2018 at 4:46 PM Giuliano Augusto Faulin Belinassi
>  wrote:
> >
> > Hi, I have some news. :-)
> >
> > I replicated the Martin Liška experiment [1] on a 64-cores machine for
> > gcc [2] and Linux kernel [3] (Linux kernel was fully parallelized),
> > and I am excited to dive into this problem. As a result, I want to
> > propose GSoC project on this issue, starting with something like:
> > 1- Systematically create a benchmark for easily information
> > gathering. Martin Liška already made the first version of it, but I
> > need to improve it.
> > 2- Find and document the global states (Try to reduce the gcc's
> > global states as well).
> > 3- Define the parallelization strategy.
> > 4- First parallelization attempt.
> >
> > I also proposed this issue as a research project to my advisor and he
> > supported me on this idea. So I can work for at least one year on
> > this, and other things related to it.
> >
> > Would anyone be willing to mentor me on this?
> 
> As the one who initially suggested the project I'm certainly willing
> to mentor you on this.
> 
> Richard.
> 
> > [1] https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440
> > [2] https://www.ime.usp.br/~belinass/64cores-experiment.svg
> > [3] https://www.ime.usp.br/~belinass/64cores-kernel-experiment.svg
> > On Mon, Nov 19, 2018 at 8:53 AM Richard Biener
> >  wrote:
> > >
> > > On Fri, Nov 16, 2018 at 8:00 PM Giuliano Augusto Faulin Belinassi
> > >  wrote:
> > > >
> > > > Hi! Sorry for the late reply again :P
> > > >
> > > > On Thu, Nov 15, 2018 at 8:29 AM Richard Biener
> > > >  wrote:
> > > > >
> > > > > On Wed, Nov 14, 2018 at 10:47 PM Giuliano Augusto Faulin Belinassi
> > > > >  wrote:
> > > > > >
> > > > > > As a brief introduction, I am a graduate student that got interested
> > > > > >
> > > > > > in the "Parallelize the compilation using threads"(GSoC 2018 [1]). I
> > > > > > am a newcommer in GCC, but already have sent some patches, some of
> > > > > > them have already been accepted [2].
> > > > > >
> > > > > > I brought this subject up in IRC, but maybe here is a proper place 
> > > > > > to
> > > > > > discuss this topic.
> > > > > >
> > > > > > From my point of view, parallelizing GCC itself will only speed up 
> > > > > > the
> > > > > > compilation of projects which have a big file that creates a
> > > > > > bottleneck in the whole project compilation (note: by big, I mean 
> > > > > > the
> > > > > > amount of code to generate).
> > > > >
> > > > > That's true.  During GCC bootstrap there are some of those (see 
> > > > > PR84402).
> > > > >
> > > >
> > > > > One way to improve parallelism is to use link-time optimization where
> > > > > even single source files can be split up into multiple link-time 
> > > > > units.  But
> > > > > then there's the serial whole-program analysis part.
> > > >
> > > > Did you mean this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 ?
> > > > That is a lot of data :-)
> > > >
> > > > It seems that 'phase opt and generate' is the most time-consuming
> > > > part. Is that the 'GIMPLE optimization pipeline' you were talking
> > > > about in this thread:
> > > > https://gcc.gnu.org/ml/gcc/2018-03/msg00202.html
> > >
> > > It's everything that comes after the frontend parsing bits, thus this
> > > includes in particular RTL optimization and early GIMPLE optimizations.
> > >
> > > > > > Additionally, I know that GCC must not
> > > > > > change the project layout, but from the software engineering 
> > > > > > perspective,
> > > > > > this may be a bad smell that indicates that the file should be 
> > > > > > broken
> > > > > > into smaller files. Finally, the Makefiles will take care of the
> > > > > > parallelization task.
> > > > >
> > > > > What do you mean by GCC must not change the project layout?  GCC
> > > > > happily re-orders functions and link-time optimization will reorder
> > > > > TUs (well, linking may as well).
> > > > >
> > > >
> > > > That was a response to a comment made on IRC:
> > > >
> > > > On Thu, Nov 15, 2018 at 9:44 AM Jonathan Wakely  
> > > > wrote:
> > > > >I think this is in response to a comment I made on IRC. Giuliano said
> > > > >that if a project has a very large file that dominates the total build
> > > > >time, the file should be split up into smaller pieces. I said  "GCC
> > > > >can't restructure people's code. it can only try to compile it
> > > > >faster". We weren't referring to code transformations in the compiler
> > > > >like re-ordering functions, but physically refactoring the source
> > > > >code.

Re: Parallelize the compilation using Threads

2019-01-14 Thread Richard Biener
On Mon, Jan 14, 2019 at 12:41 PM Giuliano Belinassi
 wrote:
>
> Hi,
>
> I am currently studying the GIMPLE IR documentation and thinking about a
> way easily gather the timing information. I was thinking about about
> adding this feature to gcc to show/dump the elapsed time on GIMPLE. Does
> this makes sense? Is this already implemented somewhere? Where is a good
> way to start it?

There's -ftime-report which more-or-less tells you the time spent in the
individual passes.  I think there's no overall group to count GIMPLE
optimizers vs. RTL optimizers though.

> Richard Biener: I would like to know What is your nickname in IRC :)

It's richi.

Richard.

> Thank you,
> Giuliano.
>
> On 12/17, Richard Biener wrote:
> > On Wed, Dec 12, 2018 at 4:46 PM Giuliano Augusto Faulin Belinassi
> >  wrote:
> > >
> > > Hi, I have some news. :-)
> > >
> > > I replicated the Martin Liška experiment [1] on a 64-cores machine for
> > > gcc [2] and Linux kernel [3] (Linux kernel was fully parallelized),
> > > and I am excited to dive into this problem. As a result, I want to
> > > propose GSoC project on this issue, starting with something like:
> > > 1- Systematically create a benchmark for easily information
> > > gathering. Martin Liška already made the first version of it, but I
> > > need to improve it.
> > > 2- Find and document the global states (Try to reduce the gcc's
> > > global states as well).
> > > 3- Define the parallelization strategy.
> > > 4- First parallelization attempt.
> > >
> > > I also proposed this issue as a research project to my advisor and he
> > > supported me on this idea. So I can work for at least one year on
> > > this, and other things related to it.
> > >
> > > Would anyone be willing to mentor me on this?
> >
> > As the one who initially suggested the project I'm certainly willing
> > to mentor you on this.
> >
> > Richard.
> >
> > > [1] https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440
> > > [2] https://www.ime.usp.br/~belinass/64cores-experiment.svg
> > > [3] https://www.ime.usp.br/~belinass/64cores-kernel-experiment.svg
> > > On Mon, Nov 19, 2018 at 8:53 AM Richard Biener
> > >  wrote:
> > > >
> > > > On Fri, Nov 16, 2018 at 8:00 PM Giuliano Augusto Faulin Belinassi
> > > >  wrote:
> > > > >
> > > > > Hi! Sorry for the late reply again :P
> > > > >
> > > > > On Thu, Nov 15, 2018 at 8:29 AM Richard Biener
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, Nov 14, 2018 at 10:47 PM Giuliano Augusto Faulin Belinassi
> > > > > >  wrote:
> > > > > > >
> > > > > > > As a brief introduction, I am a graduate student that got 
> > > > > > > interested
> > > > > > >
> > > > > > > in the "Parallelize the compilation using threads"(GSoC 2018 
> > > > > > > [1]). I
> > > > > > > am a newcommer in GCC, but already have sent some patches, some of
> > > > > > > them have already been accepted [2].
> > > > > > >
> > > > > > > I brought this subject up in IRC, but maybe here is a proper 
> > > > > > > place to
> > > > > > > discuss this topic.
> > > > > > >
> > > > > > > From my point of view, parallelizing GCC itself will only speed 
> > > > > > > up the
> > > > > > > compilation of projects which have a big file that creates a
> > > > > > > bottleneck in the whole project compilation (note: by big, I mean 
> > > > > > > the
> > > > > > > amount of code to generate).
> > > > > >
> > > > > > That's true.  During GCC bootstrap there are some of those (see 
> > > > > > PR84402).
> > > > > >
> > > > >
> > > > > > One way to improve parallelism is to use link-time optimization 
> > > > > > where
> > > > > > even single source files can be split up into multiple link-time 
> > > > > > units.  But
> > > > > > then there's the serial whole-program analysis part.
> > > > >
> > > > > Did you mean this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 
> > > > > ?
> > > > > That is a lot of data :-)
> > > > >
> > > > > It seems that 'phase opt and generate' is the most time-consuming
> > > > > part. Is that the 'GIMPLE optimization pipeline' you were talking
> > > > > about in this thread:
> > > > > https://gcc.gnu.org/ml/gcc/2018-03/msg00202.html
> > > >
> > > > It's everything that comes after the frontend parsing bits, thus this
> > > > includes in particular RTL optimization and early GIMPLE optimizations.
> > > >
> > > > > > > Additionally, I know that GCC must not
> > > > > > > change the project layout, but from the software engineering 
> > > > > > > perspective,
> > > > > > > this may be a bad smell that indicates that the file should be 
> > > > > > > broken
> > > > > > > into smaller files. Finally, the Makefiles will take care of the
> > > > > > > parallelization task.
> > > > > >
> > > > > > What do you mean by GCC must not change the project layout?  GCC
> > > > > > happily re-orders functions and link-time optimization will reorder
> > > > > > TUs (well, linking may as well).
> > > > > >
> > > > >
> > > > > That was a response to a comment made on IRC:
> > > > >
> > > > > On Thu,

Question on scheduling of stack_protect_prologue

2019-01-14 Thread Matthew Malcomson
I've found a testcase where the stack protector code generated through
`-fstack-protector-all` doesn't actually protect anything.

#+name stack-reorder.c
#+begin_src c
#include 
#include 
int foo (int a, int b, int c) {
     char buf[64];
     buf[a] = 1;
     buf[b] = c;

     // Just add something so that the assignments above have some
     // observable behaviour.
     int retval = 0;
     for (size_t i = 0; i < 32; i++)
     {
         retval += buf[i];
     }
     return retval;
}
#+end_src

When compiling on aarch64 with
~gcc -fstack-protector-all -g -S stack-reorder.c  -o test.s -O3
--save-temps -fdump-rtl-all~
(with ~gcc (GCC) 9.0.0 20181214 (experimental)~)

We get an RTL dump on the final pass that has the snippet
#+begin_example
(insn 8 21 130 (parallel [
     (set (mem/v/f/c:DI (plus:DI (reg/f:DI 31 sp)
     (const_int 88 [0x58])) [1 D.4227+0 S8 A64])
     (unspec:DI [
     (mem/v/f/c:DI (reg/f:DI 0 x0 [116]) [1
__stack_chk_guard+0 S8 A64])
     ] UNSPEC_SP_SET))
     (set (reg:DI 1 x1 [141])
     (const_int 0 [0]))
     ]) "stack-reorder.c":3:31 1046 {stack_protect_set_di}
  (expr_list:REG_UNUSED (reg:DI 1 x1 [141])
     (nil)))
(note 130 8 117 (var_location b (entry_value:SI (reg:SI 1 x1 [ b ])))
NOTE_INSN_VAR_LOCATION)
(note 117 130 118 stack-reorder.c:4 NOTE_INSN_BEGIN_STMT)
(note 118 117 119 stack-reorder.c:5 NOTE_INSN_BEGIN_STMT)
(note 119 118 120 stack-reorder.c:6 NOTE_INSN_BEGIN_STMT)
(note 120 119 131 stack-reorder.c:10 NOTE_INSN_BEGIN_STMT)
(note 131 120 121 (var_location retval (const_int 0 [0]))
NOTE_INSN_VAR_LOCATION)
(note 121 131 144 stack-reorder.c:11 NOTE_INSN_BEGIN_STMT)
(note 144 121 122 0xb76fd960 NOTE_INSN_BLOCK_BEG)
(note 122 144 132 stack-reorder.c:11 NOTE_INSN_BEGIN_STMT)
(note 132 122 123 (var_location retval (nil)) NOTE_INSN_VAR_LOCATION)
(note 123 132 145 stack-reorder.c:13 NOTE_INSN_BEGIN_STMT)
(note 145 123 75 0xb76fd960 NOTE_INSN_BLOCK_END)
(insn:TI 75 145 133 (parallel [
     (set (reg:DI 1 x1 [137])
     (unspec:DI [
     (mem/v/f/c:DI (plus:DI (reg/f:DI 31 sp)
     (const_int 88 [0x58])) [1 D.4227+0 S8 A64])
     (mem/v/f/c:DI (reg/f:DI 0 x0 [116]) [1
__stack_chk_guard+0 S8 A64])
     ] UNSPEC_SP_TEST))
     (clobber (reg:DI 2 x2 [142]))
     ]) "stack-reorder.c":16:1 1048 {stack_protect_test_di}
  (expr_list:REG_DEAD (reg/f:DI 0 x0 [116])
     (expr_list:REG_UNUSED (reg:DI 2 x2 [142])
     (nil
#+end_example

In this snippet the stack protect set and test patterns are right next to each
other, causing the stack protector to essentially do nothing.
The RTL insns to set the two elements in `buf[]` are left after this snippet.

The stack_protect_set and stack_protect_test patterns are put together in the
sched1 pass (as seen by the change in the RTL between the previous dump and
that one).

I would like to know what is supposed to stop RTL from the stack_protect_set
pattern from being reordered around the code it protects like this?

I recognise this is an unlikely pattern of code and that it doesn't present as
much of a security risk as things like calling memcpy or setting memory through
some sort of loop.
Could the reasoning be that for those patterns likely to cause a security risk
the rescheduling is stopped by jumps/labels/calls?




Replacing DejaGNU

2019-01-14 Thread MCC CS
Hi all,

I've been running the testsuite on my macOS, on which
it is especially unbearable. I want to (at least try to)
rewrite a DejaGNU replacement accepting the same
syntax and having no dependency, should therefore
be faster. I was wondering if there have been any
attempts on this? Knowing what went wrong would
help me. What I'll try to code will be a GCC-specific
multicore test suite runner in C (without using a
Makefile)

Regards


Re: Replacing DejaGNU

2019-01-14 Thread Rainer Orth
"MCC CS"  writes:

> I've been running the testsuite on my macOS, on which
> it is especially unbearable. I want to (at least try to)

that problem may well be macOS specific: since at least macOS 10.13
(maybe even 10.12; cannot currently tell for certain) make -jN check
times on my Mac mini skyrocketed with between 60 and 80% system time.
It seems this is due to lock contention on one specific kernel lock, but
I haven't been able to find out more yet.

There's no such problem on other targets, not even e.g. on Mac OS X 10.7.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Replacing DejaGNU

2019-01-14 Thread Richard Biener
On Mon, Jan 14, 2019 at 2:54 PM Rainer Orth  
wrote:
>
> "MCC CS"  writes:
>
> > I've been running the testsuite on my macOS, on which
> > it is especially unbearable. I want to (at least try to)
>
> that problem may well be macOS specific: since at least macOS 10.13
> (maybe even 10.12; cannot currently tell for certain) make -jN check
> times on my Mac mini skyrocketed with between 60 and 80% system time.
> It seems this is due to lock contention on one specific kernel lock, but
> I haven't been able to find out more yet.
>
> There's no such problem on other targets, not even e.g. on Mac OS X 10.7.

If I would take a guess then it's security checks (verifying signatures
for each process invocation?).  IIRC you can disable this system-wide
somehow (of course that's not recommended).

Richard.

> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Replacing DejaGNU

2019-01-14 Thread MCC CS
Thank you for the quick replies. I was inspired by
https://gcc.gnu.org/ml/gcc-help/2012-04/msg00223.html
but it seems, according to your comments, that was outdated.
The problem on my Mac was each of the processes used no more
than 10% of a core. Now I know that it's not so inefficient
on other platforms, but I might try rewriting it in the future
if I have time, as I believe there's still some room for optimizations.
 
Thanks

Sent: Monday, January 14, 2019 at 4:57 PM
From: "Richard Biener" 
To: "Rainer Orth" 
Cc: "MCC CS" , "GCC Development" 
Subject: Re: Replacing DejaGNU
On Mon, Jan 14, 2019 at 2:54 PM Rainer Orth  
wrote:
>
> "MCC CS"  writes:
>
> > I've been running the testsuite on my macOS, on which
> > it is especially unbearable. I want to (at least try to)
>
> that problem may well be macOS specific: since at least macOS 10.13
> (maybe even 10.12; cannot currently tell for certain) make -jN check
> times on my Mac mini skyrocketed with between 60 and 80% system time.
> It seems this is due to lock contention on one specific kernel lock, but
> I haven't been able to find out more yet.
>
> There's no such problem on other targets, not even e.g. on Mac OS X 10.7.

If I would take a guess then it's security checks (verifying signatures
for each process invocation?). IIRC you can disable this system-wide
somehow (of course that's not recommended).

Richard.

> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Replacing DejaGNU

2019-01-14 Thread Iain Sandoe


> On 14 Jan 2019, at 13:53, Rainer Orth  wrote:
> 
> "MCC CS"  writes:
> 
>> I've been running the testsuite on my macOS, on which
>> it is especially unbearable. I want to (at least try to)
> 
> that problem may well be macOS specific: since at least macOS 10.13
> (maybe even 10.12; cannot currently tell for certain) make -jN check
> times on my Mac mini skyrocketed with between 60 and 80% system time.
> It seems this is due to lock contention on one specific kernel lock, but
> I haven't been able to find out more yet.

this PR mentions the compilation, but it’ even more apparent on test.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84257

* Assuming SIP is disabled.

Some testing suggests that each DYLD_LIBRARY_PATH entry adds around 2ms to each 
exe launch.
So .. when you’re doing something that’s a lot of work per launch, not much is 
seen - but when you’re doing things with a huge number of exe launches - e.g. 
configuring or running the test suite, it bites.

A work-around is to remove the RPATH_ENVAR variable setting in the top level 
Makefile.in (which actually has the same effect as running things with SIP 
enabled)

=== Possible solution (partial hacks locally, not ready for posting)

My current investigations (targeted at GCC 10 time frame, even if they are 
subsequently back-ported) is to replace all use of absolute pathnames in GCC 
libraries with @rpath/xxx and figure out a way to get the compiler to auto-add 
the relevant rpaths to exes (so that a fixed installation of GCC behaves the 
same way as it does currently).

=== DejaGNU on macOS...

DejaGNU / expect are not fantastic on macOS, even given the comments above - 
it’s true.  Writing an interpreter/funnel for the testsuite has crossed my mind 
more than once. 

However, I suspect it’s a large job, and it might be more worth investing any 
available effort in debugging the slowness in expect/dejaGNU - especially the 
lock contention that Rainer mentions.

> There's no such problem on other targets, not even e.g. on Mac OS X 10.7.

indeed.

Iain



Re: Replacing DejaGNU

2019-01-14 Thread Jakub Jelinek
On Mon, Jan 14, 2019 at 03:15:05PM +0100, MCC CS wrote:
> Thank you for the quick replies. I was inspired by
> https://gcc.gnu.org/ml/gcc-help/2012-04/msg00223.html
> but it seems, according to your comments, that was outdated.

Since then the parallelization has been changed, since 2014 all the
instances run the same set of tests and communicate together which one
picks which test, see http://gcc.gnu.org/r215273 .

So, I don't really see what would help you replacing the testsuite
framework, moreover, we have like 400k tests now and many of them
use simpler or more complicated tcl expressions in them, including almost
2.5MB of pure tcl code.  Replacing it with something different and
incompatible is lots of work, especially when all you want is work around a
bug in some broken OS.

Jakub


Re: Replacing DejaGNU

2019-01-14 Thread Paolo Carlini

Hi,

On 14/01/19 15:35, Jakub Jelinek wrote:

On Mon, Jan 14, 2019 at 03:15:05PM +0100, MCC CS wrote:

Thank you for the quick replies. I was inspired by
https://gcc.gnu.org/ml/gcc-help/2012-04/msg00223.html
but it seems, according to your comments, that was outdated.

So, I don't really see what would help you replacing the testsuite
framework, moreover, we have like 400k tests now and many of them
use simpler or more complicated tcl expressions in them, including almost
2.5MB of pure tcl code.  Replacing it with something different and
incompatible is lots of work, especially when all you want is work around a
bug in some broken OS.


I'm not an expert but certainly there are long standing issues with 
DejaGNU, well beyond perfornance, right? I remember Mark Mitchell doing 
some work in this area which, as far as I can remember, had nothing to 
do with performance per se. And, well, some of these issues are obvious 
to explain, like not being able to check for *duplicate* error messages. 
I remember briefly discussing this with Dodji in Manchester.


Just wanted to make sure this kind of public discussion isn't completely 
suppressed.


Paolo.



Re: Replacing DejaGNU

2019-01-14 Thread David Edelsohn
On Mon, Jan 14, 2019 at 9:51 AM Paolo Carlini  wrote:
>
> Hi,
>
> On 14/01/19 15:35, Jakub Jelinek wrote:
> > On Mon, Jan 14, 2019 at 03:15:05PM +0100, MCC CS wrote:
> >> Thank you for the quick replies. I was inspired by
> >> https://gcc.gnu.org/ml/gcc-help/2012-04/msg00223.html
> >> but it seems, according to your comments, that was outdated.
> > So, I don't really see what would help you replacing the testsuite
> > framework, moreover, we have like 400k tests now and many of them
> > use simpler or more complicated tcl expressions in them, including almost
> > 2.5MB of pure tcl code.  Replacing it with something different and
> > incompatible is lots of work, especially when all you want is work around a
> > bug in some broken OS.
>
> I'm not an expert but certainly there are long standing issues with
> DejaGNU, well beyond perfornance, right? I remember Mark Mitchell doing
> some work in this area which, as far as I can remember, had nothing to
> do with performance per se. And, well, some of these issues are obvious
> to explain, like not being able to check for *duplicate* error messages.
> I remember briefly discussing this with Dodji in Manchester.
>
> Just wanted to make sure this kind of public discussion isn't completely
> suppressed.

A few years ago, Rob Savoye mentioned that he had a plan for a replacement.

- David


Re: Replacing DejaGNU

2019-01-14 Thread Jakub Jelinek
On Mon, Jan 14, 2019 at 03:50:32PM +0100, Paolo Carlini wrote:
> On 14/01/19 15:35, Jakub Jelinek wrote:
> > On Mon, Jan 14, 2019 at 03:15:05PM +0100, MCC CS wrote:
> > > Thank you for the quick replies. I was inspired by
> > > https://gcc.gnu.org/ml/gcc-help/2012-04/msg00223.html
> > > but it seems, according to your comments, that was outdated.
> > So, I don't really see what would help you replacing the testsuite
> > framework, moreover, we have like 400k tests now and many of them
> > use simpler or more complicated tcl expressions in them, including almost
> > 2.5MB of pure tcl code.  Replacing it with something different and
> > incompatible is lots of work, especially when all you want is work around a
> > bug in some broken OS.
> 
> I'm not an expert but certainly there are long standing issues with DejaGNU,
> well beyond perfornance, right? I remember Mark Mitchell doing some work in
> this area which, as far as I can remember, had nothing to do with
> performance per se. And, well, some of these issues are obvious to explain,
> like not being able to check for *duplicate* error messages. I remember
> briefly discussing this with Dodji in Manchester.

I think several testcases check for duplicate error messages, the regexp is
against the whole text, so you can just check if it occurs more than once
there.

> Just wanted to make sure this kind of public discussion isn't completely
> suppressed.

I don't want to suppress any discussion, all I wanted to say is that
replacing the testsuite framework is a multi-year project and if it is done,
we'd need to decide on the benefits and disadvantages (even if it has a
superset of the features current framework has, having each developer learn
a new framework is non-trivial cost too).

Jakub


Re: Replacing DejaGNU

2019-01-14 Thread Paolo Carlini

Hi,

On 14/01/19 17:28, Jakub Jelinek wrote:

I think several testcases check for duplicate error messages, the regexp is
against the whole text, so you can just check if it occurs more than once
there.


This is essentially https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30612

Paolo.



Re: Replacing DejaGNU

2019-01-14 Thread Jeff Law
On 1/14/19 9:39 AM, Paolo Carlini wrote:
> Hi,
> 
> On 14/01/19 17:28, Jakub Jelinek wrote:
>> I think several testcases check for duplicate error messages, the
>> regexp is
>> against the whole text, so you can just check if it occurs more than once
>> there.
> 
> This is essentially https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30612
Anyone working in this space should probably look at Ian's blogpost.

https://www.airs.com/blog/archives/499

jeff


Re: Replacing DejaGNU

2019-01-14 Thread Paolo Carlini

Hi Jeff,

On 14/01/19 17:43, Jeff Law wrote:

On 1/14/19 9:39 AM, Paolo Carlini wrote:

Hi,

On 14/01/19 17:28, Jakub Jelinek wrote:

I think several testcases check for duplicate error messages, the
regexp is
against the whole text, so you can just check if it occurs more than once
there.

This is essentially https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30612

Anyone working in this space should probably look at Ian's blogpost.

https://www.airs.com/blog/archives/499


Thanks for the pointer. The fourth line is already rather encouraging ;)

Paolo.



Re: Replacing DejaGNU

2019-01-14 Thread Joseph Myers
On Mon, 14 Jan 2019, Jeff Law wrote:

> On 1/14/19 9:39 AM, Paolo Carlini wrote:
> > Hi,
> > 
> > On 14/01/19 17:28, Jakub Jelinek wrote:
> >> I think several testcases check for duplicate error messages, the
> >> regexp is
> >> against the whole text, so you can just check if it occurs more than once
> >> there.
> > 
> > This is essentially https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30612
> Anyone working in this space should probably look at Ian's blogpost.
> 
> https://www.airs.com/blog/archives/499

Also in the July 2013 discussion (starting at 
 I listed a series of 
other issues with DejaGnu - especially that neither DejaGnu nor the test 
code in GCC should need to know anything at all about how to make pieces 
of an uninstalled toolchain find each other, that should be entirely the 
responsibility of "make install".

To the issues added there I should also add: the DejaGnu board file 
interface is insufficiently expressive.  Specifically, it only supports 
tests exiting with a DejaGnu status rather than an arbitrary exit code, 
which is an issue if you want to use DejaGnu board files with a 
non-DejaGnu testsuite (so that the board interaction code only has to be 
written once) - it's perfectly possible to construct a DejaGnu "testsuite" 
that takes externally provided tests to run on the board (and for that 
matter to compile, if you wish to reuse ldscript settings from the board 
file by compiling through DejaGnu), and so use a board file for a 
non-DejaGnu testsuite, but that testsuite can't get back any extra 
information it might want from test exit statuses.

-- 
Joseph S. Myers
jos...@codesourcery.com


RS6000 emitting sign extention for unsigned type

2019-01-14 Thread kamlesh kumar
Hi devs,
consider below testcase:
$cat test.c
void foo(){
unsigned int x=-1;
double d=x;
}
$./cc1 test.c -msoft-float -m64
$cat test.s

.foo:
.LFB0:
mflr 0
std 0,16(1)
stdu 1,-128(1)
.LCFI0:
li 9,-1
stw 9,112(1)
lwa 9,112(1)
mr 3,9
bl .__floatunsidf
nop
mr 9,3
std 9,120(1)
nop
addi 1,1,128
.LCFI1:
ld 0,16(1)
mtlr 0
blr
.long 0
.byte 0,0,0,1,128,0,0,1

Here, you can see sign extension before calling the __floatunsidf routine.
As per my understanding it should emit zero extension here because
__floatunsidf  has  it argument as unsigned type.

Like to know ,  Reason behind doing  sign extension here , rather than zero
extension.
or if this is a bug?
is there Any work around or hook?
Even you can point me to the right direction in the source? where we need
to do modification?

Thanks
~Kamlesh


Thanks !
Kamlesh


Re: Replacing DejaGNU

2019-01-14 Thread Jim Wilson

On 1/14/19 5:44 AM, MCC CS wrote:

I've been running the testsuite on my macOS, on which
it is especially unbearable. I want to (at least try to)
rewrite a DejaGNU replacement accepting the same
syntax and having no dependency, should therefore
be faster. I was wondering if there have been any
attempts on this?


CodeSourcery wrote one called qmtest, but there apparently hasn't been 
any work done on it in a while.  Joseph Myers indirectly referred to it. 
 You can find a copy here

https://github.com/MentorEmbedded/qmtest

It used to be possible to run the gcc testsuite using qmtest, but I 
don't know the current status.  I do see that there is still a 
qmtest-g++ makefile rule for running the G++ testsuite via qmtest 
though.  You could try that and see if it still works.


There is so much stuff that depends on dejagnu that replacing it will be 
difficult.


Jim


Re: Replacing DejaGNU

2019-01-14 Thread Joseph Myers
On Mon, 14 Jan 2019, Jim Wilson wrote:

> CodeSourcery wrote one called qmtest, but there apparently hasn't been any
> work done on it in a while.  Joseph Myers indirectly referred to it.  You can
> find a copy here
> https://github.com/MentorEmbedded/qmtest

Note that's a poor git-svn conversion, missing most of the history 
(everything before a repository rearrangement, and everything outside of 
QMTest proper - in particular, the separate qmtest_gcc that provided 
various things needed for the GCC testsuite).  If anyone seriously wants 
to do things with QMTest now, I should do a proper reposurgeon conversion 
of the full history (of qmtest and qmtest_gcc, and properly of qmtc from 
the time it was in a separate repository before being integrated into 
qmtest) to replace it.

> It used to be possible to run the gcc testsuite using qmtest, but I don't know
> the current status.  I do see that there is still a qmtest-g++ makefile rule
> for running the G++ testsuite via qmtest though.  You could try that and see
> if it still works.

README.QMTEST was removed in 2011 as the support was long bit-rotten.  I'm 
not sure why the actual bitrotten makefile support wasn't removed as well.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: __has_include__ is problematic

2019-01-14 Thread Nathan Sidwell

On 1/10/19 9:32 AM, Jakub Jelinek wrote:

On Thu, Jan 10, 2019 at 03:20:59PM +0100, Florian Weimer wrote:

Can we remove __has_include__?


No.


Its availability results in code which is needlessly non-portable
because for some reason, people write __has_include__ instead of
__has_include.  (I don't think there is any difference.)


__has_include needs to be a macro, while __has_include__ is a weirdo
builtin that does all the magic.  But one needs to be able to
#ifdef __has_include
etc.


Why not give the wierdo __has_include__ an unspellable name? 
('builtinhasinclude') and take care constructing the 
__has_include macro expansion to have a token with exactly that spelling?


nathan

--
Nathan Sidwell


Re: __has_include__ is problematic

2019-01-14 Thread Florian Weimer
* Jakub Jelinek:

> Because the magic builtin is a preprocessor builtin, kind of macro,
> so you can't have a normal macro with the same name.

Could we turn this kind-of-macro into something that can be tested using
#ifdef?

Thanks,
Florian


Re: __has_include__ is problematic

2019-01-14 Thread Florian Weimer
* Nathan Sidwell:

> On 1/10/19 9:32 AM, Jakub Jelinek wrote:
>> On Thu, Jan 10, 2019 at 03:20:59PM +0100, Florian Weimer wrote:
>>> Can we remove __has_include__?
>>
>> No.
>>
>>> Its availability results in code which is needlessly non-portable
>>> because for some reason, people write __has_include__ instead of
>>> __has_include.  (I don't think there is any difference.)
>>
>> __has_include needs to be a macro, while __has_include__ is a weirdo
>> builtin that does all the magic.  But one needs to be able to
>> #ifdef __has_include
>> etc.
>
> Why not give the wierdo __has_include__ an unspellable name?
> ('builtinhasinclude') and take care constructing the
> __has_include macro expansion to have a token with exactly that
> spelling?

Wouldn't that break -dM rather horribly?

Thanks,
Florian


Failing aarch64 tests (PR 87763), no longer combining instructions with hard registers

2019-01-14 Thread Steve Ellcey
I have a question about PR87763, these are aarch64 specific tests
that are failing after r265398 (combine: Do not combine moves from hard
registers).

These tests are all failing when the assembler scan looks for
specific instructions and these instructions are no longer being
generated.  In some cases the new code is no worse than the old code
(just different) but in most cases the new code is a performance
regression from the old code.

Note that these tests are generally *very* small functions where the
body of the function consists of only 1 to 4 instructions so if we
do not combine instructions involving hard registers there isn't much,
if any, combining that can be done.  In larger functions this probably
would not be an issue and I think those cases are where the incentive
for this patch came from.  So my question is, what do we want to
do about these failures?

Find a GCC patch to generate the better code?  If it isn't done by
combine, how would we do it?  Peephole optimizations?

Modify the tests to pass with the current output?  Which, in my
opinion would make the tests of not much value.

Remove the tests?  Tests that search for specific assembly language
output are rather brittle to begin with and if they are no longer
serving a purpose after the combine patch, maybe we don't need them.

The tests in question are:

gcc.target/aarch64/combine_bfi_1.c
gcc.target/aarch64/insv_1.c
gcc.target/aarch64/lsl_asr_sbfiz.c
gcc.target/aarch64/sve/tls_preserve_1.c
gcc.target/aarch64/tst_5.c
gcc.target/aarch64/tst_6.c
gcc.dg/vect/vect-nop-move.c # Scanning combine dump file, not asm file


Re: Failing aarch64 tests (PR 87763), no longer combining instructions with hard registers

2019-01-14 Thread Segher Boessenkool
Hi!

On Mon, Jan 14, 2019 at 09:53:18PM +, Steve Ellcey wrote:
> I have a question about PR87763, these are aarch64 specific tests
> that are failing after r265398 (combine: Do not combine moves from hard
> registers).
> 
> These tests are all failing when the assembler scan looks for
> specific instructions and these instructions are no longer being
> generated.  In some cases the new code is no worse than the old code
> (just different) but in most cases the new code is a performance
> regression from the old code.
> 
> Note that these tests are generally *very* small functions where the
> body of the function consists of only 1 to 4 instructions so if we
> do not combine instructions involving hard registers there isn't much,
> if any, combining that can be done.

That is why all such hard regs are copied to a new pseudo first now.  The
pseudo can usually be combined in the same way as the hard reg could be
before.  The extra copy is optimised away by register allocation, in those
cases where that is a good choice (and, alas, register allocation sometimes
makes bad decisions).

> In larger functions this probably
> would not be an issue and I think those cases are where the incentive
> for this patch came from.  So my question is, what do we want to
> do about these failures?

Fix them :-)

Some are caused by deficiencies in the target code (or, things that were
not required before, but that now are needed).

Some are shortcomings in register allocation (or elsewhere; but in generic
code).

> Find a GCC patch to generate the better code?  If it isn't done by
> combine, how would we do it?  Peephole optimizations?

Sometimes that is needed, sure.  If your ISA is more orthogonal you need
fewer peepholes, which is great.  But most targets can use a few for good
profit.

> Modify the tests to pass with the current output?  Which, in my
> opinion would make the tests of not much value.

Sometimes the tests _are_ not much value, aren't really testing what they
intended to test.

> Remove the tests?  Tests that search for specific assembly language
> output are rather brittle to begin with and if they are no longer
> serving a purpose after the combine patch, maybe we don't need them.
> 
> The tests in question are:

Ah, not too many, I'll look at them all.  Please correct me where I make
mistakes, I'm no expert on aarch64.

> gcc.target/aarch64/combine_bfi_1.c

f1:
Trying 9, 8 -> 10:
9: r99:SI=r100:SI&0xffff
  REG_DEAD r100:SI
8: r98:SI=r101:SI<<0x8&0x00
  REG_DEAD r101:SI
   10: r96:SI=r98:SI|r99:SI
  REG_DEAD r99:SI
  REG_DEAD r98:SI
Failed to match this instruction:
(set (reg:SI 96)
(ior:SI (and:SI (reg:SI 100)
(const_int -16776961 [0xffff]))
(and:SI (ashift:SI (reg:SI 101)
(const_int 8 [0x8]))
(const_int 16776960 [0x00]

Either you need a pattern to match things like this, or combine (or
simplify-rtx) should write is as an lhs zero_extract.

f2 is similar; f3,f4,f5 are similar and/or the test should allow bfxil as
well as bfi.

> gcc.target/aarch64/insv_1.c

This test tests that various combinations with constant integers generate
good code.  For the first test, bfi1, we get

Trying 2, 7 -> 13:
2: r92:DI=r95:DI
  REG_DEAD r95:DI
7: zero_extract(r92:DI,0x8,0)=r93:DI
  REG_DEAD r93:DI
   13: x0:DI=r92:DI
  REG_DEAD r92:DI
Failed to match this instruction:
(set (reg/i:DI 0 x0)
(ior:DI (and:DI (reg:DI 95)
(const_int -256 [0xff00]))
(reg:DI 93)))
Successfully matched this instruction:
(set (reg/v:DI 92 [ aD.3347 ])
(and:DI (reg:DI 95)
(const_int -256 [0xff00])))
Successfully matched this instruction:
(set (reg/i:DI 0 x0)
(ior:DI (reg/v:DI 92 [ aD.3347 ])
(reg:DI 93)))
allowing combination of insns 2, 7 and 13

which is not what we want.  A peephole2, or a define_split, or a
define_insn_and_split would fix this.

bfi2 is similar.  movk I think the same as well.  set0 and set1 are best
code already IU think.

> gcc.target/aarch64/lsl_asr_sbfiz.c

sbfiz32 (sbfiz64 is fine):
Trying 6 -> 7:
6: r94:SI=r95:SI<<0x1d
  REG_DEAD r95:SI
7: r93:SI=r94:SI>>0xa
  REG_DEAD r94:SI
Failed to match this instruction:
(set (reg:SI 93)
(ashift:SI (subreg:SI (sign_extract:DI (subreg:DI (reg:SI 95) 0)
(const_int 3 [0x3])
(const_int 0 [0])) 0)
(const_int 19 [0x13])))

Say what?  Everything was SI, where does that sign_extract:DI come from?
And why isn't it optimised back to SI?

(Please open a PR just for this one, if it isn't obviously a target thing
that causes it).

> gcc.target/aarch64/sve/tls_preserve_1.c

I get
foo:
.LFB0:
.cfi_startproc
stp x29, x30, [sp, -64]!
.cfi_def_cfa_offset 64
.cfi_offset 29, -64
.cfi_offset 30, -56
mrs x1, tpidr_el0
mov x29, sp
stp q0, q1, [sp, 16]
str