Re: [PATCH][RFC] Add new ipa-reorder pass

2019-12-01 Thread Jan Hubicka
> Hi.
> 
> I'm sending v3 of the patch where I changed:
> - function.cold sections are properly put into .text.unlikely and
>   not into a .text.sorted.XYZ section
> 
> I've just finished measurements and I still have the original speed up
> for tramp3d:
> Total runs: 10, before: 13.92, after: 13.82, cmp: 99.219%

Hi,
I have updated binutils to current head on the Firefox testing patch and
run FDO build with tp first run ordering and call chain clustering.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=1313e6a4d74ebff702afa7594684beb83c01d95f&newProject=try&newRevision=1c2d53b10b042c15edbe7bd26e2740641840&framework=1

It seems there are no differences in performance. The two binaries can
be downloaded at

w/o patch:
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/MK-7DC3FQcevZC_Nvlnq8Q/runs/0/artifacts/public/build/target.tar.bz2
with call chain clustering.
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/UVh6iNILT-qb8sYM5vxVCQ/runs/0/artifacts/public/build/target.tar.bz2

Since Firefox is quite sensitive to code size I would expect to be able
to measure some benefits here.  Any idea what may have go wrong?
I checked that the binaries seems generally sane - out of 58MB text
segment there is 34MB cold section. It is possible that system ld is
used instead of provided one, but that would be weird.  I will try to
find way to double-check that updating binutils really updated them for
GCC.

Honza


Re: [PATCH][RFC] Add new ipa-reorder pass

2019-12-01 Thread Jan Hubicka
> > Hi.
> > 
> > I'm sending v3 of the patch where I changed:
> > - function.cold sections are properly put into .text.unlikely and
> >   not into a .text.sorted.XYZ section
> > 
> > I've just finished measurements and I still have the original speed up
> > for tramp3d:
> > Total runs: 10, before: 13.92, after: 13.82, cmp: 99.219%
> 
> Hi,
> I have updated binutils to current head on the Firefox testing patch and
> run FDO build with tp first run ordering and call chain clustering.
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=1313e6a4d74ebff702afa7594684beb83c01d95f&newProject=try&newRevision=1c2d53b10b042c15edbe7bd26e2740641840&framework=1
> 
> It seems there are no differences in performance. The two binaries can
> be downloaded at
> 
> w/o patch:
> https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/MK-7DC3FQcevZC_Nvlnq8Q/runs/0/artifacts/public/build/target.tar.bz2
> with call chain clustering.
> https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/UVh6iNILT-qb8sYM5vxVCQ/runs/0/artifacts/public/build/target.tar.bz2
> 
> Since Firefox is quite sensitive to code size I would expect to be able
> to measure some benefits here.  Any idea what may have go wrong?
> I checked that the binaries seems generally sane - out of 58MB text
> segment there is 34MB cold section. It is possible that system ld is
> used instead of provided one, but that would be weird.  I will try to
> find way to double-check that updating binutils really updated them for
> GCC.
Linker used is
GNU ld (GNU Binutils) 2.33.50.20191130

Does it have all necessary support in?  Do I need something like
-ffunction-sections?

Honza


Re: Fix crossmodule ipa-inline hint

2019-12-01 Thread Andreas Schwab
On Nov 30 2019, Jan Hubicka wrote:

>   * g++.dg/lto/inline-crossmodule-1.h: New testcase.
>   * g++.dg/lto/inline-crossmodule-1_0.C: New testcase.
>   * g++.dg/lto/inline-crossmodule-1_1.C: New testcase.

ERROR: (DejaGnu) proc "scan-wpa-ipa-times {Inlined ret1} 1 inlined" does not 
exist.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


Re: [PATCH] Use xchg for -Os (PR target/92549)

2019-12-01 Thread Jan Hubicka
> Hi!
> 
> xchg instruction is smaller, in some cases much smaller than 3 moves,
> (e.g. in the testcase 2 bytes vs. 8 bytes), and is not a performance
> disaster, but from Agner Fog tables and
> https://stackoverflow.com/questions/45766444/why-is-xchg-reg-reg-a-3-micro-op-instruction-on-modern-intel-architectures
> it doesn't seem to be something we'd want to use when optimizing for speed,
> at least not on Intel.
> 
> While we have *swap patterns, those are very unlikely to be triggered
> during combine, usually we have different pseudos in there and the actual
> need for swapping is only materialized during RA.
> 
> The following patch does it when optimizing the insn for size only.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2019-11-19  Jakub Jelinek  
> 
>   PR target/92549
>   * config/i386/i386.md (peephole2 for *swap): New peephole2.
> 
>   * gcc.target/i386/pr92549.c: New test.

It is very hard to get a testcase, unforutnately, but I got the
following (locally non-reproducible) failure while building firefox with
LTO+FDO:

[task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -   1080 | }
[task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -|
[task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -  (insn  1389 1390 41 
(parallel [
[task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -  (set (reg:SI 
24 xmm4 [orig:187 SR.3778 ] [187])
[task 2019-12-01T14:38:04.166Z] 14:38:04 INFO - (reg:SI 23 xmm3 [orig:104 
SR.3780 ] [104]))
[task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -  (set (reg:SI 
23 xmm3 [orig:104 SR.3780 ] [104])
[task 2019-12-01T14:38:04.166Z] 14:38:04 INFO - (reg:SI 24 xmm4 [orig:187 
SR.3778 ] [187]))
[task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -  ])
"/builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/WritingModes.h":1117:0
78 {*swapsi}
[task 2019-12-01T14:38:04.167Z] 14:38:04 INFO -   (nil))
[task 2019-12-01T14:38:04.167Z] 14:38:04 INFO -  during RTL pass: rnreg

I guess the problem is that there is no xchange in SSE instruction set,
so peephle needs to be more restrictive?

Honza


Re: Fix crossmodule ipa-inline hint

2019-12-01 Thread Jan Hubicka
> On Nov 30 2019, Jan Hubicka wrote:
> 
> > * g++.dg/lto/inline-crossmodule-1.h: New testcase.
> > * g++.dg/lto/inline-crossmodule-1_0.C: New testcase.
> > * g++.dg/lto/inline-crossmodule-1_1.C: New testcase.
> 
> ERROR: (DejaGnu) proc "scan-wpa-ipa-times {Inlined ret1} 1 inlined" does not 
> exist.

Uhh, should be scan-wpa-ipa-dump-times, I will test and commit the
obvious patch.

Thanks,
Honza
> 
> Andreas.
> 
> -- 
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."


Fix profile_count comparsions with gcov_type

2019-12-01 Thread Jan Hubicka
Hi,
this patch fixes one extra (and I hope last) place where we mix up
global and local porfile.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* profile-count.h (profile_count::operator<): Use IPA value for
comparsion.
(profile_count::operator>): Likewise.
(profile_count::operator<=): Likewise.
(profile_count::operator>=): Likewise.
* predict.c (maybe_hot_count_p): Do not convert to gcov_type.
Index: profile-count.h
===
--- profile-count.h (revision 278877)
+++ profile-count.h (working copy)
@@ -930,14 +930,14 @@ public:
 {
   gcc_checking_assert (ipa_p ());
   gcc_checking_assert (other >= 0);
-  return initialized_p () && m_val < (uint64_t) other;
+  return ipa ().initialized_p () && ipa ().m_val < (uint64_t) other;
 }
 
   bool operator> (const gcov_type other) const
 {
   gcc_checking_assert (ipa_p ());
   gcc_checking_assert (other >= 0);
-  return initialized_p () && m_val > (uint64_t) other;
+  return ipa ().initialized_p () && ipa ().m_val > (uint64_t) other;
 }
 
   bool operator<= (const profile_count &other) const
@@ -968,14 +968,14 @@ public:
 {
   gcc_checking_assert (ipa_p ());
   gcc_checking_assert (other >= 0);
-  return initialized_p () && m_val <= (uint64_t) other;
+  return ipa ().initialized_p () && ipa ().m_val <= (uint64_t) other;
 }
 
   bool operator>= (const gcov_type other) const
 {
   gcc_checking_assert (ipa_p ());
   gcc_checking_assert (other >= 0);
-  return initialized_p () && m_val >= (uint64_t) other;
+  return ipa ().initialized_p () && ipa ().m_val >= (uint64_t) other;
 }
 
   /* Return true when value is not zero and can be used for scaling. 
Index: predict.c
===
--- predict.c   (revision 278877)
+++ predict.c   (working copy)
@@ -184,7 +184,7 @@ maybe_hot_count_p (struct function *fun,
   /* Code executed at most once is not hot.  */
   if (count <= MAX (profile_info ? profile_info->runs : 1, 1))
 return false;
-  return (count.to_gcov_type () >= get_hot_bb_threshold ());
+  return (count >= get_hot_bb_threshold ());
 }
 
 /* Return true if basic block BB of function FUN can be CPU intensive


Fix jump function update in inliner

2019-12-01 Thread Jan Hubicka
Hi,
this patch fixes the problem with clearing bits and m_vr in inliner
update we discussed earlier.  I am not sure if
dete_type_change_from_memory_writes needs the clear?

Bootstrapped/regtested x86_64-linux, Martin, does it make sense?

Honza

* ipa-prop.c (ipa_set_jf_unknown): Do not clear bits and m_vr.
(detect_type_change_from_memory_writes): Clear it here.
Index: ipa-prop.c
===
--- ipa-prop.c  (revision 278877)
+++ ipa-prop.c  (working copy)
@@ -512,8 +512,6 @@ static void
 ipa_set_jf_unknown (struct ipa_jump_func *jfunc)
 {
   jfunc->type = IPA_JF_UNKNOWN;
-  jfunc->bits = NULL;
-  jfunc->m_vr = NULL;
 }
 
 /* Set JFUNC to be a copy of another jmp (to be used by jump function
@@ -819,6 +817,8 @@ detect_type_change_from_memory_writes (i
 return false;
 
   ipa_set_jf_unknown (jfunc);
+  jfunc->bits = NULL;
+  jfunc->m_vr = NULL;
   return true;
 }
 


Re: [PATCH] Use xchg for -Os (PR target/92549)

2019-12-01 Thread Jakub Jelinek
On Sun, Dec 01, 2019 at  03:43:37PM +0100, Jan Hubicka wrote:
> > PR target/92549
> > * config/i386/i386.md (peephole2 for *swap): New peephole2.
> > 
> > * gcc.target/i386/pr92549.c: New test.
> 
> It is very hard to get a testcase, unforutnately, but I got the
> following (locally non-reproducible) failure while building firefox with
> LTO+FDO:
> 
> [task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -   1080 | }
> [task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -|
> [task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -  (insn  1389 1390 41 
> (parallel [
> [task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -  (set (reg:SI 
> 24 xmm4 [orig:187 SR.3778 ] [187])
> [task 2019-12-01T14:38:04.166Z] 14:38:04 INFO - (reg:SI 23 xmm3 [orig:104 
> SR.3780 ] [104]))
> [task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -  (set (reg:SI 
> 23 xmm3 [orig:104 SR.3780 ] [104])
> [task 2019-12-01T14:38:04.166Z] 14:38:04 INFO - (reg:SI 24 xmm4 [orig:187 
> SR.3778 ] [187]))
> [task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -  ])
> "/builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/WritingModes.h":1117:0
> 78 {*swapsi}
> [task 2019-12-01T14:38:04.167Z] 14:38:04 INFO -   (nil))
> [task 2019-12-01T14:38:04.167Z] 14:38:04 INFO -  during RTL pass: rnreg
> 
> I guess the problem is that there is no xchange in SSE instruction set,
> so peephle needs to be more restrictive?

I'll have a look tomorrow.

Jakub



Re: [PATCH][RFC] Add new ipa-reorder pass

2019-12-01 Thread Jan Hubicka
Hi,
I was playing with it a bit more and built with
-fno-profile-reorder-functions.  

Here is -fno-profile-reorder-functions compared to first run 
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3d537be0cb37458e7928f69a37efb2a6d6b85eae&newProject=try&newRevision=4543abfa08870391544b56d16dfcd530dac0dc30&framework=1
2.2% improvement on the page rendering is off noise but I would hope for
bit more.

Here is -fno-profile-reorder-functions compared to clustering
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3d537be0cb37458e7928f69a37efb2a6d6b85eae&newProject=try&newRevision=1c2d53b10b042c15edbe7bd26e2740641840&framework=1
I am not sure if it is a noise.

Either Fireox's talos is not good benchmark for code layout (which I
would be surprised since it is quite sensitive to code size issues) or
there are some problems.

In general I think the patch is useful and mostly mainline ready except
for detailes but it would be good to have some more evidence that it
works as expected on large binaries besides tramp3d build time.  There
are number of ways where things may go wrong ranging from misupdated
profiles, problems with function splitting, comdats and other issues.

Was you able to benchmark some other benefits?  I remember we discussed
collecting traces from valgrind, perhaps we could test that they are
looking good?

Honza


Re: [PATCH] Support multi-versioning on self-recursive function (ipa/92133)

2019-12-01 Thread Jeff Law
On 11/26/19 6:44 PM, Feng Xue OS wrote:
> Hi, Richard,
> 
>   This patch is a not bugfix, while it is small. Martin and Honza are fine 
> with it.
> But now we are in stage 3, is it ok to commit?
Yes.  It was posted well in advance of the stage1->stage3 change.
Please commit it ASAP so the testers can pick it up.

Thanks,
jeff



Re: [C++ PATCH] c++/91353 - P1331R2: Allow trivial default init in constexpr contexts.

2019-12-01 Thread Marek Polacek
On Thu, Nov 28, 2019 at 11:29:20PM -0500, Jason Merrill wrote:
> Sounds like reduced_constant_expression_p needs to deal better with empty
> bases.

This got a bit complicated because it also needs to handle unions and
now we also need to heed vptr.  But the following seems to work.

(I'll skip the story about a bogus error I hit when building cmcstl2 and
the need to debug a ~90,000 LOC test, because creduce got stuck reducing
it.)

Bootstrapped/regtested on x86_64-linux, built Boost and cmcstl2.

Ok?

2019-12-01  Marek Polacek  
Jakub Jelinek  

PR c++/91353 - P1331R2: Allow trivial default init in constexpr 
contexts.
* c-cppbuiltin.c (c_cpp_builtins): Adjust the value of __cpp_constexpr.

* class.c (trivial_default_constructor_is_constexpr): Return true in
C++20.
* constexpr.c (cx_check_missing_mem_inits): Allow missing field
initializers in C++20.
(cxx_eval_call_expression): Don't clear CONSTRUCTOR_NO_CLEARING for
constexpr constructors in C++20.
(reduced_constant_expression_p): Don't set FIELD for union and array
types.  Adjust calls to next_initializable_field.
* cp-tree.h (next_initializable_field): Adjust declaration.
* decl.c (check_for_uninitialized_const_var): Permit trivial default
initialization in constexpr.
(next_initializable_field): Add a bool parameter.  Use it.
* method.c (walk_field_subobs): Still consider a constructor that
doesn't initialize all the members constexpr.

* g++.dg/cpp0x/constexpr-array6.C: Adjust dg-error.
* g++.dg/cpp0x/constexpr-ctor.C: Likewise.
* g++.dg/cpp0x/constexpr-diag3.C: Likewise.
* g++.dg/cpp0x/constexpr-diag4.C: Likewise.
* g++.dg/cpp0x/constexpr-ex3.C: Likewise.
* g++.dg/cpp0x/constexpr-template2.C: Likewise.
* g++.dg/cpp0x/constexpr-union2.C: Likewise.
* g++.dg/cpp0x/lambda/lambda-mangle.C: Rip out a piece of code ...
* g++.dg/cpp0x/lambda/lambda-mangle6.C: ... and put it here.
* g++.dg/cpp0x/pr79118.C: Adjust dg-error.
* g++.dg/cpp1y/constexpr-83921-3.C: Likewise.
* g++.dg/cpp1y/constexpr-neg1.C: Likewise.
* g++.dg/cpp1z/constexpr-lambda12.C: Likewise.
* g++.dg/cpp1z/feat-cxx1z.C: Use -std=c++17.
* g++.dg/cpp2a/constexpr-init1.C: New test.
* g++.dg/cpp2a/constexpr-init2.C: New test.
* g++.dg/cpp2a/constexpr-init3.C: New test.
* g++.dg/cpp2a/constexpr-init4.C: New test.
* g++.dg/cpp2a/constexpr-init5.C: New test.
* g++.dg/cpp2a/constexpr-init6.C: New test.
* g++.dg/cpp2a/constexpr-init7.C: New test.
* g++.dg/cpp2a/constexpr-init8.C: New test.
* g++.dg/cpp2a/constexpr-init9.C: New test.
* g++.dg/cpp2a/constexpr-init10.C: New test.
* g++.dg/cpp2a/constexpr-init11.C: New test.
* g++.dg/cpp2a/constexpr-init12.C: New test.
* g++.dg/cpp2a/constexpr-try5.C: Adjust dg-error.
* g++.dg/cpp2a/feat-cxx2a.C: Test __cpp_constexpr.
* g++.dg/cpp2a/lambda-mangle.C: New test.
* g++.dg/debug/dwarf2/pr44641.C: Skip for c++2a.
* g++.dg/ext/stmtexpr21.C: Adjust dg-error.

diff --git gcc/c-family/c-cppbuiltin.c gcc/c-family/c-cppbuiltin.c
index 6491545bc3b..680087cd254 100644
--- gcc/c-family/c-cppbuiltin.c
+++ gcc/c-family/c-cppbuiltin.c
@@ -975,7 +975,8 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_fold_expressions=201603L");
  cpp_define (pfile, "__cpp_nontype_template_args=201411L");
  cpp_define (pfile, "__cpp_range_based_for=201603L");
- cpp_define (pfile, "__cpp_constexpr=201603L");
+ if (cxx_dialect <= cxx17)
+   cpp_define (pfile, "__cpp_constexpr=201603L");
  cpp_define (pfile, "__cpp_if_constexpr=201606L");
  cpp_define (pfile, "__cpp_capture_star_this=201603L");
  cpp_define (pfile, "__cpp_inline_variables=201606L");
@@ -997,6 +998,7 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_init_captures=201803L");
  cpp_define (pfile, "__cpp_generic_lambdas=201707L");
  cpp_define (pfile, "__cpp_designated_initializers=201707L");
+ cpp_define (pfile, "__cpp_constexpr=201907L");
  cpp_define (pfile, "__cpp_constexpr_in_decltype=201711L");
  cpp_define (pfile, "__cpp_conditional_explicit=201806L");
  cpp_define (pfile, "__cpp_consteval=201811L");
diff --git gcc/cp/class.c gcc/cp/class.c
index f36f75fa0db..d8bb44990b7 100644
--- gcc/cp/class.c
+++ gcc/cp/class.c
@@ -5288,8 +5288,14 @@ trivial_default_constructor_is_constexpr (tree t)
   /* A defaulted trivial default constructor is constexpr
  if there is nothing to initialize.  */
   gcc_assert (!TYPE_HAS_COMPLEX_DFLT (t));
-  /* A class with a vptr doesn't have a trivial default ctor.  */
-  return is_really_empty_class (t, /*ignore_vptr*/true);
+  /* A class with a vptr doesn't 

Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-12-01 Thread Eric Gallager
On 11/20/19, Richard Biener  wrote:
> On Tue, Nov 19, 2019 at 11:02 PM David Malcolm  wrote:
>>
>> > > The checker is implemented as a GCC plugin.
>> > >
>> > > The patch kit adds support for "in-tree" plugins i.e. GCC plugins
>> > > that
>> > > would live in the GCC source tree and be shipped as part of the GCC
>> > > tarball,
>> > > with a new:
>> > >   --enable-plugins=[LIST OF PLUGIN NAMES]
>> > > configure option, analogous to --enable-languages (the
>> > > Makefile/configure
>> > > machinery for handling in-tree GCC plugins is adapted from how we
>> > > support
>> > > frontends).
>> >
>> > I like that.  Implementing this as a plugin surely must help to
>> > either
>> > document the GCC plugin interface as powerful/mature for such a
>> > task.  Or
>> > make it so, if it isn't yet.  ;-)
>>
>> Our plugin "interface" as such is very broad.
>
> Just to sneak in here I don't like exposing our current plugin "non-API"
> more.  In fact I'd just build the analyzer into GCC with maybe an
> option to disable its build (in case it is very fat?).
>
> From what I read it seems the analyzer could do with a proper
> plugin API that just exposes introspection - and I really hope
> somebody finds the time to complete (or rewrite...) the
> proposed introspection API that ideally is even cross-compiler
> (proven by implementing said API ontop of both GCC and clang/llvm).
> That way the Analyzer would work with both GCC and clang [and golang
> and rustc...].

That might be a good idea for a long-term goal, but I just hope it
doesn't get in the way too much of the analyzer getting into GCC in
the short-term. The analyzer seems like it could do some really cool
analysis, and I'd like to use it sooner rather than later. Rewriting
the plugin API sounds like it could take a really long time...

>
> So it would be interesting if you could try to sketch the kind of API
> the Analyzer needs?  That is, merely the detail on which it inspects
> statements, the CFG and the callgraph.
>
> Richard.
>


Re: [PATCH 00/49] RFC: Add a static analysis framework to GCC

2019-12-01 Thread Eric Gallager
On 11/16/19, Thomas Schwinge  wrote:
> Hi David!
>
> On 2019-11-15T20:22:47-0500, David Malcolm  wrote:
>> This patch kit
>
> (I have not looked at the patches.)  ;-)
>
>> introduces a static analysis pass for GCC that can diagnose
>> various kinds of problems in C code at compile-time (e.g. double-free,
>> use-after-free, etc).
>
> Sounds great from the description!
>
>
> Would it make sense to add to the wiki page
>  a (high-level)
> comparison to other static analyzers (Coverity, cppcheck,
> clang-static-analyzer, others?), in terms of how they work, what their
> respective benefits are, what their design goals are, etc.  (Of course
> understanding that yours is much less mature at this point; talking about
> high-level design rather than current implementation status.)
>
> For example, why do we want that in GCC instead of an external tool -- in
> part covered in your Rationale.

There are a lot of bug reports open for requests for warnings that
this analyzer could solve. Users clearly want this in GCC, or else
they wouldn't keep making these requests.

> Can a compiler-side implementation
> benefit from having more information available than an external tool?
> GCC-side implementation is readily available (modulo GCC plugin
> installation?) vs. external ones need to be installed/set up first.
> GCC-side one only works with GCC-supported languages.  GCC-side one
> analyzes actual code being compiled -- thinking about preprocessor-level
> '#if' etc., which surely are problematic for external tools that are not
> actually replicating a real build.  And so on.  (If you don't want to
> spell out Coverity, cppcheck, clang-static-analyzer, etc., maybe just
> compare yours to external tools.)
>
> Just an idea, because I wondered about these things.
>
>
>> The analyzer runs as an IPA pass on the gimple SSA representation.
>> It associates state machines with data, with transitions at certain
>> statements and edges.  It finds "interesting" interprocedural paths
>> through the user's code, in which bogus state transitions happen.
>>
>> For example, given:
>>
>>free (ptr);
>>free (ptr);
>>
>> at the first call, "ptr" transitions to the "freed" state, and
>> at the second call the analyzer complains, since "ptr" is already in
>> the "freed" state (unless "ptr" is NULL, in which case it stays in
>> the NULL state for both calls).
>>
>> Specific state machines include:
>> - a checker for malloc/free, for detecting double-free, resource leaks,
>>   use-after-free, etc (sm-malloc.cc), and
>
> I can immediately see how this can be useful for a bunch of
> 'malloc'/'free'-like etc. OpenACC Runtime Library calls as well as source
> code directives.  ..., and this would've flagged existing code in the
> libgomp OpenACC tests, which recently has given me some grief. Short
> summary/examples:
>
> In addition to host-side 'malloc'/'free', there is device-side (separate
> memory space) 'acc_malloc'/'acc_free'.  Static checking example: don't
> mix up host-side and device-side pointers.  (Both are normal C/C++
> pointers.  Hmm, maybe such checking could easily be implemented even
> outside of your checker by annotating the respective function
> declarations with an attribute describing which in/out arguments are
> host-side vs. device-side pointers.)
>
> Then, there are functions to "map" host-side and device-side memory:
> 'acc_map_data'/'acc_unmap_data'.  Static checking example: you must not
> 'acc_free' memory spaces that are still mapped.
>
> Then, there are functions like 'acc_create' (or equivalent directives
> like '#pragma acc create') doing both 'acc_malloc', 'acc_map_data'
> (plus/depending on internal reference counting).  Static checking
> example: for a pointer returned by 'acc_create" etc., you must use
> 'acc_delete' etc. instead of 'acc_free', which first does
> 'acc_unmap_data' before interal 'acc_free' (..., and again all that
> depending on reference counting).  (Might be "interesting" to teach your
> checker about the reference counting -- if that is actually necessary;
> needs further thought.)
>
>
>> The checker is implemented as a GCC plugin.
>>
>> The patch kit adds support for "in-tree" plugins i.e. GCC plugins that
>> would live in the GCC source tree and be shipped as part of the GCC
>> tarball,
>> with a new:
>>   --enable-plugins=[LIST OF PLUGIN NAMES]
>> configure option, analogous to --enable-languages (the Makefile/configure
>> machinery for handling in-tree GCC plugins is adapted from how we support
>> frontends).
>
> I like that.  Implementing this as a plugin surely must help to either
> document the GCC plugin interface as powerful/mature for such a task.  Or
> make it so, if it isn't yet.  ;-)

Nick Clifton was bringing this up as a point in his talk on his
annobin plugin at Cauldron; this should make him happy.

>
>> The default is for no such plugins to be enabled, so the default would
>> be that the checker isn't built - you'

[PATCH v2, committed] Fix bugs relating to flexibly-sized objects in nios2 backend.

2019-12-01 Thread Sandra Loosemore
I've checked in this patch to fix PR target/92499 in a more restricted 
way than my initial patch, without breaking ABI compatibility.  This 
version of the patch is for the nios2 backend only although (as I noted 
in the issue) I'm pretty sure other backends have similar problems.


I've fixed the immediate problem reported by the GLIBC project by 
replacing ASM_OUTPUT_ALIGNED_LOCAL with ASM_OUTPUT_ALIGNED_DECL_LOCAL so 
that it can use the targetm.in_small_data_p hook instead of looking at 
the allocated size of the object.  That inconsistency was what was 
causing it to use gp-relative addressing on a flexibly-sized local 
object that it was placing in .bss instead of .sbss.  I've also tweaked 
nios2_in_small_data_p to avoid treating objects with flexible array 
members in small data except where it's necessary to maintain ABI 
compatibility, and future-proofed it against a potential ABI change by 
not using gp-relative addressing for extern references to these types. 
Note that the nios2 backend doesn't default to generating gp-relative 
addresses for external objects of any type (you have to ask for that 
with -mgpopt=global) so probably there is hardly any code that would be 
affected if we did change the ABI down the road.


I plan to backport this patch to the GCC 9 branch but probably not any 
older branches.


-Sandra
2019-12-01  Sandra Loosemore  

	Fix bugs relating to flexibly-sized objects in nios2 backend.

	PR target/92499

	gcc/c/
	* c-decl.c (flexible_array_type_p): Move to common code.

	gcc/
	* config/nios2/nios2.c (nios2_in_small_data_p): Do not consider
	objects of flexible types to be small if they have internal linkage
	or are declared extern.
	* config/nios2/nios2.h (ASM_OUTPUT_ALIGNED_LOCAL): Replace with...
	(ASM_OUTPUT_ALIGNED_DECL_LOCAL): ...this.  Use targetm.in_small_data_p
	instead of the size of the object initializer.
	* tree.c (flexible_array_type_p): Move from C front end, and
	generalize to handle fields in non-C structures.
	* tree.h (flexible_array_type_p): Declare.

	gcc/testsuite/
	* gcc.target/nios2/pr92499-1.c: New.
	* gcc.target/nios2/pr92499-2.c: New.
	* gcc.target/nios2/pr92499-3.c: New.
Index: gcc/c/c-decl.c
===
--- gcc/c/c-decl.c	(revision 278890)
+++ gcc/c/c-decl.c	(working copy)
@@ -5709,39 +5709,6 @@ check_compound_literal_type (location_t
 		"defining a type in a compound literal is invalid in C++");
 }
 
-/* Determine whether TYPE is a structure with a flexible array member,
-   or a union containing such a structure (possibly recursively).  */
-
-static bool
-flexible_array_type_p (tree type)
-{
-  tree x;
-  switch (TREE_CODE (type))
-{
-case RECORD_TYPE:
-  x = TYPE_FIELDS (type);
-  if (x == NULL_TREE)
-	return false;
-  while (DECL_CHAIN (x) != NULL_TREE)
-	x = DECL_CHAIN (x);
-  if (TREE_CODE (TREE_TYPE (x)) == ARRAY_TYPE
-	  && TYPE_SIZE (TREE_TYPE (x)) == NULL_TREE
-	  && TYPE_DOMAIN (TREE_TYPE (x)) != NULL_TREE
-	  && TYPE_MAX_VALUE (TYPE_DOMAIN (TREE_TYPE (x))) == NULL_TREE)
-	return true;
-  return false;
-case UNION_TYPE:
-  for (x = TYPE_FIELDS (type); x != NULL_TREE; x = DECL_CHAIN (x))
-	{
-	  if (flexible_array_type_p (TREE_TYPE (x)))
-	return true;
-	}
-  return false;
-default:
-return false;
-  }
-}
-
 /* Performs sanity checks on the TYPE and WIDTH of the bit-field NAME,
replacing with appropriate values if they are invalid.  */
 
Index: gcc/config/nios2/nios2.c
===
--- gcc/config/nios2/nios2.c	(revision 278890)
+++ gcc/config/nios2/nios2.c	(working copy)
@@ -2373,6 +2373,22 @@ nios2_in_small_data_p (const_tree exp)
 	  if (nios2_small_section_name_p (section))
 	return true;
 	}
+  else if (flexible_array_type_p (TREE_TYPE (exp))
+	   && (!TREE_PUBLIC (exp) || DECL_EXTERNAL (exp)))
+	{
+	  /* We really should not consider any objects of any flexibly-sized
+	 type to be small data, but pre-GCC 10 did not test
+	 for this and just fell through to the next case.  Thus older
+	 code compiled with -mgpopt=global could contain GP-relative
+	 accesses to objects defined in this compilation unit with
+	 external linkage.  We retain the possible small-data treatment
+	 of such definitions for backward ABI compatibility, but
+	 no longer generate GP-relative accesses for external
+	 references (so that the ABI could be changed in the future
+	 with less potential impact), or objects with internal
+	 linkage.  */
+	  return false;
+	}
   else
 	{
 	  HOST_WIDE_INT size = int_size_in_bytes (TREE_TYPE (exp));
Index: gcc/config/nios2/nios2.h
===
--- gcc/config/nios2/nios2.h	(revision 278890)
+++ gcc/config/nios2/nios2.h	(working copy)
@@ -467,10 +467,10 @@ while (0)
the linker seems to want the alignment of data objects
to depend on their 

Re: [PATCH, GCC/testsuite] Fix dump-noaddr dumpbase

2019-12-01 Thread Andrew Pinski
On Tue, Dec 5, 2017 at 11:27 AM Mike Stump  wrote:
>
> On Dec 5, 2017, at 11:11 AM, Thomas Preudhomme 
>  wrote:
> >
> > On 05/12/17 17:54, Andrew Pinski wrote:
> >> On Tue, Dec 5, 2017 at 9:50 AM, Thomas Preudhomme
> >>  wrote:
> >>> Hi,
> >>>
> >>> dump-noaddr test FAILS when $tmpdir is not the same as the directory
> >>> where runtest is called from. Note that this does not happen when
> >>> running make check because tmpdir is set to srcdir.
> >>>
> >>> In that case, file mkdir will create the directory in the current
> >>> directory while GCC is invoked from tmpdir and hence -dumpbase look
> >>> for dump1 and dump2 relative to tmpdir. This patch forces dumpbase to
> >>> be relative to tmpdir which will work in all case.
> >>>
> >>> ChangeLog entry is as follows:
> >>>
> >>> *** gcc/testsuite/ChangeLog ***
> >>>
> >>> 2017-12-05  Thomas Preud'homme  
> >>>
> >>> * gcc.c-torture/unsorted/dump-noaddr.x (dump_compare): Set dump 
> >>> base
> >>> relative to tmpdir.
> >>>
> >>> Testing: Successfully ran unsorted.exp via make check and out of tree
> >>> testing using runtest from /test with tmpdir set in
> >>> /test/site.exp to .
> >>>
> >>> Is this ok for stage3?
> >> https://gcc.gnu.org/ml/gcc-patches/2012-06/msg01752.html
> >> I don't remember where this discussion went last time.
> >> Maybe this time there will be a resolution :).
> >
> > FWIW, I agree with Matt, creating the dump in tmpdir makes more sense. I 
> > think his patch can be simplified though because the compiler seems to be 
> > invoked from tmpdir so it can at least be omitted from the -dumpbase.
>
> Sounds reasonable.  I've added that on top of his patch and checked that in.  
> Let us know if it works or not.

I finally got around to testing this myself and I had to revert the
second patch of the set which removed $tmpdir from dumpbase option.
Maybe it is the way I run the testsuite out of tree; I have been
running this way since at least 2010 and the code to do the running
existed before I joined Cavium/Marvell.

This is how I run the testsuite (part of the Makefile):
gcc.sum g++.sum gdb.sum:
rm -f *.gcda
mkdir -p $(basename $@)
-$(RUNTEST) --tool $(basename $@) TMPDIR=`pwd`/$(basename $@)
$(DGFLAGS) \
  --srcdir $(SRC)/$(subst g++,gcc,$(basename $@))/testsuite \
  HOSTCC=gcc HOSTCFLAGS=

I don't "cd TMPDIR" before invoking runtest though so I wonder being
in the TMPDIR is always true because of that.

Thanks,
Andrew Pinski



Thanks,
Andrew Pinski


Re: [PATCH 0/2] Make C front end share the C++ tree representation of loops and switches

2019-12-01 Thread Sandra Loosemore

Ping!

On 11/13/19 9:27 AM, Sandra Loosemore wrote:

This patch series lays some groundwork for the project to redo the
OpenACC "kernels" region support in GCC, described in Thomas
Schwinge's recent talk at the GNU Cauldron:

https://gcc.gnu.org/wiki/cauldron2019talks?action=AttachFile&do=view&target=OpenACC+kernels-cauldron2019.pdf

Briefly, the larger goal is to make the compiler recognize "for" loops
that are candidates for parallelization, and treat them as if they
were annotated with "#pragma acc loop auto".  This is fairly
straightforward to do on the output of the Fortran and C++ front ends,
which both produce a high-level parse tree representation of the loop.
OTOH, the C front end currently lowers loops into a goto form very
early, making it hard both to recognize "for" loops and to
differentiate between (valid) structured and (invalid) unstructured
control flow in the body of the loop.

So, the immediate goal of this patch sequence is to make the C front
end produce output using the same high-level tree representation as
the C++ front end already emits: specifically, to produce FOR_STMT,
WHILE_STMT, and DO_STMT for loops, BREAK_STMT and CONTINUE_STMT
instead of gotos, and also SWITCH_STMT instead of SWITCH_EXPR since
that's mixed up in the handling for "break".  Besides this being
helpful to OpenACC implementation, there's also some argument to be
made that sharing more code between the respective C family front ends
is good from an engineering perspective, etc.

Part 1 of the patch set moves the definitions and support
(genericization, pretty-printers, dumpers, etc) for those tree nodes
out of the cp/ front end and into the common c-family/ code.  Part 2
hacks up the C front end to emit the now-shared data structures.

I bootstrapped and regression-tested this on x86_64-linux-gnu.  There
are a few regressions involving these tests:

gcc.dg/tree-ssa/pr77445-2.c
gcc.dg/tree-ssa/ssa-dce-3.c
gcc.dg/tree-ssa/ssa-dom-thread-7.c

I looked at these briefly and it seems like the problem here is that
the output of the C front end after gimplification is different than
it used to be, since I swapped in the C++ genericization code (it
lowers to a LOOP_EXPR instead of directly to goto form, then the
LOOP_EXPR gets lowered during gimplification).  TBH, I don't really
understand what optimizations these test cases are trying to test, and
whether the pattern-matching is too fragile, whether the form of the code
isn't something that actually tests the optimization, or whether the
optimization is just not working on this input.  So I'm not sure what
to do about those failures!  :-(  I could change the genericizer to
emit the same goto form that the C front end formerly emitted instead
of the C++-style output, but that might just be papering over bugs
elsewhere.  Any advice on how to proceed here is welcome.  If the
patches are OK otherwise, maybe just file bugzilla issues for these
regressions?

-Sandra


Sandra Loosemore (2):
   Move loop and switch tree data structures from cp/ to c-family/.
   Change C front end to emit structured loop and switch tree nodes.

  gcc/c-family/c-common.c |  24 +++
  gcc/c-family/c-common.def   |  24 +++
  gcc/c-family/c-common.h |  53 -
  gcc/c-family/c-dump.c   |  38 
  gcc/c-family/c-gimplify.c   | 414 
  gcc/c-family/c-pretty-print.c   |  92 +++-
  gcc/c/c-decl.c  |  18 +-
  gcc/c/c-lang.h  |   3 +-
  gcc/c/c-objc-common.h   |   2 +
  gcc/c/c-parser.c| 125 ++-
  gcc/c/c-tree.h  |  21 +-
  gcc/c/c-typeck.c| 227 ++--
  gcc/cp/cp-gimplify.c| 347 ++
  gcc/cp/cp-objcp-common.c|  13 +-
  gcc/cp/cp-tree.def  |  23 --
  gcc/cp/cp-tree.h|  40 
  gcc/cp/cxx-pretty-print.c   |  78 ---
  gcc/cp/dump.c   |  31 ---
  gcc/doc/generic.texi|  56 +++--
  gcc/objc/objc-act.c |   6 +-
  gcc/testsuite/gcc.dg/gomp/block-7.c |  12 +-
  21 files changed, 859 insertions(+), 788 deletions(-)





Re: [PATCH] Support multi-versioning on self-recursive function (ipa/92133)

2019-12-01 Thread Feng Xue OS
Done.   -Thanks

Feng



From: Jeff Law 
Sent: Monday, December 2, 2019 4:33 AM
To: Feng Xue OS; Martin Jambor; Jan Hubicka; Richard Biener
Cc: luoxhu; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Support multi-versioning on self-recursive function 
(ipa/92133)

On 11/26/19 6:44 PM, Feng Xue OS wrote:
> Hi, Richard,
>
>   This patch is a not bugfix, while it is small. Martin and Honza are fine 
> with it.
> But now we are in stage 3, is it ok to commit?
Yes.  It was posted well in advance of the stage1->stage3 change.
Please commit it ASAP so the testers can pick it up.

Thanks,
jeff



Re: [PATCH] Use xchg for -Os (PR target/92549)

2019-12-01 Thread Uros Bizjak
On Sun, Dec 1, 2019 at 6:55 PM Jakub Jelinek  wrote:
>
> On Sun, Dec 01, 2019 at  03:43:37PM +0100, Jan Hubicka wrote:
> > > PR target/92549
> > > * config/i386/i386.md (peephole2 for *swap): New peephole2.
> > >
> > > * gcc.target/i386/pr92549.c: New test.
> >
> > It is very hard to get a testcase, unforutnately, but I got the
> > following (locally non-reproducible) failure while building firefox with
> > LTO+FDO:
> >
> > [task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -   1080 | }
> > [task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -|
> > [task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -  (insn  1389 1390 
> > 41 (parallel [
> > [task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -  (set 
> > (reg:SI 24 xmm4 [orig:187 SR.3778 ] [187])
> > [task 2019-12-01T14:38:04.166Z] 14:38:04 INFO - (reg:SI 23 xmm3 
> > [orig:104 SR.3780 ] [104]))
> > [task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -  (set 
> > (reg:SI 23 xmm3 [orig:104 SR.3780 ] [104])
> > [task 2019-12-01T14:38:04.166Z] 14:38:04 INFO - (reg:SI 24 xmm4 
> > [orig:187 SR.3778 ] [187]))
> > [task 2019-12-01T14:38:04.166Z] 14:38:04 INFO -  ])
> > "/builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/WritingModes.h":1117:0
> > 78 {*swapsi}
> > [task 2019-12-01T14:38:04.167Z] 14:38:04 INFO -   (nil))
> > [task 2019-12-01T14:38:04.167Z] 14:38:04 INFO -  during RTL pass: rnreg
> >
> > I guess the problem is that there is no xchange in SSE instruction set,
> > so peephle needs to be more restrictive?
>
> I'll have a look tomorrow.

general_reg_operand should be used in the pattern.

Uros.