Re: Question about information from -fdump-rtl-sched2 on M1 Max

2024-04-30 Thread Iain Sandoe
Hi,

> On 30 Apr 2024, at 00:39, Andrew Pinski via Gcc  wrote:
> 
> On Mon, Apr 29, 2024 at 4:26 PM Lucier, Bradley J via Gcc
>  wrote:
>> 
>> The question: How to interpret scheduling info with the compiler listed 
>> below.
>> 
>> Specifically, a tight loop that was reported to be scheduled in 23 cycles 
>> (as I understand it) actually executes in a little over 2 cycles per loop, 
>> as I interpret two separate experiments.
>> 
>> Am I misinterpreting something here?
> 
> Yes, the schedule mode in use here is the cortex-a53 one ...
> as evidenced by "cortex_a53_slot_" in the dump.
> Most aarch64 cores don't have a schedule model associated with it.
> Especially when it comes cores that don't have not been upstream
> directly from the company that produces them.

indeed the branches in use are not yet upstreamed .. but ...

> The default scheduling model is cortex-a53 anyways. And you didn't use
> -mtune= nor -mcpu=; only -march=native which just changes the arch
> features and not the tuning or scheduler model.

… 
1) 14.1 and 13.3 will have support for -mcpu=apple-m1,m2,m3

2) Those branches will also have hopefully better choices for the tuning and 
scheduling within the available models (I got some advice from Tamar, thanks!).

3) Andrew is correct, we have not really had much information from the vendor 
about the scheduling - although the latest data now does include some.  
Unfortunately this is a topic I’ve not yet got into so it’s going to take me a 
while and probably lots of advice to do something specific for Mx cores.

4) I did not think —mcpu=native was working in 13.2  … but ICBW (anyway, it is 
present in 14.1 and backported to (darwin) 13.3, 12.4 and 11.5).  You should 
not have long to wait for 14.1 ...

thanks
Iain

> 
> Thanks,
> Andrew Pinski
> 
>> 
>> Thanks.
>> 
>> Brad
>> 
>> The compiler:
>> 
>> [MacBook-Pro:~/programs/gambit/gambit-feeley] lucier% gcc-13 -v
>> Using built-in specs.
>> COLLECT_GCC=gcc-13
>> COLLECT_LTO_WRAPPER=/opt/homebrew/Cellar/gcc/13.2.0/bin/../libexec/gcc/aarch64-apple-darwin23/13/lto-wrapper
>> Target: aarch64-apple-darwin23
>> Configured with: ../configure --prefix=/opt/homebrew/opt/gcc 
>> --libdir=/opt/homebrew/opt/gcc/lib/gcc/current --disable-nls 
>> --enable-checking=release --with-gcc-major-version-only 
>> --enable-languages=c,c++,objc,obj-c++,fortran --program-suffix=-13 
>> --with-gmp=/opt/homebrew/opt/gmp --with-mpfr=/opt/homebrew/opt/mpfr 
>> --with-mpc=/opt/homebrew/opt/libmpc --with-isl=/opt/homebrew/opt/isl 
>> --with-zstd=/opt/homebrew/opt/zstd --with-pkgversion='Homebrew GCC 13.2.0' 
>> --with-bugurl=https://github.com/Homebrew/homebrew-core/issues 
>> --with-system-zlib --build=aarch64-apple-darwin23 
>> --with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk 
>> --with-ld=/Library/Developer/CommandLineTools/usr/bin/ld-classic
>> Thread model: posix
>> Supported LTO compression algorithms: zlib zstd
>> gcc version 13.2.0 (Homebrew GCC 13.2.0)
>> 
>> (so perhaps not the standard gcc).
>> 
>> The command line (cut down a bit) is
>> 
>> gcc-13 -save-temps -fverbose-asm -fdump-rtl-sched2 -O1 
>> -fexpensive-optimizations -fno-gcse -Wno-unused -Wno-write-strings 
>> -Wdisabled-optimization -fwrapv -fno-strict-aliasing -fno-trapping-math 
>> -fno-math-errno -fschedule-insns2 -foptimize-sibling-calls 
>> -fomit-frame-pointer -fipa-ra -fmove-loop-invariants -march=native -fPIC 
>> -fno-common   -I"../include" -c -o _num.o -I. _num.c -D___LIBRARY
>> 
>> The scheduling report for the loop is
>> 
>> ;;   ==
>> ;;   -- basic block 10 from 39 to 70 -- after reload
>> ;;   ==
>> 
>> ;;0--> b  0: i  39 x4=x2+x7
>> :cortex_a53_slot_any
>> ;;0--> b  0: i  46 x1=zxn([sxn(x2)*0x4+x8])
>> :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_load
>> ;;3--> b  0: i  45 x9=zxn([sxn(x4)*0x4+x3])
>> :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_load
>> ;;7--> b  0: i  47 x1=zxn(x6)*zxn(x1)+x9   
>> :(cortex_a53_slot_any+cortex_a53_imul)
>> ;;9--> b  0: i  48 x1=x1+x5
>> :cortex_a53_slot_any
>> ;;9--> b  0: i  53 x5=x12+x2   
>> :cortex_a53_slot_any
>> ;;   10--> b  0: i  50 [sxn(x4)*0x4+x3]=x1 
>> :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_store
>> ;;   10--> b  0: i  57 x4=x2+0x1   
>> :cortex_a53_slot_any
>> ;;   11--> b  0: i  67 x2=x2+0x2   
>> :cortex_a53_slot_any
>> ;;   12--> b  0: i  60 x9=zxn([sxn(x5)*0x4+x3])
>> :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_load
>> ;;   13--> b  0: i  61 x4=zxn([sxn(x4)*0x4+x8])
>> :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_load
>> ;;   17--> b  0: i  62 x4=z

Re: Question about information from -fdump-rtl-sched2 on M1 Max

2024-04-30 Thread Iain Sandoe



> On 30 Apr 2024, at 08:33, Iain Sandoe  wrote:
> 
> Hi,
> 
>> On 30 Apr 2024, at 00:39, Andrew Pinski via Gcc  wrote:
>> 
>> On Mon, Apr 29, 2024 at 4:26 PM Lucier, Bradley J via Gcc
>>  wrote:
>>> 
>>> The question: How to interpret scheduling info with the compiler listed 
>>> below.
>>> 
>>> Specifically, a tight loop that was reported to be scheduled in 23 cycles 
>>> (as I understand it) actually executes in a little over 2 cycles per loop, 
>>> as I interpret two separate experiments.
>>> 
>>> Am I misinterpreting something here?
>> 
>> Yes, the schedule mode in use here is the cortex-a53 one ...
>> as evidenced by "cortex_a53_slot_" in the dump.
>> Most aarch64 cores don't have a schedule model associated with it.
>> Especially when it comes cores that don't have not been upstream
>> directly from the company that produces them.
> 
> indeed the branches in use are not yet upstreamed .. but ...
> 
>> The default scheduling model is cortex-a53 anyways. And you didn't use
>> -mtune= nor -mcpu=; only -march=native which just changes the arch
>> features and not the tuning or scheduler model.
> 
> … 
> 1) 14.1 and 13.3 will have support for -mcpu=apple-m1,m2,m3

I should have been more clear — the 14.1 and 13.3 darwin development branches 
will have this support (not yet upstream).

> 
> 2) Those branches will also have hopefully better choices for the tuning and 
> scheduling within the available models (I got some advice from Tamar, 
> thanks!).
> 
> 3) Andrew is correct, we have not really had much information from the vendor 
> about the scheduling - although the latest data now does include some.  
> Unfortunately this is a topic I’ve not yet got into so it’s going to take me 
> a while and probably lots of advice to do something specific for Mx cores.
> 
> 4) I did not think —mcpu=native was working in 13.2  … but ICBW (anyway, it 
> is present in 14.1 and backported to (darwin) 13.3, 12.4 and 11.5).  You 
> should not have long to wait for 14.1 ...
> 
> thanks
> Iain
> 
>> 
>> Thanks,
>> Andrew Pinski
>> 
>>> 
>>> Thanks.
>>> 
>>> Brad
>>> 
>>> The compiler:
>>> 
>>> [MacBook-Pro:~/programs/gambit/gambit-feeley] lucier% gcc-13 -v
>>> Using built-in specs.
>>> COLLECT_GCC=gcc-13
>>> COLLECT_LTO_WRAPPER=/opt/homebrew/Cellar/gcc/13.2.0/bin/../libexec/gcc/aarch64-apple-darwin23/13/lto-wrapper
>>> Target: aarch64-apple-darwin23
>>> Configured with: ../configure --prefix=/opt/homebrew/opt/gcc 
>>> --libdir=/opt/homebrew/opt/gcc/lib/gcc/current --disable-nls 
>>> --enable-checking=release --with-gcc-major-version-only 
>>> --enable-languages=c,c++,objc,obj-c++,fortran --program-suffix=-13 
>>> --with-gmp=/opt/homebrew/opt/gmp --with-mpfr=/opt/homebrew/opt/mpfr 
>>> --with-mpc=/opt/homebrew/opt/libmpc --with-isl=/opt/homebrew/opt/isl 
>>> --with-zstd=/opt/homebrew/opt/zstd --with-pkgversion='Homebrew GCC 13.2.0' 
>>> --with-bugurl=https://github.com/Homebrew/homebrew-core/issues 
>>> --with-system-zlib --build=aarch64-apple-darwin23 
>>> --with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX14.sdk 
>>> --with-ld=/Library/Developer/CommandLineTools/usr/bin/ld-classic
>>> Thread model: posix
>>> Supported LTO compression algorithms: zlib zstd
>>> gcc version 13.2.0 (Homebrew GCC 13.2.0)
>>> 
>>> (so perhaps not the standard gcc).
>>> 
>>> The command line (cut down a bit) is
>>> 
>>> gcc-13 -save-temps -fverbose-asm -fdump-rtl-sched2 -O1 
>>> -fexpensive-optimizations -fno-gcse -Wno-unused -Wno-write-strings 
>>> -Wdisabled-optimization -fwrapv -fno-strict-aliasing -fno-trapping-math 
>>> -fno-math-errno -fschedule-insns2 -foptimize-sibling-calls 
>>> -fomit-frame-pointer -fipa-ra -fmove-loop-invariants -march=native -fPIC 
>>> -fno-common   -I"../include" -c -o _num.o -I. _num.c -D___LIBRARY
>>> 
>>> The scheduling report for the loop is
>>> 
>>> ;;   ==
>>> ;;   -- basic block 10 from 39 to 70 -- after reload
>>> ;;   ==
>>> 
>>> ;;0--> b  0: i  39 x4=x2+x7
>>> :cortex_a53_slot_any
>>> ;;0--> b  0: i  46 x1=zxn([sxn(x2)*0x4+x8])
>>> :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_load
>>> ;;3--> b  0: i  45 x9=zxn([sxn(x4)*0x4+x3])
>>> :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_load
>>> ;;7--> b  0: i  47 x1=zxn(x6)*zxn(x1)+x9   
>>> :(cortex_a53_slot_any+cortex_a53_imul)
>>> ;;9--> b  0: i  48 x1=x1+x5
>>> :cortex_a53_slot_any
>>> ;;9--> b  0: i  53 x5=x12+x2   
>>> :cortex_a53_slot_any
>>> ;;   10--> b  0: i  50 [sxn(x4)*0x4+x3]=x1 
>>> :(cortex_a53_slot_any+cortex_a53_ls_agen),cortex_a53_store
>>> ;;   10--> b  0: i  57 x4=x2+0x1   
>>> :cortex_a53_slot_any
>>> ;;   11--> b  0: i  67 x2=x2+0x2   
>>> :c

Re: 1.76% performance loss in VRP due to inlining

2024-04-30 Thread Richard Biener via Gcc
On Fri, Apr 26, 2024 at 11:45 AM Aldy Hernandez via Gcc  wrote:
>
> Hi folks!
>
> In implementing prange (pointer ranges), I have found a 1.74% slowdown
> in VRP, even without any code path actually using the code.  I have
> tracked this down to irange::get_bitmask() being compiled differently
> with and without the bare bones patch.  With the patch,
> irange::get_bitmask() has a lot of code inlined into it, particularly
> get_bitmask_from_range() and consequently the wide_int_storage code.
>
> I don't know whether this is expected behavior, and if it is, how to
> mitigate it.  I have tried declaring get_bitmask_from_range() inline,
> but that didn't help.  OTOH, using __attribute__((always_inline))
> helps a bit, but not entirely.  What does help is inlining
> irange::get_bitmask() entirely, but that seems like a big hammer.

You can use -Winline to see why we don't inline an inline declared
function.  I would guess the unit-growth limit kicks in?

Did you check a release checking compiler?  That might still
inline things.

> The overall slowdown in compilation is 0.26%, because VRP is a
> relatively fast pass, but a measurable pass slowdown is something we'd
> like to avoid.
>
> What's the recommended approach here?
>
> For the curious, I am attaching before and after copies of
> value-range.s.  I am also attaching the two patches needed to
> reproduce the problem on mainline.  The first patch is merely setup.
> It is the second patch that exhibits the problem.  Notice there are no
> uses of prange yet.
>
> Thanks.
> Aldy


Re: 1.76% performance loss in VRP due to inlining

2024-04-30 Thread Aldy Hernandez via Gcc
On Tue, Apr 30, 2024 at 9:58 AM Richard Biener
 wrote:
>
> On Fri, Apr 26, 2024 at 11:45 AM Aldy Hernandez via Gcc  
> wrote:
> >
> > Hi folks!
> >
> > In implementing prange (pointer ranges), I have found a 1.74% slowdown
> > in VRP, even without any code path actually using the code.  I have
> > tracked this down to irange::get_bitmask() being compiled differently
> > with and without the bare bones patch.  With the patch,
> > irange::get_bitmask() has a lot of code inlined into it, particularly
> > get_bitmask_from_range() and consequently the wide_int_storage code.
> >
> > I don't know whether this is expected behavior, and if it is, how to
> > mitigate it.  I have tried declaring get_bitmask_from_range() inline,
> > but that didn't help.  OTOH, using __attribute__((always_inline))
> > helps a bit, but not entirely.  What does help is inlining
> > irange::get_bitmask() entirely, but that seems like a big hammer.
>
> You can use -Winline to see why we don't inline an inline declared
> function.  I would guess the unit-growth limit kicks in?

Ah, will do.  Thanks.

>
> Did you check a release checking compiler?  That might still
> inline things.

Yes, we only measure performance with release builds.

Aldy



Re: 1.76% performance loss in VRP due to inlining

2024-04-30 Thread Martin Jambor
Hi,

On Fri, Apr 26 2024, Aldy Hernandez via Gcc wrote:
> Hi folks!
>
> In implementing prange (pointer ranges), I have found a 1.74% slowdown
> in VRP, even without any code path actually using the code.  I have
> tracked this down to irange::get_bitmask() being compiled differently
> with and without the bare bones patch.  With the patch,
> irange::get_bitmask() has a lot of code inlined into it, particularly
> get_bitmask_from_range() and consequently the wide_int_storage code.
>
> I don't know whether this is expected behavior, and if it is, how to
> mitigate it.  I have tried declaring get_bitmask_from_range() inline,
> but that didn't help.  OTOH, using __attribute__((always_inline))
> helps a bit, but not entirely.  What does help is inlining
> irange::get_bitmask() entirely, but that seems like a big hammer.
>
> The overall slowdown in compilation is 0.26%, because VRP is a
> relatively fast pass, but a measurable pass slowdown is something we'd
> like to avoid.
>
> What's the recommended approach here?

I'm afraid that the right approach (not sure if that also means the
recommended approach) is to figure out why inlining
irange::get_bitmask() helps, i.e. what unnecessary computations or
memory accesses it avoids or which other subsequent optimizations it
enables, etc.  Then we can have a look whether IPA could facilitate this
without inlining (or if eventually code shrinks to a reasonable size,
how to teach the inliner to predict this).

Martin


>
> For the curious, I am attaching before and after copies of
> value-range.s.  I am also attaching the two patches needed to
> reproduce the problem on mainline.  The first patch is merely setup.
> It is the second patch that exhibits the problem.  Notice there are no
> uses of prange yet.
>
> Thanks.
> Aldy
> From ee63833c5f56064ef47c2bb9debd485f77d00171 Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez 
> Date: Tue, 19 Mar 2024 18:04:55 +0100
> Subject: [PATCH] Move get_bitmask_from_range out of irange class.
>
> ---
>  gcc/value-range.cc | 52 +++---
>  gcc/value-range.h  |  1 -
>  2 files changed, 26 insertions(+), 27 deletions(-)
>
> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> index 70375f7abf9..0f81ce32615 100644
> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -31,6 +31,30 @@ along with GCC; see the file COPYING3.  If not see
>  #include "fold-const.h"
>  #include "gimple-range.h"
>  
> +// Return the bitmask inherent in a range.
> +
> +static irange_bitmask
> +get_bitmask_from_range (tree type,
> + const wide_int &min, const wide_int &max)
> +{
> +  unsigned prec = TYPE_PRECISION (type);
> +
> +  // All the bits of a singleton are known.
> +  if (min == max)
> +{
> +  wide_int mask = wi::zero (prec);
> +  wide_int value = min;
> +  return irange_bitmask (value, mask);
> +}
> +
> +  wide_int xorv = min ^ max;
> +
> +  if (xorv != 0)
> +xorv = wi::mask (prec - wi::clz (xorv), false, prec);
> +
> +  return irange_bitmask (wi::zero (prec), min | xorv);
> +}
> +
>  void
>  irange::accept (const vrange_visitor &v) const
>  {
> @@ -1832,31 +1856,6 @@ irange::invert ()
>  verify_range ();
>  }
>  
> -// Return the bitmask inherent in the range.
> -
> -irange_bitmask
> -irange::get_bitmask_from_range () const
> -{
> -  unsigned prec = TYPE_PRECISION (type ());
> -  wide_int min = lower_bound ();
> -  wide_int max = upper_bound ();
> -
> -  // All the bits of a singleton are known.
> -  if (min == max)
> -{
> -  wide_int mask = wi::zero (prec);
> -  wide_int value = lower_bound ();
> -  return irange_bitmask (value, mask);
> -}
> -
> -  wide_int xorv = min ^ max;
> -
> -  if (xorv != 0)
> -xorv = wi::mask (prec - wi::clz (xorv), false, prec);
> -
> -  return irange_bitmask (wi::zero (prec), min | xorv);
> -}
> -
>  // Remove trailing ranges that this bitmask indicates can't exist.
>  
>  void
> @@ -1978,7 +1977,8 @@ irange::get_bitmask () const
>// in the mask.
>//
>// See also the note in irange_bitmask::intersect.
> -  irange_bitmask bm = get_bitmask_from_range ();
> +  irange_bitmask bm
> += get_bitmask_from_range (type (), lower_bound (), upper_bound ());
>if (!m_bitmask.unknown_p ())
>  bm.intersect (m_bitmask);
>return bm;
> diff --git a/gcc/value-range.h b/gcc/value-range.h
> index 9531df56988..dc5b153a83e 100644
> --- a/gcc/value-range.h
> +++ b/gcc/value-range.h
> @@ -351,7 +351,6 @@ private:
>bool varying_compatible_p () const;
>bool intersect_bitmask (const irange &r);
>bool union_bitmask (const irange &r);
> -  irange_bitmask get_bitmask_from_range () const;
>bool set_range_from_bitmask ();
>  
>bool intersect (const wide_int& lb, const wide_int& ub);
> -- 
> 2.44.0
>
> From 03c70de43177a97ec5e9c243aafc798c1f37e6d8 Mon Sep 17 00:00:00 2001
> From: Aldy Hernandez 
> Date: Wed, 20 Mar 2024 06:25:52 +0100
> Subject: [PATCH] Implement minimum prange class exhibiting VRP sl

How stable is the CFG and basic block IDs?

2024-04-30 Thread Jørgen Kvalsvik

Hi,

I am working on adding path coverage support to gcc/gcov and need to 
develop a good testing strategy.


So far I can reasonably well report on the uncovered path as such:

paths covered 6 of 17
path not covered: 2 8 3 4
path not covered: 2 8 3 5 6
path not covered: 2 8 3 5 7 10
...

where the numbers are the basic blocks of the CFG, which can be seen in 
the source listing with gcov -a, e.g.:


#:6:while (low <= high)
%:6-block 2
%:6-block 8

Some natural test would be to by hand determine the paths taken and 
compare with the gcov output, like lines/branches/conditions are tested 
today. Unlike the other coverages in gcov, paths work more like function 
summaries and cannot be reasonably shown in the source listing, but the 
basic block listing actually works quite alright.


The problem is testing. If gcc would re-number the basic blocks then 
tests comparing hard-coded test paths would break, even though the path 
coverage itself would be just fine (and presumably the change to the 
basic block indices), which would add an unreasonable maintenance 
burden. If anything, it feels very fragile to write tests against the 
basic block indices.


On the other hand, if it can be expected that the same code should 
always yield the same CFG, the same BBs, and the same BB indices then I 
would happily test against them. I suppose this makes the basic blocks 
basically a public interface, which granted feels odd.


If you have any good idea on how to test paths in a robust way please 
let me know.


Thanks,
Jørgen


Re: How to format code in GCC style?

2024-04-30 Thread Filip Kastl
On Tue 2024-04-30 13:52:16, Hanke Zhang via Gcc wrote:
> Hi
> I'm trying to format my code in GCC style.
> 
> But I don't know how to finish it. I try `indent` and
> `check_GNU_style.sh` in the `contrib` directory. But none of them seem
> to work well.
> 
> So I wondered is there any official tools to do that?
> 
> Thanks
> Hanke Zhang

Hi Hanke,

Not sure if this will help you but this is what I use: While writing code in
GCC style, I use the vim options listed here:

https://gcc.gnu.org/wiki/FormattingCodeForGCC

Then, before I submit a patch I run it through check_GNU_style.sh.  This works
fine for me.

Cheers,
Filip Kastl


GCC 14.1 Release Candidate available from gcc.gnu.org

2024-04-30 Thread Jakub Jelinek via Gcc
The first release candidate for GCC 14.1 is available from

 https://gcc.gnu.org/pub/gcc/snapshots/14.1.0-RC-20240430/
 ftp://gcc.gnu.org/pub/gcc/snapshots/14.1.0-RC-20240430/

and shortly its mirrors.  It has been generated from git commit
r14-10154-g7a00c459cbb913a.

I have so far bootstrapped and tested the release candidate on
x86_64-linux.
Please test it and report any issues to bugzilla.

If all goes well, we'd like to release 14.1 on Tuesday, May 7th.



Re: How stable is the CFG and basic block IDs?

2024-04-30 Thread Richard Biener via Gcc
On Tue, Apr 30, 2024 at 11:45 AM Jørgen Kvalsvik  wrote:
>
> Hi,
>
> I am working on adding path coverage support to gcc/gcov and need to
> develop a good testing strategy.
>
> So far I can reasonably well report on the uncovered path as such:
>
> paths covered 6 of 17
> path not covered: 2 8 3 4
> path not covered: 2 8 3 5 6
> path not covered: 2 8 3 5 7 10
> ...
>
> where the numbers are the basic blocks of the CFG, which can be seen in
> the source listing with gcov -a, e.g.:
>
>  #:6:while (low <= high)
>  %:6-block 2
>  %:6-block 8
>
> Some natural test would be to by hand determine the paths taken and
> compare with the gcov output, like lines/branches/conditions are tested
> today. Unlike the other coverages in gcov, paths work more like function
> summaries and cannot be reasonably shown in the source listing, but the
> basic block listing actually works quite alright.
>
> The problem is testing. If gcc would re-number the basic blocks then
> tests comparing hard-coded test paths would break, even though the path
> coverage itself would be just fine (and presumably the change to the
> basic block indices), which would add an unreasonable maintenance
> burden. If anything, it feels very fragile to write tests against the
> basic block indices.

Problematic is usually when early canonicalization changes the
direction of a branch which affects the block IDs of the true/false
destinations (and their downstream blocks).

> On the other hand, if it can be expected that the same code should
> always yield the same CFG, the same BBs, and the same BB indices then I
> would happily test against them. I suppose this makes the basic blocks
> basically a public interface, which granted feels odd.
>
> If you have any good idea on how to test paths in a robust way please
> let me know.

Is there enough info to print a path like the following?

path not covered: t.c:2(true) t.c:4(false) t.c:11(true) ...

instead of printing (condition destination?) block IDs?

Richard.

>
> Thanks,
> Jørgen


Re: Sourceware mitigating and preventing the next xz-backdoor

2024-04-30 Thread Alejandro Colomar via Gcc
Hi Mark,

On Sun, Apr 21, 2024 at 10:40:14PM +0200, Alejandro Colomar wrote:

[...]

> Let's generate a v2 patch set, showing the range-diff against v1.  We
> need to check the commit IDs of the first set, which can be found in the
> mailing list archives, thanks to the trick we used.  The v1 range was
> 7ec952012^..892a12470.  So we just pass that range:
> 
>   $ git format-patch -o ./patches/ master..HEAD \
>   --range-diff=7ec952012^..892a12470 -v2 --cover-letter;
>   ./patches/v2--cover-letter.patch
>   
> ./patches/v2-0001-share-mk-build-fonts-unifont-Build-UnifontR-from-.patch
>   ./patches/v2-0002-share-mk-build-pdf-book-Use-Unifont.patch
>   
> ./patches/v2-0003-share-mk-build-fonts-unifont-Specify-space-width-.patch
> 
> The v2 cover letter shows the changes introduced since v1:
> 
>   $ tail -n20 ./patches/v2--cover-letter.patch 
>create mode 100644 share/mk/build/fonts/unifont/dit.mk
>create mode 100644 share/mk/build/fonts/unifont/pfa.mk
>create mode 100644 
> share/mk/configure/build-depends/fonts-unifont/unifont.otf.mk
> 
>   Range-diff against v1:
>   1:  7ec952012 = 1:  7ec952012 share/mk/: build-fonts-unifont: Build 
> UnifontR from unifont.otf
>   2:  d80376b08 = 2:  d80376b08 share/mk/: build-pdf-book: Use Unifont
>   3:  892a12470 ! 3:  bc7fa7d92 share/mk/: build-fonts-unifont: Specify 
> spacewidth in afmtodit(1)
>   @@ Metadata
>Author: Alejandro Colomar 
>
> ## Commit message ##
>   -share/mk/: build-fonts-unifont: Specify spacewidth in 
> afmtodit(1)
>   +share/mk/: build-fonts-unifont: Specify space width in 
> afmtodit(1)
>
>Link: 
> 
>Suggested-by: "G. Branden Robinson" 
>   -- 
>   2.43.0

I've added a recommendation in the Linux man-pages contributing
documentation that patches be sent with a range diff, and also that
patches be sent in PGP-signed mail (if the user has a PGP key).  It has
specific instructions like the above (but simplified).



Feel free to copy any of that documentation.

I also recommended specific mutt(1) settings:

set crypt_autosign = yes
set crypt_protected_headers_write = yes

And git-send-email(1) configuration for using with neomutt(1):

   [sendemail]
   sendmailcmd = neomutt -C -H - && true

For all the documentation for mail and patches, see these two files:



Have a lovely day!
Alex

-- 



signature.asc
Description: PGP signature


Re: How stable is the CFG and basic block IDs?

2024-04-30 Thread Jørgen Kvalsvik

On 30/04/2024 12:45, Richard Biener wrote:

On Tue, Apr 30, 2024 at 11:45 AM Jørgen Kvalsvik  wrote:


Hi,

I am working on adding path coverage support to gcc/gcov and need to
develop a good testing strategy.

So far I can reasonably well report on the uncovered path as such:

paths covered 6 of 17
path not covered: 2 8 3 4
path not covered: 2 8 3 5 6
path not covered: 2 8 3 5 7 10
...

where the numbers are the basic blocks of the CFG, which can be seen in
the source listing with gcov -a, e.g.:

  #:6:while (low <= high)
  %:6-block 2
  %:6-block 8

Some natural test would be to by hand determine the paths taken and
compare with the gcov output, like lines/branches/conditions are tested
today. Unlike the other coverages in gcov, paths work more like function
summaries and cannot be reasonably shown in the source listing, but the
basic block listing actually works quite alright.

The problem is testing. If gcc would re-number the basic blocks then
tests comparing hard-coded test paths would break, even though the path
coverage itself would be just fine (and presumably the change to the
basic block indices), which would add an unreasonable maintenance
burden. If anything, it feels very fragile to write tests against the
basic block indices.


Problematic is usually when early canonicalization changes the
direction of a branch which affects the block IDs of the true/false
destinations (and their downstream blocks).


I figured. Ok, that means I must figure out another strategy.




On the other hand, if it can be expected that the same code should
always yield the same CFG, the same BBs, and the same BB indices then I
would happily test against them. I suppose this makes the basic blocks
basically a public interface, which granted feels odd.

If you have any good idea on how to test paths in a robust way please
let me know.


Is there enough info to print a path like the following?

path not covered: t.c:2(true) t.c:4(false) t.c:11(true) ...

instead of printing (condition destination?) block IDs?


Yes! This is all pulled from the CFG and BB -> line mapping. I plan to 
implement this under different verbosity flags. We can even do one 
better - I have implemented this in my POC:


path not covered: 6 8 3 5 6
BB 6: 14:   high = mid - 1;
BB 8:  6:while (low <= high)
BB 3:  8:int mid = (low + high) >> 1;
BB 3:  9:long midVal = a[mid];
BB 3: 11:   if (midVal < key)
BB 5: 13:   else if (midVal > key)
BB 6: 14:   high = mid - 1;

That is, gcov will print the statements in the order they need to be 
executed in order to achieve coverage.


Thanks,
Jørgen



Richard.



Thanks,
Jørgen




Re: How stable is the CFG and basic block IDs?

2024-04-30 Thread Jan Hubicka via Gcc
> > The problem is testing. If gcc would re-number the basic blocks then
> > tests comparing hard-coded test paths would break, even though the path
> > coverage itself would be just fine (and presumably the change to the
> > basic block indices), which would add an unreasonable maintenance
> > burden. If anything, it feels very fragile to write tests against the
> > basic block indices.
> 
> Problematic is usually when early canonicalization changes the
> direction of a branch which affects the block IDs of the true/false
> destinations (and their downstream blocks).

I think we can renumber Bbs sequentially before profile generation (or
maybe we do already).  But indeed the CfG may change with time.
> 
> > On the other hand, if it can be expected that the same code should
> > always yield the same CFG, the same BBs, and the same BB indices then I
> > would happily test against them. I suppose this makes the basic blocks
> > basically a public interface, which granted feels odd.
> >
> > If you have any good idea on how to test paths in a robust way please
> > let me know.
> 
> Is there enough info to print a path like the following?
> 
> path not covered: t.c:2(true) t.c:4(false) t.c:11(true) ...
> 
> instead of printing (condition destination?) block IDs?

This was my first idea too.  Thre can be multiple BBs per line but you
can print both BB ID and source file nameand ignore the BB ID for
testsuite pruposes.

Honza
> 
> Richard.
> 
> >
> > Thanks,
> > Jørgen


Re: How stable is the CFG and basic block IDs?

2024-04-30 Thread Jørgen Kvalsvik

On 30/04/2024 13:43, Jan Hubicka wrote:

The problem is testing. If gcc would re-number the basic blocks then
tests comparing hard-coded test paths would break, even though the path
coverage itself would be just fine (and presumably the change to the
basic block indices), which would add an unreasonable maintenance
burden. If anything, it feels very fragile to write tests against the
basic block indices.


Problematic is usually when early canonicalization changes the
direction of a branch which affects the block IDs of the true/false
destinations (and their downstream blocks).


I think we can renumber Bbs sequentially before profile generation (or
maybe we do already).  But indeed the CfG may change with time.



On the other hand, if it can be expected that the same code should
always yield the same CFG, the same BBs, and the same BB indices then I
would happily test against them. I suppose this makes the basic blocks
basically a public interface, which granted feels odd.

If you have any good idea on how to test paths in a robust way please
let me know.


Is there enough info to print a path like the following?

path not covered: t.c:2(true) t.c:4(false) t.c:11(true) ...

instead of printing (condition destination?) block IDs?


This was my first idea too.  Thre can be multiple BBs per line but you
can print both BB ID and source file nameand ignore the BB ID for
testsuite pruposes.



That's a brilliant idea, simply testing against the source line mapping. 
I'll try it right away, fantastic!


Thanks,
Jørgen


Honza


Richard.



Thanks,
Jørgen




Re: How stable is the CFG and basic block IDs?

2024-04-30 Thread Jørgen Kvalsvik

On 30/04/2024 13:43, Jan Hubicka wrote:

The problem is testing. If gcc would re-number the basic blocks then
tests comparing hard-coded test paths would break, even though the path
coverage itself would be just fine (and presumably the change to the
basic block indices), which would add an unreasonable maintenance
burden. If anything, it feels very fragile to write tests against the
basic block indices.


Problematic is usually when early canonicalization changes the
direction of a branch which affects the block IDs of the true/false
destinations (and their downstream blocks).


I think we can renumber Bbs sequentially before profile generation (or
maybe we do already).  But indeed the CfG may change with time.



On the other hand, if it can be expected that the same code should
always yield the same CFG, the same BBs, and the same BB indices then I
would happily test against them. I suppose this makes the basic blocks
basically a public interface, which granted feels odd.

If you have any good idea on how to test paths in a robust way please
let me know.


Is there enough info to print a path like the following?

path not covered: t.c:2(true) t.c:4(false) t.c:11(true) ...

instead of printing (condition destination?) block IDs?


This was my first idea too.  Thre can be multiple BBs per line but you
can print both BB ID and source file nameand ignore the BB ID for
testsuite pruposes.



Actually, we don't have the true/false, but maybe the CFG recording can 
be extended to include it, along with unconditional, is_fake, is_throw 
etc., looks like there's room in the bitset. Finding the right edge is 
easy because we have both the source and destination.


I'll check out if storing the true/false flag is simple and submit a patch.


Honza


Richard.



Thanks,
Jørgen




GCC 13.2.1 Status Report (2024-04-30)

2024-04-30 Thread Jakub Jelinek via Gcc
Status
==

The gcc-13 branch is open for regression and documentation fixes.

It's time to do the annual release from the branch, GCC 13.3, and
the plan is to do a release candidate in two weeks, around May 14th,
following with the actual release around May 21st.

Please look through bugzilla and see which of your regression fixes
for GCC 14/15 are also applicable for the GCC 13 branch and do the
necessary backporting.  Note that releases from the older branches
will follow as well.


Quality Data


Priority  #   Change from last report
---   ---
P12   +   2
P2  512   + 115
P3   96   -  30
P4  216   +   2
P5   24
---   ---
Total P1-P3 725   +  87
Total   965   +  89


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2023-July/242150.html



Re: 1.76% performance loss in VRP due to inlining

2024-04-30 Thread Jason Merrill via Gcc
On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc  wrote:
>
> In implementing prange (pointer ranges), I have found a 1.74% slowdown
> in VRP, even without any code path actually using the code.  I have
> tracked this down to irange::get_bitmask() being compiled differently
> with and without the bare bones patch.  With the patch,
> irange::get_bitmask() has a lot of code inlined into it, particularly
> get_bitmask_from_range() and consequently the wide_int_storage code.
...
> +static irange_bitmask
> +get_bitmask_from_range (tree type,
> + const wide_int &min, const wide_int &max)
...
> -irange_bitmask
> -irange::get_bitmask_from_range () const

My guess is that this is the relevant change: the old function has
external linkage, and is therefore interposable, which inhibits
inlining.  The new function has internal linkage, which allows
inlining.

Relatedly, I wonder if we want to build GCC with -fno-semantic-interposition?

Jason



Re: 1.76% performance loss in VRP due to inlining

2024-04-30 Thread Richard Biener via Gcc



> Am 30.04.2024 um 21:11 schrieb Jason Merrill via Gcc :
> 
> On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc  
> wrote:
>> 
>> In implementing prange (pointer ranges), I have found a 1.74% slowdown
>> in VRP, even without any code path actually using the code.  I have
>> tracked this down to irange::get_bitmask() being compiled differently
>> with and without the bare bones patch.  With the patch,
>> irange::get_bitmask() has a lot of code inlined into it, particularly
>> get_bitmask_from_range() and consequently the wide_int_storage code.
> ...
>> +static irange_bitmask
>> +get_bitmask_from_range (tree type,
>> + const wide_int &min, const wide_int &max)
> ...
>> -irange_bitmask
>> -irange::get_bitmask_from_range () const
> 
> My guess is that this is the relevant change: the old function has
> external linkage, and is therefore interposable, which inhibits
> inlining.  The new function has internal linkage, which allows
> inlining.
> 
> Relatedly, I wonder if we want to build GCC with -fno-semantic-interposition?

I guess that’s a good idea, though it’s already implied when doing LTO 
bootstrap and building cc1 and friends?  (But not for libgccjit?)

Richard 

> 
> Jason
> 


Re: 1.76% performance loss in VRP due to inlining

2024-04-30 Thread Jakub Jelinek via Gcc
On Tue, Apr 30, 2024 at 03:09:51PM -0400, Jason Merrill via Gcc wrote:
> On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc  
> wrote:
> >
> > In implementing prange (pointer ranges), I have found a 1.74% slowdown
> > in VRP, even without any code path actually using the code.  I have
> > tracked this down to irange::get_bitmask() being compiled differently
> > with and without the bare bones patch.  With the patch,
> > irange::get_bitmask() has a lot of code inlined into it, particularly
> > get_bitmask_from_range() and consequently the wide_int_storage code.
> ...
> > +static irange_bitmask
> > +get_bitmask_from_range (tree type,
> > + const wide_int &min, const wide_int &max)
> ...
> > -irange_bitmask
> > -irange::get_bitmask_from_range () const
> 
> My guess is that this is the relevant change: the old function has
> external linkage, and is therefore interposable, which inhibits
> inlining.  The new function has internal linkage, which allows
> inlining.

Even when a function is exported, when not compiled with -fpic/-fPIC
if we know the function is defined in current TU, it can't be interposed,
Try
int
foo (int x)
{
  return x + 1;
}

int
bar (int x, int y)
{
  return foo (x) + foo (y);
}
with -O2 -fpic -fno-semantic-interposition vs. -O2 -fpic vs. -O2 -fpie vs.
-O2.

> Relatedly, I wonder if we want to build GCC with -fno-semantic-interposition?

It could be useful just for libgccjit.  And not sure if libgccjit users
don't want to interpose something.

Jakub



Re: GCC 14.1 Release Candidate available from gcc.gnu.org

2024-04-30 Thread William Seurer via Gcc
I bootstrapped and tested on powepc64 8 and 9 BE and 8, 9, and 10 LE and 
it looks good.


On 4/30/24 5:33 AM, Jakub Jelinek via Gcc wrote:

The first release candidate for GCC 14.1 is available from

  https://gcc.gnu.org/pub/gcc/snapshots/14.1.0-RC-20240430/
  ftp://gcc.gnu.org/pub/gcc/snapshots/14.1.0-RC-20240430/

and shortly its mirrors.  It has been generated from git commit
r14-10154-g7a00c459cbb913a.

I have so far bootstrapped and tested the release candidate on
x86_64-linux.
Please test it and report any issues to bugzilla.

If all goes well, we'd like to release 14.1 on Tuesday, May 7th.



Re: 1.76% performance loss in VRP due to inlining

2024-04-30 Thread Jason Merrill via Gcc

On 4/30/24 12:22, Jakub Jelinek wrote:

On Tue, Apr 30, 2024 at 03:09:51PM -0400, Jason Merrill via Gcc wrote:

On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc  wrote:


In implementing prange (pointer ranges), I have found a 1.74% slowdown
in VRP, even without any code path actually using the code.  I have
tracked this down to irange::get_bitmask() being compiled differently
with and without the bare bones patch.  With the patch,
irange::get_bitmask() has a lot of code inlined into it, particularly
get_bitmask_from_range() and consequently the wide_int_storage code.

...

+static irange_bitmask
+get_bitmask_from_range (tree type,
+ const wide_int &min, const wide_int &max)

...

-irange_bitmask
-irange::get_bitmask_from_range () const


My guess is that this is the relevant change: the old function has
external linkage, and is therefore interposable, which inhibits
inlining.  The new function has internal linkage, which allows
inlining.


Even when a function is exported, when not compiled with -fpic/-fPIC
if we know the function is defined in current TU, it can't be interposed,


Ah, I was misremembering the effect of the change.  Rather, it's that if 
we see that a function with internal linkage has only a single caller, 
we try harder to inline it.


Jason



Gig

2024-04-30 Thread Lindsey Clark via Gcc


Building libgccjit with -fno-semantic-interposition? ( was Re: 1.76% performance loss in VRP due to inlining)

2024-04-30 Thread David Malcolm via Gcc
On Tue, 2024-04-30 at 21:15 +0200, Richard Biener via Gcc wrote:
> 
> 
> > Am 30.04.2024 um 21:11 schrieb Jason Merrill via Gcc
> > :
> > 
> > On Fri, Apr 26, 2024 at 5:44 AM Aldy Hernandez via Gcc
> >  wrote:
> > > 
> > > In implementing prange (pointer ranges), I have found a 1.74%
> > > slowdown
> > > in VRP, even without any code path actually using the code.  I
> > > have
> > > tracked this down to irange::get_bitmask() being compiled
> > > differently
> > > with and without the bare bones patch.  With the patch,
> > > irange::get_bitmask() has a lot of code inlined into it,
> > > particularly
> > > get_bitmask_from_range() and consequently the wide_int_storage
> > > code.
> > ...
> > > +static irange_bitmask
> > > +get_bitmask_from_range (tree type,
> > > + const wide_int &min, const wide_int &max)
> > ...
> > > -irange_bitmask
> > > -irange::get_bitmask_from_range () const
> > 
> > My guess is that this is the relevant change: the old function has
> > external linkage, and is therefore interposable, which inhibits
> > inlining.  The new function has internal linkage, which allows
> > inlining.
> > 
> > Relatedly, I wonder if we want to build GCC with -fno-semantic-
> > interposition?
> 
> I guess that’s a good idea, though it’s already implied when doing
> LTO bootstrap and building cc1 and friends?  (But not for libgccjit?)

[CCing jit mailing list]

FWIW I've no idea if any libgccjit users are using semantic
interposition; I suspect the answer is "no one is using it".

Antoyo, Andrea [also CCed]: are either of you using semantic
interposition of symbols within libgccjit?

If not, we *might* get a slightly faster libgccjit by building it with
-fno-semantic-interposition.  Or maybe not...


Dave
 
> 
> Richard 
> 
> > 
> > Jason
> > 
> 



Re: How to format code in GCC style?

2024-04-30 Thread Hanke Zhang via Gcc
Hi Filip,

Thanks for your reply. But I'm not so familiar with VIM actually. Thus
when I try the options listed there, nothing happened. (VSCODE is the
IDE I'm using for development.)

And I notice that there is a `vimrc` in the `contrib` directory, I
tried it too, but sadly, it didn't work for me.

Maybe I configured vim in some wrong way.

Thanks
Hanke Zhang

Filip Kastl  于2024年4月30日周二 18:00写道:
>
> On Tue 2024-04-30 13:52:16, Hanke Zhang via Gcc wrote:
> > Hi
> > I'm trying to format my code in GCC style.
> >
> > But I don't know how to finish it. I try `indent` and
> > `check_GNU_style.sh` in the `contrib` directory. But none of them seem
> > to work well.
> >
> > So I wondered is there any official tools to do that?
> >
> > Thanks
> > Hanke Zhang
>
> Hi Hanke,
>
> Not sure if this will help you but this is what I use: While writing code in
> GCC style, I use the vim options listed here:
>
> https://gcc.gnu.org/wiki/FormattingCodeForGCC
>
> Then, before I submit a patch I run it through check_GNU_style.sh.  This works
> fine for me.
>
> Cheers,
> Filip Kastl