git gcc-verify question

2025-02-13 Thread Jerry D via Gcc

Does anyone know what this is about?

$ git gcc-verify
Checking 918fcaf0cbf833063c45805ef893cfa2c9ebc875: OK
Exception ignored in: 
Traceback (most recent call last):
  File "/usr/lib/python3.13/site-packages/git/cmd.py", line 563, in __del__
  File "/usr/lib/python3.13/site-packages/git/cmd.py", line 544, in 
_terminate

  File "/usr/lib64/python3.13/subprocess.py", line 2227, in terminate
ImportError: sys.meta_path is None, Python is likely shutting down

I am on Fedora 41 just updated.

Jerry


Current trunk fails to build gmp

2025-02-13 Thread Rainer Emrich

Since some time in November last year trunk fails to build gmp.
Last successfull build was 30th of October last year.

The issue seems to be a failing configure test. From config.log:

Test compile: long long reliability test 1
configure:6585: 
/opt/devel/gnu/gcc/Linux/x86_64-pc-linux-gnu/Ubuntu_22.04/gcc-15.0.0-standard/bin/gcc 
-O2 -pedantic -fomit-frame-pointer -m64  conftest.c >&5

conftest.c: In function 'f':
conftest.c:12:48: error: too many arguments to function 'g'; expected 0, 
have 6
   12 | 
for(i=0;i<1;i++){if(e(got,got,9,d[i].n)==0)h();g(i,d[i].src,d[i].n,got,d[i].want,9);if(d[i].n)h();}}

  |^ ~
conftest.c:7:6: note: declared here
7 | void g(){}
  |  ^
configure:6588: $? = 1
failed program was:
/* The following provokes a segfault in the compiler on 
powerpc-apple-darwin.
   Extracted from tests/mpn/t-iord_u.c.  Causes Apple's gcc 3.3 build 
1640 and

   1666 to segfault with e.g., -O2 -mpowerpc64.  */

#if defined (__GNUC__) && ! defined (__cplusplus)
typedef unsigned long long t1;typedef t1*t2;
void g(){}
void h(){}
static __inline__ t1 e(t2 rp,t2 up,int n,t1 v0)
{t1 c,x,r;int 
i;if(v0){c=1;for(i=1;ivoid f(){static const struct{t1 n;t1 src[9];t1 
want[9];}d[]={{1,{0},{1}},};t1 got[9];int i;

for(i=0;i<1;i++){if(e(got,got,9,d[i].n)==0)h();g(i,d[i].src,d[i].n,got,d[i].want,9);if(d[i].n)h();}}
#else
int dummy;
#endif

int main () { return 0; }
configure:7072: result: no, long long reliability test 1

Any comments?

I can open a PR if neccessary.

Rainer



OpenPGP_0x917D882CE22A6AD2_and_old_rev.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: 22% degradation seen in embench:matmult-int

2025-02-13 Thread Visda.Vokhshoori--- via Gcc

“the interchanged loop might for example no longer vectorize.”

The loops are not vectorized.  Which is ok, because this device doesn’t have 
the support for it.
I just don’t think a pass could single handedly make code slower that much.

Loop interchange is supposed to interchange the loop nest index with outer 
index to improve cache locality.  This is supposed to help -that is the next 
iteration we will have the data available in cache.

The benchmark source –and  the loop that gets interchanged is line 143

Source: 
https://github.com/embench/embench-iot/blob/master/src/matmult-int/matmult-int.c#L143

This loop is where most of the time is spent. But it would have been good if I 
had access to h/w tracing to see if the interchanged loop reduces cache misses 
as well as to see what is causing it to run this much slower.

Thanks for your reply!

From: Richard Biener 
Date: Thursday, February 13, 2025 at 2:57 AM
To: Visda Vokhshoori - C51841 
Cc: gcc@gcc.gnu.org 
Subject: Re: 22% degradation seen in embench:matmult-int
[You don't often get email from richard.guent...@gmail.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
content is safe

On Wed, Feb 12, 2025 at 4:38 PM Visda.Vokhshoori--- via Gcc
 wrote:
>
> Embench is used for benchmarking on embedded devices.
> This one project matmult-int has a function Multiply.  It’s a matrix 
> multiplication for 20 x 20 matrix.
> The device is a ATSAME70Q21B which is Cortex-M7
> The compiler is arm branch based on GCC version 13
> We are compiling with O3 which has loop-interchange pass on by default.
>
> When we compile with -fno-loop-interchange we get all 22% back plus 5% speed 
> up.
>
> When we do the loop interchange on the one loop nest that get interchanged it 
> is slightly (.7%) faster.
>
> Has anyone else seen large degradation as a result of loop interchange?

I would suggest to compare the -fopt-info diagnostic output with and
without -fno-loop-interchange,
the interchanged loop might for example no longer vectorize.  Other
than that - no, loop interchange
isn't applied very often and it has a very conservative cost model.

Are you able to share a testcase?

Richard.

>
> Thanks


gcc-12-20250213 is now available

2025-02-13 Thread GCC Administrator via Gcc
Snapshot gcc-12-20250213 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/12-20250213/
and on various mirrors, see https://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 12 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-12 revision c72f9c0a3ad8eefd0706957ba054c2c2f388d3d5

You'll find:

 gcc-12-20250213.tar.xz   Complete GCC

  SHA256=40a960056dada322b74c706ef762b2cecfdf168120b29862a5271190c21e8354
  SHA1=535427df282b985fe63846c911e798259655dd35

Diffs from 12-20250206 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-12
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Sourceware Open Valentine Office Friday, Feb 14, 16:00 UTC

2025-02-13 Thread Mark Wielaard
Friday Feb 14, 16:00 UTC
At #overseers on irc.libera.chat

To get the right time in your local timezone:
$ date -d "Fri Feb 14 16:00 UTC 2025"

Valentine's day. Lets show our shared infrastructure some love!

- Got issues with the new process/service isolation
  and/or the DDos protections? Please let us know!
- What is the status of the Forge experiment?
- Need help setting up secure development policies? Just ask!
- Patchwork workflow? Updating CI jobs, autoregen scripts?
  Documentation snapshots? Lets hack together!

Sourceware relies on cooperation among a broad diversity of core
toolchain and developer tool projects, hackers, organizations, ideas,
and communication styles. The monthly Sourceware Open Office meetings
are one way of coming together as a community and discuss our shared
development infrastructure. For other ways to participate see
https://sourceware.org/mission.html#organization






Re: Current trunk fails to build gmp

2025-02-13 Thread Sam James via Gcc
Rainer Emrich  writes:

> Since some time in November last year trunk fails to build gmp.
> Last successfull build was 30th of October last year.
>
> The issue seems to be a failing configure test. From config.log:

Please see
https://gmplib.org/list-archives/gmp-bugs/2024-November/005550.html.

Building GMP with -std=gnu17 is a workaround.


Re: 22% degradation seen in embench:matmult-int

2025-02-13 Thread Richard Biener via Gcc
On Thu, Feb 13, 2025 at 9:30 PM  wrote:
>
>
>
> “the interchanged loop might for example no longer vectorize.”
>
>
>
> The loops are not vectorized.  Which is ok, because this device doesn’t have 
> the support for it.
>
> I just don’t think a pass could single handedly make code slower that much.
>
>
>
> Loop interchange is supposed to interchange the loop nest index with outer 
> index to improve cache locality.  This is supposed to help -that is the next 
> iteration we will have the data available in cache.
>
>
>
> The benchmark source –and  the loop that gets interchanged is line 143
>
>
>
> Source: 
> https://github.com/embench/embench-iot/blob/master/src/matmult-int/matmult-int.c#L143

Looks like the classical matmul loop, similar to the one in SPEC CPU
bwaves.  We do
apply interchange here and that looks reasonable to me.  Note
interchange assumes
a CPU uarch with caches and HW prefetching where linear accesses are a lot more
efficient than strided ones - that might not hold at all for the
Cortex-M7.  Without
interchange the store to Res[] can be moved out of the inner loop.

I've tried

#define UPPERLIMIT 20
typedef long matrix[UPPERLIMIT][UPPERLIMIT];
void
Multiply (matrix A, matrix B, long * __restrict Res)
{
  register int Outer, Inner, Index;

  for (Outer = 0; Outer < UPPERLIMIT; Outer++)
for (Inner = 0; Inner < UPPERLIMIT; Inner++)
  {
(*(matrix *)Res)[Outer][Inner] = 0;
for (Index = 0; Index < UPPERLIMIT; Index++)
  (*(matrix *)Res)[Outer][Inner] += A[Outer][Index] * B[Index][Inner];
  }
}

and this is interchanged on x86_64 as well.  We are implementing a trick
for the zeroing which, when moved into innermost position is done as

  for (Index = 0; Index < UPPERLIMIT; Index++)
for (Inner = 0; Inner < UPPERLIMIT; Inner++)
   {
  tem = Index == 0 ? 0 : (*(matrix *)Res)[Outer][Inner];
  tem += A[Outer][Index] * B[Index][Inner];
  (*(matrix *)Res)[Outer][Inner] = tem;
   }

this conditional might kill performance for you.  The advantage is that this
loop can now be more efficiently vectorized.



>
>
> This loop is where most of the time is spent. But it would have been good if 
> I had access to h/w tracing to see if the interchanged loop reduces cache 
> misses as well as to see what is causing it to run this much slower.
>
>
>
> Thanks for your reply!
>
>
>
> From: Richard Biener 
> Date: Thursday, February 13, 2025 at 2:57 AM
> To: Visda Vokhshoori - C51841 
> Cc: gcc@gcc.gnu.org 
> Subject: Re: 22% degradation seen in embench:matmult-int
>
> [You don't often get email from richard.guent...@gmail.com. Learn why this is 
> important at https://aka.ms/LearnAboutSenderIdentification ]
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the 
> content is safe
>
> On Wed, Feb 12, 2025 at 4:38 PM Visda.Vokhshoori--- via Gcc
>  wrote:
> >
> > Embench is used for benchmarking on embedded devices.
> > This one project matmult-int has a function Multiply.  It’s a matrix 
> > multiplication for 20 x 20 matrix.
> > The device is a ATSAME70Q21B which is Cortex-M7
> > The compiler is arm branch based on GCC version 13
> > We are compiling with O3 which has loop-interchange pass on by default.
> >
> > When we compile with -fno-loop-interchange we get all 22% back plus 5% 
> > speed up.
> >
> > When we do the loop interchange on the one loop nest that get interchanged 
> > it is slightly (.7%) faster.
> >
> > Has anyone else seen large degradation as a result of loop interchange?
>
> I would suggest to compare the -fopt-info diagnostic output with and
> without -fno-loop-interchange,
> the interchanged loop might for example no longer vectorize.  Other
> than that - no, loop interchange
> isn't applied very often and it has a very conservative cost model.
>
> Are you able to share a testcase?
>
> Richard.
>
> >
> > Thanks