GSoC OpenACC

2025-04-04 Thread Carter Weidman via Gcc
Hello!

My name is Carter. I’m looking to become active in the GCC community. I would 
of course love to be funded through GSoC (and will most definitely be 
submitting a formal proposal) but will contribute regardless of this. I’m 
interesting in the OpenACC parts of the posted projects, so helpful resources 
to dig further would be much appreciated! If there is anything easy I can do in 
the meantime that would be awesome as well. 

Looking forward to learning from everyone!
Carter. 

Re: GSoC Interest in enhancing OpenACC support

2025-04-04 Thread Thomas Schwinge
Hi Pau!

On 2025-03-19T21:53:22-0400, Pau Sum via Gcc  wrote:
> Hey GCC Community,

Welcome to GCC!

> I am interested in contributing to the "Enhance OpenACC Support" project for 
> Google Summer of Code 2025.

Thanks for your interest!

I saw you also briefly discussed on GCC IRC.  I'm logged in, but usually
only able to follow the discussions very, very asynchronously.  :-/

You mentioned you "currently have a thinkpad with amd integrated gpu",
"AMD Ryzen 7 Pro 4750U", "Advanced Micro Devices, Inc. [AMD/ATI] Renoir
[Radeon RX Vega 6 (Ryzen 4000/5000 Mobile Series)]".  By now, have you
been able to figure out whether that one's supported with GCC/GCN
'-march=gfx90c', via building an offloading toolchain,
, and running some simple OpenACC or
OpenMP 'target' code?

(It really shouldn't be necessary, but... as others have remarked, for
stability reasons, when using the AMD GPU for code offloading, you may
have to disable other use of it -- like, graphical UI...)

Or, indeed, as an alternative to an AMD GPU, a system with any kind of
Nvidia GPU should work.

> In particular, I am excited about implementing acc_memcpy_device and the 
> #pragma acc cache directive.
>
> To prepare, I have started reviewing the OpenACC 2.6 specification and have 
> explored how acc_memcpy_to_device is implemented in GCC. Specifically, I have 
> examined libgomp/oacc-mem.c and libgomp/oacc-mem.h to understand memory 
> operations, as well as c-family/c-pragma.c, omp-expand.cc, and omp-low.cc to 
> see how OpenACC directives are parsed and processed. I have some experience 
> with OpenMP and a compilers course and those are proving to be helpful while 
> I research the requirements of the project.

Yes, that's a good start.

> I would appreciate any guidance on understanding the init, shutdown, and set 
> directives, as I have not yet explored them in depth.

Do you have specific questions in addition to the
"Notes on OpenACC 'init', 'shutdown', 'set' directives" already provided
on ?

> Additionally, I would like to get a better sense of the scope and challenges 
> of this project. From my research, it seems that implementing the cache 
> directive could be particularly complex. Are there other key technical 
> considerations or design constraints I should focus on?

The good thing about this project is that it can be scaled as we'd like:
it consists of several, (mostly) totally independent parts, like those
you've mentioned above.  Plus, additional features as per the more recent
releases of the OpenACC specification, with varying complexity, which we
(mostly) can pick individually.  Of course, you'd not start with the most
complex one first.

> Also, how much technical detail should I provide in my GSoC proposal? Should 
> I already have an idea of how I should be implementing these issues?

That'll certainly be good, to demonstrate that you have an idea what
you're doing.  Of course, for the more complicated ones, like, indeed,
the 'cache' directive, that's maybe not possible to flesh out in full
detail, but you could, for example, discuss some ideas about how this
might be implemented, at a higher level, and/or which existing GCC
infrastructure might be used for it, etc.

For example, in your proposal, you could list a number of OpenACC
features that are not yet implemented, and your idea how (or, similar to
which existing feature) they might be implemented, and a rough guess
about the estimated complexity.

We strongly recommend that aspiring GSoC contributors submit any kind of
GCC code changes in advance of the GSoC selection process.  (Need not be
directly related to the task you'd like to work on for GSoC.)

> Thanks for any guidance!

Happy to look into any further questions you may have.


Grüße
 Thomas


Re: Does gcc have different inlining heuristics on different platforms?

2025-04-04 Thread Eric Botcazou via Gcc
> You can see what -fuse-linker-plugin says, what gcc/auto-host.h contains
> for HAVE_LTO_PLUGIN.  I don't know whether the BFD linker (or mold)
> supports linker plugins on windows.  I do know that libiberty simple-object
> does not support PE, that is, at _least_ (DWARF) debuginfo will be subpar.

Here's an excerpt of a LTO compilation with -Wl,--verbose=2 on native Windows:

C:/GNATPRO/25.1/bin/../lib/gcc/x86_64-w64-mingw32/13.3.1/../../../../x86_64-
w64-mingw32/bin/ld.exe: C:\tmp\ccCnskmU.o (symbol from plugin): symbol `fma' 
definition: DEF, visibility: DEFAULT, resolution: PREVAILING_DEF_IRONLY
C:/GNATPRO/25.1/bin/../lib/gcc/x86_64-w64-mingw32/13.3.1/../../../../x86_64-
w64-mingw32/bin/ld.exe: C:\tmp\ccCnskmU.o (symbol from plugin): symbol `main' 
definition: DEF, visibility: DEFAULT, resolution: PREVAILING_DEF
C:/GNATPRO/25.1/bin/../lib/gcc/x86_64-w64-mingw32/13.3.1/../../../../x86_64-
w64-mingw32/bin/ld.exe: C:\tmp\ccCnskmU.o (symbol from plugin): symbol 
`__fpclassify' definition: UNDEF, visibility: DEFAULT, resolution: 
RESOLVED_EXEC
attempt to open C:\tmp\ccbENa6D.ltrans0.ltrans.o succeeded
C:\tmp\ccbENa6D.ltrans0.ltrans.o

This is with GNU ld.

-- 
Eric Botcazou




gcc-13-20250404 is now available

2025-04-04 Thread GCC Administrator via Gcc
Snapshot gcc-13-20250404 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/13-20250404/
and on various mirrors, see https://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 13 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-13 revision 7acd7d8d2a082e96ce91ef284cb55d28bfa00186

You'll find:

 gcc-13-20250404.tar.xz   Complete GCC

  SHA256=f455bca54326d4e1f0ad731867911335d4d7f227483a30eb7b21341cb7491143
  SHA1=ddede0366853dd4672dcaf583f867e190e84a8f6

Diffs from 13-20250328 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-13
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Memory access in GIMPLE

2025-04-04 Thread Richard Biener via Gcc
On Fri, Apr 4, 2025 at 3:37 AM Krister Walfridsson
 wrote:
>
> On Thu, 3 Apr 2025, Richard Biener wrote:
>
> > On Thu, Apr 3, 2025 at 2:23 AM Krister Walfridsson via Gcc
> >  wrote:
> >>
> >> I have more questions about GIMPLE memory semantics for smtgcc.
> >>
> >> As before, each section starts with a description of the semantics I've
> >> implemented (or plan to implement), followed by concrete questions if
> >> relevant. Let me know if the described semantics are incorrect or
> >> incomplete.
> >>
> >>
> >> Accessing memory
> >> 
> >> Memory access in GIMPLE is done using GIMPLE_ASSIGN statements where the
> >> lhs and/or rhs is a memory reference expression (such as MEM_REF). When
> >> both lhs and rhs access memory, one of the following must hold --
> >> otherwise the access is UB:
> >>   1. There is no overlap between lhs and rhs
> >>   2. lhs and rhs represent the same address
> >>
> >> A memory access is also UB in the following cases:
> >>   * Any accessed byte is outside valid memory
> >>   * The pointer violates the alignment requirements
> >>   * The pointer provenance doesn't match the object
> >>   * The type is incorrect from a TBAA perspective
> >>   * It's a store to constant memory
> >
> > correct.
> >
> > Note that GIMPLE_CALL and GIMPLE_ASM can also access memory.
> >
> >> smtgcc requires -fno-strict-aliasing for now, so I'll ignore TBAA in this
> >> mail. Provenance has its own issues, which I'll come back to in a separate
> >> mail.
> >>
> >>
> >> Checking memory access is within bounds
> >> ---
> >> A memory access may be represented by a chain of memory reference
> >> expressions such as MEM_REF, ARRAY_REF, COMPONENT_REF, etc. For example,
> >> accessing a structure:
> >>
> >>struct s {
> >>  int x, y;
> >>};
> >>
> >> as:
> >>
> >>int foo (struct s * p)
> >>{
> >>  int _3;
> >>
> >>   :
> >>  _3 = p_1(D)->x;
> >>  return _3;
> >>}
> >>
> >> involves a MEM_REF for the whole object and a COMPONENT_REF to select the
> >> field. Conceptually, we load the entire structure and then pick out the
> >> element -- so all bytes of the structure must be in valid memory.
> >>
> >> We could also do the access as:
> >>
> >>int foo (struct s * p)
> >>{
> >>  int * q;
> >>  int _3;
> >>
> >>   :
> >>  q_2 = &p_1(D)->x;
> >>  _3 = *q_2;
> >>  return _3;
> >>}
> >>
> >> This calculates the address of the element, and then reads it as an
> >> integer, so only the four bytes of x must be in valid memory.
> >>
> >> In other words, the compiler is not allowed to optimize:
> >>q_2 = &p_1(D)->x;
> >>_3 = *q_2;
> >> to
> >>_3 = p_1(D)->x;
> >
> > Correct.  The reason that p_1(D)->x is considered accessing the whole
> > object is because of TBAA, so with -fno-strict-aliasing there is no UB
> > when the whole object isn't accessible (but the subsetted piece is).
>
> I'm still a bit confused... Assume we're using -fno-strict-aliasing and p
> points to a 4-byte buffer in the example above. Then:
>_3 = p_1(D)->x;
> is valid. So that means my paragraph starting with "In other words..."
> must be incorrect? That is, the compiler is allowed to optimize:
>q_2 = &p_1(D)->x;
>_3 = *q_2;
> to
>_3 = p_1(D)->x;
> since it's equally valid. But this optimization would be invalid for
> -fstrict-aliasing?

Yes.  Note we avoid this transform even with -fno-strict-aliasing.

> And a memory access like:
>_4 = p_1(D)->y;
> would be invalid (for both -fno-strict-aliasing and -fstrict-aliasing)?
> That is, the subsetted piece must be accessible?

Of course.

Note even with -fno-strict-aliasing pointer provenance rules apply,
so if you manage to lay out i and j after each other

  extern int i, j;

accessing j via a pointer to i is still UB.

Richard.

>
> /Krister


Re: [GSoC][Enhance OpenACC support] Uncertain about Cache directive functionality and device_type clause goal

2025-04-04 Thread Zhang lv via Gcc
Hi Thomas,

Thanks for your detailed response! I've updated the proposal based on the
feedback. Please kindly check it out. Thanks a lot!

Project Goals and Tasks

GCC currently only partially supports the features specified in OpenACC
2.6. This project aims to enhance GCC's OpenACC support in the following
areas:

*1. OpenACC **acc_memcpy_device** Runtime API Routine*

The acc_memcpy_device routine is currently missing in GCC's OpenACC runtime
implementation. According to the specification, this routine copies a
specified number of bytes from one device address (data_dev_src) to another
device address (data_dev_dest). Both addresses must reside in the current
device’s memory. There is also an asynchronous variant that performs the
data transfer on a specified async queue (async_arg). The routine must
handle error cases such as null pointers and invalid async argument values.
This function shares a similar implementation pattern with
acc_memcpy_to_device and acc_memcpy_from_device, which transfer data
between host and device memory.

Implementation will mainly involve modifying the following files:

   - libgomp/libgomp.map
   - libgomp/openacc.h
   - libgomp/openacc_lib.h
   - libgomp/openacc.f90
   - libgomp/oacc-mem.c
   - libgomp/libgomp.h
   - libgomp/target.c
   - libgomp/plugin/plugin-nvptx.c
   - libgomp/plugin/plugin-gcn.c

The existing functions such as memcpy_tofrom_device, gomp_copy_dev2host,
gomp_device_copy , and gomp_device_copy_async were primarily designed for
acc_memcpy_to_device and acc_memcpy_from_device, which handle host-device
transfers.

At the device-specific level, these functions delegate low-level memory
operations to libgomp plugins (e.g., libgomp/plugin/plugin-nvptx.c for
NVIDIA and libgomp/plugin/plugin-gcn.c for AMD), which are dynamically
loaded from shared object (.so) files and use target runtime APIs (like
CUDA or ROCm) to allocate memory, launch kernels, and manage transfers.

For acc_memcpy_device, which handles device-to-device transfers, we should
design a similar logic. Further investigation is needed to structure and
implement this functionality effectively.

*2. Support for **init**, **shutdown**, and **set** Directives*

These directives are currently unsupported at the front-end level in GCC,
even though their corresponding runtime APIs—acc_init, acc_shutdown,
acc_set_device_type, and their async queue variants—are implemented. The
goal here is to add parsing support in the front end to map these
directives to the appropriate built-in functions. In GCC, front ends map
OpenACC directives to BUILT_IN_GOACC_* entries defined in
gcc/omp-builtins.def, and the back end expands these into runtime API
calls. This task involves analyzing and extending front-end source files,
taking inspiration from the implementation of the wait directive. Relevant
files include:

   - gcc/c-family/c-omp.cc
   - gcc/c/c-parser.cc
   - gcc/cp/parser.cc
   - gcc/fortran/trans-openmp.cc
   - gcc/omp-builtins.def
   - gcc/omp-oacc-kernels-decompose.cc

*3. Make the OpenACC **cache** Directive Actually Do Something*

Currently, the cache directive in OpenACC is parsed at the front end, but
not used for any optimization purposes, such as data prefetching or moving
data to low-latency memory (e.g., L1/L2/L3 cache or GPU cache [5]). We can
leverage existing prefetch support in GCC, such as the middle-end aprefetch
pass, which analyzes nested loop structures and inserts __builtin_prefetch
calls in GIMPLE for the innermost loop. These are later expanded by the
backend expand pass to target-specific prefetch instructions based on the
ISA. On CPUs, __builtin_prefetch is typically mapped to prefetch
instructions in the CPU ISA. By enabling options like -fopenacc and setting
proper flags, we can also define a customized __builtin_prefetch_openacc
call and write new RTL templates to map prefetch calls to GPU-specific
instructions—if such instructions exist for the target architecture. This
will be discussed at the end of this section.

To be more specific about how we can leverage existing infrastructure: when
the OACC_CACHE directive is lowered to GIMPLE via gimplify_oacc_cache, we
can insert __builtin_prefetch_openacc calls at appropriate locations using
gimple_build_call and gsi_insert_before in the aprefetch pass located in
tree-ssa-loop-prefetch.cc. These built-ins will then be picked up by the
backend RTL expansion pass. However, to the best of my knowledge, there is
currently no backend support for expanding prefetch instructions for GPU
targets. Therefore, additional backend work is needed to implement the
necessary RTL patterns or instruction selection hooks for GPU architectures.

Aside from the aprefetch pass enabled by the -fprefetch-loop-arrays
option, GCC also provides several tuning parameters via --param, such as
prefetch-latency, simultaneous-prefetches, and sched-autopref-queue-depth [7],
to control how aggressive or conservative the prefetching should be. All of
these parameters infl

Re: message has lines too long for transport - Was: Fwd: Mail delivery failed: returning message to sender

2025-04-04 Thread Richard Earnshaw (lists) via Gcc
On 23/03/2025 20:26, Toon Moene wrote:
> I had the following message when sending test results to gcc-testresults 
> *starting* today (3 times):
> 
> Note that the message is generated by *my* exim4 "mail delivery software" 
> (Debian Testing) - it is not the *receiving* side that thinks the lines are 
> too long.
> 
> Is anyone else seeing this ?
> 
> 
>  Forwarded Message 
> Subject: Mail delivery failed: returning message to sender
> Date: Sun, 23 Mar 2025 18:18:28 +0100
> From: Mail Delivery System 
> To: t...@moene.org
> 
> This message was created automatically by mail delivery software.
> 
> A message that you sent could not be delivered to one or more of its
> recipients. This is a permanent error. The following address(es) failed:
> 
>   gcc-testresu...@gcc.gnu.org
>     message has lines too long for transport

https://datatracker.ietf.org/doc/html/rfc5322#section-2.1.1

So there's a 998 (sic) character limit on the length of any line.  I'm guessing 
your mail posting tool is not reformatting the message and trying to pass it 
straight through.  Your SMTP server then rejects it.

I guess your options are to request a 'flowed' encoding type or to reformat the 
message with a filter before passing it on to your SMTP server.

R.


Re: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is optimized away.

2025-04-04 Thread Richard Biener via Gcc
On Fri, Apr 4, 2025 at 3:06 PM Robert Dubner  wrote:
>
> This program exhibits the behavior when compiled with -O2, -O3 and -OS
>
> PROGRAM-ID.  PROG.
> PROCEDUREDIVISION.
> MOVE 1 TO RETURN-CODE
> STOP RUN.

Hmm, the call to exit() is still in the program.

   0x0040114a <-358>:   movswl 0x250f(%rip),%edi#
0x403660 <__gg__data_return_code>
   0x00401151 <-351>:   mov%dx,(%rax)
=> 0x00401154 <-348>:   call   0x4010b0 

$edi is 0 here.  It looks like the store to __gg__data_return_code via
the move to (%rax)
and the load from it got interchanged due to aliasing issues possibly.

It works when compiling with -O2 -fno-strict-aliasing, so apparently
your override
which you said was present does not work.  I can't find it, after all.

You want to implement the LANG_HOOKS_POST_OPTIONS hook and
do

flag_strict_aliasing = 0;

therein.

Richard.

>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Friday, April 4, 2025 03:02
> > To: Robert Dubner 
> > Cc: GCC Mailing List 
> > Subject: Re: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is
> > optimized away.
> >
> > On Fri, Apr 4, 2025 at 12:17 AM Robert Dubner  wrote:
> > >
> > > The COBOL compiler has this routine:
> > >
> > > void
> > > gg_exit(tree exit_code)
> > >   {
> > >   tree the_call =
> > >   build_call_expr_loc(location_from_lineno(),
> > >   builtin_decl_explicit (BUILT_IN_EXIT),
> > >   1,
> > >   exit_code);
> > >   gg_append_statement(the_call);
> > >   }
> > >
> > > I have found that when GCOBOL is used with -O2, -O3, or -Os, the call to
> > > gg_exit() is optimized away, and the intended exit value is lost, and I
> > > end up with zero.
> > >
> > > By changing the routine to
> > >
> > > void
> > > gg_exit(tree exit_code)
> > >   {
> > >   tree args[1] = {exit_code};
> > >   tree function = gg_get_function_address(INT, "exit");
> > >   tree the_call = build_call_array_loc (location_from_lineno(),
> > > VOID,
> > > function,
> > > 1,
> > > args);
> > >   gg_append_statement(the_call);
> > >   }
> > >
> > > the call is not optimized away, and the generated executable behaves as
> > > expected.
> > >
> > > How do I prevent the call to gg_exit() from being optimized away?
> >
> > I don't see anything wrong here, so the issue must be elsewhere.
> > Do you have a COBOL testcase that shows the exit() being optimized?
> >
> > >
> > > Thanks!
> > >


Re: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is optimized away.

2025-04-04 Thread Richard Biener via Gcc
On Fri, Apr 4, 2025 at 3:35 PM Richard Biener
 wrote:
>
> On Fri, Apr 4, 2025 at 3:06 PM Robert Dubner  wrote:
> >
> > This program exhibits the behavior when compiled with -O2, -O3 and -OS
> >
> > PROGRAM-ID.  PROG.
> > PROCEDUREDIVISION.
> > MOVE 1 TO RETURN-CODE
> > STOP RUN.
>
> Hmm, the call to exit() is still in the program.
>
>0x0040114a <-358>:   movswl 0x250f(%rip),%edi#
> 0x403660 <__gg__data_return_code>
>0x00401151 <-351>:   mov%dx,(%rax)
> => 0x00401154 <-348>:   call   0x4010b0 
>
> $edi is 0 here.  It looks like the store to __gg__data_return_code via
> the move to (%rax)
> and the load from it got interchanged due to aliasing issues possibly.
>
> It works when compiling with -O2 -fno-strict-aliasing, so apparently
> your override
> which you said was present does not work.  I can't find it, after all.
>
> You want to implement the LANG_HOOKS_POST_OPTIONS hook and
> do
>
> flag_strict_aliasing = 0;
>
> therein.

Oh, and the actual problem is of course

  *(unsigned short *) (__gg___11_return_code6.data + (unsigned long)
((sizetype) D.260 * 1)) = D.259;
  ..pssr_retval = (signed int) __gg__data_return_code;

where __gg__data_return_code is a global variable but
__gg___11_return_code6 is something
weird of type cblc_field_type_node and I nowhere see the .data field
of it stored to (but it maybe is,
via pointer arithmetic again - who knows - it's not mentioned anywhere
else in the .original dump,
so it's likely coming from libgcobol?  So is __gg__data_return_code it seems.

OK, so you are storing a short but reading from a unsigned char
declaration.  Since the
declaration __gg__data_return_code is just 1 byte the 2-byte store
cannot possibly alias it.

Richard.

> Richard.
>
> >
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Friday, April 4, 2025 03:02
> > > To: Robert Dubner 
> > > Cc: GCC Mailing List 
> > > Subject: Re: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is
> > > optimized away.
> > >
> > > On Fri, Apr 4, 2025 at 12:17 AM Robert Dubner  wrote:
> > > >
> > > > The COBOL compiler has this routine:
> > > >
> > > > void
> > > > gg_exit(tree exit_code)
> > > >   {
> > > >   tree the_call =
> > > >   build_call_expr_loc(location_from_lineno(),
> > > >   builtin_decl_explicit (BUILT_IN_EXIT),
> > > >   1,
> > > >   exit_code);
> > > >   gg_append_statement(the_call);
> > > >   }
> > > >
> > > > I have found that when GCOBOL is used with -O2, -O3, or -Os, the call to
> > > > gg_exit() is optimized away, and the intended exit value is lost, and I
> > > > end up with zero.
> > > >
> > > > By changing the routine to
> > > >
> > > > void
> > > > gg_exit(tree exit_code)
> > > >   {
> > > >   tree args[1] = {exit_code};
> > > >   tree function = gg_get_function_address(INT, "exit");
> > > >   tree the_call = build_call_array_loc (location_from_lineno(),
> > > > VOID,
> > > > function,
> > > > 1,
> > > > args);
> > > >   gg_append_statement(the_call);
> > > >   }
> > > >
> > > > the call is not optimized away, and the generated executable behaves as
> > > > expected.
> > > >
> > > > How do I prevent the call to gg_exit() from being optimized away?
> > >
> > > I don't see anything wrong here, so the issue must be elsewhere.
> > > Do you have a COBOL testcase that shows the exit() being optimized?
> > >
> > > >
> > > > Thanks!
> > > >


RE: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is optimized away.

2025-04-04 Thread Robert Dubner
Thank you.  I will implement that hook. And I'll see about that data 
definition.

I could try to explain how RETURN-CODE became __gg___11_return_code6, and 
why I defined it as unsigned char __gg__data_return_code[2] = {0,0};  But I 
have a rule that I try to follow, which is that when I am starting to bore 
myself, I stop talking.

Thank you very much for the information about the hook, and thanks for a 
little bit of insight into how to see what the compiler is doing.

Bob D.


> -Original Message-
> From: Richard Biener 
> Sent: Friday, April 4, 2025 09:50
> To: Robert Dubner 
> Cc: GCC Mailing List 
> Subject: Re: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is
> optimized away.
>
> On Fri, Apr 4, 2025 at 3:35 PM Richard Biener
>  wrote:
> >
> > On Fri, Apr 4, 2025 at 3:06 PM Robert Dubner  wrote:
> > >
> > > This program exhibits the behavior when compiled with -O2, -O3 and -OS
> > >
> > > PROGRAM-ID.  PROG.
> > > PROCEDUREDIVISION.
> > > MOVE 1 TO RETURN-CODE
> > > STOP RUN.
> >
> > Hmm, the call to exit() is still in the program.
> >
> >0x0040114a <-358>:   movswl 0x250f(%rip),%edi#
> > 0x403660 <__gg__data_return_code>
> >0x00401151 <-351>:   mov%dx,(%rax)
> > => 0x00401154 <-348>:   call   0x4010b0 
> >
> > $edi is 0 here.  It looks like the store to __gg__data_return_code via
> > the move to (%rax)
> > and the load from it got interchanged due to aliasing issues possibly.
> >
> > It works when compiling with -O2 -fno-strict-aliasing, so apparently
> > your override
> > which you said was present does not work.  I can't find it, after all.
> >
> > You want to implement the LANG_HOOKS_POST_OPTIONS hook and
> > do
> >
> > flag_strict_aliasing = 0;
> >
> > therein.
>
> Oh, and the actual problem is of course
>
>   *(unsigned short *) (__gg___11_return_code6.data + (unsigned long)
> ((sizetype) D.260 * 1)) = D.259;
>   ..pssr_retval = (signed int) __gg__data_return_code;
>
> where __gg__data_return_code is a global variable but
> __gg___11_return_code6 is something
> weird of type cblc_field_type_node and I nowhere see the .data field
> of it stored to (but it maybe is,
> via pointer arithmetic again - who knows - it's not mentioned anywhere
> else in the .original dump,
> so it's likely coming from libgcobol?  So is __gg__data_return_code it
> seems.
>
> OK, so you are storing a short but reading from a unsigned char
> declaration.  Since the
> declaration __gg__data_return_code is just 1 byte the 2-byte store
> cannot possibly alias it.
>
> Richard.
>
> > Richard.
> >
> > >
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Friday, April 4, 2025 03:02
> > > > To: Robert Dubner 
> > > > Cc: GCC Mailing List 
> > > > Subject: Re: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT),
> is
> > > > optimized away.
> > > >
> > > > On Fri, Apr 4, 2025 at 12:17 AM Robert Dubner 
> wrote:
> > > > >
> > > > > The COBOL compiler has this routine:
> > > > >
> > > > > void
> > > > > gg_exit(tree exit_code)
> > > > >   {
> > > > >   tree the_call =
> > > > >   build_call_expr_loc(location_from_lineno(),
> > > > >   builtin_decl_explicit (BUILT_IN_EXIT),
> > > > >   1,
> > > > >   exit_code);
> > > > >   gg_append_statement(the_call);
> > > > >   }
> > > > >
> > > > > I have found that when GCOBOL is used with -O2, -O3, or -Os, the
> call to
> > > > > gg_exit() is optimized away, and the intended exit value is lost,
> and I
> > > > > end up with zero.
> > > > >
> > > > > By changing the routine to
> > > > >
> > > > > void
> > > > > gg_exit(tree exit_code)
> > > > >   {
> > > > >   tree args[1] = {exit_code};
> > > > >   tree function = gg_get_function_address(INT, "exit");
> > > > >   tree the_call = build_call_array_loc (location_from_lineno(),
> > > > > VOID,
> > > > > function,
> > > > > 1,
> > > > > args);
> > > > >   gg_append_statement(the_call);
> > > > >   }
> > > > >
> > > > > the call is not optimized away, and the generated executable
> behaves as
> > > > > expected.
> > > > >
> > > > > How do I prevent the call to gg_exit() from being optimized away?
> > > >
> > > > I don't see anything wrong here, so the issue must be elsewhere.
> > > > Do you have a COBOL testcase that shows the exit() being optimized?
> > > >
> > > > >
> > > > > Thanks!
> > > > >


COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is optimized away.

2025-04-04 Thread Robert Dubner
The COBOL compiler has this routine:

void
gg_exit(tree exit_code)
  {
  tree the_call = 
  build_call_expr_loc(location_from_lineno(),
  builtin_decl_explicit (BUILT_IN_EXIT),
  1,
  exit_code);
  gg_append_statement(the_call);
  }

I have found that when GCOBOL is used with -O2, -O3, or -Os, the call to
gg_exit() is optimized away, and the intended exit value is lost, and I
end up with zero.

By changing the routine to 

void
gg_exit(tree exit_code)
  {
  tree args[1] = {exit_code};
  tree function = gg_get_function_address(INT, "exit");
  tree the_call = build_call_array_loc (location_from_lineno(),
VOID,
function,
1,
args);
  gg_append_statement(the_call);
  }

the call is not optimized away, and the generated executable behaves as
expected.

How do I prevent the call to gg_exit() from being optimized away?

Thanks!



RE: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is optimized away.

2025-04-04 Thread Robert Dubner
This program exhibits the behavior when compiled with -O2, -O3 and -OS

PROGRAM-ID.  PROG.
PROCEDUREDIVISION.
MOVE 1 TO RETURN-CODE
STOP RUN.

> -Original Message-
> From: Richard Biener 
> Sent: Friday, April 4, 2025 03:02
> To: Robert Dubner 
> Cc: GCC Mailing List 
> Subject: Re: COBOL: Call to builtin_decl_explicit (BUILT_IN_EXIT), is
> optimized away.
>
> On Fri, Apr 4, 2025 at 12:17 AM Robert Dubner  wrote:
> >
> > The COBOL compiler has this routine:
> >
> > void
> > gg_exit(tree exit_code)
> >   {
> >   tree the_call =
> >   build_call_expr_loc(location_from_lineno(),
> >   builtin_decl_explicit (BUILT_IN_EXIT),
> >   1,
> >   exit_code);
> >   gg_append_statement(the_call);
> >   }
> >
> > I have found that when GCOBOL is used with -O2, -O3, or -Os, the call to
> > gg_exit() is optimized away, and the intended exit value is lost, and I
> > end up with zero.
> >
> > By changing the routine to
> >
> > void
> > gg_exit(tree exit_code)
> >   {
> >   tree args[1] = {exit_code};
> >   tree function = gg_get_function_address(INT, "exit");
> >   tree the_call = build_call_array_loc (location_from_lineno(),
> > VOID,
> > function,
> > 1,
> > args);
> >   gg_append_statement(the_call);
> >   }
> >
> > the call is not optimized away, and the generated executable behaves as
> > expected.
> >
> > How do I prevent the call to gg_exit() from being optimized away?
>
> I don't see anything wrong here, so the issue must be elsewhere.
> Do you have a COBOL testcase that shows the exit() being optimized?
>
> >
> > Thanks!
> >


Re: GSoC

2025-04-04 Thread Martin Jambor
Hi,

On Wed, Apr 02 2025, Leul Abiy via Gcc wrote:
> Dear Sir/Madam,
>
> I would like to work on the rust frontend for this summer.

We are delighted you found contributing to GCC interesting.

> I am trying to
> break down all the steps for the first project in the rust frontend. So far
> I plan on creating the visitor class as stated and then using it to deal
> with all the different types of errors such as dead code, and unused
> variables. Can you go into a little more depth on the project?

Please note that Rust GCC projects are a bit special in the sense that
they are often discussed primarily on Zulip of the GCC-Rust team:

https://gcc-rust.zulipchat.com/

So you may want to reach out to the GCC-Rust community primarily there.


> For example
> what is GCC internal infrastructure that is currently being used called? Is
> it GIMPLE (according to chatgpt)?

While the rust front-end eventually lowers their ASTs into GIMPLE
(possibly through GENERIC IR, I'm not actually sure), I assume that most
of their projects work on their own AST representation.

>
> I would also like to maybe test this code with maybe an unused variable and
> put a breakpoint in the GIMPLE. This will help me see the workflow better.
> Please try to guide me into doing this if possible.

I have not double checked, but I think that warnings about totally
unused variables is done in the front-ends and so not on GIMPLE (as
opposed to e.g. warnings about unused return values of functions which
have an attribute that says it should really be used somehow).  So I
really suggest bringing this up in the Zulip forum.

Good luck!

Martin




Re: Does gcc have different inlining heuristics on different platforms?

2025-04-04 Thread Richard Biener via Gcc
On Mon, Mar 31, 2025 at 1:20 PM Julian Waters via Gcc  wrote:
>
> Hi all,
>
> I've been trying to chase down an issue that's been driving me insane
> for a while now. It has to do with the flatten attribute being
> combined with LTO. I've heard that flatten and LTO are a match made in
> hell (Someone else's words, not mine), but from what I observe,
> several methods marked as flatten on Linux compile to an acceptable
> size with ok amount of inlining, but on Windows however... The exact
> same methods marked as flatten have their callees inlined so
> aggressively that they reach sizes of 5MB per method! Something seems
> to be different between how inlining works on the 2 platforms, what
> are the differences (If any) between Linux and Windows when it comes
> to inlining, particularly involving the flatten attribute? Is there a
> list of differences that is easily accessible somewhere, or
> alternatively is there somewhere in the gcc source where the
> heuristics are defined that I can decipher?
>
> Here's one such example of the differences between Linux and Windows
> (Both were compiled with the same optimization settings, -O3 and
> -flto=auto):
>
> Linux:
> 010b12d0 6289 t
> G1ParScanThreadState::trim_queue_to_threshold(unsigned int)
>
> Windows:
> 000296f9b0c0 00642d40 T
> G1ParScanThreadState::trim_queue_to_threshold(unsigned int) [clone
> .constprop.0]
> 000295125480 00630080 T
> G1ParScanThreadState::trim_queue_to_threshold(unsigned int)
>
>
> Thanks in advance for the help, and for humouring my question

The main difference is that LTO on Linux can use the linker plugin
to derive information about how TUs are combined while on Windows
we're using the "collect2 path" which is quite unmaintained and which
gives imprecise information.  This can already result in quite different
inlining.  You can "simulated" that on Linux with -fno-use-linker-plugin
(only for experimenting, don't use this unless necssary).

Richard.

>
> best regards,
> Julian


Re: [Draft] GSoC 2025 Proposal: Implementing Clang's -ftime-trace Feature in GCC

2025-04-04 Thread waffl3x via Gcc
On Thursday, April 3rd, 2025 at 10:45 PM, Andi Kleen  
wrote:

> 
> 
> On Fri, Apr 04, 2025 at 07:21:47AM +0300, Eldar Kusdavletov wrote:
> 
> > Thanks. I’ve submitted a more concrete version of the proposal — attaching 
> > it
> > here.
> > 
> > I’ve taken a brief look at Clang’s implementation, but the idea isn’t to 
> > follow
> > it exactly — rather, to provide a similar kind of trace-based visualization 
> > in
> > a way that fits naturally into GCC’s internals.
> > 
> > The plan is to build on top of the existing timevar infrastructure (as you
> > said), but shift focus toward a higher-level view: breaking down time spent 
> > on
> > a per-function basis, showing how different compiler passes nest and 
> > contribute
> > to the overall time, and exporting that as a Chrome trace (JSON).
> > 
> > Let me know if anything still feels too vague or needs clarification — 
> > happy to
> > adjust. Really appreciate your thoughts!
> 
> 
> Thanks for the write up. It looks a lot better.
> 
> For middle/backend optimization the pass manager already works on
> functions and measures the timevars this way. The information is just
> currently all aggregated.
> 
> It shouldn't be that hard to output this per function.
> 
> You may need some mapping from low level passes to higher level
> operations though (perhaps that is what you meant with
> aggregation layer)
> 
> I don't think it would be easy to go below functions at this level,
> it would require lots of changes in the individual passes, which
> would be a very large project by itself.
> 
> But then there is also slowness in the frontend, like declaration
> processing, macro and template expansion etc. These would
> be also interesting to account, but there is no pass manager
> at this level so you would need to add a much smaller number of
> custom hooks. This is easier because there isn't a large number
> of passes there At that level you (and likely should) account
> below functions to individual statements and declarations.
> You may also need to limit yourself to specific
> languages (e.g. C/C++ only)
> 
> I would separate these two cases in the project plan because
> they are quite different.
> 
> -Andi


Just to add my two cents here, I think this would be an excellent
addition and was surprised when I found that GCC was missing this
feature in the past.  It will also be really useful to be able to
compare the two compilers even if it's not directly apples to apples.

Alex


Re: GSoC 2025: In-Memory Filesystem for GPU Offloading Tests

2025-04-04 Thread Thomas Schwinge
Hi Arijit, Andrew!

Arijit, welcome to GCC!

On 2025-03-11T03:26:44+0530, Arijit Kumar Das via Gcc  wrote:
> Thank you for the detailed response! This gives me a much clearer picture
> of how things work.
>
> Regarding the two possible approaches:
>
>- I personally find *Option A (self-contained in-memory FS)* more
>interesting, and I'd like to work on it first.

Sure, that's fine.  ..., and we could then still put Option B on top, in
case that just Option A should turn out to be too easy for you.  ;-)

>- However, if *Option B (RPC-based host FS access)* is the preferred
>approach for GSoC, I’d be happy to work on that as well.

> For now, I’ll begin setting up the toolchain and running simple OpenMP
> target kernels as suggested. Thanks again for your guidance!

Let us know if you need further help; we understand it's not trivial to
get this set up at first.

Do you have access to a system with an AMD GPU supported by GCC, or any
Nvidia GPU?

Just a few more comments in addition to Andrew's very useful remarks.
(Thank you, Andrew!)

> On Mon, 10 Mar, 2025, 10:55 pm Andrew Stubbs,  wrote:
>> On 10/03/2025 15:37, Arijit Kumar Das via Gcc wrote:
>> > I have carefully read the project description and understand that the goal
>> > is to modify *newlib* and the *run tools* to redirect system calls for file
>> > I/O operations to a virtual, volatile filesystem in host memory, as the GPU

Instead of "in host memory", it should be "in GPU memory" (for Option A).

>> > lacks its own filesystem. Please correct me if I’ve misunderstood any
>> > aspect.

>> > I have set up the GCC source tree and am currently browsing relevant files
>> > in the *gcc/testsuite* directory. However, I am unsure *where the run tools
>> > source files are located and how they interact with newlib system calls.*

>> Newlib isn't part of the GCC repo, so if you
>> can't find the files then that's probably why!

(I assume, by now you've found the newlib source code?)

>> The "run" tools are installed as part of the offload toolchain, albeit
>> hidden under the "libexec" directory because they're really only used
>> for testing. You can find the sources with the config/nvptx or
>> config/gcn backend files.

Actually, only for GCN: 'gcc/config/gcn/gcn-run.cc'.  For nvptx, it's
part of : 'nvptx-run.cc'.

>> Currently, system calls such as "open" simply return EACCESS
>> ("permission denied") so the stub implementations are fairly easy to
>> understand (e.g. newlib/libc/sys/amdgcn/open.c).

(I assume, by now you've found the corresponding nvptx code in newlib?)

>> The task would be to
>> insert new code there that actually does something.  You do not need to
>> modify the compiler itself.


Grüße
 Thomas


Re: [Draft] GSoC 2025 Proposal: Implementing Clang's -ftime-trace Feature in GCC

2025-04-04 Thread Andi Kleen
On Fri, Apr 04, 2025 at 07:21:47AM +0300, Eldar Kusdavletov wrote:
> Thanks. I’ve submitted a more concrete version of the proposal — attaching it
> here.
> 
> I’ve taken a brief look at Clang’s implementation, but the idea isn’t to 
> follow
> it exactly — rather, to provide a similar kind of trace-based visualization in
> a way that fits naturally into GCC’s internals.
> 
> The plan is to build on top of the existing timevar infrastructure (as you
> said), but shift focus toward a higher-level view: breaking down time spent on
> a per-function basis, showing how different compiler passes nest and 
> contribute
> to the overall time, and exporting that as a Chrome trace (JSON).
> 
> Let me know if anything still feels too vague or needs clarification — happy 
> to
> adjust. Really appreciate your thoughts!

Thanks for the write up. It looks a lot better.

For middle/backend optimization the pass manager already works on
functions and measures the timevars this way.  The information is just
currently all aggregated.

It shouldn't be that hard to output this per function.

You may need some mapping from low level passes to higher level
operations though (perhaps that is what you meant with 
aggregation layer)

I don't think it would be easy to go below functions at this level,
it would require lots of changes in the individual passes, which
would be a very large project by itself.

But then there is also slowness in the frontend, like declaration
processing, macro and template expansion etc. These would 
be also interesting to account, but there is no pass manager
at this level so you would need to add a much smaller number of
custom hooks. This is easier because there isn't a large number
of passes there At that level you (and likely should) account
below functions to individual statements and declarations.
You may also need to limit yourself to specific
languages (e.g. C/C++ only)

I would separate these two cases in the project plan because
they are quite different.

-Andi



[GSoC] OpenACC bind() and device_type

2025-04-04 Thread Carter Weidman via Gcc
Here is my (potentially naive) understanding of how we may implement bind() and 
device_type (probably in a follow-up email) in GCC. These are primary parts of 
the project that I am interested in. I would be open to exploring the cache 
directive as well though would like to start with a limited scope and get 
feedback on how long just the first two could potentially take. 

The bind() clause, as part of OpenACC’s “routine directive” section in the spec 
defines a way to specify the name to use when calling a procedure from an 
accelerator device. This can be done in two ways; with an identifier, such as 
device_function, or a string, such as “device_function”. If it is an 
identifier, it will be treated like any other function in your source code. If 
a string is passed, the compiler will not enforce any naming rules or mangling, 
and the accelerator (or other device) will specially target that exact name. 

Implementation:
The bind clause will need to be parsed, represented in GIMPLE IR, mapped to an 
external device (lowered), generate bindings for the external device 
(nvptx/gcc), then tested and validated using existing GCC test infrastructure 
and adding some of my own. 

These are the relevant files for each part:

gcc/c-family/c-pragma.cc
c/c-parser.cc
cp/parser.cc

gcc/gimple.h
gcc/gimple.def

gcc/omp-low.cc

gcc/config/nvptx/ and/or gcc/config/gcn/

gcc/testsuite/c-c++-common/goacc/

Potential Runtime integration: 
libgomp/oacc/… (did not do much research on this yet so I may not know what I’m 
talking about)

I also have significant research prepared for implementing the device_type 
clause as well, but would like feedback on whether or not I’m in the correct 
ballpark with bind() first. I have further implementation details prepared but 
definitely need more time to get comfortable with the codebase and intricacies. 
I believe I could spend an entire summer in just these two but I don’t have the 
clearest vision for this project to make an accurate judgement. 

Be harsh! 

Thank you,
Carter.




Re: GSoC 2025 Introduction & Interest in GCC Rust Front-End

2025-04-04 Thread Martin Jambor
Hello,

and sorry for a somewhat late reply.

On Fri, Mar 28 2025, Ansh Jaiswar via Gcc wrote:
> Dear GCC Developers,
>
> I am Ansh Jaiswar , a second-year Computer Science student interested in 
> compilers and systems programming. I have experience with C/C++ and basic 
> knowledge of Rust.
>
> I am applying for GSoC 2025 and am particularly interested in the
> "Rewrite Rust Lints" project under GCC.

We are delighted you found contributing to GCC interesting.

>I have gone through the gccrs GitHub repository and understand that
>this project involves implementing lints at the HIR level instead of
>relying on GCC’s existing infrastructure. I believe this will improve
>the flexibility and maintainability of Rust linting in GCC.
>
> To prepare, I am currently exploring the Rust-GCC frontend and its HIR
> structure. I would appreciate any pointers on how to get started,
> particularly regarding the existing linting mechanism and the visitor
> pattern approach.
>
> Additionally, I would like to know if there are any beginner-friendly
> tasks I could work on before GSoC officially starts.

Please note that Rust GCC projects are a bit special in the sense that
they are often discussed primarily on Zulip of the GCC-Rust team:

https://gcc-rust.zulipchat.com/

So you may want to reach out to the GCC-Rust community primarily there.

Good luck!

Martin