Re: Memory access in GIMPLE

2025-04-02 Thread Richard Biener via Gcc
On Thu, Apr 3, 2025 at 2:23 AM Krister Walfridsson via Gcc
 wrote:
>
> I have more questions about GIMPLE memory semantics for smtgcc.
>
> As before, each section starts with a description of the semantics I've
> implemented (or plan to implement), followed by concrete questions if
> relevant. Let me know if the described semantics are incorrect or
> incomplete.
>
>
> Accessing memory
> 
> Memory access in GIMPLE is done using GIMPLE_ASSIGN statements where the
> lhs and/or rhs is a memory reference expression (such as MEM_REF). When
> both lhs and rhs access memory, one of the following must hold --
> otherwise the access is UB:
>   1. There is no overlap between lhs and rhs
>   2. lhs and rhs represent the same address
>
> A memory access is also UB in the following cases:
>   * Any accessed byte is outside valid memory
>   * The pointer violates the alignment requirements
>   * The pointer provenance doesn't match the object
>   * The type is incorrect from a TBAA perspective
>   * It's a store to constant memory

correct.

Note that GIMPLE_CALL and GIMPLE_ASM can also access memory.

> smtgcc requires -fno-strict-aliasing for now, so I'll ignore TBAA in this
> mail. Provenance has its own issues, which I'll come back to in a separate
> mail.
>
>
> Checking memory access is within bounds
> ---
> A memory access may be represented by a chain of memory reference
> expressions such as MEM_REF, ARRAY_REF, COMPONENT_REF, etc. For example,
> accessing a structure:
>
>struct s {
>  int x, y;
>};
>
> as:
>
>int foo (struct s * p)
>{
>  int _3;
>
>   :
>  _3 = p_1(D)->x;
>  return _3;
>}
>
> involves a MEM_REF for the whole object and a COMPONENT_REF to select the
> field. Conceptually, we load the entire structure and then pick out the
> element -- so all bytes of the structure must be in valid memory.
>
> We could also do the access as:
>
>int foo (struct s * p)
>{
>  int * q;
>  int _3;
>
>   :
>  q_2 = &p_1(D)->x;
>  _3 = *q_2;
>  return _3;
>}
>
> This calculates the address of the element, and then reads it as an
> integer, so only the four bytes of x must be in valid memory.
>
> In other words, the compiler is not allowed to optimize:
>q_2 = &p_1(D)->x;
>_3 = *q_2;
> to
>_3 = p_1(D)->x;

Correct.  The reason that p_1(D)->x is considered accessing the whole
object is because of TBAA, so with -fno-strict-aliasing there is no UB
when the whole object isn't accessible (but the subsetted piece is).

> Question: Describing the first case as conceptually reading the whole
> structure makes sense for loads. But I assume the same requirement -- that
> the entire object must be in valid memory -- also applies for stores. Is
> that correct?

Yes.  Like for loads this is for the purpose of TBAA.

>
>
> Allowed out-of-bounds read?
> ---
> Compile the function below for x86_64 with "-O3 -march=x86-64-v2":
>
>int f(int *a)
>{
>  for (int i = 0; i < 100; i++)
>if (a[i])
>  return 1;
>  return 0;
>}
>
> The vectorizer transforms this into code that processes one scalar element
> at a time until the pointer is 16-byte aligned, then switches to vector
> loads.
>
> The original code is well-defined when called like this:
>
>int a[2] __attribute__((aligned(16))) = {1, 0};
>f(a);
>
> But the vectorized version ends up reading 8 bytes out of bounds.
>
> This out-of-bounds read is harmless in practice -- it stays within the
> same memory page, so the extra bytes are accessable. But it's invalid
> under the smtgcc memory model.
>
> Question: Is this considered a valid access in GIMPLE? If so, what are the
> rules for allowed out-of-bounds memory reads?

The "out-of-bound memory reads" thing is used for
 a) TBAA
 b) possibly range analysis for access indexes (the 'i' in a[i])

For TBAA vector(T) is the same as T.  For index analysis, unless
the vector(T)[i'] access is fully outside of the contained object, it
should not be inferred that i' is an invalid index.

Validity at runtime is then determined by whether the difference can
be observed which to my understanding is about traps from unmapped
pages in case of reads (and of course bogus values written to unrelated
memory for writes).  (similarly 'volatile' accesses are considered to be
exactly observable)

So the "GIMPLE abstract machine" (to introduce that term...) would
consider aligned, partly out-of-bound accesses as valid, fully
out-of-bound accesses as invalid.  Complementary to that, TBAA
restrictions apply (which for vector vs. component accesses are waived).

>
>
> Alignment check
> ---
> Question: smtgcc currently gets the alignment requirements by calling
> get_object_alignment on the tree expression returned from
> gimple_assign_lhs (for stores) or gimple_assign_rhs1 (for loads). Is that
> the correct way to get the required alignment?

That's the cons

Memory access in GIMPLE

2025-04-02 Thread Krister Walfridsson via Gcc

I have more questions about GIMPLE memory semantics for smtgcc.

As before, each section starts with a description of the semantics I've 
implemented (or plan to implement), followed by concrete questions if 
relevant. Let me know if the described semantics are incorrect or 
incomplete.



Accessing memory

Memory access in GIMPLE is done using GIMPLE_ASSIGN statements where the 
lhs and/or rhs is a memory reference expression (such as MEM_REF). When 
both lhs and rhs access memory, one of the following must hold -- 
otherwise the access is UB:

 1. There is no overlap between lhs and rhs
 2. lhs and rhs represent the same address

A memory access is also UB in the following cases:
 * Any accessed byte is outside valid memory
 * The pointer violates the alignment requirements
 * The pointer provenance doesn't match the object
 * The type is incorrect from a TBAA perspective
 * It's a store to constant memory

smtgcc requires -fno-strict-aliasing for now, so I'll ignore TBAA in this 
mail. Provenance has its own issues, which I'll come back to in a separate 
mail.



Checking memory access is within bounds
---
A memory access may be represented by a chain of memory reference 
expressions such as MEM_REF, ARRAY_REF, COMPONENT_REF, etc. For example, 
accessing a structure:


  struct s {
int x, y;
  };

as:

  int foo (struct s * p)
  {
int _3;

 :
_3 = p_1(D)->x;
return _3;
  }

involves a MEM_REF for the whole object and a COMPONENT_REF to select the 
field. Conceptually, we load the entire structure and then pick out the 
element -- so all bytes of the structure must be in valid memory.


We could also do the access as:

  int foo (struct s * p)
  {
int * q;
int _3;

 :
q_2 = &p_1(D)->x;
_3 = *q_2;
return _3;
  }

This calculates the address of the element, and then reads it as an 
integer, so only the four bytes of x must be in valid memory.


In other words, the compiler is not allowed to optimize:
  q_2 = &p_1(D)->x;
  _3 = *q_2;
to
  _3 = p_1(D)->x;

Question: Describing the first case as conceptually reading the whole 
structure makes sense for loads. But I assume the same requirement -- that 
the entire object must be in valid memory -- also applies for stores. Is 
that correct?



Allowed out-of-bounds read?
---
Compile the function below for x86_64 with "-O3 -march=x86-64-v2":

  int f(int *a)
  {
for (int i = 0; i < 100; i++)
  if (a[i])
return 1;
return 0;
  }

The vectorizer transforms this into code that processes one scalar element 
at a time until the pointer is 16-byte aligned, then switches to vector 
loads.


The original code is well-defined when called like this:

  int a[2] __attribute__((aligned(16))) = {1, 0};
  f(a);

But the vectorized version ends up reading 8 bytes out of bounds.

This out-of-bounds read is harmless in practice -- it stays within the 
same memory page, so the extra bytes are accessable. But it's invalid 
under the smtgcc memory model.


Question: Is this considered a valid access in GIMPLE? If so, what are the 
rules for allowed out-of-bounds memory reads?



Alignment check
---
Question: smtgcc currently gets the alignment requirements by calling 
get_object_alignment on the tree expression returned from 
gimple_assign_lhs (for stores) or gimple_assign_rhs1 (for loads). Is that 
the correct way to get the required alignment?



   /Krister


Re: GSoC: Application Proposal for Simple File System for Nvidia and AMD GPU Code Testing

2025-04-02 Thread Ambika Sharan via Gcc
>
> This expands on implementation details, compares different approaches
> (in-memory vs. RPC-based solutions), and identifies relevant GCC components
> to modify.
>
> 1. Overview of Implementation Approaches
>
> Currently, GCC’s GPU offloading test framework lacks support for file I/O,
> causing many test cases to fail. To solve this, two primary solutions are
> proposed:
>
>
>1.
>
>In-Memory File System (Preferred Approach)
>
>
>-
>
>Implement a simple file system that operates entirely in volatile
>memory
>-
>
>Provides an interface for file operations (open(), read(), write(),
>etc.).
>-
>
>Integrated directly within the libgomp OpenMP runtime for GPUs.
>
>
> Remote Procedure Call (RPC) File System
>
>-
>
>Implements a host-GPU communication channel using RPC mechanisms.
>-
>
>Allows the GPU to request file operations on the host (like read(),
>write()).
>-
>
>Requires a runtime service running on the host that interacts with the
>GPU.
>
>
> Both approaches have trade-offs, which are discussed below.
>
> 2. Detailed Implementation Plan
>
>  Approach 1: In-Memory File System
>
> Concept:
>
> This approach mimics a file system within memory.
>
> Useful for temporary file operations required during GPU execution.
>
> Works best when file persistence isn’t required beyond test execution.
>
> Key Components:
>
> A global data structure (gpu_file_table[]) to store file descriptors and
> file contents.
>
> Functions for standard file operations:
>
>-
>
>gpu_fopen() → Opens a file in memory.
>-
>
>gpu_fwrite() → Writes to the in-memory buffer.
>-
>
>gpu_fread() → Reads file content.
>-
>
>gpu_fclose() → Closes a file and releases memory.
>
>
> Integration with GCC:
>
>-
>
>Modify libgomp to introduce these functions (libgomp/gpu_file_io.c).
>-
>
>Override existing file I/O calls in OpenMP test cases to use the
>in-memory system.
>
>
> *Pros and Cons:*
>
>  Pros:
>
>1.
>
>Fast: No network latency.
>2.
>
>Self-contained: Works independently of the host.
>3.
>
>Minimal modifications: Only requires changes in libgomp.
>
> Cons:
>
>1.
>
>Files disappear after execution.
>2.
>
>Limited size: Memory constraints may prevent large file operations.
>
> Approach 2: RPC-Based File System
>
> Concept:
>
>1.
>
>Uses an RPC server on the host to handle file requests from the GPU.
>2.
>
>GPU communicates with the host via PCIe, shared memory, or
>socket-based RPC.
>3.
>
>Suitable for real-world file interactions where data persistence is
>required.
>
> Key Components:
>
>-
>
>Host-side daemon (gpu_fs_server)
>-
>
>   Runs in the background and listens for GPU file access requests.
>   -
>
>   Handles operations like open(), read(), write(), close().
>   -
>
>GPU Runtime Library (gpu_fs_client)
>-
>
>   GPU sends file requests to the host via RPC calls.
>   -
>
>   Waits for responses and returns data to the calling program.
>
> Integration with GCC:
>
> Modify libgomp/nvptx.c and libgomp/gcn.c to call RPC file system functions.
>
> Modify OpenMP test cases to use host files via RPC.
>
> *Pros and Cons:*
>
> Pros
>
>1.
>
>Persistent file storage: Files remain available after execution.
>2.
>
>Supports larger files: Not limited by GPU memory.
>3.
>
>Closer to real-world applications that access disk files.
>
> Cons:
>
>1.
>
>Network/PCIe overhead: Slower than in-memory operations.
>2.
>
>Additional complexity: Requires host-side service and GPU
>communication handling.
>3.
>
>Synchronization challenges: Needs efficient queueing and handling of
>file requests.
>
> 3. Integration with GCC Test Harness
>
> Both approaches require modifications to GCC’s test framework (DejaGNU) to
> ensure GPU tests correctly handle file I/O.
>
> Key Files in GCC:
>
>1.
>
>libgomp/testsuite/lib/libgomp.exp
>
> Contains test execution logic for OpenMP offloading.
>
> Needs modifications to redirect file I/O to the new file system.
>
>2.
>
>libgomp/nvptx.c (Nvidia GPUs) & libgomp/gcn.c (AMD GPUs)
>
> Contains the GPU offloading runtime.
>
> Will be updated to intercept file system calls.
>
> libgomp/libgomp.h (OpenMP Headers)
>
> Needs new function declarations for gpu_fopen(), gpu_fwrite(), etc.
>
>3.
>
>libgomp/gpu_file_io.c
>
> New file to implement in-memory file operations.
>
> 4. Existing Functionality to Build Upon
>
>-
>
>OpenMP Offloading in GCC
>-
>
>   GCC already supports offloading OpenMP/OpenACC programs to GPUs.
>   -
>
>   This project extends its ability to handle file I/O in offloaded
>   execution.
>   -
>
>GCC’s DejaGNU Framework
>-
>
>   Used for automated testing.
>   -
>
>   The new file system will integrate with DejaGNU to improve GPU test
>   coverage

Re: GSoC 2025 – Excited About GCC Go Escape Analysis & Seeking Guidance

2025-04-02 Thread Thomas Schwinge
Hi Astha, Ian!

On 2025-03-31T19:48:16+0530, Astha Pipania via Gcc  wrote:
> I hope you're doing well!

Astha, welcome to GCC!

> I'm incredibly excited about the "GCC Go Escape
> Analysis" project for GSoC 2025.

... which is listed on
.

Ian, can you comment on whether that's still a good project nowadays, and
whether you (I suppose?) would be available to mentor, should the
occasion arise?


Grüße
 Thomas


> I've spent time reviewing past
> contributions, including the 2024 work related to Go in GCC, and I’m eager
> to build on that progress in a meaningful way.
>
> After going through the mailing lists and discussions, I want to ensure my
> initial contributions align with the current priorities of the project.
> Could you please guide me on the type of bug fixes or improvements that
> would be most valuable at this stage? I want to focus on something
> impactful that contributes to the project’s long-term goals.
>
> Looking forward to your insights, and I truly appreciate your time and
> guidance!
>
> Best regards,
> Astha Pipania


Re: GSoC mandatory step error

2025-04-02 Thread Thomas Schwinge
Hi Leul!

On 2025-03-31T18:33:51-0400, Leul Abiy via Gcc  wrote:
> I just wanted to ask about the build of gcc. I know it is a required step
> before applying to GCC for the GSoC 2025. However, I get an error.

Welcome to GCC, and good luck for GSoC 2025!

In addition to what Martin already wrote:

> Here is the error: [...]

> ChatGPT doesn't seem to be able to fix it.

Eh.  ;'-)

> I used this command for configure: [...]
> I tried rerunning [...]
> it didn't work. Please let me know what I should do next.

Let me say that you've gained some bonus points already: you've provided
the exact error message, in plain text form (instead of a screenshot
image or similar, or not at all...), and you've provided the build steps
that lead to this error, and you've provided some comments on what you've
already tried to overcome the issue.  (Additionally, the exact sources
revision might be helpful.)

I didn't try, but with the information you provided, I should be able to
reproduce the problem locally.  A lot of newcomers fail to provide such
details, even if requested explicitly, so thank you for being more
helpful!  :-)

> I really want to
> work for gcc this summer.

So, here's your first task: fix this issue.  :-)

But first, update the sources, to see if the problem has been resolved
already.  If not, patch the GCC sources until it builds, and submit your
patch to ; see
.
CC Jakub Jelinek , who has recently been working on
'gcc/tree-tailcall.cc'.


Grüße
 Thomas


Re: GSoC mandatory step error

2025-04-02 Thread Jakub Jelinek via Gcc
On Tue, Apr 01, 2025 at 10:09:56AM +0200, Martin Jambor wrote:
> The simple fix is to initialize the variable to nullptr in the source,
> of course. :-)

It is a false positive.
  gimple *stmt;
...
  for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
{
  stmt = gsi_stmt (gsi);
...
}
...
  if (gsi_end_p (gsi))
{
...
  return;
}
...
  if ((stmt_could_throw_p (cfun, stmt)
   && !stmt_can_throw_external (cfun, stmt)) || EDGE_COUNT (bb->succs) > 1)
So, it is possible that stmt isn't initialized if there are no statements in
that bb at all, but in that case it recurses on the predecessors and
returns, so when stmt is used later in the function, it must be initialized.
If gsi_end_p is not for some reason inlined nor IPA analyzed then the
compiler might not know that if gsi doesn't change then two gsi_end_p
calls will always give the same answer.
I agree
-  gimple *stmt;
+  gimple *stmt = NULL;
doesn't hurt though.
And I think this potential uninit false positive is there for quite a while,
I think since r10-44-gd407e7f53b4a9c3f6.

Jakub



Re: GSoC: Application Proposal for Simple File System for Nvidia and AMD GPU Code Testing

2025-04-02 Thread Thomas Schwinge
Hi Ambika!

Welcome to GCC!

On 2025-03-29T15:26:18-0500, Ambika Sharan via Gcc  wrote:
>  Simple File System for Nvidia and AMD GPU Code Generation Testing

Thanks for your interest, and initial work on this project idea.

Please add more detail: ideas how you think you'd implement the
respective functionality, pros and cons of different approaches, relevant
files in GCC and/or elsewhere, and existing functionality to build upon.

If you have specific questions, we'll be happy to look into these.


Grüße
 Thomas


> 2. Project Description and Goals
>
>-
>
>This project aims to enhance the GCC testing framework for GPU-targeted
>code generation by developing a simple "in-memory" file system or an RPC
>mechanism for devices to access host files. These features are necessary to
>support file operations in GPU test cases that currently fail due to the
>lack of I/O support.
>-
>
>Key Deliverables:
>-
>
>   Implement a volatile, in-memory file system (initially empty or with
>   preloaded files).
>   -
>
>   Extend test harness functionality to handle additional test case
>   files.
>   -
>
>   Investigate and, if feasible, implement an RPC mechanism to enable
>   device-host file interactions.
>   -
>
>   Test and document the changes for seamless integration into the GCC
>   framework.
>
> 3. Why This Project is Worthwhile
>
>-
>
>Testing infrastructure is critical for ensuring the robustness of
>GPU-targeted code generation, especially in HPC workloads where GPUs
>dominate. By addressing this gap in GCC, the project enhances its utility
>for developers using Nvidia and AMD GPUs in high-performance computing.
>-
>
>With the growing adoption of OpenACC/OpenMP offloading and the rise of
>GPU-based workflows, this improvement ensures GCC remains a competitive and
>reliable compiler for modern heterogeneous hardware
>
> 4. Introduction and Skills
>
>-
>
>Who I Am:
>My name is Ambika Sharan, and I am a Computer Science and Data Science
>student at the University of Wisconsin-Madison, graduating in December
>2025. My primary interests lie in high-performance computing (HPC), GPU
>programming, and compiler optimization.
>-
>
>Relevant Skills:
>-
>
>   Programming Languages: Proficient in C, C++, Python, and Bash.
>   -
>
>   GPU Programming: Experience with CUDA and OpenMP.
>   -
>
>   Linux Systems: Advanced knowledge of Linux-based systems and
>   compiling software from source.
>   -
>
>   Research Experience: Conducted AI/ML research with Wisconsin Science
>   and Computing Emerging Research Stars (WISCERS).
>   -
>
>   Internship Experience: At AMD, contributed to AI/ML initiatives
>   involving performance engineering and benchmarking.
>   -
>
>Why I’m a Good Fit:
>My background in HPC, coupled with my experience in low-level GPU
>programming and Linux-based workflows, equips me to address the technical
>challenges of this project. My ability to work collaboratively with diverse
>teams and adapt quickly to new tools and environments ensures I can
>contribute effectively to the GCC community.
>
> 5. Preparation and Research
>
>-
>
>Current Progress:
>-
>
>   Engaged with the GCC community on mailing lists and IRC to clarify
>   technical details.
>   -
>
>   Reviewed the paper "A Modern Compiler Infrastructure for OpenMP"
>    and relevant GCC documentation.
>   -
>
>   Built and tested GCC on my local machine, gaining familiarity with
>   its codebase and test harness.
>   -
>
>Next Steps:
>-
>
>   Continue engaging with the community for feedback on this application.
>   -
>
>   Finalize the design for the in-memory file system or RPC mechanism,
>   ensuring alignment with GCC’s existing architecture.


GSoC

2025-04-02 Thread Leul Abiy via Gcc
Dear Sir/Madam,

I would like to work on the rust frontend for this summer. I am trying to
break down all the steps for the first project in the rust frontend. So far
I plan on creating the visitor class as stated and then using it to deal
with all the different types of errors such as dead code, and unused
variables. Can you go into a little more depth on the project? For example
what is GCC internal infrastructure that is currently being used called? Is
it GIMPLE (according to chatgpt)?

I would also like to maybe test this code with maybe an unused variable and
put a breakpoint in the GIMPLE. This will help me see the workflow better.
Please try to guide me into doing this if possible.

Best Wishes,
Leul Shiferaw


Re: GSOC interest in Extend the static analysis pass, [OpenACC]

2025-04-02 Thread Thomas Schwinge
Hi Kaaden!

On 2025-03-10T22:20:21+, Kaaden Ruman via Gcc  wrote:
> Hello, my name is Kaaden and I am a student at the University of Alberta in 
> Canada.

Welcome to GCC!

> I am interested in pursuing the "Extend the static analysis pass" idea as a 
> medium size project. 

I'm sorry that I didn't pay full attention; I've recently removed that
Analyzer/OpenACC idea from the list of project ideas: I already have a
student working on this, outside of GSoC.

> Also, I noticed that this link in the project idea description is dead: 
> https://mid.mail-archive.com/875yp9pyyx.fsf@euler.schwinge.homeip.net

Thanks for reporting.  I'll fix that up
()
in case we ever put this idea back in the list.

Would there perhaps be another GCC project that you might consider for
GSoC?  Just one example:

> I have previous experience in compilers (and working in large codebases) from 
> interning at IBM where I worked on their COBOL compiler

So, as you've maybe heard, GCC recently got a COBOL front end
contributed.  Now, I don't know if there are currently any projects
suitable for GSoC, and if the GCC/COBOL guys would be available for
mentoring, but we could reach out to them if you'd like.


Grüße
 Thomas


> and binary optimiser (mostly backend). I have also taken a compiler course 
> with a project of an LLVM based compiler (mostly frontend). 
>
> I also have taken and enjoyed a GPU programming course so extending checks 
> for OpenACC caught my eye. In that course we mainly worked with CUDA but 
> discussed other tools such as OpenACC as well, so I have some familiarity.
>
> Thanks,
> Kaaden


Re: [GSoC][Enhance OpenACC support] Uncertain about Cache directive functionality and device_type clause goal

2025-04-02 Thread Thomas Schwinge
Hi Chenlu!

On 2025-03-27T22:05:02+1100, Zhang lv via Gcc  wrote:
>  Hi here,

Welcome to GCC!

> I found digging into OpenACC meaningful. It's a late start for a GSoC
> proposal

..., but not yet too late!  :-)

> and any suggestions from the community are appreciated! Feel free
> to comment on any part of it. To save time for readers, I've outlined my
> key understandings here—expansion and polishing are still needed.
>
> I'm not sure whether my understanding of each task is correct, especially
> for the cache directive and device_type clause. Here is my current
> understanding:
>
> *3. Make the OpenACC **cache** Directive Actually Do Something*
>
> Currently, the cache directive in OpenACC is parsed at the front end, but
> not used for any optimization purposes, such as data prefetching or moving
> data to low-latency memory (e.g., L1/L2/L3 cache or GPU cache [5]).
>
> *TODO:* My current understanding is that after the OACC_CACHE directive is
> lowered to GIMPLE via gimplify_oacc_cache, a new tree-SSA optimization pass
> could be added. This pass might be similar to the existing aprefetch pass,
> or OpenACC prefetch logic could be integrated into it.

..., or even just synthesizing specific (GIMPLE?) constructs, so that the
existing pass can do its work -- in case that the pass' existing support
maps to (at least specific instances of) the functionality provided by
the respective offloading back ends.  That'll need more research; I don't
know yet, either.

> The goal may be emitting prefetch instructions by inserting suitable
> built-in functions and relying on the backend to map them to runtime API
> calls or device-specific instructions through RTL templates.
>
> However, several questions remain:
>
>- Since OpenACC supports both accelerators (e.g., GPUs) and multicore
>CPUs, should we handle both cases?
>   - For CPUs, we can refer to each vendor's ISA (e.g., x86_64, Arm) to
>   decide which prefetch instructions to generate.

In general that's right, but note that currently GCC does not yet
implement OpenACC support for multicore CPU, but only for AMD and Nvidia
GPUs, or single-threaded host-fallback execution (with 'if(false)', or if
no device is available).  But then, in addition to the code offloading
devices, data prefetching etc. conceptually also applies to the host code
generation.  (..., but that need not be the focus of the work, at this
time.)

>   - For GPUs, are we expected to use prefetch instructions from GPU ISA
>   or Should we manual use runtime API routines like acc_memcpy_device to
>   manage data?

You're right that there may also be cases where we may call specific
functions of the respective GPU driver/library, to mark up memory regions
in specific ways, etc., but for OpenACC 'cache', my first idea would've
been to use specific individual GPU instructions, or instruction
variants/flags for existing instructions, as applicable.

>- Additional considerations include choosing a suitable prefetch
>distance, which may differ by device type or architecture.

Right.

Understand, that at a specific point in the compilation pipeline, the
offloading code steam gets split off of the host code stream.  Until
then, we'll have to keep a generic (GIMPLE) representation, and
afterwards, we know if we're host vs. AMD GPU vs. Nvidia GPU, and which
specific architecture we're compiling for, and can then map to specific
IR representations, and finally specific GPU instructions etc.

> *5. OpenACC **device_type** Clause*
>
> *TODO:* Is the device_type clause designed to allow users to manually
> specify the target platform in source code, rather than via compiler
> options like -foffload=amdgcn-amdhsa="-march=gfx900"
>
> , or compiler building options like--target=nvptx-none?

Yet different; see the OpenACC specification.  High-level: if you have a
toolchain targeting several offload devices, it allows for tuning OpenACC
processing *differently* per 'device_type'.  For example (untested, and
not very clever like this):

#pragma acc parallel loop device_type(nvidia) num_gangs(100) 
device_type(radeon) num_gangs(50)

Nvidia GPU offloading compilation/execution will use 'num_gangs(100)',
AMD GPU offloading compilation/execution will use 'num_gangs(50)'.

> My understanding for other task is as follows:
>
>
> *1. OpenACC **acc_memcpy_device** Runtime API Routine*
>
> The acc_memcpy_device routine is currently missing in GCC's OpenACC runtime
> implementation. According to the specification, this routine copies a
> specified number of bytes from one device address (data_dev_src) to another
> device address (data_dev_dest). Both addresses must reside in the current
> device’s memory. There is also an asynchronous variant that performs the
> data transfer on a specified async queue (async_arg). The routine must
> handle error cases such as null pointers and invalid async argument values.
> This function shares a similar implementation pattern with
> acc_memcpy_to_de

Re: GSoC 2025 – Excited About GCC Go Escape Analysis & Seeking Guidance

2025-04-02 Thread Ian Lance Taylor via Gcc
On Wed, Apr 2, 2025 at 6:35 AM Thomas Schwinge  wrote:
>
> On 2025-03-31T19:48:16+0530, Astha Pipania via Gcc  wrote:
> > I hope you're doing well!
>
> Astha, welcome to GCC!
>
> > I'm incredibly excited about the "GCC Go Escape
> > Analysis" project for GSoC 2025.
>
> ... which is listed on
> .
>
> Ian, can you comment on whether that's still a good project nowadays, and
> whether you (I suppose?) would be available to mentor, should the
> occasion arise?

I'm not sure where that project came from. Perhaps I added it years
ago and no longer remember it.

Unfortunately  it is not a good project today, as the Go frontend does
now implement escape analysis. It was added by Cherry Mui back in
2017.

Ian


Re: GSoC OpenACC

2025-04-02 Thread Thomas Schwinge
Hi Carter!

On 2025-03-30T22:00:41+, Carter Weidman via Gcc  wrote:
> My name is Carter. I’m looking to become active in the GCC community. 

Welcome to GCC!

> I would of course love to be funded through GSoC (and will most definitely be 
> submitting a formal proposal) but will contribute regardless of this.

That's great to hear, and good luck for GSoC!

> I’m interesting in the OpenACC parts of the posted projects, so helpful 
> resources to dig further would be much appreciated! If there is anything easy 
> I can do in the meantime that would be awesome as well.

Well, you should be working on writing up a convincing GSoC proposal for
a specific project that you'd like to work on.  :-)


Grüße
 Thomas