[RFC] Summary of libgomp failures for offloading to nvptx from AArch64

2024-07-25 Thread Prathamesh Kulkarni via Gcc
Hi,
I am working on enabling offloading to nvptx from AAarch64 host. As mentioned 
on wiki (https://gcc.gnu.org/wiki/Offloading#Running_.27make_check.27),
I ran make check-target-libgomp on AAarch64 host (and no GPU) with following 
results:

=== libgomp Summary ===

# of expected passes14568
# of unexpected failures1023
# of expected failures  309
# of untested testcases 54
# of unresolved testcases   992
# of unsupported tests  644

It seems majority of the tests fail due to the following 4 issues:

* Compiling a minimal test-case:

int main()
{
  int x;
  #pragma omp target map (to: x)
  {
x = 0;
  }
  return x;
}

Compiling with -fopenmp -foffload=nvptx-none results in following issues:

(1) Differing values of NUM_POLY_INT_COEFFS between host and accelerator, which 
results in following ICE:

0x1a6e0a7 pp_quoted_string
../../gcc/gcc/pretty-print.cc:2277
 0x1a6ffb3 pp_format(pretty_printer*, text_info*, urlifier const*)
../../gcc/gcc/pretty-print.cc:1634
 0x1a4a3f3 diagnostic_context::report_diagnostic(diagnostic_info*)
../../gcc/gcc/diagnostic.cc:1612
 0x1a4a727 diagnostic_impl
../../gcc/gcc/diagnostic.cc:1775
 0x1a4e20b fatal_error(unsigned int, char const*, ...)
../../gcc/gcc/diagnostic.cc:2218
 0xb3088f lto_input_mode_table(lto_file_decl_data*)
 ../../gcc/gcc/lto-streamer-in.cc:2121
 0x6f5cdf lto_file_finalize
../../gcc/gcc/lto/lto-common.cc:2285
 0x6f5cdf lto_create_files_from_ids
../../gcc/gcc/lto/lto-common.cc:2309
 0x6f5cdf lto_file_read
../../gcc/gcc/lto/lto-common.cc:2364
 0x6f5cdf read_cgraph_and_symbols(unsigned int, char const**)
../../gcc/gcc/lto/lto-common.cc:2812
 0x6cfb93 lto_main()
../../gcc/gcc/lto/lto.cc:658

This is already tracked in https://gcc.gnu.org/PR96265 (and related PR's)

Streaming out mode_table:
mode = SI, mclass = 2, size = 4, prec = 32
mode = DI, mclass = 2, size = 8, prec = 64

Streaming in mode_table (in lto_input_mode_table):
mclass = 2, size = 4, prec = 0
(and then calculates the correct mode value by iterating over all modes of 
mclass starting from narrowest mode)

The issue is that the value for prec is not getting streamed-in correctly for 
SImode as seen above. While streaming out from AArch64 host,
it is 32, but while streaming in for nvptx, it is 0. This happens because of 
differing values of NUM_POLY_INT_COEFFS between AArch64 and nvptx backend.

Since NUM_POLY_INT_COEFFS is 2 for aarch64, the streamed-out values for mode, 
precision would be <4, 0> and <32, 0>
respectively (streamed-out in bp_pack_poly_value). Both zeros come from 
coeffs[1] of size and prec. While streaming in however,
NUM_POLY_INT_COEFFS is 1 for nvptx, and thus it incorrectly treats <4, 0> as 
size and precision respectively, which is why precision
gets streamed in as 0, and thus it encounters the above ICE.

Supporting non VLA code with offloading:

In the general case, it's hard to support offloading for arbitrary poly_ints 
when NUM_POLY_INT_COEFFS differs for host and accelerator.
For example, it's not possible to represent a degree-2 poly_int like 4 + 4x 
(as-is) on an accelerator with NUM_POLY_INT_COEFFS == 1.

However, IIUC, we can support offloading for restricted set of poly_ints whose 
degree <= accel's NUM_POLY_INT_COEFFS, since they can be
represented on accelerator ? For a hypothetical example, if host 
NUM_POLY_INT_COEFFS == 3 and accel NUM_POLY_INT_COEFFS == 2, then I suppose
we could represent a degree 2 poly_int on accelerator, but not a degree 3 
poly_int like 3+4x+5x^2 ?

Based on that, I have come up with following approach in attached 
"quick-and-dirty" patch (p-163-2.diff):
Stream-out host NUM_POLY_INT_COEFFS, and while streaming-in during lto1, 
compare it with accelerator's NUM_POLY_INT_COEFFS as follows:

Stream in host_num_poly_int_coeffs;
if (host_num_poly_int_coeffs == NUM_POLY_INT_COEFFS) // NUM_POLY_INT_COEFFS 
represents accelerator's value here.
{
/* Both are equal, proceed to unpacking NUM_POLY_INT_COEFFS words from 
bitstream.  */
}
else if (host_num_poly_int_coeffs < NUM_POLY_INT_COEFFS)
{
/* Unpack host_num_poly_int_coeffs words and zero out remaining higher 
coeffs (similar to zero-extension).  */
}
else
{
/* Unpack host_num_poly_int_coeffs words and ensure that degree of 
streamed-out poly_int <= NUM_POLY_INT_COEFFS.  */
}

For example, with host NUM_POLY_INT_COEFFS == 2 and accel NUM_POLY_INT_COEFFS 
== 1, this will allow streaming of "degree-1" poly_ints
like 4+0x (which will degenerate to constant 4), but give an error for 
streaming degree-2 poly_int like 4+4x.

Following this approach, I am assuming we can support AArch64/nvptx offloading 
for non VLA code, since poly_ints used for representing various
artefacts like mode_size, mode_precision, vector length etc. will be degree-1 
poly_int for scalar variables and fixed-length vectors
(and thus degenerate to constants). With the

Re: GCC 14.2 Release Candidate available from gcc.gnu.org

2024-07-25 Thread William Seurer via Gcc

On 7/23/24 7:50 AM, Jakub Jelinek via Gcc wrote:

The first release candidate for GCC 14.2 is available from

  https://gcc.gnu.org/pub/gcc/snapshots/14.2.0-RC-20240723/
  ftp://gcc.gnu.org/pub/gcc/snapshots/14.2.0-RC-20240723/

and shortly its mirrors.  It has been generated from git commit
r14-10504-ga544898f6dd6a16.

I have so far bootstrapped and tested the release candidate on
x86_64-linux.
Please test it and report any issues to bugzilla.

If all goes well, we'd like to release 14.2 on Tuesday, Jul 30th.



I bootstrapped and tested the RC on powerpc64 BE and LE on power 8, 9, 
and 10 and all was nominal.




gcc-12-20240725 is now available

2024-07-25 Thread GCC Administrator via Gcc
Snapshot gcc-12-20240725 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/12-20240725/
and on various mirrors, see https://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 12 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-12 revision 7149e22fe92581f54bf89855c996ea83bdf773ef

You'll find:

 gcc-12-20240725.tar.xz   Complete GCC

  SHA256=dee44b260c739ca8a39201f81f1f9390bc1142fc2642e9963222fe9d54861d10
  SHA1=f09c5195ff71ef55f770096d84bbf3011ef38d4f

Diffs from 12-20240718 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-12
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.