Hi! On 2024-02-19T11:52:55+0100, Richard Biener <rguent...@suse.de> wrote: > On Mon, 19 Feb 2024, Thomas Schwinge wrote: >> On 2024-02-16T14:53:04+0100, I wrote: >> > On 2024-02-16T12:41:06+0000, Andrew Stubbs <a...@baylibre.com> wrote: >> >> On 16/02/2024 12:26, Richard Biener wrote: >> >>> On Fri, 16 Feb 2024, Andrew Stubbs wrote: >> >>>> On 16/02/2024 10:17, Richard Biener wrote: >> >>>>> On Fri, 16 Feb 2024, Thomas Schwinge wrote: >> >>>>>> On 2023-10-20T12:51:03+0100, Andrew Stubbs <a...@codesourcery.com> >> >>>>>> wrote: >> >>>>>>> I've committed this patch >> >>>>>> >> >>>>>> ... as commit c7ec7bd1c6590cf4eed267feab490288e0b8d691 >> >>>>>> "amdgcn: add -march=gfx1030 EXPERIMENTAL", which the later >> >>>>>> RDNA3/gfx1100 >> >>>>>> support builds on top of, and that's what I'm currently working on >> >>>>>> getting proper GCC/GCN target (not offloading) results for. >> >>>>>> >> >>>>>> Now looking at 'gcc.dg/vect/bb-slp-cond-1.c', which is reasonably >> >>>>>> simple, >> >>>>>> and hopefully representative for other SLP execution test FAILs >> >>>>>> (regressions compared to my earlier non-gfx1100 testing). >> >>>>>> >> >>>>>> $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ >> >>>>>> source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c >> >>>>>> --sysroot=install/amdgcn-amdhsa -ftree-vectorize >> >>>>>> -fno-tree-loop-distribute-patterns -fno-vect-cost-model >> >>>>>> -fno-common >> >>>>>> -O2 -fdump-tree-slp-details -fdump-tree-vect-details -isystem >> >>>>>> build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem >> >>>>>> source-gcc/newlib/libc/include >> >>>>>> -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/ >> >>>>>> -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -wrapper >> >>>>>> setarch,--addr-no-randomize -fdump-tree-all-all >> >>>>>> -fdump-ipa-all-all >> >>>>>> -fdump-rtl-all-all -save-temps -march=gfx1100 >> >>>>>> >> >>>>>> The '-march=gfx1030' 'a-bb-slp-cond-1.s' is identical (apart from >> >>>>>> 'TARGET_PACKED_WORK_ITEMS' in 'gcn_target_asm_function_prologue'), so >> >>>>>> I >> >>>>>> suppose will also exhibit the same failure mode, once again? >> >>>>>> >> >>>>>> Compared to '-march=gfx90a', the differences begin in >> >>>>>> 'a-bb-slp-cond-1.c.266r.expand' (only!), down to 'a-bb-slp-cond-1.s'. >> >>>>>> >> >>>>>> Changed like: >> >>>>>> >> >>>>>> @@ -38,10 +38,10 @@ int main () >> >>>>>> #pragma GCC novector >> >>>>>> for (i = 1; i < N; i++) >> >>>>>> if (a[i] != i%4 + 1) >> >>>>>> - abort (); >> >>>>>> + __builtin_printf("%d %d != %d\n", i, a[i], i%4 + 1); >> >>>>>> >> >>>>>> if (a[0] != 5) >> >>>>>> - abort (); >> >>>>>> + __builtin_printf("%d %d != %d\n", 0, a[0], 5); >> >>>>>> >> >>>>>> ..., we see: >> >>>>>> >> >>>>>> $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out >> >>>>>> 40 5 != 1 >> >>>>>> 41 6 != 2 >> >>>>>> 42 7 != 3 >> >>>>>> 43 8 != 4 >> >>>>>> 44 5 != 1 >> >>>>>> 45 6 != 2 >> >>>>>> 46 7 != 3 >> >>>>>> 47 8 != 4 >> >>>>>> >> >>>>>> '40..47' are the 'i = 10..11' in 'foo', and the expectation is >> >>>>>> 'a[i * stride + 0..3] != 0'. So, either some earlier iteration has >> >>>>>> scribbled zero values over these (vector lane masking issue, >> >>>>>> perhaps?), >> >>>>>> or some other code generation issue? >> > >> >>>> [...], I must be doing something different because vect/bb-slp-cond-1.c >> >>>> passes for me, on gfx1100. >> > >> > That's strange. I've looked at your log file (looks good), and used your >> > toolchain to compile, and your 'gcn-run' to invoke, and still do get: >> > >> > $ flock /tmp/gcn.lock ~/gcn-run ~/bb-slp-cond-1.exe >> > GCN Kernel Aborted >> > Kernel aborted >> > >> > Andrew, later on, please try what happens when you put an unconditional >> > 'abort' call into a test case? >> >> Andrew, any luck with that yet? >> >> Richard, are you able to reproduce the 'gcc.dg/vect/bb-slp-cond-1.c' >> execution test failure mentioned above (manual compilation and >> 'gcn-run')? > > No, when manually compiling/running the testcase it works fine for me.
I've updated my GCC master branch sources, but it still fails for me: $ build-gcc/gcc/xgcc -Bbuild-gcc/gcc/ source-gcc/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c --sysroot=install/amdgcn-amdhsa -isystem build-gcc/amdgcn-amdhsa/gfx1100/newlib/targ-include -isystem source-gcc/newlib/libc/include -Bbuild-gcc/amdgcn-amdhsa/gfx1100/newlib/ -Lbuild-gcc/amdgcn-amdhsa/gfx1100/newlib -march=gfx1100 -ftree-vectorize -fno-tree-loop-distribute-patterns -fno-vect-cost-model -fno-common -O2 -save-temps $ flock /tmp/gcn.lock build-gcc/gcc/gcn-run a.out GCN Kernel Aborted Kernel aborted Strange. In 'bb-slp-cond-1.tar.xz' I'm attaching the files I've built. Could you please compare those to yours and try 'gcn-run gfx1030/a.out'? Grüße Thomas > Didn't yet get to try the .exp files > > Richard. > >> >> Gr??e >> Thomas >> >> >> >>> I didn't try to run it - when doing make check-gcc fails to using >> >>> gcn-run for test invocation >> > >> > Note, that for such individual test cases, invoking the compiler and then >> > 'gcn-run' manually would seem easiest? >> > >> >>> what's the trick to make it do that? >> > >> > I tell you've probably not done much "embedded" or simulator testing of >> > GCC targets? ;-P >> > >> >> There's a config file for nvptx here: >> >> https://github.com/SourceryTools/nvptx-tools/blob/master/nvptx-none-run.exp >> > >> > Yes, and I have pending some updates to that one, to be finished once >> > I've generally got my testing set up again, to a sufficient degree... >> > >> >> You can probably make the obvious adjustments. I think Thomas has a GCN >> >> version with a few more features. >> > >> > Right. I'm attaching my current 'amdgcn-amdhsa-run.exp'. >> > >> > I'm aware that the 'set_board_info gcc,[...] [...]' may be obsolete/wrong >> > (as Andrew also noted privately) -- likewise, at least in part, for >> > GCC/nvptx, which is where I copied all that from. (Will revise later; >> > not relevant for this discussion, here.) >> > >> > Similar to what I've recently added to libgomp, there is 'flock'ing here, >> > so that you may use 'make -j[...] check' for (partial) parallelism, but >> > still all execution testing runs serialized. I found this to greatly >> > help denoise the test results. (Not ideal, of course, but improving that >> > is for later, too.) >> > >> > You may want to disable the 'HSA_STATUS_ERROR_OUT_OF_RESOURCES' thing if >> > that doesn't work like that in your case. (I've no idea what >> > 'amdgpu_gpu_recover' would do if the GPU is also used for display.) But >> > this, again, greatly helps denoise test results, at least for the one >> > system I'm currently testing on. >> > >> > I intend to publish proper documentation of all this, later on -- happy >> > to answer any questions in the mean time. >> > >> > If you don't already have a common directory for DejaGnu board files, put >> > 'amdgcn-amdhsa-run.exp' into '~/tmp/amdgcn-amdhsa/', for example, and add >> > a 'dejagnu.exp' file next to it: >> > >> > lappend boards_dir ~/tmp/amdgcn-amdhsa >> > >> > Prepare: >> > >> > $ DEJAGNU=$HOME/tmp/amdgcn-amdhsa/dejagnu.exp >> > $ export DEJAGNU >> > $ AMDGCN_AMDHSA_RUN=[...]/build-gcc/gcc/gcn-run >> > $ export AMDGCN_AMDHSA_RUN >> > $ # If necessary: >> > $ AMDGCN_AMDHSA_LD_LIBRARY_PATH=/opt/rocm/lib >> > $ >> > LD_LIBRARY_PATH=$AMDGCN_AMDHSA_LD_LIBRARY_PATH${LD_LIBRARY_PATH+:$LD_LIBRARY_PATH} >> > $ export LD_LIBRARY_PATH >> > >> > ..., and then run: >> > >> > $ make -j8 check-gcc-c >> > RUNTESTFLAGS='--target_board=amdgcn-amdhsa-run/-march=gfx1030 vect.exp' >> > >> > Oh, and I saw that on <https://gcc.gnu.org/wiki/Offloading>, Tobias has >> > recently put into a new "Using the GPU as stand-alone system" section >> > some similar information. (..., but this should, in my opinion, be on a >> > different page, as it's explicitly *not* about what we understand as >> > offloading.) >> > >> >> I usually use the CodeSourcery magic stack of scripts for testing >> >> installed toolchains on remote devices, so I'm not too familiar with >> >> using Dejagnu directly. >> > >> > Tsk... ;'-| >> > >> > >> > Gr??e >> > Thomas >> > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, > Frankenstrasse 146, 90461 Nuernberg, Germany; > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
bb-slp-cond-1.tar.xz
Description: application/xz