https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85231
Bug ID: 85231
Summary: [og7, openacc, nvptx] Too much shared memory claimed
for long vector length
Product: gcc
Version: 8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: vries at gcc dot gnu.org
Target Milestone: ---
On the og7 branch, with the current state of the vector-length patches applied,
for vector-length-128-1.c we generate:
...
.shared .align 8 .u8 __oacc_bcast[64];
...
cvta.shared.u64 %r55,__oacc_bcast;
st.u64 [%r55],%r36;
st.u64 [%r55+8],%r37;
st.u64 [%r55+16],%r38;
st.u64 [%r55+24],%r39;
...
cvta.shared.u64 %r54,__oacc_bcast;
ld.u64 %r36,[%r54];
ld.u64 %r37,[%r54+8];
ld.u64 %r38,[%r54+16];
ld.u64 %r39,[%r54+24];
...
It seems we claim double (64) of what we need (32).
This patch (already applicable on og7 branch) fixes that:
...
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index ba8d3bec1d7..3cf110cd1ed 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4058,7 +4058,8 @@ nvptx_shared_propagate (bool pre_p, bool is_call,
basic_block block,
emit_insn_after (init, insn);
unsigned int psize = ROUND_UP (data.offset, oacc_bcast_align);
- unsigned int pnum = (nvptx_mach_vector_length () > PTX_WARP_SIZE
+ unsigned int pnum = ((nvptx_mach_vector_length () > PTX_WARP_SIZE
+ && nvptx_mach_max_workers () > 1)
? nvptx_mach_max_workers () + 1
: 1);
...