https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105421
Bug ID: 105421 Summary: GCN offloading, raised '-mgang-private-size': 'HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION' Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: openacc Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tschwinge at gcc dot gnu.org CC: ams at gcc dot gnu.org, jules at gcc dot gnu.org Target Milestone: --- Target: GCN As discussed in <http://mid.mail-archive.com/87fslzspgo.fsf@dem-tschwing-1.ger.mentorg.com>, after commit r12-8252-gb2202431910e30d8505c94d1cb9341cac7080d10 "fortran: Fix up gfc_trans_oacc_construct [PR104717]", we need commit r12-8311-g2a570f11a2fecf23998d7fe1d5cabad62cfe5cec "Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717]", however: > That only works with the default GCN multilib '-march=fiji', testing > on gfx803 amdfury2 system. For all of '-march=gfx900' (amdnano2), > '-march=gfx906' (amd_ryzen3), '-march=gfx908' (amd-instinct1), I get: > > libgomp: GCN fatal error: Asynchronous queue error > Runtime message: HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent > attempted to access memory beyond the largest legal address. > > ..., and I still get that if lowering the allocation to the minimum, > '-foffload-options=amdgcn-amdhsa=-mgang-private-size=560'. > > This is a really simple OpenACC 'parallel' construct: > > !$acc parallel > write (0, '("The answer is ", I2)') var > !$acc end parallel > > ..., which ought to launch a 1-gang x 1-worker x 1-vector GPU kernel, so > I'd assume '-mgang-private-size=560' (or '-mgang-private-size=13579' in > fact) is not a problem?