https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105246
Bug ID: 105246 Summary: [amdgcn] Use library call for SQRT with -ffast-math + provide additional option to use single-precsion opcode Product: gcc Version: 12.0 Status: UNCONFIRMED Keywords: documentation, wrong-code Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: burnus at gcc dot gnu.org CC: ams at gcc dot gnu.org Target Milestone: --- Target: amdgcn-amdhsa AMD GCN hardware has opcodes which operate on double-precision variables as input/output but internally only do single-precision operation. This affects (currently) only "sqrt" which for -funsafe-math-optimizations (implied by -Ofast / -ffast-math) uses AMDGCN's "v_sqrt". Namely gcc/config/gcn/gcn-valu.md has: (define_insn "sqrt<mode>2<exec>" ... "flag_unsafe_math_optimizations" "v_sqrt%i0\t%0, %1" Thus: while "v_sqrt" works on double-precision variables, it only calculates with 23bits (as with float32) instead of 52bits (as float64 provides) for the fractional part of the floating-point number. PROBLEM: In many cases, this loss of precision by an order of 100,000,000 (10⁸ / 2²⁹) is very unexpected and too much for code which requires double precision. An ULP of 4 is expected not an ULP of 10⁸! In particular: In order to permit several optimizations, -Ofast or --fast-math is commonly recommended and the precision loss is unexpected. In terms of testsuites, OvO's sqrt examples are effected, requiring a way higher OVO_TOL_ULP to pass (→ https://github.com/TApplencourt/OvO ) But the issue really came up when discussion with HPC code users. EXPECTED: - By default, with -ffast-math, do the double-precision operation by a library call - Provide some (GCN-specific) -m... flag to do those calculations in single precision. For instance something like: -mpermit-reduced-precision Use hardware intrinsics instead of library even if they provide a much reduced precision. Example: use v_sqrt with double-precision variables even though the hardware only provides single-precision results for the fractional part of the floating-point variable. (Default: disabled)