https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104353
Bug ID: 104353 Summary: ppc64le: Apparent reliance on undefined behavior of xvcvdpsxws Product: gcc Version: 11.2.0 URL: https://github.com/numpy/numpy/issues/20964#issuecomme nt-1027865665 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: ckk at kvr dot at Target Milestone: --- Created attachment 52331 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52331&action=edit Minimal test case for reproduction I ran into a strange numpy error on ppc64le that only occurred inside a ppc64le QEMU instance. In short, casting arrays of i doubles 1.0 to ints 1 worked as expected on native hardware, but produced the following bogus results when running inside a VM: i = 1: 1 i = 2: 1 1 i = 3: 1 1 1 i = 4: 0 0 0 0 i = 5: 0 0 0 0 1 i = 6: 0 0 0 0 1 1 i = 7: 0 0 0 0 1 1 1 i = 8: 0 0 0 0 0 0 0 0 i = 9: 0 0 0 0 0 0 0 0 1 ... Guided by the numpy folks, a SIMD issue was suspected, and I managed to create a minimal test case (attached here) with which this could be reproduced. It only occurs with -O3. I then filed an issue with QEMU, where the issue was quickly rejected. This led to further analysis by the numpy folks. There, it was discovered that GCC is apparently relying on undefined behavior of the xvcvdpsxws instruction, which happened to work on native hardware because it happen to exhibit that behavior. I'm only summarizing here; there's a great analysis in detail, and a much better test case, on the GitHub issue, which I have linked in the URL as I'd prefer not to reproduce the author's work here.