On Tue, Nov 14, 2023 at 5:13 AM John Paul Adrian Glaubitz <glaub...@physik.fu-berlin.de> wrote: > > Hi Jeffrey! > > On Tue, 2023-11-14 at 00:50 -0500, Jeffrey Walton wrote: > > On SPARC, 64-bit words can be loaded and saved through one of two > > instructions. The first version is optimized, the second version is > > not. The optimized version is faster, but the 64-bit words have to be > > aligned to 8-byte boundaries. I.e., naturally aligned. > > > > If you are performing unaligned loads of 64-bit words, then you have > > to specify the compiler option -xmemalign=4i. -xmemalign=4i will > > generate the inefficient load, but it will avoid the SIGBUS. > > > > When using the default toolchain settings, -xmemalign=8s is used, > > which causes the toolchain to use the optimized loads. I think that is > > what is generating the UBsan finding "runtime error: member access > > within misaligned address ... which requires 8 byte alignment." > > > > Also see "3.4.151 –xmemalign[=<a><b>]", > > <https://docs.oracle.com/cd/E37069_01/html/E37076/aevkc.html>, in the > > Solaris manual. > > > > > [1] > > > https://www.gnu.org/software/libc/manual/html_node/Obstacks-Data-Alignment.html > > This is completely new to me and really interesting, thanks for the heads-up! > > However, I think this particular flag is not available in GCC or LLVM, is it?
Yeah, you're right. I hit that bug using the SunCC compiler at <https://github.com/weidai11/cryptopp/issues/691>. I believe GCC effectively uses -xmemalign=4. I'm not sure if it is 4i, 4s or 4f. I don't think it's possible to change that in GCC, but I may be wrong. And according to the 691 bug, the 'fast load' uses 'load extended word' or the ldx instruction. It requires the 64-bit word to be aligned on the 8-byte boundary. I don't recall what the non-efficient load uses. Maybe two 'load signed word' or two ldsw instructions? Jeff