On 2025-01-16 21:25, Jeffrey Walton wrote:
On Fri, Jan 17, 2025 at 12:07 AM Bruno Haible via Gnulib discussion
list <bug-gnulib@gnu.org> wrote:
Yes, the undefined behaviour really starts here, in line 35:
const __m128i *data = buf;
'buf' was not aligned, 'const __m128i *' is 16-byte aligned.
Disassemble the code around that line. See which asm instruction is
being used for the load. I suspect MOVDQA (aligned) is being used
instead of MOVDQU (unaligned).
The compiler is entitled to do that. Bruno's right, the behavior is
undefined once the code assigns the unaligned pointer to an __m128i *
variable; see C23 §6.3.2.3 ¶7. Since behavior is undefined, the compiler
can do whatever it likes.
I installed the attached patch to work around the immediate issue of the
undefined behavior. This skips the pclmul speedup if the buffer is not
properly aligned. If that is a significant performance issue in
Gnulib-using code, I hope Sam or somebody can come up with a
higher-performance fix.From 59146234323977ae8815a6b8aa8c5a81298b19bb Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Thu, 16 Jan 2025 22:58:55 -0800
Subject: [PATCH] crc-x86_64: fix unaligned access
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Problem reported by Bruno Haible in:
https://lists.gnu.org/r/bug-gnulib/2025-01/msg00142.html
* lib/crc.c (crc32_update_no_xor): Don’t pass unaligned buffer to
crc32_update_no_xor_pclmul. No doubt there is a higher
performance fix, perhaps involving advancing byte-by-byte along
the buffer until we get to an aligned boundary, but at least this
should fix the alignment bug.
---
ChangeLog | 11 +++++++++++
lib/crc.c | 8 ++++----
2 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 48fe21b6aa..8a77ff50a8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2025-01-16 Paul Eggert <egg...@cs.ucla.edu>
+
+ crc-x86_64: fix unaligned access
+ Problem reported by Bruno Haible in:
+ https://lists.gnu.org/r/bug-gnulib/2025-01/msg00142.html
+ * lib/crc.c (crc32_update_no_xor): Don’t pass unaligned buffer to
+ crc32_update_no_xor_pclmul. No doubt there is a higher
+ performance fix, perhaps involving advancing byte-by-byte along
+ the buffer until we get to an aligned boundary, but at least this
+ should fix the alignment bug.
+
2025-01-16 Bruno Haible <br...@clisp.org>
getopt-posix: Fix compilation error in C++ mode (regression 2024-09-21).
diff --git a/lib/crc.c b/lib/crc.c
index 51c72019d8..57abf27cc3 100644
--- a/lib/crc.c
+++ b/lib/crc.c
@@ -123,8 +123,8 @@ crc32_update_no_xor (uint32_t crc, const char *buf, size_t len)
pclmul_checked = true;
}
- if (pclmul_enabled && len >= 16)
- return crc32_update_no_xor_pclmul(crc, buf, len);
+ if (pclmul_enabled && len >= 16 && (intptr_t) buf % 16 == 0)
+ return crc32_update_no_xor_pclmul (crc, buf, len);
#endif
slice_alignment = (len & (-8));
@@ -205,8 +205,8 @@ crc32_update_no_xor (uint32_t crc, const char *buf, size_t len)
pclmul_checked = true;
}
- if (pclmul_enabled && len >= 16)
- return crc32_update_no_xor_pclmul(crc, buf, len);
+ if (pclmul_enabled && len >= 16 && (intptr_t) buf % 16 == 0)
+ return crc32_update_no_xor_pclmul (crc, buf, len);
#endif
for (n = 0; n < len; n++)
--
2.45.2