On 2025-01-17 02:24, Lasse Collin wrote:
Try using __m128i_u* instead of __m128i*. The former has aligned(1)
attribute. It's not available in old GCC versions though.

Thanks for the suggestion. I installed the attached two patches to use the unaligned type, and to check for compiler support. This should be better than the quick hack I installed yesterday.

With the code variants I tried last year, the penalty from unaligned
reads was very small.

Good, that means we should be done here.
From c89cd7c1b57ebbc1436959c8bff1297215e9d08c Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Fri, 17 Jan 2025 10:27:55 -0800
Subject: [PATCH 1/2] crc-x86_64: better fix for unaligned access
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Avoid undefined behavior in a way that doesn’t require
the input buffer to be aligned.
From a suggestion by Lasse Collin in:
https://lists.gnu.org/r/bug-gnulib/2025-01/msg00148.html
* lib/crc-x86_64-pclmul.c (crc32_update_no_xor_pclmul):
Since the const void * pointer ‘buf’ might not be aligned,
assign it to const __m128i_u * instead of to const __m128i *.
* lib/crc.c (crc32_update_no_xor):
Remove recently-addeda check for buffer alignment.
---
 ChangeLog               | 13 +++++++++++++
 lib/crc-x86_64-pclmul.c |  2 +-
 lib/crc.c               |  4 ++--
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index f80d7579c0..3164c1c587 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,16 @@
+2025-01-17  Paul Eggert  <egg...@cs.ucla.edu>
+
+	crc-x86_64: better fix for unaligned access
+	Avoid undefined behavior in a way that doesn’t require
+	the input buffer to be aligned.
+	From a suggestion by Lasse Collin in:
+	https://lists.gnu.org/r/bug-gnulib/2025-01/msg00148.html
+	* lib/crc-x86_64-pclmul.c (crc32_update_no_xor_pclmul):
+	Since the const void * pointer ‘buf’ might not be aligned,
+	assign it to const __m128i_u * instead of to const __m128i *.
+	* lib/crc.c (crc32_update_no_xor):
+	Remove recently-addeda check for buffer alignment.
+
 2025-01-17  Pádraig Brady  <p...@draigbrady.com>
 
 	Avoid -Wformat=security failures with --disable-nls
diff --git a/lib/crc-x86_64-pclmul.c b/lib/crc-x86_64-pclmul.c
index 48f5c7a84c..1650843394 100644
--- a/lib/crc-x86_64-pclmul.c
+++ b/lib/crc-x86_64-pclmul.c
@@ -32,7 +32,7 @@ __attribute__ ((__target__ ("pclmul,avx")))
 uint32_t
 crc32_update_no_xor_pclmul (uint32_t crc, const void *buf, size_t len)
 {
-  const __m128i *data = buf;
+  const __m128i_u *data = buf;
   __m128i *datarw;
   size_t bytes_remaining = len;
   __m128i in256[4] = { 0 };
diff --git a/lib/crc.c b/lib/crc.c
index 57abf27cc3..319a23715b 100644
--- a/lib/crc.c
+++ b/lib/crc.c
@@ -123,7 +123,7 @@ crc32_update_no_xor (uint32_t crc, const char *buf, size_t len)
       pclmul_checked = true;
     }
 
-  if (pclmul_enabled && len >= 16 && (intptr_t) buf % 16 == 0)
+  if (pclmul_enabled && len >= 16)
     return crc32_update_no_xor_pclmul (crc, buf, len);
 #endif
 
@@ -205,7 +205,7 @@ crc32_update_no_xor (uint32_t crc, const char *buf, size_t len)
       pclmul_checked = true;
     }
 
-  if (pclmul_enabled && len >= 16 && (intptr_t) buf % 16 == 0)
+  if (pclmul_enabled && len >= 16)
     return crc32_update_no_xor_pclmul (crc, buf, len);
 #endif
 
-- 
2.45.2

From 83f323021530b11679bf2d029c9e4c43cf64808f Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Fri, 17 Jan 2025 10:39:38 -0800
Subject: [PATCH 2/2] crc-x86_64: port to old GCC compilers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* m4/crc-x86_64.m4 (gl_CRC_X86_64_PCLMUL):
Check that the compiler supports __m128i_u, too,
since we’re using the type now.  Issue reported in
the same message from Lasse Collin.
---
 ChangeLog        | 6 ++++++
 m4/crc-x86_64.m4 | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/ChangeLog b/ChangeLog
index 3164c1c587..5796c80ae0 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,11 @@
 2025-01-17  Paul Eggert  <egg...@cs.ucla.edu>
 
+	crc-x86_64: port to old GCC compilers
+	* m4/crc-x86_64.m4 (gl_CRC_X86_64_PCLMUL):
+	Check that the compiler supports __m128i_u, too,
+	since we’re using the type now.  Issue reported in
+	the same message from Lasse Collin.
+
 	crc-x86_64: better fix for unaligned access
 	Avoid undefined behavior in a way that doesn’t require
 	the input buffer to be aligned.
diff --git a/m4/crc-x86_64.m4 b/m4/crc-x86_64.m4
index 41b7b0fe05..b705d8c80c 100644
--- a/m4/crc-x86_64.m4
+++ b/m4/crc-x86_64.m4
@@ -23,6 +23,8 @@ AC_DEFUN([gl_CRC_X86_64_PCLMUL],
             __m128i a, b;
             a = _mm_clmulepi64_si128 (a, b, 0x00);
             a = _mm_shuffle_epi8 (a, b);
+            static __m128i_u u;
+            b = _mm_loadu_si128 (&u);
             return __builtin_cpu_supports ("pclmul");
           }
         ]])
-- 
2.45.2

Reply via email to