Re: [PATCH] crc: New optimised slice-by-8 implementation

Sam Russell Sat, 26 Oct 2024 14:52:19 -0700

Hi team,

Sorry for the multiple bumps, but this patch offers a 77% reduction in
compute time for CRC an a 35% reduction in gzip time, if this patch is not
compliant then can some please let me know and I am happy to do whatever it
take to make it compliant so we can all benefit from the speedups. If it is
compliant then can someone please merge it.


Cheers
Sam

On Thu, 24 Oct 2024 at 11:56, Sam Russell <sam.h.russ...@gmail.com> wrote:

> Sorry for the bump, confirming that I've tested on s390x locally (WSL
> amd64 with docker-qemu running s390x)
>
> root@3d86c908e4f5:/gnulib# ./gnulib-tool --with-tests --test crc
> ...
> ## ----------------------------------------------------------------------
> ##
> ## ---------------------------- Gnulib tests ----------------------------
> ##
> ## Please report test failures in this directory to <bug-gnulib@gnu.org>.
> ##
> ## ----------------------------------------------------------------------
> ##
> make  check-TESTS
> make[4]: Entering directory '/gnulib/testdir4782/build/gltests'
> make[5]: Entering directory '/gnulib/testdir4782/build/gltests'
> PASS: test-alignasof
> PASS: test-assert
> PASS: test-byteswap
> PASS: test-crc
> PASS: test-endian
> PASS: test-getrusage
> PASS: test-gettimeofday
> PASS: test-intprops
> PASS: test-inttypes
> PASS: test-limits-h
> PASS: test-stdbool
> PASS: test-stddef
> PASS: test-stdint
> PASS: test-stdlib
> PASS: test-sys_resource
> PASS: test-sys_time
> PASS: test-sys_types
> PASS: test-init.sh
> PASS: test-time-h
> PASS: test-time
> PASS: test-unistd
> PASS: test-verify
> PASS: test-verify.sh
> PASS: test-wchar
>
> ============================================================================
> Testsuite summary for dummy 0
>
> ============================================================================
> # TOTAL: 24
> # PASS:  24
> # SKIP:  0
> # XFAIL: 0
> # FAIL:  0
> # XPASS: 0
> # ERROR: 0
>
> ============================================================================
> ...
> root@3d86c908e4f5:/gnulib# uname -m
> s390x
>
> Dockerfile:
> FROM multiarch/debian-debootstrap:s390x-bullseye-slim
>
> RUN apt update
> RUN apt install -y vim build-essential m4 automake autoconf git
> RUN apt install -y python3
>
> Is there anything outstanding to do here? For downstream apps (e.g. gzip)
> they would just need to set GL_CRC_SLICE_BY_8 to enable the new feature,
> it's currently disabled by default.
>
> On Tue, 22 Oct 2024 at 22:38, Sam Russell <sam.h.russ...@gmail.com> wrote:
>
>> Tests pass with GL_CRC_SLICE_BY_8 set
>>
>>
>>
>> On Wed, 16 Oct 2024 at 13:52, Simon Josefsson <si...@josefsson.org>
>> wrote:
>>
>>> Sam Russell <sam.h.russ...@gmail.com> writes:
>>>
>>> >> People using the crc module who want to disable the faster/larger
>>> CRC32
>>> >> implementation can add 'gl_crc_slice_by_8=yes' to their configure.ac
>>> >> before invoking gnulib.
>>> >
>>> > I've been running tests with `./gnulib-tool --with-tests --test crc`
>>> so I'm
>>> > interested in a way of enabling this flag from gnulib-tool
>>>
>>> You could add a module like 'crc-slice-by-8' that depend on 'crc' with a
>>> 'configure.ac-early' statement that does 'gl_crc_slice_by_8=yes'.
>>>
>>> Maybe there are cleaner/simpler solutions.
>>>
>>> >> If code is copied then licensing terms has to be clear.  RFCs have bad
>>> >> licensing terms, so maybe we cannot use this code at all.
>>> >
>>> > The existing lib/crc.c uses code directly taken from RFC 1952 and is
>>> > referenced as such, I sent off my copyright paperwork and mentioned
>>> this to
>>> > them and am awaiting a reply.
>>>
>>> Nice catch, we should re-evaluate what code we can use and how.  The
>>> snippet that is copied seems fairly small.
>>>
>>> >> That memcpy looks weird, does it really do what you want it to do on
>>> all
>>> >> architectures?  Would using something from endian.h be cleaner?
>>> >
>>> > The purpose is to work around alignment issues. The signature
>>> > for crc32_update_no_xor() takes a char* and casting up to a uint64_t is
>>> > undefined by C. Using memcpy lets the compiler and runtime be smart and
>>> > insulates us from nonaligned memory accesses. I am open to other ideas
>>> > around this (we can step forward byte by byte until we reach an aligned
>>> > pointer and then do 8-byte blocks from there, for example, but then we
>>> need
>>> > to detect alignment and do the upcasting ourselves)
>>> >
>>> > You are correct that this breaks under big endian though, thanks for
>>> the
>>> > tip for endian.h. I've added a le64toh() call and this should make it
>>> work
>>> > correctly on big endian systems (how do we go about testing these btw?
>>> I
>>> > don't have one on me).
>>>
>>> The GCC compile farm has plenty of different architectures, so if you
>>> write some low-level code that may have endian issues, testing for that
>>> is a good idea.  You can test it on your own machine too, using QEMU.
>>>
>>> /Simon
>>>
>>

0001-crc-New-optimised-slice-by-8-implementation.patch
Description: Binary data

Re: [PATCH] crc: New optimised slice-by-8 implementation

Reply via email to