Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-21 Thread Bruno Haible via Gnulib discussion list
Hi Sam, Sam Russell wrote on 2024-10-15: > I emailed > ass...@fsf.org and asked to kick off the copyright assignment process. The FSF has received and processed your copyright assignment for Gnulib. You can now start sending proposed patches for review for real. I guess, the first patch should b

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-17 Thread Sam Russell
> I have one use-case (although not published so less > important) where the table size increase is a problem, and would prefer > to use the current smaller crc.c (or even better, a smaller one that > generate the tables on the fly, I forgot that it is possible). There's another approach for table

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-17 Thread Bruno Haible via Gnulib discussion list
Simon Josefsson wrote: > If we are splitting this module up into several, maybe it even make more > sense to have a 'crc-slice-by-8' module for that particular portable > optimization. I have one use-case (although not published so less > important) where the table size increase is a problem, and

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-16 Thread Simon Josefsson via Gnulib discussion list
Bruno Haible via Gnulib discussion list writes: > Simon Josefsson wrote: >> I think in this example, I think it makes >> sense for gnulib to provide a optimized CRC function that may contain >> architecture-specific optimizations. The reason seems to be that while >> there are numerous different

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-16 Thread Bruno Haible via Gnulib discussion list
Simon Josefsson wrote: > I think in this example, I think it makes > sense for gnulib to provide a optimized CRC function that may contain > architecture-specific optimizations. The reason seems to be that while > there are numerous different optimized implementations around, few seems > to be arr

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-16 Thread Simon Josefsson via Gnulib discussion list
Kristoffer Brånemyr writes: > Which CRC version did you plan to put into gnulib by the way? There > are many different polynomials in use. CRC32 and CRC32C for instance > use different polynomials. I don't remember which polynomial the > routine in cksum used, but I think it was CRC32, but I seem

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-16 Thread Kristoffer Brånemyr
Den tisdag 15 oktober 2024 kl. 13:05:53 CEST, Pádraig Brady skrev: >On 15/10/2024 08:07, Simon Josefsson wrote: > > The coreutils code is GPL and the crc module in gnulib is LGPL.  I'm > > using the gnulib crc module in some LGPL projects.  Is it wortwhile to > > keep the optimization GPL?  I w

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-16 Thread Kristoffer Brånemyr
Den onsdag 16 oktober 2024 kl. 08:43:03 CEST, Simon Josefsson skrev: > Pádraig Brady writes: > > > >  In general gnulib focuses on portable routines, so for example leaves > > platform specific crypto optimizations to libcrypto, only providing fallback > > cross platform implementations wher

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-16 Thread Sam Russell
> I seem to remember there was something special with the polynomial that forced you byteswap the data. That's correct, coreutils uses the normal polynomial, gnulib uses the bit-reversed polynomial, and gzip uses gnulib directly as the RFC 1952 specifies the bit-reversed polynomial. Both use the s

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-15 Thread Simon Josefsson via Gnulib discussion list
Pádraig Brady writes: > In general gnulib focuses on portable routines, so for example leaves > platform specific crypto optimizations to libcrypto, only providing fallback > cross platform implementations where needed. OpenSSL libcrypto doesn't provide CRC, does it? I agree with what you say

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-15 Thread Pádraig Brady
On 15/10/2024 16:50, Jeffrey Walton wrote: On Tue, Oct 15, 2024 at 10:59 AM Pádraig Brady wrote: On 15/10/2024 12:58, Sam Russell wrote: I'm happy with the slice-by-8 code I have

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-15 Thread Jeffrey Walton
On Tue, Oct 15, 2024 at 10:59 AM Pádraig Brady wrote: > > On 15/10/2024 12:58, Sam Russell wrote: > > I'm happy with the slice-by-8 code I have > > > > but the > > c

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-15 Thread Sam Russell
The math nerd in me really likes pclmul but agree it adds a lot of complexity for effectively x64-only features. I'll focus on getting slice-by-8 ready in this case anyway, I am waiting for a reply from the GNU copyright team and then am planning to just get this one thing all the way to the end.

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-15 Thread Pádraig Brady
On 15/10/2024 12:58, Sam Russell wrote: I'm happy with the slice-by-8 code I have > but the  cksum_pclmul implementation is quite detailed so it would be useful if we

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-15 Thread Sam Russell
I'm happy with the slice-by-8 code I have < https://github.com/samrussell/gnulib/blob/slice_by_8/lib/crc.c> but the cksum_pclmul implementation is quite detailed so it would be useful if we could relicense that for gnulib. There's a minor optimisation around dealing with a non-round number of byte

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-15 Thread Pádraig Brady
On 15/10/2024 08:07, Simon Josefsson wrote: The coreutils code is GPL and the crc module in gnulib is LGPL. I'm using the gnulib crc module in some LGPL projects. Is it wortwhile to keep the optimization GPL? I would prefer a LGPL crc module that is performant, with #ifdef-varianted support fo

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-15 Thread Sam Russell
Sounds good, I've just sent a patch with a test to cover alignment issues we might run into in the optimised implementation, and I emailed ass...@fsf.org and asked to kick off the copyright assignment process. Anything else I should be starting now to make things run more smoothly? On Tue, 15 Oct

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-15 Thread Simon Josefsson via Gnulib discussion list
I suggest to focus on the immediate use-case and see if we can complete that, to not get overwhelmed with the variety of ideas. Just getting your bigger table speedup into a LGPL'd crc.c in gnulib, and to get gzip to use that module, seems like a good improvement. You should start on the copyrigh

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-15 Thread Sam Russell
IANAL but it appears the source (paper, not code) for both implementations in coreutils is free, I'm happy to write from scratch based off the papers: Slice-by-8: "Novel Table Lookup-Based Algorithms for High-Performance CRC Generation" PCLMUL: "Fast CRC Computation for Numeric Polynomials Using P

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-15 Thread Simon Josefsson via Gnulib discussion list
The coreutils code is GPL and the crc module in gnulib is LGPL. I'm using the gnulib crc module in some LGPL projects. Is it wortwhile to keep the optimization GPL? I would prefer a LGPL crc module that is performant, with #ifdef-varianted support for 1) no tables at all, 2) small table like tod

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Sam Russell
> Can't you do the CRC byte-by-byte until you reach an address that is properly aligned? > See lib/memchr.c for an example. That would be perfect, I'll look over lib/memchr.c. My background is in network engineering and reverse engineering so I am in my element with how everything looks on the

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Jeffrey Walton
On Mon, Oct 14, 2024 at 5:11 PM Sam Russell wrote: > > One issue I've noticed is that the crc functions take a char* and while this > is ok in other implementations, gnulib appears to enforce strict alignment > (presumably for portability purposes). > > Because of this, we might need to introduc

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Collin Funk
Sam Russell writes: > One issue I've noticed is that the crc functions take a char* and while > this is ok in other implementations, gnulib appears to enforce strict > alignment (presumably for portability purposes). > > Because of this, we might need to introduce a new function that takes an > u

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Sam Russell
One issue I've noticed is that the crc functions take a char* and while this is ok in other implementations, gnulib appears to enforce strict alignment (presumably for portability purposes). Because of this, we might need to introduce a new function that takes an unsigned long long* (and then _m12

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Sam Russell
> Thanks! This looks nice, but please add code that generated the tables > which is important for reproduction. Of course. > I am worried about the size increase with the new tables, what do you > think about making the new approach optional with some #ifdef, which may > be off by default to pre

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Jeffrey Walton
On Mon, Oct 14, 2024 at 2:26 PM Simon Josefsson via Gnulib discussion list wrote: > > Sam Russell writes: > > > I've noticed that GZIP trails behind zlib in performance and part of this > > is down to the fact that zlib is using a more efficient CRC32 > > implementation. I've written an implement

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Simon Josefsson via Gnulib discussion list
Sam Russell writes: > I've noticed that GZIP trails behind zlib in performance and part of this > is down to the fact that zlib is using a more efficient CRC32 > implementation. I've written an implementation of this for gnulib based off > the Intel paper at > https://static.aminer.org/pdf/PDF/00

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Sam Russell
That sounds good. It looks like there some subtle differences anyway, the gzip version does everything bit reversed, and while the intel paper has constants for that there are some logic things that would have to change (take hiword then loword instead of loword then hiword etc) On Mon, Oct 14, 20

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Pádraig Brady
On 14/10/2024 15:53, Sam Russell wrote: > For reference, coreutils' cksum uses slice by 8 unconditionally since: > https://github.com/coreutils/coreutils/commit/a7533917e perfect, we could just copy this across then? is there a reason

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Bruno Haible via Gnulib discussion list
Sam Russell wrote: > I built from HEAD, named it gzip_vanilla, rebuilt with my CRC code and > named it gzip_8_slice. Input file is a 115MB file gzipped (default > settings) to 61M Thanks; that's a comparison from which one can draw conclusions. > sam@DESKTOP-R64B0KJ:~/gziptest$ time ../code/gzip/

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Sam Russell
I built from HEAD, named it gzip_vanilla, rebuilt with my CRC code and named it gzip_8_slice. Input file is a 115MB file gzipped (default settings) to 61M sam@DESKTOP-R64B0KJ:~/gziptest$ time ../code/gzip/gzip_8_slice -k -d -c large_file.gz > /dev/null real0m0.307s user0m0.298s sys 0m

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Bruno Haible via Gnulib discussion list
Sam Russell wrote: > As for your question on speed, I noticed between zstd (which uses zlib as a > backend) and gzip there seems to be an improvement of maybe 30-40% for > decompressing a 100MB file (part of this is due to multithreading though), If you compare two scenarios which differ in 4 aspe

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Jim Meyering
On Mon, Oct 14, 2024 at 6:53 AM Bruno Haible via Gnulib discussion list wrote: > Hi Sam, > > Thanks for the contribution offer! > > > I've noticed that GZIP trails behind zlib in performance and part of this > > is down to the fact that zlib is using a more efficient CRC32 > > implementation. > >

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Sam Russell
Hi Bruno, Presumably you've read Pádraig's comment in the other thread that I mistakenly created, there are two interesting things from this: - coreutils is already GNU so no copyright review required, although the code appears to be inside the cksum utility so it's not in a position to be includ

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Sam Russell
> For reference, coreutils' cksum uses slice by 8 unconditionally since: > https://github.com/coreutils/coreutils/commit/a7533917e perfect, we could just copy this across then? is there a reason gnulib wouldn't just include coreutils as a dependency? > Note we don't use it on systems with pclmul,

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Bruno Haible via Gnulib discussion list
Hi Sam, Thanks for the contribution offer! > I've noticed that GZIP trails behind zlib in performance and part of this > is down to the fact that zlib is using a more efficient CRC32 > implementation. How much of a speedup do you obtain in gzip overall (not in CRC32 alone) for large files, throu

Re: Adding slice-by-4 and slice-by-8 to CRC32

2024-10-14 Thread Pádraig Brady
On 14/10/2024 11:16, Sam Russell wrote: Hi, First off, this is my first GNU contribution so I have *no idea* what I'm doing, feedback is appreciated. I've noticed that GZIP trails behind zlib in performance and part of this is down to the fact that zlib is using a more efficient CRC32 impleme