IIRC we test values from 0-16 bytes in the unit test and the pclmul
implementation doesn't get hit for smaller values?
On Sat, Jan 18, 2025, 13:16 Simon Josefsson wrote:
> Thanks for discussion, Lasse. I suppose the current situation is
> difficult to change due to mainly licensing...
>
> Lasse
Worth noting I've seen faults at avx2/512 for reads that aren't correctly
aligned, unless I've made a mistake somewhere along the way
please read "we could not do" as "we could avoid doing"
We discussed this previously, we decided since AVX1 supports unaligned
accesses we could not do an alignment check at the start of the function,
but as you've discovered, this memcpy issue creates undefined behaviour.
Most performant would probably be an alignment check at the start and then
manua
this is a shame, the pragma option seemed super clean if clang worked with
it. ad hoc library seems like a close second given we can't change CFLAGS
without changing the whole repo
On Sat, Dec 21, 2024, 23:14 Bruno Haible wrote:
> The gnulib CIs also exercise compilation with clang, and it faile
> Thanks - I have pushed this now.
Thanks :)
> My point was that it would be better to use the same mechanism the .c
> implementation uses in the ./configure check, to make sure it tests for
> what will be used. That is, drop the CFLAGS changes and add:
I agree from a testability perspective, I
> End sentences with '.'
done
> Same here.
done
> That filename comment is now wrong.
fixed
> Is that true? If this new original code, shouldn't it say 2024? If
> this isn't new original code (does it come from coreutils), it make
> sense to add a comment saying where it is coming from, lik
Thanks for the quick reply Bruno, updated patch attached
On Mon, 16 Dec 2024 at 17:25, Bruno Haible wrote:
> Sam Russell wrote:
> > crc-x86_64 is added as its own package with minimal bindings in crc.c to
> > support, flagged off by the GL_CRC_PCLMUL flag which is only set if the
:40, Sam Russell wrote:
> Sounds good, thanks for explaining. I'll get going on the autoconf stuff
> then.
>
> On Fri, Dec 13, 2024, 20:03 Jim Meyering wrote:
>
>> That's not the way it usually works.
>> Typically, the author and/or person most familiar with th
you don't want to list yourself and no one else steps up, just mark
> it as "all".
>
> On Fri, Dec 13, 2024 at 10:52 AM Sam Russell
> wrote:
> >
> > Jim, would you be willing to be the maintainer for this in gnulib? If so
> then I can fix the autoconf this wee
it.
>
> On Fri, Dec 13, 2024 at 9:45 AM Sam Russell
> wrote:
> >
> > I should also add that I'm planning long term to try and get the major
> open source tools (e.g. gzip) using the most efficient algorithms for
> CRC32, so there will be more patches coming in f
these?
On Fri, Dec 13, 2024, 17:52 Sam Russell wrote:
> > I'd prefer of the crc PCLMUL feature is "opt-in" from a package
> > maintainer point of view.
>
> fine by me, this just requires a change to the build script then? the
> binding in crc.c has #ifdefs ar
> I'd prefer of the crc PCLMUL feature is "opt-in" from a package
> maintainer point of view.
fine by me, this just requires a change to the build script then? the
binding in crc.c has #ifdefs around it already
> So this would be removed.
ok, so we just need crc pclmul to be explicitly flagged o
Hi guys,
Just bumping this, is this a change we would like to include? Are there any
issues with the patch that you'd like me to fix?
Cheers
Sam
On Tue, Nov 26, 2024, 23:01 Sam Russell wrote:
> > It's your choice: 3 compilation units for x86_64, or 1 compilation unit
>
Hi,
I've been looking into other CRC32 implementations for architectures that
don't have accelerated polynomial multiplication (PCLMUL/VMULL for
x86-SSE/AVX or ARM-NEON), and there are improvements we can make, but they
make tradeoffs for memory (both for lookup tables and for working set). The
"b
hoose in the end.
Final patch for now with alignment removed based on Jeff's feedback
that _mm_loadu_si128() accepts unaligned data. bench-crc, test-crc and gzip
all happy.
Am I correct in thinking that we are just waiting for Simon's input now?
On Tue, 26 Nov 2024 at 22:53, Bruno H
de.
> I believe the way to zero a __m128i is using _mm_setzero_si128(). It
Perfect, it's outside the main loop so I'm not worried but this does look
more correct.
On Tue, 26 Nov 2024 at 22:47, Jeffrey Walton wrote:
> On Tue, Nov 26, 2024 at 4:27 PM Sam Russell
> wrote:
> >
as well as the alignment check above.
test-crc and bench-crc are fine, gzip builds with this gnulib and
decompresses my test file with no hash error
On Tue, 26 Nov 2024 at 20:29, Sam Russell wrote:
> > Cool. But it even gets better: one can use these target options on a per-
> > functi
> Cool. But it even gets better: one can use these target options on a per-
> function basis, via __attribute__. See
>
https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/x86-Function-Attributes.html#index-target_0028_0022avx_0022_0029-function-attribute_002c-x86
>
https://gcc.gnu.org/onlinedocs/gcc-14
> * I would suggest to rename the main source file from crc-pclmul.c to
> crc-x86_64.c.
> Rationale: So that immediately clear that the code is specific to the
> x86_64 CPUs. Not everyone is an assembly language hacker, and even
some
> assembly language hackers (like me) don't know ab
85% time reduction on AMD Ryzen 5 5600:
$ ./gltests/bench-crc 100
real 1.740296
user 1.740
sys0.000
$ ../bench-crc-pclmul 100
real 0.248324
user 0.248
sys0.000
This translates to a 13% time reduction for gzip:
$ time ./gzip_sliceby8 -k -d -c large_file.gz > /dev/null
re
gzip just took my patch to initialise MAINTAINERCLEANFILES, just built from
HEAD with gnulib pulled from HEAD, everything looking good
make check
Testsuite summary for gzip 1.13.36-46e9-dirty
especially reluctant to add the existing cksum crc variant to
gnulib,
> since I don't see anything but cksum itself using that.
I need to research a little further, but bzip2 uses the same implementation
as cksum afaik
On Thu, 31 Oct 2024 at 18:33, Pádraig Brady wrote:
> On 31/10/2024 16
I can extend coreutils cksum if you like, what are your thoughts on adding
some parameters to the commandline? Even if it's just an extra flag for
--rfc1952 or --gzip to change all the parameters just between these two
variants?
On Thu, Oct 31, 2024, 17:41 Sam Russell wrote:
> These
These are all solvable:
Polynomial: technically the same but one could argue it's the reversed
version (we reverse the polynomial so we can read LSB first). We now
generate the tables at compile time so this could be a parameter to table
generation
Initial/final value: easy enough to configure at
> I believe this should be fixed through the second patch of mine.
Yes, thanks
> I think you should add 'MAINTAINERCLEANFILES =' to gzip instead.
Adding to lib/Makefile.am on line 24 fixes this
Patch attached to mark functions in crc.c and crc-generate-tables.c as
static, GZIP now builds
$ ti
Thanks for merging Simon.
I've fixed a couple of issues with building gzip, marking the functions as
static fixes the -Werror=missing-declarations error, and
the MAINTAINERCLEANFILES variable isn't set when building gzip so there's
an error when using += with it. There's another issue with the way
ere's anything outstanding please let me know and I'm happy to add
whatever is missing.
On Mon, 28 Oct 2024 at 22:06, Sam Russell wrote:
> Is there anything else we're missing on this patch? From what I can tell
> the configure flag is just the standard behaviour of AC_ARG_ENABLE
23:21 Simon Josefsson wrote:
> Thanks - I am happy with this version, or even without the user-visible
> logic too (not sure if this is something users should ever tune). My
> question about selection logic was more for maintainers rather than
> end-users.
>
> /Simon
>
> 27
I had a play with autoconf (inspired heavily by m4/debug.m4), can confirm
the behaviour works as follows:
# build with slice-by-8 (default enabled)
./configure
make
gltests/bench-crc 100
real 1.714850
user 1.715
sys0.000
# build without slice-by-8
./configure --disable-crc-slice-by-8
m
thanks, moved the xors to the front of the lines
On Sun, Oct 27, 2024, 13:41 Bruno Haible wrote:
> Sam Russell wrote:
> > > Also, in GNU coding style, when line breaking is needed within
> > expressions,
> > > we do the line break before the operator, not after t
Done.
> I'll then deal with the cross-compiling situation. Will need to experiment
> a bit, since that's the first time, in Gnulib, that we have a built file
> generated by a C program (as opposed to a pre-installed tool).
Glad to be on the bleeding edge :)
On Sun, 27 Oct 2024
> You can fix most of these by simply running 'indent' on your *.h and *.c
Thanks, this is super helpful. I'm happy to follow the coding style but a
linter would make it easier for me to track down these whitespace issues.
> Regarding the Automake stuff, please try the attached patch, on top of
y
. If it is
compliant then can someone please merge it.
Cheers
Sam
On Thu, 24 Oct 2024 at 11:56, Sam Russell wrote:
> Sorry for the bump, confirming that I've tested on s390x locally (WSL
> amd64 with docker-qemu running s390x)
>
> root@3d86c908e4f5:/gnulib# ./gnulib-tool --with
w feature,
it's currently disabled by default.
On Tue, 22 Oct 2024 at 22:38, Sam Russell wrote:
> Tests pass with GL_CRC_SLICE_BY_8 set
>
>
>
> On Wed, 16 Oct 2024 at 13:52, Simon Josefsson wrote:
>
>> Sam Russell writes:
>>
>> >>
il if we read more than requested. We also have 0-length reads in the mix
which is a no-op and returns 0x as a result.
On Thu, 24 Oct 2024 at 23:08, Bruno Haible wrote:
> Sam Russell wrote:
> > I think we are massively overengineering this. We shouldn't need to
> > fu
Tests pass with GL_CRC_SLICE_BY_8 set
On Wed, 16 Oct 2024 at 13:52, Simon Josefsson wrote:
> Sam Russell writes:
>
> >> People using the crc module who want to disable the faster/larger CRC32
> >> implementation can add 'gl_crc_slice_by_8=yes' to their
Thanks, my mistake on the dropped CC
On Tue, 22 Oct 2024 at 22:24, Simon Josefsson wrote:
> Thanks, pushed in your name -- next time please attach git-format-patch
> output that includes the commit message, and keep cc to the list.
>
> /Simon
>
> Sam Russell writes:
&g
> I'd opt for reducing this set of expected values to 32 values,
> by reintroducing the memcpy() trick from your previous patch:
Attached is a patch with this implementation
On Tue, 22 Oct 2024 at 14:34, Sam Russell wrote:
> I think we are massively overengineering this. We sho
I think we are massively overengineering this. We shouldn't need to
future-proof tests for changes that *might* be added in the future, and
having 32x32 hashes in a table at the start of the test is not a big deal
(it took less than a second to generate and it'll take the same amount of
time if we
ntations to match the current implementation
On Tue, 22 Oct 2024 at 12:06, Bruno Haible wrote:
> Sam Russell wrote:
> > is there anymore feedback on this?
>
> + char plaintext[] = "Gnulib crc testsGnulib crc tests"
> + "Gnulib crc testsGnuli
,
endianness)
Cheers
Sam
On Tue, Oct 15, 2024, 10:44 Simon Josefsson wrote:
> Thanks this looks better. Let's keep it around for more feedback/review
> until your copyright papers arrive.
>
> /Simon
>
> Sam Russell writes:
>
> >> Or repeat "Gnulib c
> I have one use-case (although not published so less
> important) where the table size increase is a problem, and would prefer
> to use the current smaller crc.c (or even better, a smaller one that
> generate the tables on the fly, I forgot that it is possible).
There's another approach for table
Thank you for including explicit usage cmdline :)
Results with current slice_by_8 code
GL_CRC_SLICE_BY_8 not set
$ gltests/bench-crc 100
real 7.502692
user 7.503
sys0.000
$ gltests/bench-crc 100
real 7.481411
user 7.481
sys0.000
$ gltests/bench-crc 100
real 7.525393
alignment and do the upcasting ourselves)
You are correct that this breaks under big endian though, thanks for the
tip for endian.h. I've added a le64toh() call and this should make it work
correctly on big endian systems (how do we go about testing these btw? I
don't have one o
> I seem to remember there was something special with the polynomial that
forced you byteswap the data.
That's correct, coreutils uses the normal polynomial, gnulib uses the
bit-reversed polynomial, and gzip uses gnulib directly as the RFC 1952
specifies the bit-reversed polynomial. Both use the s
This is my implementation of the slice-by-8 algorithm for CRC32 generation.
I've added a flag CRC_ENABLE_SLICE_BY_8, I'd appreciate if someone can give
me a hint on how to set this up in the makefile config. I get the
impression that we want this to be on by default or for the major
architectures,
e end.
On Tue, 15 Oct 2024 at 16:59, Pádraig Brady wrote:
> On 15/10/2024 12:58, Sam Russell wrote:
> > I'm happy with the slice-by-8 code I have <
> https://github.com/samrussell/gnulib/blob/slice_by_8/lib/crc.c <
> https://github.com/samrussell/gnulib/blob/slice_by_8/lib
I'm happy with the slice-by-8 code I have <
https://github.com/samrussell/gnulib/blob/slice_by_8/lib/crc.c> but
the cksum_pclmul implementation is quite detailed so it would be useful if
we could relicense that for gnulib.
There's a minor optimisation around dealing with a non-round number of
byte
ld get away with a single off-by-1 test but here we are testing
everything from 0-15 bytes of misalignment; a 49-byte misalignment should
create the same problem as a 1-byte misalignment.
On Tue, 15 Oct 2024 at 09:57, Simon Josefsson wrote:
> Sam Russell writes:
>
> > + char p
t module, seems like a good improvement. You should start on
> the copyright assignment paper process first.
>
> /Simon
>
> Sam Russell writes:
>
> > IANAL but it appears the source (paper, not code) for both
> implementations
> > in coreutils is free, I'm happy
Patch file attached.
As far as I can tell, the main concern with the new optimisations is that
we're moving from processing byte-by-byte to processing in blocks of up to
128 bytes, and these will be read in either 8 or 16 byte chunks. This new
test calls the inner CRC update function with data ali
gt; Otherwise another approach is to keep a simple LGPL crc module and
> performant crc-gpl module that uses the coreutils code.
>
> /Simon
>
> Sam Russell writes:
>
> > That sounds good. It looks like there some subtle differences anyway, the
> > gzip version does ev
ing looks on the wire, but I am trying my best to do things by the
book here for portability. If there's a kosher way of telling when we're
aligned then that would be the holy grail here.
On Mon, 14 Oct 2024 at 23:25, Collin Funk wrote:
> Sam Russell writes:
>
> >
it
though... It would make sense to also offer a second function to allow
callers to dump in a block of guaranteed-aligned memory.
On Mon, Oct 14, 2024, 18:30 Bruno Haible wrote:
> Sam Russell wrote:
> > I built from HEAD, named it gzip_vanilla, rebuilt with my CRC code and
> > nam
en source projects in the last 2 days.
On Mon, 14 Oct 2024 at 20:26, Simon Josefsson wrote:
> Sam Russell writes:
>
> > I've noticed that GZIP trails behind zlib in performance and part of this
> > is down to the fact that zlib is using a more efficient CRC32
&
, 2024, 19:05 Pádraig Brady wrote:
> On 14/10/2024 15:53, Sam Russell wrote:
> > > For reference, coreutils' cksum uses slice by 8 unconditionally since:
> > > https://github.com/coreutils/coreutils/commit/a7533917e <
> https://github.com/coreutils/coreutils/commi
user0m0.478s
sys 0m0.010s
Looks to be about a ~35% reduction in time
On Mon, 14 Oct 2024 at 17:44, Bruno Haible wrote:
> Sam Russell wrote:
> > As for your question on speed, I noticed between zstd (which uses zlib
> as a
> > backend) and gzip there seems to be an improvemen
Hi Bruno,
Presumably you've read Pádraig's comment in the other thread that I
mistakenly created, there are two interesting things from this:
- coreutils is already GNU so no copyright review required, although the
code appears to be inside the cksum utility so it's not in a position to be
includ
4 at 15:50, Pádraig Brady wrote:
> On 14/10/2024 11:16, Sam Russell wrote:
> > Hi,
> >
> > First off, this is my first GNU contribution so I have *no idea* what
> I'm doing, feedback is appreciated.
> >
> > I've noticed that GZIP trails behind zlib i
Hi,
First off, this is my first GNU contribution so I have *no idea* what I'm
doing, feedback is appreciated.
I've noticed that GZIP trails behind zlib in performance and part of this
is down to the fact that zlib is using a more efficient CRC32
implementation. I've written an implementation of t
Hi,
First off, this is my first GNU contribution so I have *no idea* what I'm
doing, feedback is appreciated.
I've noticed that GZIP trails behind zlib in performance and part of this
is down to the fact that zlib is using a more efficient CRC32
implementation. I've written an implementation of t
62 matches
Mail list logo