On Thu, Sep 16, 2021 at 5:25 PM Chris Kennelly via Libc-alpha < libc-al...@sourceware.org> wrote:
> On Thu, Sep 16, 2021 at 5:50 PM enh <e...@google.com> wrote: > > > plus testing for _equality_ can (as mentioned earlier) have slightly > > different properties from the three-way comparator behavior of > > bcmp()/memcmp(). > > > > llvm-libc's implementation only returns the boolean, though. > > The mem* functions are extremely sensitive to instruction cache effects, so > having 3 unique implementations (__memcmpeq, bcmp, memcmp) that do similar, > but subtly different things can be a hidden performance cost--one that is > hard to demonstrate with a microbenchmark. Our experience developing > optimized mem* routines ended up showing better performance in actual > applications when we accepted seemingly worse microbenchmark performance by > optimizing for code footprint instead (more extensive notes for mem* in > general > < > https://storage.googleapis.com/pub-tools-public-publication-data/pdf/4f7c3da72d557ed418828823a8e59942859d677f.pdf > > > and > memcmp specifically (section 4.4) > < > https://storage.googleapis.com/pub-tools-public-publication-data/pdf/e52f61fd2c51e8962305120548581efacbc06ffc.pdf > > > ). > Regarding the code bloat found in memcmp in the paper, I think that is pretty exclusive to the sse4 implementation: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/memcmp-sse4.S;h=b82adcd5fab5b60a0327819f6041a689a276916a;hb=HEAD And I think there is a fair argument to not include a __memcmpeq() based on that implementation. The older versions: sse2: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/memcmp.S;h=870e15c5a080162b336b13bac24cf7afbac6874b;hb=HEAD avx2: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S;h=2621ec907aedb781fcf0444e831c801f18fa68ba;hb=HEAD evex: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/x86_64/multiarch/memcmp-evex-movbe.S;h=654dc7ac8ccb9445b2c7107a7cf2d9f6ce4b1010;hb=HEAD Have a much more reasonable code size footprint. Also the __memcmpeq() code will itself have a smaller code size footprint that memcmp() With the implementations from my patch the code size is shrunk the following: sse2: -66 avx2: -436 avx2: -500 > The alternative would be to alias (as the NOTES suggest as a possible > implementation), but I think that raises James' question of why not just > use bcmp? Dependencies on non-boolean implementations of bcmp seem > rare--namely, I haven't actually seen one. > > > > On Thu, Sep 16, 2021 at 2:43 PM Joseph Myers <jos...@codesourcery.com> > > wrote: > > > >> On Thu, 16 Sep 2021, James Y Knight wrote: > >> > >> > Wouldn't it be far simpler to just un-deprecate bcmp? > >> > >> The aim is to have something to which calls can be generated in all > >> standards modes. bcmp has never been part of ISO C; there's nothing to > >> undeprecate there. > > > > > >> -- > >> Joseph S. Myers > >> jos...@codesourcery.com > >> > > >