Hi, Moving this thread to [email protected], since this code comes from there.
Герман Семёнов <[email protected]> writes: > Hello everyone, > I'm newbie to sending patches by patch, but I'm still very used to modern git > hosting. Patch changes are simple in gnulib, using pahole tool > from Red Hat (https://linux.die.net/man/1/pahole), I found that 'rofile' > structure in memory takes 72 bytes, which does not fit into 64 byte cpu > cacheline and consumes more processor cycles. If you have benchmarks tied to > 'rofile', there may be a decent increase. How to run the tests? > > From 3a5c97e9eda30c451d4bf9afb2c15e2acf282de6 Mon Sep 17 00:00:00 2001 > From: Herman Semenoff <[email protected]> > Date: Wed, 24 Sep 2025 23:32:27 +0300 > Subject: [PATCH] lib: align struct rofile to 64 bytes (1 cpu cacheline) > > References: > > https://wr.informatik.uni-hamburg.de/_media/teaching/wintersemester_2013_2014/epc-14-haase-svenhendrik-alignmentinc-presentation.pdf > https://hpc.rz.rptu.de/Tutorials/AVX/alignment.shtml > https://en.wikipedia.org/wiki/Data_structure_alignment > https://stackoverflow.com/a/20882083 > > https://zijishi.xyz/post/optimization-technique/learning-to-use-data-alignment/ > --- > lib/stackvma.c | 2 +- > lib/vma-iter.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/lib/stackvma.c b/lib/stackvma.c > index 95bb80db7c..72cc1b89b8 100644 > --- a/lib/stackvma.c > +++ b/lib/stackvma.c > @@ -142,13 +142,13 @@ struct rofile > size_t position; > size_t filled; > int eof_seen; > + char stack_allocated_buffer[STACK_ALLOCATED_BUFFER_SIZE]; > /* These fields deal with allocation of the buffer. */ > char *buffer; > char *auxmap; > size_t auxmap_length; > uintptr_t auxmap_start; > uintptr_t auxmap_end; > - char stack_allocated_buffer[STACK_ALLOCATED_BUFFER_SIZE]; > }; > > /* Open a read-only file stream. */ > diff --git a/lib/vma-iter.c b/lib/vma-iter.c > index 009835f60c..f6732ffb5a 100644 > --- a/lib/vma-iter.c > +++ b/lib/vma-iter.c > @@ -164,13 +164,13 @@ struct rofile > size_t position; > size_t filled; > int eof_seen; > + char stack_allocated_buffer[STACK_ALLOCATED_BUFFER_SIZE]; > /* These fields deal with allocation of the buffer. */ > char *buffer; > char *auxmap; > size_t auxmap_length; > unsigned long auxmap_start; > unsigned long auxmap_end; > - char stack_allocated_buffer[STACK_ALLOCATED_BUFFER_SIZE]; > }; > > /* Open a read-only file stream. */ Thanks for the patch, but I agree with what Bruno said on the GitHub thread [1]. The vma_iter functions can do a lot of parsing if a program has many shared libraries or if the program maps many files into memory. For example, my current Emacs process has 2700 lines in /proc/self/maps. But I don't expect a program to repeatedly iterate over the virtual memory areas such that it creates a performance issue. Collin [1] https://github.com/coreutils/gnulib/pull/21
