I applied the getcontext lite patch with minor changes (mark the symbol hidden and leave the namespace alone).
Still working on the other two. -Arun On Fri, Apr 1, 2011 at 2:57 AM, Lassi Tuura <[email protected]> wrote: > Hi, > >> I also applied two of your patches. 01-performance-optimisations.patch >> is the only remaining patch in my inbox. > > Thanks! > > I refreshed the performance optimisation patch to reflect the current git > tree; attached at the end. > > There was also the trace-specific getcontext patch in another thread > ("Another optimisation for x86-64 fast trace"). I've attached it here again, > for completeness. > > In the mean time I've experimented with yet another change to the fast trace, > this time replacing the two-level hash table with a single-level one. This > doesn't use mempool but instead grabs a slab of memory use GET_MEMORY (= > mmap). Apart from the app-provided memory allocation hooks it's a bit more > like what Jason Evans submitted. > > This last one yields a fairly consistent ~5% improvement. I'm seeing this > taking our application (387M stack walks to profile memory allocations) to > 50-51 clock cycles per stack level (was ~53), or ~1320 cycles per walk (was > ~1400). The gain is largely from avoiding the double indirect memory access > in the hash code. > > I hope the last improvement is still acceptable even with using less of the > mempool stuff. > > I experimented with a number of other changes, none of which yield consistent > measurable improvement. I think with these changes the code is starting to be > at the limit of what can be achieved - or at least what I am capable of :-) > Per stack level time now mostly goes into a single hash table access plus > reading the saved register values off the stack; everything else contributes > little to the stack walk time (under heavy walking). > > Regards, > Lassi > > _______________________________________________ Libunwind-devel mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/libunwind-devel
