original-brownbear commented on PR #13816: URL: https://github.com/apache/lucene/pull/13816#issuecomment-2365187862
> profilers often lie about this True but here I'd say whatever number the profiler outputs is only part of the cost of hard synchronizing on a global like we do here (we also have the indirect cost from cache/memory effects). Plus what the profiler isn't wrong about is the fact that threads actually go into waiting in this spot when benchmarking systems with large core/thread counts. That said: If I can't convince you of the CAS fix. How about adjusting the math like I did here and moving the `toByteArray` call out of the `synchronized` block? That has pretty much the same effect because the code in the sychronized becomes fast enough to remove the thread-waiting in my testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org