mcimadamore commented on PR #912: URL: https://github.com/apache/lucene/pull/912#issuecomment-1324996036
> Code similar like this: https://github.com/openjdk/jdk/blob/37848a9ca2ab3021e7b3b2e112bab4631fbe1d99/src/java.base/share/classes/java/nio/X-Buffer.java.template#L929 > > The of statement uses a simple copy loop to read the values if length is too short. > > We can do the same in our code by calling super to read into array. > > It would be worth a try. @uschindler AFAIK, the limit in the buffer implementation is very low (it is set to 6 bytes). This means that just copying two int values (8 bytes) will overflow, and fallback to a bulk copy. So, unless you have a lot of copy operations with 0-1 elements, I doubt that this is really the culprit. More scientific benchmark below: ``` Benchmark (ELEM_SIZE) Mode Cnt Score Error Units SegmentCopy.buffer_copy 5 avgt 30 3.952 ± 0.030 ns/op SegmentCopy.buffer_copy 10 avgt 30 3.838 ± 0.053 ns/op SegmentCopy.buffer_copy 50 avgt 30 5.069 ± 0.054 ns/op SegmentCopy.buffer_copy 100 avgt 30 8.157 ± 0.182 ns/op SegmentCopy.buffer_copy 500 avgt 30 21.651 ± 3.558 ns/op SegmentCopy.buffer_copy 1000 avgt 30 52.112 ± 9.626 ns/op SegmentCopy.segment_copy 5 avgt 30 6.002 ± 0.070 ns/op SegmentCopy.segment_copy 10 avgt 30 5.649 ± 0.062 ns/op SegmentCopy.segment_copy 50 avgt 30 7.307 ± 0.420 ns/op SegmentCopy.segment_copy 100 avgt 30 8.894 ± 0.054 ns/op SegmentCopy.segment_copy 500 avgt 30 18.369 ± 0.449 ns/op SegmentCopy.segment_copy 1000 avgt 30 53.127 ± 9.788 ns/op SegmentCopy.segment_copy_loop 5 avgt 30 7.369 ± 0.086 ns/op SegmentCopy.segment_copy_loop 10 avgt 30 8.651 ± 0.106 ns/op SegmentCopy.segment_copy_loop 50 avgt 30 11.562 ± 0.133 ns/op SegmentCopy.segment_copy_loop 100 avgt 30 15.543 ± 0.315 ns/op SegmentCopy.segment_copy_loop 500 avgt 30 35.992 ± 0.608 ns/op SegmentCopy.segment_copy_loop 1000 avgt 30 65.584 ± 1.539 ns/op ``` The performance is comparable, but on small sizes, the segment version seems to be worse off. This has nothing to do with the fact that we use a bulk copy (in fact, as demonstrated in the `segment_copy_loop` benchmark, not using a bulk copy is slower for all sizes considered - which is consistent with what I'm seeing in the ByteBuffer code). The segment copy code seems to have a fixed 2ns cost over the buffer variant. More investigation is required to understand where the difference comes from. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org