[ https://issues.apache.org/jira/browse/LUCENE-9795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287988#comment-17287988 ]
Robert Muir commented on LUCENE-9795: ------------------------------------- OK, I think i can explain the checkindex stuff. When profiling unit tests, I do see this stack as top CPU user: {noformat} java.nio.ByteBuffer#get() at java.nio.DirectByteBuffer#get() at org.apache.lucene.store.ByteBufferGuard#getBytes() at org.apache.lucene.store.ByteBufferIndexInput#readBytes() at org.apache.lucene.store.MockIndexInputWrapper#readBytes() at org.apache.lucene.util.compress.LZ4#decompress() at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict#decompressBlock() at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict#next() at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict#seekExact() at org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$BaseSortedDocValues#lookupOrd() at org.apache.lucene.index.SortedDocValues#binaryValue() at org.apache.lucene.index.CheckIndex#checkBinaryDocValues() {noformat} I don't think checkindex should test retrieving every SORTED doc's bytes as if it were BINARY. Looks to me like a leftover actually. I will upload a simple patch. The grouping stuff should maybe be a separate issue, I suspect grouping logic may be inefficiently doing similar stuff (reading tons of terms bytes instead of using ordinals or something). > investigate large checkindex/grouping regression in nightly benchmarks > ---------------------------------------------------------------------- > > Key: LUCENE-9795 > URL: https://issues.apache.org/jira/browse/LUCENE-9795 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Robert Muir > Priority: Major > Attachments: Screen_Shot_2021-02-21_at_09.17.53.png, > Screen_Shot_2021-02-21_at_09.30.30.png > > > In the nightly benchmark, checkindex times increased more than 4x on the 2/16 > datapoint > Looking at the commits on 2/15, most obvious thing to look into is docvalues > terms dict compression: LUCENE-9663 > Will try to pinpoint it more, my concern is some perf bug such as every > single term causing decompression of the whole block repeatedly (missing > seek-within-block opto?) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org