[ https://issues.apache.org/jira/browse/LUCENE-10033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390663#comment-17390663 ]
Adrien Grand commented on LUCENE-10033: --------------------------------------- I've tried to push some more optimizations by reusing ForUtil for low numbers of bits per value and only subtracting the min value when it would likely save space. Here's the new result for the sorting tasks: {noformat} TaskQPS baseline StdDev QPS patch StdDev Pct diff p-value TermDTSort 111.01 (2.2%) 60.47 (2.0%) -45.5% ( -48% - -42%) 0.000 HighTermDayOfYearSort 68.24 (2.8%) 58.41 (4.2%) -14.4% ( -20% - -7%) 0.000 HighTermMonthSort 42.39 (1.9%) 43.67 (6.4%) 3.0% ( -5% - 11%) 0.042 {noformat} And the CPU profiles: {noformat} main: PERCENT CPU SAMPLES STACK 7.96% 1490 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$4#longValue() 7.34% 1374 org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector$1#collect() 7.10% 1328 org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get() 6.74% 1262 org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector#thresholdCheck() 5.58% 1044 org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector#countHit() 4.88% 914 org.apache.lucene.store.ByteBufferGuard#ensureValid() 4.68% 876 jdk.internal.misc.Unsafe#convEndian() 3.13% 586 org.apache.lucene.search.comparators.NumericComparator$NumericLeafComparator$2#advance() 2.90% 542 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() 2.53% 473 org.apache.lucene.search.FieldComparator$TermOrdValComparator#compareBottom() 2.49% 466 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$18#advanceExact() 2.33% 436 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc() 1.39% 261 org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32() 1.36% 254 java.nio.Buffer#checkIndex() 1.27% 238 org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get() 1.24% 232 org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#readByte() 1.12% 209 java.nio.Buffer#scope() 0.98% 184 org.apache.lucene.codecs.lucene90.ForUtil#expand8To32() 0.98% 184 org.apache.lucene.search.ConjunctionDISI#doNext() 0.93% 174 sun.nio.ch.FileDispatcherImpl#force0() 0.90% 169 org.apache.lucene.search.FieldComparator$TermOrdValComparator#getOrdForDoc() 0.89% 167 org.apache.lucene.search.DocIdSetIterator$2#advance() 0.78% 146 java.util.zip.Inflater#inflateBytesBytes() 0.77% 144 java.lang.Integer#compare() 0.67% 125 org.apache.lucene.codecs.lucene90.PForUtil#expand32() 0.62% 116 jdk.internal.misc.ScopedMemoryAccess#getShortUnalignedInternal() 0.61% 114 org.apache.lucene.store.ByteBufferIndexInput#readLongs() 0.59% 110 org.apache.lucene.search.ConjunctionDISI#nextDoc() 0.54% 101 org.apache.lucene.search.ScoreMode#isExhaustive() 0.49% 92 java.nio.DirectByteBuffer#ix() patch: PERCENT CPU SAMPLES STACK 6.62% 1401 org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector#countHit() 6.37% 1349 org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector#thresholdCheck() 5.15% 1090 org.apache.lucene.codecs.lucene90.DocValuesEncoder#mul() 3.98% 843 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$1#advance() 3.75% 795 org.apache.lucene.search.comparators.NumericComparator$NumericLeafComparator$2#advance() 3.58% 758 org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector$1#collect() 2.89% 612 org.apache.lucene.codecs.lucene90.ForUtil#expand8() 2.60% 550 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$8#ordValue() 2.59% 548 org.apache.lucene.store.ByteBufferIndexInput#readLongs() 2.43% 515 org.apache.lucene.search.FieldComparator$TermOrdValComparator#getOrdForDoc() 2.42% 512 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$2#longValue() 2.40% 509 org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$8#advanceExact() 2.27% 480 org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll() 2.08% 441 org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc() 1.89% 400 org.apache.lucene.codecs.lucene90.ForUtil#expand16() 1.83% 387 org.apache.lucene.search.ConjunctionDISI#doNext() 1.72% 365 org.apache.lucene.codecs.lucene90.DocValuesForUtil#expand32() 1.63% 345 org.apache.lucene.codecs.lucene90.DocValuesEncoder#add() 1.57% 332 org.apache.lucene.codecs.lucene90.DocValuesEncoder#decode() 1.47% 312 org.apache.lucene.codecs.lucene90.ForUtil#decode9() 1.46% 309 org.apache.lucene.codecs.lucene90.ForUtil#shiftLongs() 1.35% 286 org.apache.lucene.store.DataInput#readVLong() 1.33% 282 java.nio.Buffer#position() 1.31% 278 org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32() 0.97% 205 org.apache.lucene.search.comparators.IntComparator$IntLeafComparator#getValueForDoc() 0.92% 194 org.apache.lucene.search.FieldComparator$TermOrdValComparator#compareBottom() 0.87% 184 org.apache.lucene.store.DataInput#readVInt() 0.85% 181 java.util.zip.Inflater#inflateBytesBytes() 0.83% 175 org.apache.lucene.codecs.lucene90.ForUtil#decode() 0.81% 172 sun.nio.ch.FileDispatcherImpl#force0() {noformat} {DocValuesEncoder#mul()} being called is due to TermDTSort, since GCD compression gets applied to the time field. TermDTSort is indeed the task that gets the greatest hit in this run. I like this hybrid idea! Maybe another idea (which is hybrid too!) would consist of doing part of the decoding for the entire block and part of the decoding on a per-value basis. E.g. in the case of GCD compression, maybe we could do the bit unpacking for the entire block but, but only apply the multiplicative factor when fetching a single value. > Encode doc values in smaller blocks of values, like postings > ------------------------------------------------------------ > > Key: LUCENE-10033 > URL: https://issues.apache.org/jira/browse/LUCENE-10033 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Time Spent: 40m > Remaining Estimate: 0h > > This is a follow-up to the discussion on this thread: > https://lists.apache.org/thread.html/r7b757074d5f02874ce3a295b0007dff486bc10d08fb0b5e5a4ba72c5%40%3Cdev.lucene.apache.org%3E. > Our current approach for doc values uses large blocks of 16k values where > values can be decompressed independently, using DirectWriter/DirectReader. > This is a bit inefficient in some cases, e.g. a single outlier can grow the > number of bits per value for the entire block, we can't easily use run-length > compression, etc. Plus, it encourages using a different sub-class for every > compression technique, which puts pressure on the JVM. > We'd like to move to an approach that would be more similar to postings with > smaller blocks (e.g. 128 values) whose values get all decompressed at once > (using SIMD instructions), with skip data within blocks in order to > efficiently skip to arbitrary doc IDs (or maybe still use jump tables as > today's doc values, and as discussed here for postings: > https://lists.apache.org/thread.html/r7c3cb7ab143fd4ecbc05c04064d10ef9fb50c5b4d6479b0f35732677%40%3Cdev.lucene.apache.org%3E). -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org