[ 
https://issues.apache.org/jira/browse/LUCENE-10033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390663#comment-17390663
 ] 

Adrien Grand commented on LUCENE-10033:
---------------------------------------

I've tried to push some more optimizations by reusing ForUtil for low numbers 
of bits per value and only subtracting the min value when it would likely save 
space. Here's the new result for the sorting tasks:

{noformat}
                    TaskQPS baseline      StdDev   QPS patch      StdDev        
        Pct diff p-value
              TermDTSort      111.01      (2.2%)       60.47      (2.0%)  
-45.5% ( -48% -  -42%) 0.000
   HighTermDayOfYearSort       68.24      (2.8%)       58.41      (4.2%)  
-14.4% ( -20% -   -7%) 0.000
       HighTermMonthSort       42.39      (1.9%)       43.67      (6.4%)    
3.0% (  -5% -   11%) 0.042
{noformat}

And the CPU profiles:

{noformat}
main:
PERCENT       CPU SAMPLES   STACK
7.96%         1490          
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$4#longValue()
7.34%         1374          
org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector$1#collect()
7.10%         1328          
org.apache.lucene.util.packed.DirectReader$DirectPackedReader4#get()
6.74%         1262          
org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector#thresholdCheck()
5.58%         1044          
org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector#countHit()
4.88%         914           
org.apache.lucene.store.ByteBufferGuard#ensureValid()
4.68%         876           jdk.internal.misc.Unsafe#convEndian()
3.13%         586           
org.apache.lucene.search.comparators.NumericComparator$NumericLeafComparator$2#advance()
2.90%         542           
org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
2.53%         473           
org.apache.lucene.search.FieldComparator$TermOrdValComparator#compareBottom()
2.49%         466           
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$18#advanceExact()
2.33%         436           
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc()
1.39%         261           
org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32()
1.36%         254           java.nio.Buffer#checkIndex()
1.27%         238           
org.apache.lucene.util.packed.DirectReader$DirectPackedReader12#get()
1.24%         232           
org.apache.lucene.store.ByteBufferIndexInput$SingleBufferImpl#readByte()
1.12%         209           java.nio.Buffer#scope()
0.98%         184           
org.apache.lucene.codecs.lucene90.ForUtil#expand8To32()
0.98%         184           org.apache.lucene.search.ConjunctionDISI#doNext()
0.93%         174           sun.nio.ch.FileDispatcherImpl#force0()
0.90%         169           
org.apache.lucene.search.FieldComparator$TermOrdValComparator#getOrdForDoc()
0.89%         167           
org.apache.lucene.search.DocIdSetIterator$2#advance()
0.78%         146           java.util.zip.Inflater#inflateBytesBytes()
0.77%         144           java.lang.Integer#compare()
0.67%         125           
org.apache.lucene.codecs.lucene90.PForUtil#expand32()
0.62%         116           
jdk.internal.misc.ScopedMemoryAccess#getShortUnalignedInternal()
0.61%         114           
org.apache.lucene.store.ByteBufferIndexInput#readLongs()
0.59%         110           org.apache.lucene.search.ConjunctionDISI#nextDoc()
0.54%         101           org.apache.lucene.search.ScoreMode#isExhaustive()
0.49%         92            java.nio.DirectByteBuffer#ix()

patch:
PERCENT       CPU SAMPLES   STACK
6.62%         1401          
org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector#countHit()
6.37%         1349          
org.apache.lucene.search.TopFieldCollector$TopFieldLeafCollector#thresholdCheck()
5.15%         1090          
org.apache.lucene.codecs.lucene90.DocValuesEncoder#mul()
3.98%         843           
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$1#advance()
3.75%         795           
org.apache.lucene.search.comparators.NumericComparator$NumericLeafComparator$2#advance()
3.58%         758           
org.apache.lucene.search.TopFieldCollector$SimpleFieldCollector$1#collect()
2.89%         612           org.apache.lucene.codecs.lucene90.ForUtil#expand8()
2.60%         550           
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$8#ordValue()
2.59%         548           
org.apache.lucene.store.ByteBufferIndexInput#readLongs()
2.43%         515           
org.apache.lucene.search.FieldComparator$TermOrdValComparator#getOrdForDoc()
2.42%         512           
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$2#longValue()
2.40%         509           
org.apache.lucene.codecs.lucene90.Lucene90DocValuesProducer$8#advanceExact()
2.27%         480           
org.apache.lucene.search.Weight$DefaultBulkScorer#scoreAll()
2.08%         441           
org.apache.lucene.codecs.lucene90.Lucene90PostingsReader$BlockDocsEnum#nextDoc()
1.89%         400           org.apache.lucene.codecs.lucene90.ForUtil#expand16()
1.83%         387           org.apache.lucene.search.ConjunctionDISI#doNext()
1.72%         365           
org.apache.lucene.codecs.lucene90.DocValuesForUtil#expand32()
1.63%         345           
org.apache.lucene.codecs.lucene90.DocValuesEncoder#add()
1.57%         332           
org.apache.lucene.codecs.lucene90.DocValuesEncoder#decode()
1.47%         312           org.apache.lucene.codecs.lucene90.ForUtil#decode9()
1.46%         309           
org.apache.lucene.codecs.lucene90.ForUtil#shiftLongs()
1.35%         286           org.apache.lucene.store.DataInput#readVLong()
1.33%         282           java.nio.Buffer#position()
1.31%         278           
org.apache.lucene.codecs.lucene90.PForUtil#innerPrefixSum32()
0.97%         205           
org.apache.lucene.search.comparators.IntComparator$IntLeafComparator#getValueForDoc()
0.92%         194           
org.apache.lucene.search.FieldComparator$TermOrdValComparator#compareBottom()
0.87%         184           org.apache.lucene.store.DataInput#readVInt()
0.85%         181           java.util.zip.Inflater#inflateBytesBytes()
0.83%         175           org.apache.lucene.codecs.lucene90.ForUtil#decode()
0.81%         172           sun.nio.ch.FileDispatcherImpl#force0()
{noformat}

{DocValuesEncoder#mul()} being called is due to TermDTSort, since GCD 
compression gets applied to the time field. TermDTSort is indeed the task that 
gets the greatest hit in this run.

I like this hybrid idea! Maybe another idea (which is hybrid too!) would 
consist of doing part of the decoding for the entire block and part of the 
decoding on a per-value basis. E.g. in the case of GCD compression, maybe we 
could do the bit unpacking for the entire block but, but only apply the 
multiplicative factor when fetching a single value.

> Encode doc values in smaller blocks of values, like postings
> ------------------------------------------------------------
>
>                 Key: LUCENE-10033
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10033
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> This is a follow-up to the discussion on this thread: 
> https://lists.apache.org/thread.html/r7b757074d5f02874ce3a295b0007dff486bc10d08fb0b5e5a4ba72c5%40%3Cdev.lucene.apache.org%3E.
> Our current approach for doc values uses large blocks of 16k values where 
> values can be decompressed independently, using DirectWriter/DirectReader. 
> This is a bit inefficient in some cases, e.g. a single outlier can grow the 
> number of bits per value for the entire block, we can't easily use run-length 
> compression, etc. Plus, it encourages using a different sub-class for every 
> compression technique, which puts pressure on the JVM.
> We'd like to move to an approach that would be more similar to postings with 
> smaller blocks (e.g. 128 values) whose values get all decompressed at once 
> (using SIMD instructions), with skip data within blocks in order to 
> efficiently skip to arbitrary doc IDs (or maybe still use jump tables as 
> today's doc values, and as discussed here for postings: 
> https://lists.apache.org/thread.html/r7c3cb7ab143fd4ecbc05c04064d10ef9fb50c5b4d6479b0f35732677%40%3Cdev.lucene.apache.org%3E).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to