[jira] [Commented] (LUCENE-9613) Create blocks for ords when it helps in Lucene80DocValuesFormat

Michael McCandless (Jira) Wed, 25 Aug 2021 04:30:05 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404360#comment-17404360
 ]


Michael McCandless commented on LUCENE-9613:
--------------------------------------------

I am not certain, but this change was likely responsible for a drop in SSDV 
facets performance on 06/24.  Thank you to [~rcmuir ] for noticing this.

See [Faceting > All Months (doc 
values)|https://home.apache.org/~mikemccand/lucenebench/BrowseMonthSSDVFacets.html]
 and [Faceting > All dayOfYear (doc 
values)|https://home.apache.org/~mikemccand/lucenebench/BrowseDayOfYearSSDVFacets.html]
 charts.

These tasks are very heavily measuring SSDV decode time, since they iterate all 
docs, decoding ordinals for one field and incrementing counters.

I don't think we should revert – this change is a good gain in compression.  
But maybe we can somehow optimize the block decoding, or even just be aware 
that this cost some additional CPU for decoding the blocks of ordinals.

Here were all the changes in that time period:
{noformat}
commit db26215f156d956143e29f1ce43f90c30cd8a107
Author: Mike McCandless <mikemcc...@apache.org>
Date:   Wed Jun 23 16:32:43 2021 -0400


    LUCENE-9902: move CHANGES entry to 8.10.0


commit 48ff29c8f358f4dc4fad48997b8ebfde5d2e5751
Author: Patrick Zhai <zh...@users.noreply.github.com>
Date:   Wed Jun 23 10:07:22 2021 -0700


    LUCENE-9983: Stop sorting determinize powersets unnecessarily (#163)


    * LUCENE-9983: Stop sorting determinize powersets unnecessarily


commit 1d5d4589606e5acbc1f7f6059c8f76965f472435
Author: Adrien Grand <jpou...@gmail.com>
Date:   Wed Jun 23 15:37:50 2021 +0200


    LUCENE-9613: Encode ordinals like numerics. (#186)


    This helps simplify the code, and also adds some optimizations to ordinals 
like
    better compression for long runs of equal values or fields that are used in
    index sorts.


commit 495bf6730fcf01937ea45cfa5b14f88352af07f1
Author: Michael Gibney <mich...@michaelgibney.net>
Date:   Wed Jun 23 07:53:30 2021 -0400


    For stability of DisjunctionIntervalsSource.toString(), sort subSources 
(#193)


    Iterators over subSources of DisjunctionIntervalsSource may
    return elements in indeterminate order, requiring special handling
    to make toString() output stable across equivalent instances {noformat}
The only other big change was LUCENE-9983, but that is in an unrelated part of 
the code base versus these benchmark tasks.

 

> Create blocks for ords when it helps in Lucene80DocValuesFormat
> ---------------------------------------------------------------
>
>                 Key: LUCENE-9613
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9613
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>             Fix For: main (9.0)
>
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently for sorted(-set) values, we always write ords using 
> log2(valueCount) bits per entry. However in several cases like when the field 
> is used in the index sort, or if one value is _very_common, splitting into 
> blocks like we do for numerics would help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9613) Create blocks for ords when it helps in Lucene80DocValuesFormat

Reply via email to