[ https://issues.apache.org/jira/browse/LUCENE-9613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404360#comment-17404360 ]
Michael McCandless commented on LUCENE-9613: -------------------------------------------- I am not certain, but this change was likely responsible for a drop in SSDV facets performance on 06/24. Thank you to [~rcmuir ] for noticing this. See [Faceting > All Months (doc values)|https://home.apache.org/~mikemccand/lucenebench/BrowseMonthSSDVFacets.html] and [Faceting > All dayOfYear (doc values)|https://home.apache.org/~mikemccand/lucenebench/BrowseDayOfYearSSDVFacets.html] charts. These tasks are very heavily measuring SSDV decode time, since they iterate all docs, decoding ordinals for one field and incrementing counters. I don't think we should revert – this change is a good gain in compression. But maybe we can somehow optimize the block decoding, or even just be aware that this cost some additional CPU for decoding the blocks of ordinals. Here were all the changes in that time period: {noformat} commit db26215f156d956143e29f1ce43f90c30cd8a107 Author: Mike McCandless <mikemcc...@apache.org> Date: Wed Jun 23 16:32:43 2021 -0400 LUCENE-9902: move CHANGES entry to 8.10.0 commit 48ff29c8f358f4dc4fad48997b8ebfde5d2e5751 Author: Patrick Zhai <zh...@users.noreply.github.com> Date: Wed Jun 23 10:07:22 2021 -0700 LUCENE-9983: Stop sorting determinize powersets unnecessarily (#163) * LUCENE-9983: Stop sorting determinize powersets unnecessarily commit 1d5d4589606e5acbc1f7f6059c8f76965f472435 Author: Adrien Grand <jpou...@gmail.com> Date: Wed Jun 23 15:37:50 2021 +0200 LUCENE-9613: Encode ordinals like numerics. (#186) This helps simplify the code, and also adds some optimizations to ordinals like better compression for long runs of equal values or fields that are used in index sorts. commit 495bf6730fcf01937ea45cfa5b14f88352af07f1 Author: Michael Gibney <mich...@michaelgibney.net> Date: Wed Jun 23 07:53:30 2021 -0400 For stability of DisjunctionIntervalsSource.toString(), sort subSources (#193) Iterators over subSources of DisjunctionIntervalsSource may return elements in indeterminate order, requiring special handling to make toString() output stable across equivalent instances {noformat} The only other big change was LUCENE-9983, but that is in an unrelated part of the code base versus these benchmark tasks. > Create blocks for ords when it helps in Lucene80DocValuesFormat > --------------------------------------------------------------- > > Key: LUCENE-9613 > URL: https://issues.apache.org/jira/browse/LUCENE-9613 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Fix For: main (9.0) > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Currently for sorted(-set) values, we always write ords using > log2(valueCount) bits per entry. However in several cases like when the field > is used in the index sort, or if one value is _very_common, splitting into > blocks like we do for numerics would help. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org