[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

Michael McCandless (Jira) Fri, 22 May 2020 04:52:30 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17113960#comment-17113960
 ]


Michael McCandless commented on LUCENE-9378:
--------------------------------------------

Thanks for running the luceneutil benchmarks Michael Sokolov!
{quote}Interestingly, BrowseDateTaxoFacets shows a big improvement! But 
otherwise we see a pretty significant degradation in performance.
{quote}
That is fascinating, because faceting uses BINARY DV to hold all ordinals.  I 
wonder whether the BINARY DV compression somehow makes faceting faster!?  Could 
you try running the tasks w/ normal relevance sort to see impact on 
{{BrowseDateTaxoFacets}}?   (So we can separate "sorting by BINARY compressed" 
from "faceting on BINARY compressed").

Robert Muir also suggested this idea: have we verified that the block 
decompression only happens once per block, when we {{.advance}} to multiple 
(increasing) docids in the block?  The sizable performance hits are so big in 
the results above that it makes me wonder if we are accidentally decompressing 
on every {{.advance}} rather than once per block.

Also, I wonder why the original benchmarks on the issue didn't uncover similar 
performance changes.

> Configurable compression for BinaryDocValues
> --------------------------------------------
>
>                 Key: LUCENE-9378
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9378
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Viral Gandhi
>            Priority: Minor
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9378) Configurable compression for BinaryDocValues

Reply via email to