[ 
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114010#comment-17114010
 ] 

Adrien Grand commented on LUCENE-9378:
--------------------------------------

The code suggests that block decompression happens only once per block indeed. 
I'm not very familiar with the facets tasks, do they consume all docs by any 
chance? A side-effect of bulk-decoding multiple values at once is that 
selective queries get slower because they likely decompress values that they 
don't need, but queries that match most documents like MatchAllDocsQuery might 
get faster.

Another factor that probably plays a role here is how compressible the data is. 
The compression logic we're using is fast when data is barely compressible and 
gets slower if the data is highly compressible. So depending on how 
compressible the data is, performance results could be extremely different. 
Maybe we should update the Disk usage tool 
(https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/DiskUsage.70.java)
 to work with the Lucene84 and Lucene86 codecs to get a clearer picture about 
the storage savings on a per-field basis.

> Configurable compression for BinaryDocValues
> --------------------------------------------
>
>                 Key: LUCENE-9378
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9378
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Viral Gandhi
>            Priority: Minor
>
> Lucene 8.5.1 includes a change to always [compress 
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This 
> caused (~30%) reduction in our red-line QPS (throughput). 
> We think users should be given some way to opt-in for this compression 
> feature instead of always being enabled which can have a substantial query 
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible 
> approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and 
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the 
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes, 
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values 
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to