[
https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198570#comment-17198570
]
Michael McCandless commented on LUCENE-9378:
--------------------------------------------
{quote}I'm not sure what the path forward is.
{color:#172b4d}I'm a bit unhappy of abandoning completely the idea of
leveraging redundancy we might see across values.{color}
{quote}
{color:#172b4d}Can't we leave the (great!) compression on by default, maybe
with shared dictionaries across blocks (later), but then allow a backwards
compatible option for expert users passing arguments to the default doc values
format ctor at segment write time? Can we find a way to do that, so that we
[do not really have to support two different doc values
formats|https://github.com/apache/lucene-solr/pull/1543#issuecomment-669927391]?{color}
> Configurable compression for BinaryDocValues
> --------------------------------------------
>
> Key: LUCENE-9378
> URL: https://issues.apache.org/jira/browse/LUCENE-9378
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Viral Gandhi
> Priority: Minor
> Attachments: hotspots-v76x.png, hotspots-v76x.png, hotspots-v76x.png,
> hotspots-v76x.png, hotspots-v76x.png, hotspots-v77x.png, hotspots-v77x.png,
> hotspots-v77x.png, hotspots-v77x.png, image-2020-06-12-22-17-30-339.png,
> image-2020-06-12-22-17-53-961.png, image-2020-06-12-22-18-24-527.png,
> image-2020-06-12-22-18-48-919.png, snapshot-v77x.nps, snapshot-v77x.nps,
> snapshot-v77x.nps, snapshots-v76x.nps, snapshots-v76x.nps, snapshots-v76x.nps
>
> Time Spent: 4h 10m
> Remaining Estimate: 0h
>
> Lucene 8.5.1 includes a change to always [compress
> BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This
> caused (~30%) reduction in our red-line QPS (throughput).
> We think users should be given some way to opt-in for this compression
> feature instead of always being enabled which can have a substantial query
> time cost as we saw during our upgrade. [~mikemccand] suggested one possible
> approach by introducing a *mode* in Lucene80DocValuesFormat (COMPRESSED and
> UNCOMPRESSED) and allowing users to create a custom Codec subclassing the
> default Codec and pick the format they want.
> Idea is similar to Lucene50StoredFieldsFormat which has two modes,
> Mode.BEST_SPEED and Mode.BEST_COMPRESSION.
> Here's related issues for adding benchmark covering BINARY doc values
> query-time performance - [https://github.com/mikemccand/luceneutil/issues/61]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]