[ https://issues.apache.org/jira/browse/LUCENE-9378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17114010#comment-17114010 ]
Adrien Grand commented on LUCENE-9378: -------------------------------------- The code suggests that block decompression happens only once per block indeed. I'm not very familiar with the facets tasks, do they consume all docs by any chance? A side-effect of bulk-decoding multiple values at once is that selective queries get slower because they likely decompress values that they don't need, but queries that match most documents like MatchAllDocsQuery might get faster. Another factor that probably plays a role here is how compressible the data is. The compression logic we're using is fast when data is barely compressible and gets slower if the data is highly compressible. So depending on how compressible the data is, performance results could be extremely different. Maybe we should update the Disk usage tool (https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/DiskUsage.70.java) to work with the Lucene84 and Lucene86 codecs to get a clearer picture about the storage savings on a per-field basis. > Configurable compression for BinaryDocValues > -------------------------------------------- > > Key: LUCENE-9378 > URL: https://issues.apache.org/jira/browse/LUCENE-9378 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Viral Gandhi > Priority: Minor > > Lucene 8.5.1 includes a change to always [compress > BinaryDocValues|https://issues.apache.org/jira/browse/LUCENE-9211]. This > caused (~30%) reduction in our red-line QPS (throughput). > We think users should be given some way to opt-in for this compression > feature instead of always being enabled which can have a substantial query > time cost as we saw during our upgrade. [~mikemccand] suggested one possible > approach by introducing a *mode* in Lucene84DocValuesFormat (COMPRESSED and > UNCOMPRESSED) and allowing users to create a custom Codec subclassing the > default Codec and pick the format they want. > Idea is similar to Lucene50StoredFieldsFormat which has two modes, > Mode.BEST_SPEED and Mode.BEST_COMPRESSION. > Here's related issues for adding benchmark covering BINARY doc values > query-time performance - [https://github.com/mikemccand/luceneutil/issues/61] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org