gandhi-viral commented on pull request #1543: URL: https://github.com/apache/lucene-solr/pull/1543#issuecomment-642889243
Red-line QPS (throughput) based on our internal benchmarking is still unfortunately suffering (-49%) with the latest PR. We were able to isolate one particular field, a ~90 byte on average metadata field, which is causing most of our regression. After disabling compression on that particular field, we are at -8% red-line QPS compared to using Lucene 8.4 BDVs. Looking further into the access pattern for that field, we see that (num_access / num_blocks_decompressed = 1.51), so we are decompressing a whole block per every ~1.5 hits. By temporarily using `BINARY_LENGTH_COMPRESSION_THRESHOLD = 10000` to effectively disable the LZ4 compression, we are at -2% red-line QPS, which we could live with. Could we maybe add an option to the `Lucene80DocValuesConsumer` constructor to disable compression for BinaryDocValues, or to control the 32 byte threshold? We could enable this compression by default, since it’s clearly helpful in many cases from the `luceneutil` benchmarks, but let expert users create their custom Codec to control it. Thank you @jpountz for your help. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org