[GitHub] [lucene-solr] jpountz commented on pull request #1543: LUCENE-9378: Disable compression on binary values whose length is less than 32.

GitBox Thu, 04 Jun 2020 08:04:52 -0700


jpountz commented on pull request #1543:
URL: https://github.com/apache/lucene-solr/pull/1543#issuecomment-638908545



   Hi @msokolov I'm looking into improving the encoding of lengths, which is 
the next bottleneck for binary doc values. We are using binary doc values to 
run regex queries on a high cardinality field. A ngram index helps find good 
candidates and binary doc values are then used for verification. Field values 
are typically files, URLs, ... which can have significant redundancy. I'm ok 
with making compression less aggressive though I think that it would be a pity 
to disable it entirely and never take advantage of redundancy. You mentioned 
slowdowns, but this actually depends on the query, e.g. I'm seeing an almost 2x 
speedup when sorting a `MatchAllDocsQuery` on `wikimedium10m`. Let me give a 
try at making compression less aggressive/slow?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on pull request #1543: LUCENE-9378: Disable compression on binary values whose length is less than 32.

Reply via email to