[ 
https://issues.apache.org/jira/browse/LUCENE-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302450#comment-17302450
 ] 

Michael McCandless commented on LUCENE-9843:
--------------------------------------------

{quote}There is a more obvious one to fix immediately: {{SORTED}}. Why is the 
codec option available on {{SORTED}} terms dictionary? The option is not 
necessary: it does not impact the speed of per-document ordinals. And the term 
dictionary (for lookupOrd) is block-compressed, prefix coded, etc regardless of 
what you supply. So let's please remove the option there.
{quote}
+1, I agree use cases should not be relying on super fast ord lookup, so 
hardwired compression is the right choice here.

 
{quote}For the {{BINARY}}, I personally think it is wrong to compress by 
default, in the default codec. The user wants a per-document byte[] (with their 
custom encoding), we should make it fast and just plumb it through. It's like a 
catch-all type when no other type (numeric, string, etc) is truly suitable. 
Sure, maybe some users are putting "yuge" stuff in there, where compression 
might not hurt their speed and save some disk: we could supply a different 
codec in the {{codecs/}} package for such users. But I don't think it makes 
sense at all to support in the default codec with backwards compatibility.
{quote}
Yeah, +1.

> Remove compression option on doc values
> ---------------------------------------
>
>                 Key: LUCENE-9843
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9843
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> Options on file formats add complexity and put a big tax on 
> backward-compatibility testing. I'm the one who introduced it LUCENE-9378 but 
> I would now like to think about what we can do to remove this option.
> For the record, compression was initially introduced because some binary 
> fields have so much redundancy that it's wasteful not to compress them at 
> all. But unfortunately, this slowed down some search workloads and we decided 
> to introduce this option as a way to let users choose the trade-off they want.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to