[ 
https://issues.apache.org/jira/browse/LUCENE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17333044#comment-17333044
 ] 

Adrien Grand commented on LUCENE-8069:
--------------------------------------

bq. I guess people wanting these benefits today without any changes to Lucene 
could simply add a norm-like field (e.g. sum of raw char lengths of all 
tokenized fields) and then configure Lucene to sort on that. Would that work?

One thing that occurred to me recently is that we could make indexing faster if 
we actually used the norm instead of requiring users to index some for of proxy 
for the length normalization factor: because Lucene encodes norms on bytes, 
norms are low-cardinality fields, which in-turn gives us more options to make 
indexing faster when sorting is enabled via something like LUCENE-9935 (stored 
fields merging is currently a major bottleneck when doing bulk indexing with 
index sorting enabled).

> Allow index sorting by field length
> -----------------------------------
>
>                 Key: LUCENE-8069
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8069
>             Project: Lucene - Core
>          Issue Type: Wish
>            Reporter: Adrien Grand
>            Priority: Minor
>
> Short documents are more likely to get higher scores, so sorting an index by 
> field length would mean we would be likely to collect best matches first. 
> Depending on the similarity implementation, this might even allow to early 
> terminate collection of top documents on term queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to