[ https://issues.apache.org/jira/browse/LUCENE-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17333044#comment-17333044 ]
Adrien Grand commented on LUCENE-8069: -------------------------------------- bq. I guess people wanting these benefits today without any changes to Lucene could simply add a norm-like field (e.g. sum of raw char lengths of all tokenized fields) and then configure Lucene to sort on that. Would that work? One thing that occurred to me recently is that we could make indexing faster if we actually used the norm instead of requiring users to index some for of proxy for the length normalization factor: because Lucene encodes norms on bytes, norms are low-cardinality fields, which in-turn gives us more options to make indexing faster when sorting is enabled via something like LUCENE-9935 (stored fields merging is currently a major bottleneck when doing bulk indexing with index sorting enabled). > Allow index sorting by field length > ----------------------------------- > > Key: LUCENE-8069 > URL: https://issues.apache.org/jira/browse/LUCENE-8069 > Project: Lucene - Core > Issue Type: Wish > Reporter: Adrien Grand > Priority: Minor > > Short documents are more likely to get higher scores, so sorting an index by > field length would mean we would be likely to collect best matches first. > Depending on the similarity implementation, this might even allow to early > terminate collection of top documents on term queries. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org