Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

via GitHub Wed, 02 Apr 2025 19:32:16 -0700


rmuir commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2774215695


   I think the history is just that this norm can contain arbitrary value, 
which before was a suboptimal encoding into a single byte. There was a 
ValueSource that assumed it was a single byte, so that was moved to only work 
with TFIDF for backwards compatibility purposes.
   
   Elsewhere, norm was extended and generalized to be opaque 64-bit value. 
Depending upon the Similarity's index-time `computeNorm()` implementation, it 
might not even be possible to decode to a float.
   
   But the default encoding was also fixed to be practical, by @jpountz, whilst 
still using a single byte. So in practice all the built-in Similarities use the 
same encoding and can work with this: it just won't work if you extend 
Similarity to do something else.
   
   Any confusion can be solved with documentation:
   * should be clear that this only works, if your similarity uses the default 
implementation of `computeNorm()`
   * don't think PositionLength is a good name, norm is not that (see 
discountOverlaps as an example).
   
   Also I would ask if we really need this `EMPTY` instance: it would be good 
to keep polymorphism under wraps.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

Reply via email to