dsmiley commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2784143694
Would it make sense in this PR to add a `Similarity.decodeNorm(long norm)`
returning an int of the field position length? It feels like the right thing
to add.
--
This is an automate
bruno-roustant commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2777888670
Why not numTerms() instead of positionLength()?
Inside Similarity.computeNorm(), the value is named numTerms.
--
This is an automated message from the Apache Git Service.
To r
dsmiley opened a new pull request, #14433:
URL: https://github.com/apache/lucene/pull/14433
### Description
Introduces
`org.apache.lucene.queries.function.IndexReaderFunctions#positionLength`
Javadocs:
> Creates a value source that returns the position length (number of term
dsmiley commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2780732429
`fieldLength` works for me. I'd like `fieldPositionLength` more as it
characterizes the basis of the length (it's not characters). BTW some other
methods on this class don't have "fiel
jpountz commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2780644329
What about calling it just "field length", since this is the length as
computed for the purpose of length normalization?
--
This is an automated message from the Apache Git Service.
To
dsmiley commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2778653535
I'd expect a hypothetical `IndexReaderFunctions.numTerms(field)` to return
the number of terms in the index for that field. That's not even close to what
we want! "Length" should be a
dsmiley commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2777428910
Thanks for the historical context!
I can definitely add more docs; I started with the bare minimum. Definitely
need to emphasize a dependency on the default `computeNorm` formula!
rmuir commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2774215695
I think the history is just that this norm can contain arbitrary value,
which before was a suboptimal encoding into a single byte. There was a
ValueSource that assumed it was a single byte