Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-07 Thread via GitHub
dsmiley commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2784143694 Would it make sense in this PR to add a `Similarity.decodeNorm(long norm)` returning an int of the field position length? It feels like the right thing to add. -- This is an automate

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-05 Thread via GitHub
bruno-roustant commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2777888670 Why not numTerms() instead of positionLength()? Inside Similarity.computeNorm(), the value is named numTerms. -- This is an automated message from the Apache Git Service. To r

[PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-05 Thread via GitHub
dsmiley opened a new pull request, #14433: URL: https://github.com/apache/lucene/pull/14433 ### Description Introduces `org.apache.lucene.queries.function.IndexReaderFunctions#positionLength` Javadocs: > Creates a value source that returns the position length (number of term

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-05 Thread via GitHub
dsmiley commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2780732429 `fieldLength` works for me. I'd like `fieldPositionLength` more as it characterizes the basis of the length (it's not characters). BTW some other methods on this class don't have "fiel

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-05 Thread via GitHub
jpountz commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2780644329 What about calling it just "field length", since this is the length as computed for the purpose of length normalization? -- This is an automated message from the Apache Git Service. To

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-04 Thread via GitHub
dsmiley commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2778653535 I'd expect a hypothetical `IndexReaderFunctions.numTerms(field)` to return the number of terms in the index for that field. That's not even close to what we want! "Length" should be a

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-03 Thread via GitHub
dsmiley commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2777428910 Thanks for the historical context! I can definitely add more docs; I started with the bare minimum. Definitely need to emphasize a dependency on the default `computeNorm` formula!

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-02 Thread via GitHub
rmuir commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2774215695 I think the history is just that this norm can contain arbitrary value, which before was a suboptimal encoding into a single byte. There was a ValueSource that assumed it was a single byte