Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-26 Thread via GitHub
github-actions[bot] commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2832817823 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-11 Thread via GitHub
rmuir commented on code in PR #14433: URL: https://github.com/apache/lucene/pull/14433#discussion_r2040160001 ## lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java: ## @@ -161,6 +162,17 @@ public long computeNorm(FieldInvertState state) { return Smal

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-10 Thread via GitHub
dsmiley commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2795033321 The need is to incorporate a field's position length in a composable/flexible relevance formula. A LongValues is the way to do that. I understand a Lucene user could write a custom Sim

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-10 Thread via GitHub
rmuir commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2792476458 I don't have any suggestion, I don't see the need for users to try to reimplement Similarity with valuesources. -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-08 Thread via GitHub
dsmiley commented on code in PR #14433: URL: https://github.com/apache/lucene/pull/14433#discussion_r2034358885 ## lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java: ## @@ -161,6 +162,17 @@ public long computeNorm(FieldInvertState state) { return Sm

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-08 Thread via GitHub
rmuir commented on code in PR #14433: URL: https://github.com/apache/lucene/pull/14433#discussion_r2034192030 ## lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java: ## @@ -161,6 +162,17 @@ public long computeNorm(FieldInvertState state) { return Smal

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-08 Thread via GitHub
dsmiley commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2787942020 What name would you suggest then, Rob? There's something to be said for choosing a name that's correct for the vast majority of cases, even if hypothetically a Similarity might do some

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-08 Thread via GitHub
rmuir commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2787922548 I think there is a high-level problem here, as i stated originally, that norm is not any position length. For example it may be based on `FieldInvertState.getMaxTermFrequency()` or `Field

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-08 Thread via GitHub
rmuir commented on code in PR #14433: URL: https://github.com/apache/lucene/pull/14433#discussion_r2034196400 ## lucene/queries/src/java/org/apache/lucene/queries/function/IndexReaderFunctions.java: ## @@ -301,6 +304,17 @@ public static DoubleValuesSource docCount(String field)

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-08 Thread via GitHub
rmuir commented on code in PR #14433: URL: https://github.com/apache/lucene/pull/14433#discussion_r2034195959 ## lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java: ## @@ -161,6 +162,17 @@ public long computeNorm(FieldInvertState state) { return Smal

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-07 Thread via GitHub
dsmiley commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2784143694 Would it make sense in this PR to add a `Similarity.decodeNorm(long norm)` returning an int of the field position length? It feels like the right thing to add. -- This is an automate

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-05 Thread via GitHub
bruno-roustant commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2777888670 Why not numTerms() instead of positionLength()? Inside Similarity.computeNorm(), the value is named numTerms. -- This is an automated message from the Apache Git Service. To r

[PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-05 Thread via GitHub
dsmiley opened a new pull request, #14433: URL: https://github.com/apache/lucene/pull/14433 ### Description Introduces `org.apache.lucene.queries.function.IndexReaderFunctions#positionLength` Javadocs: > Creates a value source that returns the position length (number of term

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-05 Thread via GitHub
dsmiley commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2780732429 `fieldLength` works for me. I'd like `fieldPositionLength` more as it characterizes the basis of the length (it's not characters). BTW some other methods on this class don't have "fiel

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-05 Thread via GitHub
jpountz commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2780644329 What about calling it just "field length", since this is the length as computed for the purpose of length normalization? -- This is an automated message from the Apache Git Service. To

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-04 Thread via GitHub
dsmiley commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2778653535 I'd expect a hypothetical `IndexReaderFunctions.numTerms(field)` to return the number of terms in the index for that field. That's not even close to what we want! "Length" should be a

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-03 Thread via GitHub
dsmiley commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2777428910 Thanks for the historical context! I can definitely add more docs; I started with the bare minimum. Definitely need to emphasize a dependency on the default `computeNorm` formula!

Re: [PR] New IndexReaderFunctions.positionLength from the norm [lucene]

2025-04-02 Thread via GitHub
rmuir commented on PR #14433: URL: https://github.com/apache/lucene/pull/14433#issuecomment-2774215695 I think the history is just that this norm can contain arbitrary value, which before was a suboptimal encoding into a single byte. There was a ValueSource that assumed it was a single byte