github-actions[bot] commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2832817823
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
rmuir commented on code in PR #14433:
URL: https://github.com/apache/lucene/pull/14433#discussion_r2040160001
##
lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java:
##
@@ -161,6 +162,17 @@ public long computeNorm(FieldInvertState state) {
return Smal
dsmiley commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2795033321
The need is to incorporate a field's position length in a
composable/flexible relevance formula. A LongValues is the way to do that. I
understand a Lucene user could write a custom Sim
rmuir commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2792476458
I don't have any suggestion, I don't see the need for users to try to
reimplement Similarity with valuesources.
--
This is an automated message from the Apache Git Service.
To respond to
dsmiley commented on code in PR #14433:
URL: https://github.com/apache/lucene/pull/14433#discussion_r2034358885
##
lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java:
##
@@ -161,6 +162,17 @@ public long computeNorm(FieldInvertState state) {
return Sm
rmuir commented on code in PR #14433:
URL: https://github.com/apache/lucene/pull/14433#discussion_r2034192030
##
lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java:
##
@@ -161,6 +162,17 @@ public long computeNorm(FieldInvertState state) {
return Smal
dsmiley commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2787942020
What name would you suggest then, Rob?
There's something to be said for choosing a name that's correct for the vast
majority of cases, even if hypothetically a Similarity might do some
rmuir commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2787922548
I think there is a high-level problem here, as i stated originally, that
norm is not any position length. For example it may be based on
`FieldInvertState.getMaxTermFrequency()` or
`Field
rmuir commented on code in PR #14433:
URL: https://github.com/apache/lucene/pull/14433#discussion_r2034196400
##
lucene/queries/src/java/org/apache/lucene/queries/function/IndexReaderFunctions.java:
##
@@ -301,6 +304,17 @@ public static DoubleValuesSource docCount(String field)
rmuir commented on code in PR #14433:
URL: https://github.com/apache/lucene/pull/14433#discussion_r2034195959
##
lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java:
##
@@ -161,6 +162,17 @@ public long computeNorm(FieldInvertState state) {
return Smal
dsmiley commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2784143694
Would it make sense in this PR to add a `Similarity.decodeNorm(long norm)`
returning an int of the field position length? It feels like the right thing
to add.
--
This is an automate
bruno-roustant commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2777888670
Why not numTerms() instead of positionLength()?
Inside Similarity.computeNorm(), the value is named numTerms.
--
This is an automated message from the Apache Git Service.
To r
dsmiley opened a new pull request, #14433:
URL: https://github.com/apache/lucene/pull/14433
### Description
Introduces
`org.apache.lucene.queries.function.IndexReaderFunctions#positionLength`
Javadocs:
> Creates a value source that returns the position length (number of term
dsmiley commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2780732429
`fieldLength` works for me. I'd like `fieldPositionLength` more as it
characterizes the basis of the length (it's not characters). BTW some other
methods on this class don't have "fiel
jpountz commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2780644329
What about calling it just "field length", since this is the length as
computed for the purpose of length normalization?
--
This is an automated message from the Apache Git Service.
To
dsmiley commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2778653535
I'd expect a hypothetical `IndexReaderFunctions.numTerms(field)` to return
the number of terms in the index for that field. That's not even close to what
we want! "Length" should be a
dsmiley commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2777428910
Thanks for the historical context!
I can definitely add more docs; I started with the bare minimum. Definitely
need to emphasize a dependency on the default `computeNorm` formula!
rmuir commented on PR #14433:
URL: https://github.com/apache/lucene/pull/14433#issuecomment-2774215695
I think the history is just that this norm can contain arbitrary value,
which before was a suboptimal encoding into a single byte. There was a
ValueSource that assumed it was a single byte
18 matches
Mail list logo