Re: [PR] Replace Map with IntObjectHashMap for KnnVectorsReader [lucene]

2024-09-12 Thread via GitHub
bugmakerr commented on code in PR #13763: URL: https://github.com/apache/lucene/pull/13763#discussion_r1756317165 ## lucene/core/src/java/org/apache/lucene/codecs/perfield/PerFieldKnnVectorsFormat.java: ## @@ -239,51 +245,69 @@ public FieldsReader(final SegmentReadState read

Re: [PR] Remove usage of IndexSearcher#search(Query, Collector) from join package [lucene]

2024-09-12 Thread via GitHub
javanna commented on code in PR #13747: URL: https://github.com/apache/lucene/pull/13747#discussion_r1756305014 ## lucene/join/src/java/org/apache/lucene/search/join/MergeableCollector.java: ## @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [I] Can `gradle tidy` reflow text properly when it inserts newlines? [lucene]

2024-09-12 Thread via GitHub
dweiss commented on issue #13766: URL: https://github.com/apache/lucene/issues/13766#issuecomment-2345552152 So I just double checked and block-level comments are treated as preformatted within code. This: ``` /* * fba sdifhopi hweifh swehf doish dsopih ewfpoih efwspoihb we

Re: [PR] Remove usage of IndexSearcher#search(Query, Collector) from join package [lucene]

2024-09-12 Thread via GitHub
msfroh commented on code in PR #13747: URL: https://github.com/apache/lucene/pull/13747#discussion_r1756363078 ## lucene/join/src/java/org/apache/lucene/search/join/MergeableCollector.java: ## @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Remove usage of IndexSearcher#search(Query, Collector) from join package [lucene]

2024-09-12 Thread via GitHub
msfroh commented on code in PR #13747: URL: https://github.com/apache/lucene/pull/13747#discussion_r1756378484 ## lucene/join/src/java/org/apache/lucene/search/join/MergeableCollector.java: ## @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Remove usage of IndexSearcher#search(Query, Collector) from join package [lucene]

2024-09-12 Thread via GitHub
msfroh commented on code in PR #13747: URL: https://github.com/apache/lucene/pull/13747#discussion_r1756385824 ## lucene/join/src/java/org/apache/lucene/search/join/MergeableCollector.java: ## @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Remove usage of IndexSearcher#search(Query, Collector) from join package [lucene]

2024-09-12 Thread via GitHub
msfroh commented on code in PR #13747: URL: https://github.com/apache/lucene/pull/13747#discussion_r1756385824 ## lucene/join/src/java/org/apache/lucene/search/join/MergeableCollector.java: ## @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Remove usage of IndexSearcher#search(Query, Collector) from join package [lucene]

2024-09-12 Thread via GitHub
msfroh commented on code in PR #13747: URL: https://github.com/apache/lucene/pull/13747#discussion_r1756429132 ## lucene/join/src/java/org/apache/lucene/search/join/MergeableCollector.java: ## @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] Remove usage of IndexSearcher#search(Query, Collector) from join package [lucene]

2024-09-12 Thread via GitHub
msfroh commented on code in PR #13747: URL: https://github.com/apache/lucene/pull/13747#discussion_r1756429132 ## lucene/join/src/java/org/apache/lucene/search/join/MergeableCollector.java: ## @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or

Re: [PR] similarities: provide default computeNorm implementation; remove remaining discountOverlaps setters; [lucene]

2024-09-12 Thread via GitHub
cpoerschke commented on code in PR #13757: URL: https://github.com/apache/lucene/pull/13757#discussion_r1756534635 ## lucene/core/src/java/org/apache/lucene/search/similarities/DFRSimilarity.java: ## @@ -94,6 +94,26 @@ public class DFRSimilarity extends SimilarityBase { */

Re: [PR] similarities: provide default computeNorm implementation; remove remaining discountOverlaps setters; [lucene]

2024-09-12 Thread via GitHub
cpoerschke commented on code in PR #13757: URL: https://github.com/apache/lucene/pull/13757#discussion_r1756540884 ## lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java: ## @@ -111,7 +133,17 @@ protected Similarity() {} * @param state current processi

Re: [I] Can we decrease the overhead of skipping? [lucene]

2024-09-12 Thread via GitHub
jpountz commented on issue #13106: URL: https://github.com/apache/lucene/issues/13106#issuecomment-2345810711 Fixed by https://github.com/apache/lucene/pull/13585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Can we decrease the overhead of skipping? [lucene]

2024-09-12 Thread via GitHub
jpountz closed issue #13106: Can we decrease the overhead of skipping? URL: https://github.com/apache/lucene/issues/13106 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[PR] Fix TestPrefixRandom failures when intra-segment concurrency is enabled [lucene]

2024-09-12 Thread via GitHub
javanna opened a new pull request, #13773: URL: https://github.com/apache/lucene/pull/13773 The order in which documents are processed is not a guarantee, hence we may return a different set of documents when early terminating the search. Those are not incorrect results though. I opt

Re: [PR] Remove usage of IndexSearcher#search(Query, Collector) from join package [lucene]

2024-09-12 Thread via GitHub
javanna commented on code in PR #13747: URL: https://github.com/apache/lucene/pull/13747#discussion_r1756577220 ## lucene/join/src/java/org/apache/lucene/search/join/MergeableCollector.java: ## @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] PayloadScoreQuery javadoc update w.r.t. SpanQuery use [lucene]

2024-09-12 Thread via GitHub
cpoerschke merged PR #13731: URL: https://github.com/apache/lucene/pull/13731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [PR] Address length used to copy array in FacetsCollector to not be out of bounds [lucene]

2024-09-12 Thread via GitHub
javanna merged PR #13774: URL: https://github.com/apache/lucene/pull/13774 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[PR] Lower number of hits in TestLargeNumHitsTopDocsCollector [lucene]

2024-09-12 Thread via GitHub
javanna opened a new pull request, #13775: URL: https://github.com/apache/lucene/pull/13775 There's two tests where we use 250_000 as number of collected hits, but we only ever index max 2000 docs. That makes use create a priority queue of size 250_000 for each segment partition which cause

Re: [PR] Lower number of hits in TestLargeNumHitsTopDocsCollector [lucene]

2024-09-12 Thread via GitHub
javanna commented on code in PR #13775: URL: https://github.com/apache/lucene/pull/13775#discussion_r1756658422 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/LargeNumHitsTopDocsCollector.java: ## @@ -137,7 +137,7 @@ protected void populateResults(ScoreDoc[] results

Re: [I] Can we remove `compress` option for quantized KNN vector indexing? [lucene]

2024-09-12 Thread via GitHub
mikemccand commented on issue #13768: URL: https://github.com/apache/lucene/issues/13768#issuecomment-2346033320 Well, I ran [`knnPerfTest.py`](https://github.com/mikemccand/luceneutil/blob/f4a07ed8de36c47aacb6033a3709e236bc42aca4/src/python/knnPerfTest.py) on my Linux dev box (x86-64 Rapto

Re: [PR] Fix TestPrefixRandom failures when intra-segment concurrency is enabled [lucene]

2024-09-12 Thread via GitHub
jpountz commented on PR #13773: URL: https://github.com/apache/lucene/pull/13773#issuecomment-2346043431 > Those are not incorrect results though. I think that these are incorrect results. Inter-segment concurrency has a mechanism so that if the top-N-th hit has score S and doc ID D,

Re: [PR] Lower number of hits in TestLargeNumHitsTopDocsCollector [lucene]

2024-09-12 Thread via GitHub
jpountz commented on PR #13775: URL: https://github.com/apache/lucene/pull/13775#issuecomment-2346049667 A property of this collector is that it's supposed to allocate RAM in order of the actual number of collected hits rather than the max number of hits to retrieve. So this change defeats

Re: [I] Can we remove `compress` option for quantized KNN vector indexing? [lucene]

2024-09-12 Thread via GitHub
mikemccand commented on issue #13768: URL: https://github.com/apache/lucene/issues/13768#issuecomment-2346091188 OK I disabled Panama (via temporary code change in `VectorizationProvider.java` -- we don't accept sysprops to disable this anymore right?): ``` recall latency (ms)

Re: [PR] Integrate merge-time index reordering with the intra-merge executor. [lucene]

2024-09-12 Thread via GitHub
jpountz merged PR #13289: URL: https://github.com/apache/lucene/pull/13289 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Migrate more classes to records. [lucene]

2024-09-12 Thread via GitHub
shubhamvishu commented on code in PR #13772: URL: https://github.com/apache/lucene/pull/13772#discussion_r1756801465 ## lucene/core/src/java/org/apache/lucene/util/TermAndVector.java: ## @@ -24,37 +24,24 @@ * * @lucene.experimental */ -public class TermAndVector { - - pri

Re: [I] Can we remove `compress` option for quantized KNN vector indexing? [lucene]

2024-09-12 Thread via GitHub
ChrisHegarty commented on issue #13768: URL: https://github.com/apache/lucene/issues/13768#issuecomment-2346214427 > But I don't think we should block removing compress option due to non-SIMD results? I agree. At this point we're just comparing the scalar and SIMD implementation of t

Re: [PR] Try using Murmurhash 3 for bloom filters [lucene]

2024-09-12 Thread via GitHub
shubhamvishu commented on code in PR #12868: URL: https://github.com/apache/lucene/pull/12868#discussion_r1756832910 ## lucene/core/src/java/org/apache/lucene/util/StringHelper.java: ## @@ -209,6 +209,156 @@ public static int murmurhash3_x86_32(BytesRef bytes, int seed) {

Re: [PR] Try using Murmurhash 3 for bloom filters [lucene]

2024-09-12 Thread via GitHub
shubhamvishu commented on code in PR #12868: URL: https://github.com/apache/lucene/pull/12868#discussion_r1756836548 ## lucene/codecs/src/java/org/apache/lucene/codecs/bloom/FuzzySet.java: ## @@ -150,9 +150,10 @@ private FuzzySet(FixedBitSet filter, int bloomSize, int hashCount

Re: [PR] similarities: provide default computeNorm implementation; remove remaining discountOverlaps setters; [lucene]

2024-09-12 Thread via GitHub
rmuir commented on PR #13757: URL: https://github.com/apache/lucene/pull/13757#issuecomment-2346252096 I took a stab at improving the docs in https://github.com/apache/lucene/pull/13757/commits/47d4fa02f3c52fffc5aac9328a1de9c0e640ff27 -- This is an automated message from the Apache Git Se

Re: [PR] Migrate more classes to records. [lucene]

2024-09-12 Thread via GitHub
jpountz commented on code in PR #13772: URL: https://github.com/apache/lucene/pull/13772#discussion_r1756843565 ## lucene/core/src/java/org/apache/lucene/util/TermAndVector.java: ## @@ -24,37 +24,24 @@ * * @lucene.experimental */ -public class TermAndVector { - - private

Re: [PR] Lower number of hits in TestLargeNumHitsTopDocsCollector [lucene]

2024-09-12 Thread via GitHub
javanna commented on PR #13775: URL: https://github.com/apache/lucene/pull/13775#issuecomment-2346263866 Thanks for clarifying! But then is it a feature that it goes out of memory now? Especially given that it's a side by side comparison, and what causes the OOM is the ordinary top d

Re: [PR] Simplify FST return [lucene]

2024-09-12 Thread via GitHub
jpountz merged PR #13771: URL: https://github.com/apache/lucene/pull/13771 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Lower number of hits in TestLargeNumHitsTopDocsCollector [lucene]

2024-09-12 Thread via GitHub
jpountz commented on PR #13775: URL: https://github.com/apache/lucene/pull/13775#issuecomment-2346271119 > Or maybe it makes sense to lower the threshold only for the top docs collector part, +1 to that, let's pass n=reader.numDocs()? -- This is an automated message from the Apache

Re: [PR] Fix TestPrefixRandom failures when intra-segment concurrency is enabled [lucene]

2024-09-12 Thread via GitHub
javanna commented on PR #13773: URL: https://github.com/apache/lucene/pull/13773#issuecomment-2346282034 Happy that you commented, thanks! Can you help me better understand why these are incorrect results? ``` Error Message: java.lang.AssertionError: Hit 0 docnumbers don't

Re: [PR] Lower number of hits in TestLargeNumHitsTopDocsCollector [lucene]

2024-09-12 Thread via GitHub
javanna commented on PR #13775: URL: https://github.com/apache/lucene/pull/13775#issuecomment-2346289887 > +1 to that, let's pass n=reader.numDocs()? Sure, I made that change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Introduce ProfilerCollectorManager [lucene]

2024-09-12 Thread via GitHub
javanna merged PR #13746: URL: https://github.com/apache/lucene/pull/13746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Fix TestPrefixRandom failures when intra-segment concurrency is enabled [lucene]

2024-09-12 Thread via GitHub
jpountz commented on PR #13773: URL: https://github.com/apache/lucene/pull/13773#issuecomment-2346309641 TopScoreDocCollectorManager is supposed to return the top-N results that have the best score, tie-broken by doc ID. So (docID=0, score=1) compares better than (docID=155, score=1) and sh

Re: [PR] Lower number of hits in TestLargeNumHitsTopDocsCollector [lucene]

2024-09-12 Thread via GitHub
jpountz commented on code in PR #13775: URL: https://github.com/apache/lucene/pull/13775#discussion_r1756898252 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/LargeNumHitsTopDocsCollector.java: ## @@ -137,7 +137,7 @@ protected void populateResults(ScoreDoc[] results

Re: [PR] Further reduce the search concurrency overhead. [lucene]

2024-09-12 Thread via GitHub
javanna commented on code in PR #13606: URL: https://github.com/apache/lucene/pull/13606#discussion_r1756897923 ## lucene/core/src/java/org/apache/lucene/search/HitsThresholdChecker.java: ## @@ -24,6 +24,10 @@ abstract class HitsThresholdChecker { /** Implementation of HitsTh

Re: [PR] Fix TestPrefixRandom failures when intra-segment concurrency is enabled [lucene]

2024-09-12 Thread via GitHub
javanna closed pull request #13773: Fix TestPrefixRandom failures when intra-segment concurrency is enabled URL: https://github.com/apache/lucene/pull/13773 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Fix TestPrefixRandom failures when intra-segment concurrency is enabled [lucene]

2024-09-12 Thread via GitHub
javanna commented on PR #13773: URL: https://github.com/apache/lucene/pull/13773#issuecomment-2346327496 Understood, thanks for the explanation, I did not realize doc ids played a role, I thought it was an incorrect assumption and it worked with inter-segment concurrency only by coincidence

Re: [PR] Lower number of hits in TestLargeNumHitsTopDocsCollector [lucene]

2024-09-12 Thread via GitHub
javanna commented on code in PR #13775: URL: https://github.com/apache/lucene/pull/13775#discussion_r1756915409 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/LargeNumHitsTopDocsCollector.java: ## @@ -137,7 +137,7 @@ protected void populateResults(ScoreDoc[] results

Re: [PR] Further reduce the search concurrency overhead. [lucene]

2024-09-12 Thread via GitHub
jpountz commented on code in PR #13606: URL: https://github.com/apache/lucene/pull/13606#discussion_r1756914836 ## lucene/core/src/java/org/apache/lucene/search/HitsThresholdChecker.java: ## @@ -24,6 +24,10 @@ abstract class HitsThresholdChecker { /** Implementation of HitsTh

Re: [PR] Fix TestShapeDocValues.testLatLonPolygonBBox [lucene]

2024-09-12 Thread via GitHub
msokolov commented on PR #13743: URL: https://github.com/apache/lucene/pull/13743#issuecomment-2346349295 Should we have an epsilon here rather than exact comparison with 0? Usually that's the way of things with floating point checks - we would check |a-b| < epsilon -- This is an automat

Re: [PR] Lower number of hits in TestLargeNumHitsTopDocsCollector [lucene]

2024-09-12 Thread via GitHub
javanna merged PR #13775: URL: https://github.com/apache/lucene/pull/13775 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] similarities: provide default computeNorm implementation; remove remaining discountOverlaps setters; [lucene]

2024-09-12 Thread via GitHub
cpoerschke commented on PR #13757: URL: https://github.com/apache/lucene/pull/13757#issuecomment-2346451914 > I took a stab at improving the docs in [47d4fa0](https://github.com/apache/lucene/commit/47d4fa02f3c52fffc5aac9328a1de9c0e640ff27) Thanks! -- This is an automated message f

Re: [PR] Further reduce the search concurrency overhead. [lucene]

2024-09-12 Thread via GitHub
javanna commented on code in PR #13606: URL: https://github.com/apache/lucene/pull/13606#discussion_r1757003986 ## lucene/core/src/java/org/apache/lucene/search/HitsThresholdChecker.java: ## @@ -24,6 +24,10 @@ abstract class HitsThresholdChecker { /** Implementation of HitsTh

[PR] Make HnswLock and LockedRow final [lucene]

2024-09-12 Thread via GitHub
ChrisHegarty opened a new pull request, #13776: URL: https://github.com/apache/lucene/pull/13776 Trivial commit that makes HnswLock final and LockedRow a record. This is general clean up and helps the JIT a little when reasoning about these types - which show quite a bit in indexing and sea

Re: [PR] Fix TestShapeDocValues.testLatLonPolygonBBox [lucene]

2024-09-12 Thread via GitHub
shubhamvishu commented on PR #13743: URL: https://github.com/apache/lucene/pull/13743#issuecomment-2346624909 Yes, usually we do that but in this case there is a catch. This test ONLY fails when the generated points are non collinear(as expected) but the `area == 0` ([checked in this condi

Re: [PR] Fix TestShapeDocValues.testLatLonPolygonBBox [lucene]

2024-09-12 Thread via GitHub
stefanvodita commented on code in PR #13743: URL: https://github.com/apache/lucene/pull/13743#discussion_r1757126653 ## lucene/core/src/test/org/apache/lucene/document/TestShapeDocValues.java: ## @@ -255,4 +257,9 @@ private List getTessellation(XYPolygon p) { } return

Re: [PR] Add unit-of-least-precision float comparison [lucene]

2024-09-12 Thread via GitHub
stefanvodita commented on PR #13723: URL: https://github.com/apache/lucene/pull/13723#issuecomment-2346657253 Thank you for the feedback! I've added a comparison method for doubles and a test. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Fix TestPrefixRandom failures when intra-segment concurrency is enabled [lucene]

2024-09-12 Thread via GitHub
javanna commented on PR #13773: URL: https://github.com/apache/lucene/pull/13773#issuecomment-2346773435 I opened #13777 . It needs some proper review because I am not entirely sure what side effects it may have. There is currently no direct link between leaf collector and scorers, to pass

Re: [PR] Fix TestShapeDocValues.testLatLonPolygonBBox [lucene]

2024-09-12 Thread via GitHub
shubhamvishu commented on code in PR #13743: URL: https://github.com/apache/lucene/pull/13743#discussion_r1757244198 ## lucene/core/src/test/org/apache/lucene/document/TestShapeDocValues.java: ## @@ -255,4 +257,9 @@ private List getTessellation(XYPolygon p) { } return

Re: [PR] Fix TestShapeDocValues.testLatLonPolygonBBox [lucene]

2024-09-12 Thread via GitHub
shubhamvishu commented on code in PR #13743: URL: https://github.com/apache/lucene/pull/13743#discussion_r1757244198 ## lucene/core/src/test/org/apache/lucene/document/TestShapeDocValues.java: ## @@ -255,4 +257,9 @@ private List getTessellation(XYPolygon p) { } return

Re: [PR] Replace docBase with actual docId in MaxScoreAccumulator [lucene]

2024-09-12 Thread via GitHub
javanna commented on PR #13777: URL: https://github.com/apache/lucene/pull/13777#issuecomment-2346813650 I am not getting many test failures, which is surprising, because I think that this is entirely the wrong fix, in that it defeats the early termination purpose of top score docs collect

Re: [I] Can we remove `compress` option for quantized KNN vector indexing? [lucene]

2024-09-12 Thread via GitHub
benwtrent commented on issue #13768: URL: https://github.com/apache/lucene/issues/13768#issuecomment-2346877986 If we are ok with the perf hit on non-panama, I am cool with it :). It will definitely simplify the code. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] Add unit-of-least-precision float comparison [lucene]

2024-09-12 Thread via GitHub
uschindler commented on PR #13723: URL: https://github.com/apache/lucene/pull/13723#issuecomment-2346992406 I don't like the last commit because it changes from a assert-like method to a boolean returning method. Could we not keep the previous method signature and still add a test?

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-09-12 Thread via GitHub
vigyasharma commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2346995734 > Is "default run" from this PR? No. "default run" is knn search where each embedding is a separate document with no relationship between them. I'm still wiring things up to se

[PR] First-class random access API for KnnVectorValues [lucene]

2024-09-12 Thread via GitHub
msokolov opened a new pull request, #13779: URL: https://github.com/apache/lucene/pull/13779 addresses https://github.com/apache/lucene/issues/13778 Key things in this PR: 1. Introduces abstract `KnnVectorValues` from which `ByteVectorValues` and `FloatVectorValues` derive;

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2024-09-12 Thread via GitHub
vigyasharma commented on code in PR #13525: URL: https://github.com/apache/lucene/pull/13525#discussion_r1757395276 ## lucene/core/src/java/org/apache/lucene/index/MultiVectorSimilarityFunction.java: ## @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-12 Thread via GitHub
msokolov commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2347012727 another concern I have is how this would impact ongoing work to enable multiple vectors per doc/field. There would almost certainly be conflicts with that PR on the surface, but I hope

Re: [PR] Replace docBase with actual docId in MaxScoreAccumulator [lucene]

2024-09-12 Thread via GitHub
jpountz commented on code in PR #13777: URL: https://github.com/apache/lucene/pull/13777#discussion_r1757465545 ## lucene/core/src/java/org/apache/lucene/search/TopScoreDocCollector.java: ## @@ -254,10 +254,9 @@ protected void updateMinCompetitiveScore(Scorable scorer) throws I

Re: [PR] Replace docBase with actual docId in MaxScoreAccumulator [lucene]

2024-09-12 Thread via GitHub
jpountz commented on PR #13777: URL: https://github.com/apache/lucene/pull/13777#issuecomment-2347079637 > I think that this is entirely the wrong fix Why is it the wrong fix? This looks correct to me: we want to consider docs with equal or greater scores if the doc ID is less than th

[PR] Remove IndexSearcher#search(List, Weight, Collector) [lucene]

2024-09-12 Thread via GitHub
javanna opened a new pull request, #13780: URL: https://github.com/apache/lucene/pull/13780 With the introduction of intra-segment concurrency, we have introduced a new protected search(LeafReaderContextPartition, Weight, Collector) method. The previous variant that accepts a list of leaf r

Re: [PR] Replace docBase with actual docId in MaxScoreAccumulator [lucene]

2024-09-12 Thread via GitHub
javanna commented on code in PR #13777: URL: https://github.com/apache/lucene/pull/13777#discussion_r1757481284 ## lucene/core/src/java/org/apache/lucene/search/TopScoreDocCollector.java: ## @@ -254,10 +254,9 @@ protected void updateMinCompetitiveScore(Scorable scorer) throws I

Re: [PR] Replace docBase with actual docId in MaxScoreAccumulator [lucene]

2024-09-12 Thread via GitHub
javanna commented on PR #13777: URL: https://github.com/apache/lucene/pull/13777#issuecomment-2347094763 > Why is it the wrong fix? Happy to be proven wrong, especially when I believe that my fix is wrong. ahah I was under the impression that we may no longer early terminate if

Re: [PR] Migrate more classes to records. [lucene]

2024-09-12 Thread via GitHub
jpountz merged PR #13772: URL: https://github.com/apache/lucene/pull/13772 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Make HnswLock and LockedRow final [lucene]

2024-09-12 Thread via GitHub
ChrisHegarty merged PR #13776: URL: https://github.com/apache/lucene/pull/13776 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-12 Thread via GitHub
benwtrent commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2347102278 > but I hope this could actually simplify things That is my intuition as well. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Replace docBase with actual docId in MaxScoreAccumulator [lucene]

2024-09-12 Thread via GitHub
javanna commented on PR #13777: URL: https://github.com/apache/lucene/pull/13777#issuecomment-2347103042 > A test failure on TestConstantScoreScorer seems to show that pretty clearly. Ok, that was caused by using the docBase when already included in the doc id. Phew. This looks to be

Re: [PR] Replace docBase with actual docId in MaxScoreAccumulator [lucene]

2024-09-12 Thread via GitHub
javanna commented on code in PR #13777: URL: https://github.com/apache/lucene/pull/13777#discussion_r1757494555 ## lucene/core/src/java/org/apache/lucene/search/TopScoreDocCollector.java: ## @@ -232,7 +232,7 @@ protected void updateGlobalMinCompetitiveScore(Scorable scorer) thr

Re: [PR] Replace docBase with actual docId in MaxScoreAccumulator [lucene]

2024-09-12 Thread via GitHub
javanna commented on PR #13777: URL: https://github.com/apache/lucene/pull/13777#issuecomment-2347172996 This should be good to go now :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-12 Thread via GitHub
jpountz commented on code in PR #13779: URL: https://github.com/apache/lucene/pull/13779#discussion_r1757529511 ## lucene/core/src/java/org/apache/lucene/index/KnnVectorValues.java: ## @@ -0,0 +1,281 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-12 Thread via GitHub
jpountz commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2347207782 Am guessing correctly that you're targeting 10.0 for this change? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-12 Thread via GitHub
msokolov commented on code in PR #13779: URL: https://github.com/apache/lucene/pull/13779#discussion_r1757568397 ## lucene/core/src/java/org/apache/lucene/index/KnnVectorValues.java: ## @@ -0,0 +1,281 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-12 Thread via GitHub
msokolov commented on PR #13779: URL: https://github.com/apache/lucene/pull/13779#issuecomment-2347232511 Thanks for the quick review! I will get started on addressing. As for timeline for this change, it would definitely be convenient to get in to 10.0 release. I think you had said 9/22 wo

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-12 Thread via GitHub
msokolov commented on code in PR #13779: URL: https://github.com/apache/lucene/pull/13779#discussion_r1757605144 ## lucene/core/src/java/org/apache/lucene/index/KnnVectorValues.java: ## @@ -0,0 +1,281 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-12 Thread via GitHub
msokolov commented on code in PR #13779: URL: https://github.com/apache/lucene/pull/13779#discussion_r1757605492 ## lucene/core/src/java/org/apache/lucene/index/KnnVectorValues.java: ## @@ -0,0 +1,281 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-12 Thread via GitHub
msokolov commented on code in PR #13779: URL: https://github.com/apache/lucene/pull/13779#discussion_r1757605699 ## lucene/core/src/java/org/apache/lucene/index/KnnVectorValues.java: ## @@ -0,0 +1,281 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Provide a custom hash implementation in HnswLock [lucene]

2024-09-12 Thread via GitHub
msokolov commented on PR #13751: URL: https://github.com/apache/lucene/pull/13751#issuecomment-2347296049 thank you @ChrisHegarty ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-12 Thread via GitHub
msokolov commented on code in PR #13779: URL: https://github.com/apache/lucene/pull/13779#discussion_r1757672091 ## lucene/core/src/java/org/apache/lucene/index/KnnVectorValues.java: ## @@ -0,0 +1,281 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Add unit-of-least-precision float comparison [lucene]

2024-09-12 Thread via GitHub
stefanvodita commented on PR #13723: URL: https://github.com/apache/lucene/pull/13723#issuecomment-2347389307 I changed it away from an assertion because I liked this more. It makes it so you can assert on floats *not* being equal or use their equality in a condition, without making an asse

[PR] Use Arrays.compareUnsigned instead of iterating compare in IDVersionSegmentTermsEnum and OrdsSegmentTermsEnum. [lucene]

2024-09-12 Thread via GitHub
vsop-479 opened a new pull request, #13781: URL: https://github.com/apache/lucene/pull/13781 ### Description This change similars to https://github.com/apache/lucene/pull/13252. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Use Arrays.compareUnsigned instead of iterating compare in IDVersionSegmentTermsEnum and OrdsSegmentTermsEnum. [lucene]

2024-09-12 Thread via GitHub
vsop-479 closed pull request #13781: Use Arrays.compareUnsigned instead of iterating compare in IDVersionSegmentTermsEnum and OrdsSegmentTermsEnum. URL: https://github.com/apache/lucene/pull/13781 -- This is an automated message from the Apache Git Service. To respond to the message, please l

[PR] Use Arrays.compareUnsigned in IDVersionSegmentTermsEnum and OrdsSegmentTermsEnum. [lucene]

2024-09-12 Thread via GitHub
vsop-479 opened a new pull request, #13782: URL: https://github.com/apache/lucene/pull/13782 ### Description Description This change similars to https://github.com/apache/lucene/pull/13252. -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Use Arrays.compareUnsigned in IDVersionSegmentTermsEnum and OrdsSegmentTermsEnum. [lucene]

2024-09-12 Thread via GitHub
vsop-479 commented on PR #13782: URL: https://github.com/apache/lucene/pull/13782#issuecomment-2347950684 @uschindler Please take a look when you get a chance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Use Arrays.compareUnsigned in IDVersionSegmentTermsEnum and OrdsSegmentTermsEnum. [lucene]

2024-09-12 Thread via GitHub
jpountz commented on code in PR #13782: URL: https://github.com/apache/lucene/pull/13782#discussion_r1758190884 ## lucene/CHANGES.txt: ## @@ -292,6 +292,8 @@ Build API Changes - +* GITHUB#13782: Replace handwritten loops compare with Arrays.compareUnsigne

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-12 Thread via GitHub
jpountz commented on code in PR #13779: URL: https://github.com/apache/lucene/pull/13779#discussion_r1758191856 ## lucene/core/src/java/org/apache/lucene/index/KnnVectorValues.java: ## @@ -0,0 +1,281 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-12 Thread via GitHub
jpountz commented on code in PR #13779: URL: https://github.com/apache/lucene/pull/13779#discussion_r1758192953 ## lucene/core/src/java/org/apache/lucene/index/KnnVectorValues.java: ## @@ -0,0 +1,281 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

Re: [PR] Add prefetching support to term vectors. [lucene]

2024-09-12 Thread via GitHub
jpountz merged PR #13758: URL: https://github.com/apache/lucene/pull/13758 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] First-class random access API for KnnVectorValues [lucene]

2024-09-12 Thread via GitHub
jpountz commented on code in PR #13779: URL: https://github.com/apache/lucene/pull/13779#discussion_r1758220291 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java: ## @@ -361,33 +385,46 @@ private MergedByteVectorValues(List subs, MergeState mergeS

Re: [PR] Use Arrays.compareUnsigned in IDVersionSegmentTermsEnum and OrdsSegmentTermsEnum. [lucene]

2024-09-12 Thread via GitHub
vsop-479 commented on code in PR #13782: URL: https://github.com/apache/lucene/pull/13782#discussion_r1758230120 ## lucene/CHANGES.txt: ## @@ -292,6 +292,8 @@ Build API Changes - +* GITHUB#13782: Replace handwritten loops compare with Arrays.compareUnsign

Re: [PR] Use Arrays.compareUnsigned in IDVersionSegmentTermsEnum and OrdsSegmentTermsEnum. [lucene]

2024-09-12 Thread via GitHub
vsop-479 commented on PR #13782: URL: https://github.com/apache/lucene/pull/13782#issuecomment-2348109781 @jpountz I think this check failure relats to https://github.com/apache/lucene/commit/5045d3c67b18d23c534a32cee1d95827b0b7c482 . -- This is an automated message from the Apache G

Re: [PR] Use Arrays.compareUnsigned in IDVersionSegmentTermsEnum and OrdsSegmentTermsEnum. [lucene]

2024-09-12 Thread via GitHub
jpountz commented on PR #13782: URL: https://github.com/apache/lucene/pull/13782#issuecomment-2348146820 Indeed it likely does, can you merge main back into your branch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Replace docBase with actual docId in MaxScoreAccumulator [lucene]

2024-09-12 Thread via GitHub
javanna merged PR #13777: URL: https://github.com/apache/lucene/pull/13777 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa