Re: [I] Test failure in TestBlockMaxConjunction.testRandom. [lucene]

2024-05-20 Thread via GitHub
vsop-479 commented on issue #13396: URL: https://github.com/apache/lucene/issues/13396#issuecomment-2121851897 @jpountz Please take a look when you get a chance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[I] Test failure in TestBlockMaxConjunction.testRandom. [lucene]

2024-05-20 Thread via GitHub
vsop-479 opened a new issue, #13396: URL: https://github.com/apache/lucene/issues/13396 ### Description Test failure in `TestBlockMaxConjunction.testRandom`. ### Gradle command to reproduce gradlew test --tests TestBlockMaxConjunction.testRandom -Dtests.seed=991AA33DE604

[PR] Delete all live docs when query matched a whole segment. [lucene]

2024-05-20 Thread via GitHub
vsop-479 opened a new pull request, #13395: URL: https://github.com/apache/lucene/pull/13395 ### Description We can delete all live docs when a query matched a whole segment, in `FrozenBufferedUpdates.applyQueryDeletes`. I has implemented it for `PointRangeQuery`, and also tryi

Re: [PR] Fix IndexOutOfBoundsException thrown in DefaultPassageFormatter by unordered matches [lucene]

2024-05-20 Thread via GitHub
github-actions[bot] commented on PR #13315: URL: https://github.com/apache/lucene/pull/13315#issuecomment-2121472564 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Add support for reloading the SPI for KnnVectorsFormat class [lucene]

2024-05-20 Thread via GitHub
navneet1v commented on PR #13394: URL: https://github.com/apache/lucene/pull/13394#issuecomment-2121465615 cc: @benwtrent , @msokolov -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] Add support for reloading the SPI for KnnVectorsFormat class [lucene]

2024-05-20 Thread via GitHub
navneet1v opened a new pull request, #13394: URL: https://github.com/apache/lucene/pull/13394 ### Description Add support for reloading the SPI for KnnVectorsFormat class Ref: https://github.com/apache/lucene/issues/13393 -- This is an automated message from the Apache Gi

Re: [I] Add support for reloading the SPI for KnnVectorsFormat class [lucene]

2024-05-20 Thread via GitHub
navneet1v commented on issue #13393: URL: https://github.com/apache/lucene/issues/13393#issuecomment-2121465417 I have raised a PR for the fix: https://github.com/apache/lucene/pull/13394 -- This is an automated message from the Apache Git Service. To respond to the message, please

[I] Add support for reloading the SPI for KnnVectorsFormat class [lucene]

2024-05-20 Thread via GitHub
navneet1v opened a new issue, #13393: URL: https://github.com/apache/lucene/issues/13393 ### Description ## Description Lucene uses SPI to get the instance for various classes like Codec, KNNVectorsFormat etc. Currently Codec class provide a way to reload the SPIs by provid

Re: [I] Significant drop in recall for int8 scalar quantization using maximum_inner_product [lucene]

2024-05-20 Thread via GitHub
naveentatikonda commented on issue #13350: URL: https://github.com/apache/lucene/issues/13350#issuecomment-2121286374 @benwtrent To be on the same page, can you please confirm if you have used 7 bits or 8 bits in your experiment that you ran above with cohere dataset using InnerProduct to g

Re: [PR] Replace Map by primitive LongObjectHashMap. [lucene]

2024-05-20 Thread via GitHub
dweiss commented on code in PR #13392: URL: https://github.com/apache/lucene/pull/13392#discussion_r1607243701 ## lucene/join/src/java/module-info.java: ## @@ -16,8 +16,10 @@ */ /** Index-time and Query-time joins for normalized content */ +@SuppressWarnings({"requires-auto

[PR] Replace Map by primitive LongObjectHashMap. [lucene]

2024-05-20 Thread via GitHub
bruno-roustant opened a new pull request, #13392: URL: https://github.com/apache/lucene/pull/13392 No functional changes, only replacements by primitve maps. Adds LongObjectHashMap and LongIntHashMap to the org.apache.lucene.util.hppc package, with some refactoring. Adds a depe

Re: [I] qweight.matches(LeafReaderContext ctx, int doc) can be prohibitively slow for large TermInSet queries [lucene]

2024-05-20 Thread via GitHub
dweiss commented on issue #13391: URL: https://github.com/apache/lucene/issues/13391#issuecomment-2120947487 Perhaps this wasn't clear - the important bit here is the use of TermInSetQuery (the query parsed substitutes large boolean expressions to this type of query to prevent max-boolean-c

[I] qweight.matches(LeafReaderContext ctx, int doc) can be prohibitively slow for large TermInSet queries [lucene]

2024-05-20 Thread via GitHub
dweiss opened a new issue, #13391: URL: https://github.com/apache/lucene/issues/13391 ### Description I stumbled across this one in a real-life application, where matches-API based highlighting of a query like this: field:(a OR b OR c OR d OR ...) took very long to compl

Re: [PR] Add Intervals.noIntervals() method [lucene]

2024-05-20 Thread via GitHub
uschindler commented on PR #13389: URL: https://github.com/apache/lucene/pull/13389#issuecomment-2120877177 > > The method is only intended to be called by Lucene code. > > Then it should not be a public API. > > Agree with comment by @romseygeek that since DisjunctionIntervalsS

Re: [PR] Convert more classes to record classes [lucene]

2024-05-20 Thread via GitHub
shubhamvishu commented on PR #13328: URL: https://github.com/apache/lucene/pull/13328#issuecomment-2120851082 Hi @uschindler , sorry I was on vacation until last week, so this PR stalled. I'll take a look at the comments today or tomorrow. -- This is an automated message from the Apache G

Re: [PR] Add Intervals.noIntervals() method [lucene]

2024-05-20 Thread via GitHub
expani commented on PR #13389: URL: https://github.com/apache/lucene/pull/13389#issuecomment-2120850127 >The method is only intended to be called by Lucene code. Then it should not be a public API. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Add Intervals.noIntervals() method [lucene]

2024-05-20 Thread via GitHub
expani commented on PR #13389: URL: https://github.com/apache/lucene/pull/13389#issuecomment-2120718603 >It's internal and not for public use? It's [marked public](https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/intervals/Intervals.java#L51

Re: [PR] Add Intervals.noIntervals() method [lucene]

2024-05-20 Thread via GitHub
uschindler commented on PR #13389: URL: https://github.com/apache/lucene/pull/13389#issuecomment-2120709541 Why should anybody call that method. It's internal and not for public use? Those type of changes are common in Lucene. We change behavior of methods unless it's documented to be

Re: [PR] Add Intervals.noIntervals() method [lucene]

2024-05-20 Thread via GitHub
expani commented on PR #13389: URL: https://github.com/apache/lucene/pull/13389#issuecomment-2120691885 My main concern was the public API `Intervals#analyzedText` Clients of lucene were getting a non-null iterator till now and will receive a null iterator after this change.

Re: [PR] Add Intervals.noIntervals() method [lucene]

2024-05-20 Thread via GitHub
uschindler commented on PR #13389: URL: https://github.com/apache/lucene/pull/13389#issuecomment-2120667481 > * Lot of [interval functions](https://github.com/apache/lucene/tree/main/lucene/queryparser/src/java/org/apache/lucene/queryparser/flexible/standard/nodes/intervalfn) ( around 21 us

Re: [PR] Add Intervals.noIntervals() method [lucene]

2024-05-20 Thread via GitHub
romseygeek commented on PR #13389: URL: https://github.com/apache/lucene/pull/13389#issuecomment-2120650022 Returning `null` is always OK here - if you look at ConjunctionIntervalsSource or DisjunctionIntervalsSource, the sub-sources can always return `null` to indicate that there are no in

Re: [PR] Add Intervals.noIntervals() method [lucene]

2024-05-20 Thread via GitHub
expani commented on PR #13389: URL: https://github.com/apache/lucene/pull/13389#issuecomment-2120639358 - Lot of [interval functions](https://github.com/apache/lucene/tree/main/lucene/queryparser/src/java/org/apache/lucene/queryparser/flexible/standard/nodes/intervalfn) ( around 21 usage )

[PR] Refactoring ShingleFilter: Final Fields, Builder Pattern, and Constructor Update [lucene]

2024-05-20 Thread via GitHub
iamsanjay opened a new pull request, #13390: URL: https://github.com/apache/lucene/pull/13390 ### Description #13112 Made fields final and removed their respective setters. Introduced the Builder pattern to facilitate the construction of ShingleFilter instances. All previous

Re: [PR] Add Intervals.noIntervals() method [lucene]

2024-05-20 Thread via GitHub
uschindler commented on code in PR #13389: URL: https://github.com/apache/lucene/pull/13389#discussion_r1606861463 ## lucene/queries/src/java/org/apache/lucene/queries/intervals/IntervalBuilder.java: ## @@ -67,15 +67,15 @@ */ final class IntervalBuilder { static IntervalsS

Re: [PR] Add Intervals.noIntervals() method [lucene]

2024-05-20 Thread via GitHub
uschindler commented on PR #13389: URL: https://github.com/apache/lucene/pull/13389#issuecomment-2120600483 Just some tidy missing. 😜 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Add Intervals.noIntervals() method [lucene]

2024-05-20 Thread via GitHub
uschindler commented on code in PR #13389: URL: https://github.com/apache/lucene/pull/13389#discussion_r1606861463 ## lucene/queries/src/java/org/apache/lucene/queries/intervals/IntervalBuilder.java: ## @@ -67,15 +67,15 @@ */ final class IntervalBuilder { static IntervalsS

Re: [PR] Add Intervals.noIntervals() method [lucene]

2024-05-20 Thread via GitHub
expani commented on code in PR #13389: URL: https://github.com/apache/lucene/pull/13389#discussion_r1606859997 ## lucene/queries/src/java/org/apache/lucene/queries/intervals/IntervalBuilder.java: ## @@ -67,15 +67,15 @@ */ final class IntervalBuilder { static IntervalsSourc

Re: [I] Add method to `Intervals#noMatch(String reason)` to `Intervals` class [lucene]

2024-05-20 Thread via GitHub
romseygeek commented on issue #13388: URL: https://github.com/apache/lucene/issues/13388#issuecomment-2120568422 I opened #13389 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[PR] Add Intervals.noIntervals() method [lucene]

2024-05-20 Thread via GitHub
romseygeek opened a new pull request, #13389: URL: https://github.com/apache/lucene/pull/13389 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Making NO_INTERVALS to be used by clients of Lucene [lucene]

2024-05-20 Thread via GitHub
uschindler commented on PR #13385: URL: https://github.com/apache/lucene/pull/13385#issuecomment-2120514592 This problem is often seen with patent search. I opened https://github.com/apache/lucene/issues/13388 There are more issues, but making that constant publicly available is wrong

Re: [PR] Making NO_INTERVALS to be used by clients of Lucene [lucene]

2024-05-20 Thread via GitHub
uschindler commented on PR #13385: URL: https://github.com/apache/lucene/pull/13385#issuecomment-2120497218 See #13388 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[I] Add method to `Intervals#noMatch(String reason)` to `Intervals` class [lucene]

2024-05-20 Thread via GitHub
uschindler opened a new issue, #13388: URL: https://github.com/apache/lucene/issues/13388 ### Description Followup of #13385: While reviewing #13385 I stumbled on the same issue I had several times while building a intervals query parser for patent search (a Solr plugin for a

Re: [PR] Making NO_INTERVALS to be used by clients of Lucene [lucene]

2024-05-20 Thread via GitHub
uschindler commented on PR #13385: URL: https://github.com/apache/lucene/pull/13385#issuecomment-2120480997 OK. Just for brevity: How does the one made public here differs from the already existing class (except that one returns null iterator while the other one returns an empty iterator)?

Re: [PR] Making NO_INTERVALS to be used by clients of Lucene [lucene]

2024-05-20 Thread via GitHub
romseygeek commented on PR #13385: URL: https://github.com/apache/lucene/pull/13385#issuecomment-2120479350 Let's open a new issue and we can discuss a bit more about when you need such a thing and where it should live. -- This is an automated message from the Apache Git Service. To respo

Re: [PR] Making NO_INTERVALS to be used by clients of Lucene [lucene]

2024-05-20 Thread via GitHub
uschindler commented on PR #13385: URL: https://github.com/apache/lucene/pull/13385#issuecomment-2120476676 Should I open a new issue, or should we revert this one and give a better solution? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Making NO_INTERVALS to be used by clients of Lucene [lucene]

2024-05-20 Thread via GitHub
romseygeek merged PR #13385: URL: https://github.com/apache/lucene/pull/13385 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Inconsistency Vector Search Cosine Similarity [lucene]

2024-05-20 Thread via GitHub
msokolov closed issue #13386: Inconsistency Vector Search Cosine Similarity URL: https://github.com/apache/lucene/issues/13386 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] Inconsistency Vector Search Cosine Similarity [lucene]

2024-05-20 Thread via GitHub
msokolov commented on issue #13386: URL: https://github.com/apache/lucene/issues/13386#issuecomment-2120370226 This is to be expected from *approximate* KNN search. If you want to get a sense of the accuracy you need to look at a larger number of results in aggregate rather than a single ex

Re: [I] Multi range traversal for numeric range aggregations [lucene]

2024-05-20 Thread via GitHub
stefanvodita commented on issue #13335: URL: https://github.com/apache/lucene/issues/13335#issuecomment-2120336275 It's true that we have this two-step process for aggregations (incl. counts) and that it's not always the most efficient solution. +1 to try out this optimisation, sounds pro

Re: [PR] Making NO_INTERVALS to be used by clients of Lucene [lucene]

2024-05-20 Thread via GitHub
expani commented on code in PR #13385: URL: https://github.com/apache/lucene/pull/13385#discussion_r1606657816 ## lucene/CHANGES.txt: ## @@ -362,6 +362,8 @@ Other * GITHUB#13077: Add public getter for SynonymQuery#field (Andrey Bozhko) +* GITHUB#13385: Make NO_INTERVALS sou

Re: [PR] Making NO_INTERVALS to be used by clients of Lucene [lucene]

2024-05-20 Thread via GitHub
expani commented on PR #13385: URL: https://github.com/apache/lucene/pull/13385#issuecomment-2120275366 Added @romseygeek -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Making NO_INTERVALS to be used by clients of Lucene [lucene]

2024-05-20 Thread via GitHub
romseygeek commented on code in PR #13385: URL: https://github.com/apache/lucene/pull/13385#discussion_r1606623194 ## lucene/CHANGES.txt: ## @@ -362,6 +362,8 @@ Other * GITHUB#13077: Add public getter for SynonymQuery#field (Andrey Bozhko) +* GITHUB#13385: Make NO_INTERVALS

Re: [PR] Making NO_INTERVALS to be used by clients of Lucene [lucene]

2024-05-20 Thread via GitHub
expani commented on PR #13385: URL: https://github.com/apache/lucene/pull/13385#issuecomment-2120241068 @jainankitk @romseygeek Requesting an approval. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-05-20 Thread via GitHub
RS146BIJAY opened a new issue, #13387: URL: https://github.com/apache/lucene/issues/13387 ### Description ## Issue Today, Lucene internally creates multiple DocumentWriterPerThread (DWPT) instances per shard to facilitate concurrent indexing across different ingestion threads.

Re: [PR] Making NO_INTERVALS to be used by clients of Lucene [lucene]

2024-05-20 Thread via GitHub
romseygeek commented on PR #13385: URL: https://github.com/apache/lucene/pull/13385#issuecomment-2119987080 We can backport this to 9x so add the entry under the latest 9x version. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Making NO_INTERVALS to be used by clients of Lucene [lucene]

2024-05-20 Thread via GitHub
expani commented on PR #13385: URL: https://github.com/apache/lucene/pull/13385#issuecomment-2119945093 @romseygeek Should I add only under `Lucene 10.0.0` or other versions as well ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Making NO_INTERVALS to be used by clients of Lucene [lucene]

2024-05-20 Thread via GitHub
romseygeek commented on PR #13385: URL: https://github.com/apache/lucene/pull/13385#issuecomment-2119913543 LGTM. Can you add an entry to CHANGES.txt? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Making NO_INTERVALS to be used by clients of Lucene [lucene]

2024-05-20 Thread via GitHub
jainankitk commented on PR #13385: URL: https://github.com/apache/lucene/pull/13385#issuecomment-2119856228 Makes sense for IntervalBuilder clients to use the existing IntervalsSource implementation. Also, safe to make public given the variable is static final. Approved! -- This is an au

[PR] Making NO_INTERVALS to be used by clients of Lucene [lucene]

2024-05-20 Thread via GitHub
expani opened a new pull request, #13385: URL: https://github.com/apache/lucene/pull/13385 ### Description A [bug](https://github.com/opensearch-project/OpenSearch/issues/13616) was reported in OpenSearch which caused Interval Queries containing sub-query in rules to fail with `Sub-itera

Re: [PR] Disjunction as CompetitiveIterator for numeric dynamic pruning [lucene]

2024-05-20 Thread via GitHub
gf2121 merged PR #13221: URL: https://github.com/apache/lucene/pull/13221 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac