[PR] Avoid unnecessary evaluations and skipping documents [lucene]

2025-02-26 Thread via GitHub
hanbj opened a new pull request, #14301: URL: https://github.com/apache/lucene/pull/14301 ### Description When reading the code related to sorting optimization in NumericComparator, replacing the fixed value 0x1f with 'currentSkipInterval-1' can improve the performance of merging pos

Re: [PR] Use DenseConjunctionBulkScorer for single queries sometimes. [lucene]

2025-02-26 Thread via GitHub
gf2121 commented on code in PR #14293: URL: https://github.com/apache/lucene/pull/14293#discussion_r1972872831 ## lucene/core/src/java/org/apache/lucene/search/TermQuery.java: ## @@ -165,6 +165,17 @@ public Scorer get(long leadCost) throws IOException { } }

Re: [I] Improve documentation for org.apache.lucene.search Sort class [lucene]

2025-02-26 Thread via GitHub
msokolov commented on issue #14295: URL: https://github.com/apache/lucene/issues/14295#issuecomment-2686570238 I took a look at those docs and found a broken link from "[Lucene Wiki IR references](http://wiki.apache.org/lucene-java/InformationRetrieval)" on the oal.search package javadocs p

Re: [I] Allow skip_factor to be set dynamically within QueryCache [lucene]

2025-02-26 Thread via GitHub
sgup432 commented on issue #14183: URL: https://github.com/apache/lucene/issues/14183#issuecomment-2686285414 >Why do you find it sad? It has more with the timing, as I was personally looking into improving the query cache performance. 😁 I think its still pretty useful if done wel

Re: [PR] Disable the query cache by default. [lucene]

2025-02-26 Thread via GitHub
sgup432 commented on PR #14187: URL: https://github.com/apache/lucene/pull/14187#issuecomment-2686276505 >I agree with the judgment, but maybe this just indicates we need to improve the cache! Agree here that we need to improve the cache itself. I had opened an issue [here](https://g

Re: [PR] Add support for querying multiple fields to QueryBuilder. [lucene]

2025-02-26 Thread via GitHub
jpountz commented on PR #14262: URL: https://github.com/apache/lucene/pull/14262#issuecomment-2686269558 After discussing with @rmuir on options when fields don't have the same analyzer, index options, etc. I'm leaning towards only supporting BM25F for now. There is simply no good way of sc

Re: [I] Allow skip_factor to be set dynamically within QueryCache [lucene]

2025-02-26 Thread via GitHub
jpountz commented on issue #14183: URL: https://github.com/apache/lucene/issues/14183#issuecomment-2686261007 Why do you find it sad? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Allow skip_factor to be set dynamically within QueryCache [lucene]

2025-02-26 Thread via GitHub
sgup432 commented on issue #14183: URL: https://github.com/apache/lucene/issues/14183#issuecomment-2686255847 >OK. Would you like to open a PR? Sure. >For reference, I have been separately looking into reducing the importance of the cache for good query performance and plan on

Re: [I] Allow skip_factor to be set dynamically within QueryCache [lucene]

2025-02-26 Thread via GitHub
jpountz commented on issue #14183: URL: https://github.com/apache/lucene/issues/14183#issuecomment-2686219104 OK. Would you like to open a PR? For reference, I have been separately looking into reducing the importance of the cache for good query performance and plan on making it disab

Re: [I] Improve documentation for org.apache.lucene.search Sort class [lucene]

2025-02-26 Thread via GitHub
jpountz commented on issue #14295: URL: https://github.com/apache/lucene/issues/14295#issuecomment-2686229228 Have you checked out the documentation of the [`oal.search`](https://lucene.apache.org/core/10_1_0/core/org/apache/lucene/search/package-summary.html) and [`oal.search.similarities

Re: [PR] Bump floor segment size to 16MB. [lucene]

2025-02-26 Thread via GitHub
jainankitk commented on PR #14189: URL: https://github.com/apache/lucene/pull/14189#issuecomment-2686186026 Any reason this PR is not merged and got marked as stale? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Address completion fields testing gap and truly allow loading FST off heap [lucene]

2025-02-26 Thread via GitHub
jpountz commented on PR #14270: URL: https://github.com/apache/lucene/pull/14270#issuecomment-2686199835 > I personally find it not so user-friendly to require users to define their own codec only for loading purposes, when there is no effect on how stuff is written. Is that a common thing

Re: [PR] Bump floor segment size to 16MB. [lucene]

2025-02-26 Thread via GitHub
jpountz commented on PR #14189: URL: https://github.com/apache/lucene/pull/14189#issuecomment-2686190876 None other than me being a bit anxious about side-effects, e.g. this floor segment size also affects the behavior of merge-on-full-flush. But it shouldn't be a big deal. I'll merge short

Re: [PR] Bump floor segment size to 16MB. [lucene]

2025-02-26 Thread via GitHub
jainankitk commented on PR #14189: URL: https://github.com/apache/lucene/pull/14189#issuecomment-2686184062 > For reference, this is roughly a 10x increase of the floor segment size, so given that `TieredMergePolicy` defaults to 10 segments per tier, indexes should have about 10 fewer segme

Re: [PR] Reciprocal Rank Fusion (RRF) in TopDocs [lucene]

2025-02-26 Thread via GitHub
jpountz commented on code in PR #13470: URL: https://github.com/apache/lucene/pull/13470#discussion_r1972397524 ## lucene/core/src/java/org/apache/lucene/search/TopDocs.java: ## @@ -350,4 +354,89 @@ private static TopDocs mergeAux( return new TopFieldDocs(totalHits, hits,

Re: [PR] Reciprocal Rank Fusion (RRF) in TopDocs [lucene]

2025-02-26 Thread via GitHub
jpountz commented on code in PR #13470: URL: https://github.com/apache/lucene/pull/13470#discussion_r1972392963 ## lucene/core/src/java/org/apache/lucene/search/TopDocs.java: ## @@ -350,4 +354,89 @@ private static TopDocs mergeAux( return new TopFieldDocs(totalHits, hits,

Re: [PR] Remove BitDocIdSet#bits method [lucene]

2025-02-26 Thread via GitHub
javanna merged PR #14297: URL: https://github.com/apache/lucene/pull/14297 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[PR] Support JDK 24 in Panama Vectorization Provider [lucene]

2025-02-26 Thread via GitHub
ChrisHegarty opened a new pull request, #14300: URL: https://github.com/apache/lucene/pull/14300 This commit updates the Vectorization Provider to support JDK 24. The API has not changed so the changes minimally bump the major JDK check, and enable the incubating API during testing.

Re: [PR] Address completion fields testing gap and truly allow loading FST off heap [lucene]

2025-02-26 Thread via GitHub
javanna commented on PR #14270: URL: https://github.com/apache/lucene/pull/14270#issuecomment-2685272175 I think that we do we need to have a way to truly make the switch at least for testing purposes, otherwise we only test one approach which is error prone. That's the original issue that

Re: [PR] Replace usage of DocIdBitSet#bits in QueryBitSetProducer [lucene]

2025-02-26 Thread via GitHub
javanna merged PR #14298: URL: https://github.com/apache/lucene/pull/14298 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Fix TestSysoutLimits occasionally failing. [lucene]

2025-02-26 Thread via GitHub
dweiss merged PR #14296: URL: https://github.com/apache/lucene/pull/14296 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Address completion fields testing gap and truly allow loading FST off heap [lucene]

2025-02-26 Thread via GitHub
jpountz commented on PR #14270: URL: https://github.com/apache/lucene/pull/14270#issuecomment-2685149108 Do we need to let users configure whether to load their completion fields on-heap or off-heap? Our inverted indexes, doc values, etc. don't give the option - if users want to load data o

Re: [PR] Use DenseConjunctionBulkScorer for single queries sometimes. [lucene]

2025-02-26 Thread via GitHub
jpountz commented on code in PR #14293: URL: https://github.com/apache/lucene/pull/14293#discussion_r1971643383 ## lucene/core/src/java/org/apache/lucene/document/SortedSetDocValuesRangeQuery.java: ## @@ -158,16 +157,15 @@ public Scorer get(long leadCost) throws IOException {

Re: [PR] Use DenseConjunctionBulkScorer for single queries sometimes. [lucene]

2025-02-26 Thread via GitHub
jpountz commented on PR #14293: URL: https://github.com/apache/lucene/pull/14293#issuecomment-2685119226 > Maybe we can replace all ScorerSuppliers returning ConstantScoreScorer with ConstantScoreScorerSupplier in follow up? Yes, that would be great. I started looking into it, but the

Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-26 Thread via GitHub
rmuir merged PR #14278: URL: https://github.com/apache/lucene/pull/14278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apach

Re: [PR] Use DenseConjunctionBulkScorer for single queries sometimes. [lucene]

2025-02-26 Thread via GitHub
jpountz commented on code in PR #14293: URL: https://github.com/apache/lucene/pull/14293#discussion_r1971632861 ## lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java: ## @@ -341,11 +341,10 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws

Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-26 Thread via GitHub
rmuir commented on PR #14278: URL: https://github.com/apache/lucene/pull/14278#issuecomment-2684994611 Thanks @renatoh ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Remove BitDocIdSet#bits method [lucene]

2025-02-26 Thread via GitHub
javanna commented on code in PR #14297: URL: https://github.com/apache/lucene/pull/14297#discussion_r1971607574 ## lucene/CHANGES.txt: ## @@ -22,6 +22,8 @@ API Changes * GITHUB#14291: Remove IOException from ScorerSupplier#setTopLevelScoringClause signature (Luca Cavanna)

Re: [PR] Mark DocIdBitSet#bits deprecated also in subclasses [lucene]

2025-02-26 Thread via GitHub
javanna merged PR #14299: URL: https://github.com/apache/lucene/pull/14299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Fix optimization to help inline calls to live docs. [lucene]

2025-02-26 Thread via GitHub
jpountz commented on code in PR #14294: URL: https://github.com/apache/lucene/pull/14294#discussion_r1971606728 ## lucene/core/src/java/org/apache/lucene/search/ScorerUtil.java: ## @@ -113,22 +116,21 @@ static Scorable likelyTermScorer(Scorable scorable) { /** * Optimiz

Re: [PR] Remove BitDocIdSet#bits method [lucene]

2025-02-26 Thread via GitHub
jpountz commented on code in PR #14297: URL: https://github.com/apache/lucene/pull/14297#discussion_r1971598840 ## lucene/CHANGES.txt: ## @@ -22,6 +22,8 @@ API Changes * GITHUB#14291: Remove IOException from ScorerSupplier#setTopLevelScoringClause signature (Luca Cavanna)

Re: [PR] Binary vector format for flat and hnsw vectors [lucene]

2025-02-26 Thread via GitHub
lpld commented on PR #14078: URL: https://github.com/apache/lucene/pull/14078#issuecomment-2684703383 @benwtrent Thanks for your reply! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Fix TestSysoutLimits occasionally failing. [lucene]

2025-02-26 Thread via GitHub
dweiss commented on PR #14296: URL: https://github.com/apache/lucene/pull/14296#issuecomment-2684657084 Thanks, Uwe. I don't know, to be honest. This is internal infrastructure test... maybe it won't trigger or launch MMapDirectory. I'll just merge this one and monitor builds? -- This is

Re: [PR] Use DenseConjunctionBulkScorer for single queries sometimes. [lucene]

2025-02-26 Thread via GitHub
gf2121 commented on code in PR #14293: URL: https://github.com/apache/lucene/pull/14293#discussion_r1971065012 ## lucene/core/src/java/org/apache/lucene/document/SortedSetDocValuesRangeQuery.java: ## @@ -158,16 +157,15 @@ public Scorer get(long leadCost) throws IOException {

Re: [PR] Support load per-iteration replacement of NamedSPI [lucene]

2025-02-26 Thread via GitHub
javanna commented on code in PR #14275: URL: https://github.com/apache/lucene/pull/14275#discussion_r1971311446 ## lucene/core/src/java/org/apache/lucene/util/NamedSPILoader.java: ## @@ -64,18 +64,24 @@ public NamedSPILoader(Class clazz, ClassLoader classloader) { */ pub

[PR] Mark DocIdBitSet#bits deprecated also in subclasses [lucene]

2025-02-26 Thread via GitHub
javanna opened a new pull request, #14299: URL: https://github.com/apache/lucene/pull/14299 This is the second part of #14292, which marks the method deprecated not only in the base class but also in all of its subclasses. -- This is an automated message from the Apache Git Service. To re

[PR] Remove BitDocIdSet#bits method [lucene]

2025-02-26 Thread via GitHub
javanna opened a new pull request, #14297: URL: https://github.com/apache/lucene/pull/14297 We removed DocIdSet#bits in #14290. This commit removes the last usage of the former bits method from QueryBitSetProducer. A BitSet can be used directly instead of a DocIdSet, introducind a sentinel

[PR] Replace usage of DocIdBitSet#bits in QueryBitSetProducer [lucene]

2025-02-26 Thread via GitHub
javanna opened a new pull request, #14298: URL: https://github.com/apache/lucene/pull/14298 Same change as #14297, without the removal of the method, which is already deprecate in 10x. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Fix TestSysoutLimits occasionally failing. [lucene]

2025-02-26 Thread via GitHub
uschindler commented on PR #14296: URL: https://github.com/apache/lucene/pull/14296#issuecomment-2684436056 could the same problem not also appear with MMapDirectory when it prints warnings on startup? For MMapDirectory, loading the class would be enough. Maybe it is not an issue here

Re: [PR] Support load per-iteration replacement of NamedSPI [lucene]

2025-02-26 Thread via GitHub
ChrisHegarty commented on code in PR #14275: URL: https://github.com/apache/lucene/pull/14275#discussion_r1971253531 ## lucene/core/src/java/org/apache/lucene/util/NamedSPILoader.java: ## @@ -64,18 +64,24 @@ public NamedSPILoader(Class clazz, ClassLoader classloader) { */

Re: [PR] Remove IOException from ScorerSupplier#setTopLevelScoringClause signature [lucene]

2025-02-26 Thread via GitHub
javanna merged PR #14291: URL: https://github.com/apache/lucene/pull/14291 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Support load per-iteration replacement of NamedSPI [lucene]

2025-02-26 Thread via GitHub
javanna commented on code in PR #14275: URL: https://github.com/apache/lucene/pull/14275#discussion_r1971187231 ## lucene/core/src/java/org/apache/lucene/util/NamedSPILoader.java: ## @@ -64,18 +64,24 @@ public NamedSPILoader(Class clazz, ClassLoader classloader) { */ pub

Re: [PR] Deprecate the redundant DocIdSet#bits method [lucene]

2025-02-26 Thread via GitHub
javanna merged PR #14292: URL: https://github.com/apache/lucene/pull/14292 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Remove bits method from DocIdSet [lucene]

2025-02-26 Thread via GitHub
javanna merged PR #14290: URL: https://github.com/apache/lucene/pull/14290 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]

2025-02-26 Thread via GitHub
dweiss commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2684281563 Thanks to INFRA-26434 Lucene now has an s3 bucket we can publish those data/test resources on. I'll try to collect these resources, upload them and make the necessary build changes s

Re: [PR] Enhance DictionaryCompoundWordTokenFilter [lucene]

2025-02-26 Thread via GitHub
renatoh commented on PR #14278: URL: https://github.com/apache/lucene/pull/14278#issuecomment-2684233234 > Yes, I'm just suggesting to split it. We can add this new parameter here, backport to minor release 10.2.0, no breaking changes. Separately we can default it to `true` for 11.0?

Re: [PR] Fix TestSysoutLimits occasionally failing. [lucene]

2025-02-26 Thread via GitHub
dweiss commented on PR #14296: URL: https://github.com/apache/lucene/pull/14296#issuecomment-2684211994 Failing repro line, before the fix: ``` ./gradlew test --tests TestSysoutsLimits -Dtests.seed=945AAB8255E7F992 ``` -- This is an automated message from the Apache Git Service. T

[PR] Fix TestSysoutLimits occasionally failing. [lucene]

2025-02-26 Thread via GitHub
dweiss opened a new pull request, #14296: URL: https://github.com/apache/lucene/pull/14296 VectorizationProvider may print warnings, which clashes with the test's assumptions. I simply forced any static initializations to occur before the test starts. -- This is an automated message from