Re: [PR] Use Max WAND optimizations with ToParentBlockJoinQuery when using ScoreMode.Max [lucene]
jpountz commented on code in PR #13587: URL: https://github.com/apache/lucene/pull/13587#discussion_r1726465328 ## lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java: ## @@ -101,12 +101,18 @@ public Weight createWeight( .rewrite(new ConstantScoreQuery(childQuery)) .createWeight(searcher, weightScoreMode, 0f); } else { - // if the score is needed we force the collection mode to COMPLETE because the child query - // cannot skip - // non-competitive documents. + // if the score is needed and the score mode is not max, we force the collection mode to + // COMPLETE because the + // child query cannot skip non-competitive documents. + // weightScoreMode.needsScores() will always be true here, but keep the check to make the + // logic clearer. Review Comment: I know it's a pre-existing issue, but can you fix how lines are broken? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Support JDK 23 in Panama Vectorization Provider [lucene]
ChrisHegarty opened a new pull request, #13678: URL: https://github.com/apache/lucene/pull/13678 This commit updates the Vectorization Provider to support JDK 23. The API has not changed so the changes minimally bump the major JDK check, and enable the incubating API during testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Take advantage of the doc value skipper when it is primary sort [lucene]
iverase commented on code in PR #13592: URL: https://github.com/apache/lucene/pull/13592#discussion_r1726673384 ## lucene/core/src/test/org/apache/lucene/search/TestDocValuesQueries.java: ## @@ -42,34 +45,100 @@ public class TestDocValuesQueries extends LuceneTestCase { + private Codec getCodec() { +// small interval size to test with many intervals +return TestUtil.alwaysDocValuesFormat(new Lucene90DocValuesFormat(random().nextInt(4, 16))); + } + public void testDuelPointRangeSortedNumericRangeQuery() throws IOException { -doTestDuelPointRangeNumericRangeQuery(true, 1); +doTestDuelPointRangeNumericRangeQuery(true, 1, false); + } + + public void testDuelPointRangeSortedNumericRangeWithSlipperQuery() throws IOException { +doTestDuelPointRangeNumericRangeQuery(true, 1, true); } public void testDuelPointRangeMultivaluedSortedNumericRangeQuery() throws IOException { -doTestDuelPointRangeNumericRangeQuery(true, 3); +doTestDuelPointRangeNumericRangeQuery(true, 3, false); + } + + public void testDuelPointRangeMultivaluedSortedNumericRangeWithSkipperQuery() throws IOException { +doTestDuelPointRangeNumericRangeQuery(true, 3, true); } public void testDuelPointRangeNumericRangeQuery() throws IOException { -doTestDuelPointRangeNumericRangeQuery(false, 1); +doTestDuelPointRangeNumericRangeQuery(false, 1, false); } - private void doTestDuelPointRangeNumericRangeQuery(boolean sortedNumeric, int maxValuesPerDoc) - throws IOException { + public void testDuelPointRangeNumericRangeWithSkipperQuery() throws IOException { +doTestDuelPointRangeNumericRangeQuery(false, 1, true); + } + + public void testDuelPointNumericSortedWithSkipperRangeQuery() throws IOException { +Directory dir = newDirectory(); +IndexWriterConfig config = new IndexWriterConfig(); +config.setIndexSort(new Sort(new SortField("dv", SortField.Type.LONG, random().nextBoolean(; +RandomIndexWriter iw = new RandomIndexWriter(random(), dir, config); +config.setCodec(getCodec()); Review Comment: sure! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Take advantage of the doc value skipper when it is primary sort [lucene]
iverase commented on code in PR #13592: URL: https://github.com/apache/lucene/pull/13592#discussion_r1726674931 ## lucene/core/src/java/org/apache/lucene/index/DocValuesSkipper.java: ## @@ -98,4 +98,29 @@ public abstract class DocValuesSkipper { /** Return the global number of documents with a value for the field. */ public abstract int docCount(); + + /** + * Advance this skipper so that all levels intersects the range given by {@code minValue} and + * {@code maxValue}. If there are no intersecting levels, the skipper is exhausted. + * + * NOTE: The behavior is undefined if this method is called and {@link #advance(int)} + * has not been called yet. + */ + public final void advance(long minValue, long maxValue) throws IOException { +while (true) { Review Comment: I have changed the API so we accept unpositioned skippers to call this method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Take advantage of the doc value skipper when it is primary sort [lucene]
iverase commented on code in PR #13592: URL: https://github.com/apache/lucene/pull/13592#discussion_r1726675935 ## lucene/core/src/java/org/apache/lucene/index/DocValuesSkipper.java: ## @@ -98,4 +98,29 @@ public abstract class DocValuesSkipper { /** Return the global number of documents with a value for the field. */ public abstract int docCount(); + + /** + * Advance this skipper so that all levels intersects the range given by {@code minValue} and + * {@code maxValue}. If there are no intersecting levels, the skipper is exhausted. + * + * NOTE: The behavior is undefined if this method is called and {@link #advance(int)} + * has not been called yet. + */ + public final void advance(long minValue, long maxValue) throws IOException { +while (true) { + if (minDocID(0) == DocIdSetIterator.NO_MORE_DOCS + || (minValue(0) <= maxValue && maxValue(0) >= minValue)) { +break; + } else { +int maxDocID = maxDocID(0); Review Comment: I added some comments which hopefully makes it more clear. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Modernize list get first element [lucene]
mrhbj closed pull request #13677: Modernize list get first element URL: https://github.com/apache/lucene/pull/13677 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] remove abandoned code [lucene]
mrhbj opened a new pull request, #13679: URL: https://github.com/apache/lucene/pull/13679 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] remove abandoned code [lucene]
mrhbj closed pull request #13679: remove abandoned code URL: https://github.com/apache/lucene/pull/13679 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Take advantage of the doc value skipper when it is primary sort [lucene]
iverase merged PR #13592: URL: https://github.com/apache/lucene/pull/13592 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Support JDK 23 in Panama Vectorization Provider [lucene]
ChrisHegarty merged PR #13678: URL: https://github.com/apache/lucene/pull/13678 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Support JDK 23 in Panama Vectorization Provider [lucene]
uschindler commented on PR #13678: URL: https://github.com/apache/lucene/pull/13678#issuecomment-2304691487 Backport looks also fine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] ComplexPhraseQueryParser parses wrongly when special characters found [lucene]
rajanvt-downstreem opened a new issue, #13680: URL: https://github.com/apache/lucene/issues/13680 ### Description \"media$kits weekend\"~10 get translated to "(media kits) weekend" with Slop:10 This cause the query to tag when a document has only media and weekend or kits and weekend whereas intended search is to get when "media kits weekend"~10. This happens with any special character. ### Version and environment details Lucene.NET 4.8 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use Max WAND optimizations with ToParentBlockJoinQuery when using ScoreMode.Max [lucene]
Mikep86 commented on code in PR #13587: URL: https://github.com/apache/lucene/pull/13587#discussion_r1727146876 ## lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java: ## @@ -101,12 +101,18 @@ public Weight createWeight( .rewrite(new ConstantScoreQuery(childQuery)) .createWeight(searcher, weightScoreMode, 0f); } else { - // if the score is needed we force the collection mode to COMPLETE because the child query - // cannot skip - // non-competitive documents. + // if the score is needed and the score mode is not max, we force the collection mode to + // COMPLETE because the + // child query cannot skip non-competitive documents. + // weightScoreMode.needsScores() will always be true here, but keep the check to make the + // logic clearer. Review Comment: I tried fixing this manually, but then the spotless checks fail. How do you suggest I fix this in a way that doesn't break those checks? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2304800950 Thanks @rmuir and @ChrisHegarty. I've downloaded all my content from `home.apache.org` (Lucene benchmark source corpora, line file docs, large vector file, etc.), so we won't lose any benchy stuff once the box goes poof. I need to find a new home for the nightly benchmarks logs/charts, currently at https://home.apache.org/~mikemccand/lucenebench ... when each nightly benchy finishes it copies up the results (all charts with a new data point) using sftp via Python. I'm leaning towards a [simple GitHub pages site](https://docs.github.com/en/pages) (thank you @msokolov for the idea), though it has a limit of 1 GB and the benchy reports are now ~1.7 GB: ~13 years of detailed nightly benchy reports adds up! I can probably work around that. For the larger stuff (corpora) I'll mull some more. I have plenty of storage in my personal Google drive account, so I can just start there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use Max WAND optimizations with ToParentBlockJoinQuery when using ScoreMode.Max [lucene]
jpountz commented on code in PR #13587: URL: https://github.com/apache/lucene/pull/13587#discussion_r1727411649 ## lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java: ## @@ -101,12 +101,18 @@ public Weight createWeight( .rewrite(new ConstantScoreQuery(childQuery)) .createWeight(searcher, weightScoreMode, 0f); } else { - // if the score is needed we force the collection mode to COMPLETE because the child query - // cannot skip - // non-competitive documents. + // if the score is needed and the score mode is not max, we force the collection mode to + // COMPLETE because the + // child query cannot skip non-competitive documents. + // weightScoreMode.needsScores() will always be true here, but keep the check to make the + // logic clearer. Review Comment: ususally I remove the unintended line breaks and then run `gradlew tidy` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Aggregate files from the same segment into a single Arena [lucene]
uschindler commented on PR #13570: URL: https://github.com/apache/lucene/pull/13570#issuecomment-2305145865 @ChrisHegarty the PR ist listed in the 9.x section of changes. I could work on a backport (possibly tomorrow). I just want to make sure you haven't started. I am not yet sure how to handle java 20 and java 19. I may possibly leave those as they are. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2305153561 > I'm leaning towards a [simple GitHub pages site](https://docs.github.com/en/pages) (thank you @msokolov for the idea) I enabled pages for the `luceneutil` repro and pushed a copy of the current nightly benchy reports: https://mikemccand.github.io/luceneutil/index.html. Looks like it basically works, yay! It should be simple to fix the nightly benchy script to publish updates via `git add/commit/push` instead of the current `sftp mikemcc...@home.apache.org`. I'll do that next... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Try applying bipartite graph reordering to KNN graph node ids [lucene]
jpountz commented on issue #13565: URL: https://github.com/apache/lucene/issues/13565#issuecomment-2305157956 > Does this choose a single bit-width for a group of postings? No, each posting can still have a different byte width, but it does the decoding in a way that doesn't have unpredictable conditionals. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]
mikemccand commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2305157923 A nice side effect of this is that the long running (13+ years now!) nightly reports will be backed up via git/GitHub and no longer single sourced on my home box, yay. And if ever some exotic bug shows up in the publishing, we will have the full `git` history showing each nightly benchy update going forwards to help debug. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use Max WAND optimizations with ToParentBlockJoinQuery when using ScoreMode.Max [lucene]
Mikep86 commented on code in PR #13587: URL: https://github.com/apache/lucene/pull/13587#discussion_r1727552189 ## lucene/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java: ## @@ -101,12 +101,18 @@ public Weight createWeight( .rewrite(new ConstantScoreQuery(childQuery)) .createWeight(searcher, weightScoreMode, 0f); } else { - // if the score is needed we force the collection mode to COMPLETE because the child query - // cannot skip - // non-competitive documents. + // if the score is needed and the score mode is not max, we force the collection mode to + // COMPLETE because the + // child query cannot skip non-competitive documents. + // weightScoreMode.needsScores() will always be true here, but keep the check to make the + // logic clearer. Review Comment: Got it, thank you :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Override single byte writes to OutputStreamIndexOutput to remove locking [lucene]
msfroh commented on PR #13543: URL: https://github.com/apache/lucene/pull/13543#issuecomment-2305315023 > this change was merged: did you observe any change in the flame graphs? Yes! We were benchmarking on Lucene 9.11 and saw the time spent in `growIfNeeded` on JDK21 (and about a 7% slowdown in indexing versus JDK17). I noticed this change in my local clone of main, cherry-picked it onto the 9.11 branch, and subsequent JDK21 runs caught up with JDK17. It helped a lot -- not for the biased locking deprecation (which went into JDK15), but because it bypassed these extra comparisons that were added to support virtual threads. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Aggregate files from the same segment into a single Arena [lucene]
ChrisHegarty commented on PR #13570: URL: https://github.com/apache/lucene/pull/13570#issuecomment-2305406563 > @ChrisHegarty the PR ist listed in the 9.x section of changes. I could work on a backport (possibly tomorrow). I just want to make sure you haven't started. > > I am not yet sure how to handle java 20 and java 19. I may possibly leave those as they are. Yeah, I had the same thought too - to just do this for JDK 21+. I’ve not started, but it is on my todo list. I’ll raise a PR for it tomorrow, and if you have time maybe you could help review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] remove refs to people.apache.org/home.apache.org in build [lucene]
msokolov commented on issue #13647: URL: https://github.com/apache/lucene/issues/13647#issuecomment-2305436221 Nice! glad it worked. FYI: I clicked on a few random links and found a 404 https://mikemccand.github.io/luceneutil/analyzers.html although this page does seem to exist on the current site -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] optimize code,change the 'return' position [lucene]
mrhbj opened a new pull request, #13681: URL: https://github.com/apache/lucene/pull/13681 ### Description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] optimize code,change the 'return' position [lucene]
mrhbj closed pull request #13681: optimize code,change the 'return' position URL: https://github.com/apache/lucene/pull/13681 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org