Re: [I] Pruning of estimating the point value count since BooleanScorerSupplier [lucene]
jpountz commented on issue #13554: URL: https://github.com/apache/lucene/issues/13554#issuecomment-2219711472 The idea makes sense to me, but I worry that it wouldn't look good API-wise. I also imagine that the gains would be lower than in #13199 since `Weight#scorerSupplier` is called one time per segment while comparators used to estimate the point count multiple times per segment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Merge on Commit: No merges if new data is flushed (but not committed) [lucene]
jpountz commented on issue #13537: URL: https://github.com/apache/lucene/issues/13537#issuecomment-2219745791 What version are you using? We fixed a similar problem in version 9.9, I wonder if the problem that you are reporting is the same one or a new one: https://github.com/apache/lucene/pull/12549. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Lookup next when current doc is deleted in PerThreadPKLookup.lookup [lucene]
vsop-479 commented on code in PR #13556: URL: https://github.com/apache/lucene/pull/13556#discussion_r1671812467 ## lucene/core/src/test/org/apache/lucene/index/TestTermsEnum.java: ## @@ -998,6 +999,43 @@ public void testCommonPrefixTerms() throws Exception { d.close(); } + public void testPKLookupWithUpdate() throws Exception { Review Comment: Thanks @jpountz , I moved the test case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Lookup next when current doc is deleted in PerThreadPKLookup.lookup [lucene]
jpountz merged PR #13556: URL: https://github.com/apache/lucene/pull/13556 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Check whether liveDoc is null out of loop in Weight.scoreAll [lucene]
jpountz commented on PR #13557: URL: https://github.com/apache/lucene/pull/13557#issuecomment-2219848841 Can you check if this helps with luceneutil on wikimedium10m or wikibigall? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Check whether liveDoc is null out of loop in Weight.scoreAll [lucene]
vsop-479 commented on PR #13557: URL: https://github.com/apache/lucene/pull/13557#issuecomment-2219865957 > Can you check if this helps with luceneutil on wikimedium10m or wikibigall? Sure, I will do that soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]
ChrisHegarty merged PR #13535: URL: https://github.com/apache/lucene/pull/13535 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] QueryRescorer: Use original order by default for same-score items rather than sorting by docId [lucene]
Willdotwhite commented on PR #13510: URL: https://github.com/apache/lucene/pull/13510#issuecomment-2219974712 Morning @andywebb1975 and @jpountz - I'm a bit late to the discussion, but I'm interested to be involved! I'd be happy to look into this with Andy if the rewrite is the way to go; is there any chance this fix could go as-is for a quick patch for now Adrien, and the rewrite follows in a later PR, or is that not the way things are done here? đŸ˜€ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Fix testAddDocumentOnDiskFull to handle IllegalStateException from IndexWriter#close [lucene]
easyice opened a new pull request, #13558: URL: https://github.com/apache/lucene/pull/13558 This issue is similar to https://github.com/apache/lucene/issues/11755, but it occurs in `IndexWriter#close` and also has about half of the time of reproduction. ``` java.lang.IllegalStateException: this writer hit an unrecoverable error; cannot commit at __randomizedtesting.SeedInfo.seed([EAA58E75C71B5109:66BDB60A3A8E87CD]:0) at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:5577) at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3781) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:4122) at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1331) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1369) at org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull(TestIndexWriterOnDiskFull.java:113) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946) at com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982) at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996) at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48) at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902) at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850) at java.bas
[PR] SparseFixedBitSet#firstDoc: reduce number of `indices` iterations for a bit set that is not fully built yet. [lucene]
epotyom opened a new pull request, #13559: URL: https://github.com/apache/lucene/pull/13559 In SparseFixedBitSet.firstDoc, instead of iterating though the entire indices array until non-zero value is found, keep track of max updated index. Use case where it improves performance: 1. `SparseFixedBitSet` is created with high enough length, e.g. max doc in a segment 2. `#nextSetBit` is called (in a loop) on a bit set that is still being built, i.e. some of the next bits are `#set`, but the rest of the bit set is still empty. 3. The moment there are no further set bits, `#nextSetBit` call to `#firstDoc` iterates through the rest of `indices` array. In my case, we use SparseFixedBitSet to track and iterate children hits found in `ToParentBlockJoinQuery`. Iterating through empty `indices` elements becomes expensive when we do it for each parent docID. Lucene util performance test results might not be great though - so maybe there is better way to achieve similar effect? ``` python3 src/python/localrun.py -source wikimediumall ... TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value BrowseDateTaxoFacets1.65 (9.1%)1.59 (0.5%) -3.6% ( -12% -6%) 0.081 BrowseDayOfYearTaxoFacets1.67 (9.2%)1.61 (0.6%) -3.5% ( -12% -6%) 0.088 MedTermDayTaxoFacets9.40 (6.2%)9.19 (5.0%) -2.2% ( -12% -9%) 0.206 BrowseRandomLabelTaxoFacets1.29 (4.6%)1.27 (1.0%) -1.9% ( -7% -3%) 0.070 Prefix3 543.93 (5.7%) 535.20 (4.6%) -1.6% ( -11% -9%) 0.326 AndHighLow 780.30 (3.9%) 771.77 (4.1%) -1.1% ( -8% -7%) 0.383 AndHighMed 199.79 (2.3%) 197.77 (3.0%) -1.0% ( -6% -4%) 0.233 MedSloppyPhrase 61.79 (4.1%) 61.24 (4.1%) -0.9% ( -8% -7%) 0.488 AndHighHigh 84.66 (6.6%) 83.92 (7.6%) -0.9% ( -14% - 14%) 0.699 PKLookup 143.72 (1.9%) 142.48 (2.1%) -0.9% ( -4% -3%) 0.171 Fuzzy1 56.85 (1.5%) 56.36 (2.0%) -0.8% ( -4% -2%) 0.122 BrowseDateSSDVFacets0.43 (16.7%)0.43 (16.1%) -0.8% ( -28% - 38%) 0.873 Wildcard 159.45 (2.7%) 158.30 (4.0%) -0.7% ( -7% -6%) 0.505 Fuzzy2 56.79 (1.2%) 56.38 (1.8%) -0.7% ( -3% -2%) 0.139 HighPhrase 20.07 (4.4%) 19.94 (5.8%) -0.6% ( -10% -9%) 0.701 MedSpanNear 15.66 (1.8%) 15.60 (2.2%) -0.4% ( -4% -3%) 0.537 OrNotHighMed 211.86 (3.1%) 211.03 (2.7%) -0.4% ( -5% -5%) 0.670 HighTermTitleBDVSort 16.31 (2.8%) 16.25 (2.7%) -0.4% ( -5% -5%) 0.661 MedPhrase 154.39 (2.7%) 154.01 (3.3%) -0.2% ( -6% -5%) 0.800 OrHighMed 184.54 (2.5%) 184.21 (2.0%) -0.2% ( -4% -4%) 0.797 LowPhrase 72.18 (3.6%) 72.06 (4.3%) -0.2% ( -7% -8%) 0.893 OrHighNotHigh 229.39 (4.4%) 229.05 (4.5%) -0.1% ( -8% -9%) 0.915 LowSloppyPhrase 98.92 (1.5%) 98.84 (2.1%) -0.1% ( -3% -3%) 0.897 LowSpanNear 53.22 (1.0%) 53.21 (0.8%) -0.0% ( -1% -1%) 0.932 Respell 34.18 (1.9%) 34.18 (2.4%) -0.0% ( -4% -4%) 0.986 HighSpanNear5.05 (3.1%)5.06 (3.0%)0.1% ( -5% -6%) 0.929 AndHighMedDayTaxoFacets 16.78 (1.6%) 16.79 (1.7%)0.1% ( -3% -3%) 0.850 OrHighLow 381.10 (3.4%) 381.52 (3.0%)0.1% ( -6% -6%) 0.914 HighSloppyPhrase 12.20 (3.4%) 12.22 (4.1%)0.1% ( -7% -7%) 0.902 HighTermMonthSort 1059.29 (4.7%) 1061.27 (4.8%)0.2% ( -8% - 10%) 0.901 AndHighHighDayTaxoFacets 13.53 (1.6%) 13.56 (1.7%)0.2% ( -3% -3%) 0.703 OrNotHighLow 664.93 (3.1%) 666.40 (3.6%)0.2% ( -6% -7%) 0.835 MedTerm 330.39 (8.8%) 331.13 (6.3%)0.
Re: [PR] GITHUB#13175: Stop double-checking priority queue inserts in some FacetCount classes [lucene]
mikemccand commented on PR #13488: URL: https://github.com/apache/lucene/pull/13488#issuecomment-2220286531 Thanks @slow-J -- I just backported to 9.12 as well. I had to resolve a few conflicts, maybe have a peek and see if I did it correctly? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] GITHUB#13175: Stop double-checking priority queue inserts in some FacetCount classes [lucene]
slow-J commented on PR #13488: URL: https://github.com/apache/lucene/pull/13488#issuecomment-2220297663 Thanks @mikemccand I think the diff for 9.12 looks good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [9.x] Use a confined Arena for IOContext.READONCE (#13535) [lucene]
uschindler commented on code in PR #13560: URL: https://github.com/apache/lucene/pull/13560#discussion_r1672135733 ## lucene/core/src/test/org/apache/lucene/store/TestMMapDirectory.java: ## @@ -141,4 +146,57 @@ public void testWithRandom() throws Exception { } } } + + // Opens the input with IOContext.READONCE to ensure slice and clone are appropriately confined + public void testConfined() throws Exception { +assumeTrue("Only testable with memory segments", Runtime.version().feature() >= 19); Review Comment: Use `isMemorySegmentImpl()`, see https://github.com/apache/lucene/blob/9e5d278cd6e6f17f09dea8d05295008c855c971a/lucene/core/src/test/org/apache/lucene/store/TestMMapDirectory.java#L67 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Replace AtomicLong with LongAdder in HitsThresholdChecker [lucene]
shubhamvishu commented on PR #13546: URL: https://github.com/apache/lucene/pull/13546#issuecomment-222032 Makes sense @benwtrent! For `BufferedUpdatesStream` as its on the index side we should check on the indexing time and not the regular luceneutil benchmarks to check the QPS? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [9.x] Use a confined Arena for IOContext.READONCE (#13535) [lucene]
ChrisHegarty commented on code in PR #13560: URL: https://github.com/apache/lucene/pull/13560#discussion_r1672214121 ## lucene/core/src/test/org/apache/lucene/store/TestMMapDirectory.java: ## @@ -141,4 +146,57 @@ public void testWithRandom() throws Exception { } } } + + // Opens the input with IOContext.READONCE to ensure slice and clone are appropriately confined + public void testConfined() throws Exception { +assumeTrue("Only testable with memory segments", Runtime.version().feature() >= 19); Review Comment: ah ha! that is what I was looking for. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Pruning of estimating the point value count since BooleanScorerSupplier [lucene]
kkewwei commented on issue #13554: URL: https://github.com/apache/lucene/issues/13554#issuecomment-2220507498 @jpountz, thank you for reply. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Pruning of estimating the point value count since BooleanScorerSupplier [lucene]
kkewwei commented on issue #13554: URL: https://github.com/apache/lucene/issues/13554#issuecomment-2220514317 @jpountz, thank you for reply. I will do benchmark if it's useful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]
jpountz merged PR #13359: URL: https://github.com/apache/lucene/pull/13359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [9.x] Use a confined Arena for IOContext.READONCE (#13535) [lucene]
ChrisHegarty merged PR #13560: URL: https://github.com/apache/lucene/pull/13560 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Reduce heap usage for knn index writers [lucene]
benwtrent merged PR #13538: URL: https://github.com/apache/lucene/pull/13538 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Replace AtomicLong with LongAdder in HitsThresholdChecker [lucene]
benwtrent commented on PR #13546: URL: https://github.com/apache/lucene/pull/13546#issuecomment-2220694253 @shubhamvishu I do not know off hand which benchmarks should be done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] WIP: draft of intra segment concurrency [lucene]
shubhamvishu commented on code in PR #13542: URL: https://github.com/apache/lucene/pull/13542#discussion_r1672505812 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -328,42 +336,65 @@ protected LeafSlice[] slices(List leaves) { /** Static method to segregate LeafReaderContexts amongst multiple slices */ public static LeafSlice[] slices( List leaves, int maxDocsPerSlice, int maxSegmentsPerSlice) { + +// TODO this is a temporary hack to force testing against multiple leaf reader context slices. +// It must be reverted before merging. +maxDocsPerSlice = 1; +maxSegmentsPerSlice = 1; +// end hack + // Make a copy so we can sort: List sortedLeaves = new ArrayList<>(leaves); // Sort by maxDoc, descending: -Collections.sort( -sortedLeaves, Collections.reverseOrder(Comparator.comparingInt(l -> l.reader().maxDoc(; +sortedLeaves.sort(Collections.reverseOrder(Comparator.comparingInt(l -> l.reader().maxDoc(; -final List> groupedLeaves = new ArrayList<>(); -long docSum = 0; -List group = null; +final List> groupedLeafPartitions = new ArrayList<>(); +int currentSliceNumDocs = 0; +List group = null; for (LeafReaderContext ctx : sortedLeaves) { if (ctx.reader().maxDoc() > maxDocsPerSlice) { assert group == null; -groupedLeaves.add(Collections.singletonList(ctx)); +// if the segment does not fit in a single slice, we split it in multiple partitions of +// equal size +int numSlices = Math.ceilDiv(ctx.reader().maxDoc(), maxDocsPerSlice); Review Comment: We are changing the meaning of slices from group of segments to a partition of segment if I understand correctly. I'm thinking if its fine if we are changing its definition in this PR. But if it really creating confusion maybe renaming to `totalNumLeafPartitions`(or something better as naming is hard) could do the job? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] WIP: draft of intra segment concurrency [lucene]
shubhamvishu commented on code in PR #13542: URL: https://github.com/apache/lucene/pull/13542#discussion_r1672505812 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -328,42 +336,65 @@ protected LeafSlice[] slices(List leaves) { /** Static method to segregate LeafReaderContexts amongst multiple slices */ public static LeafSlice[] slices( List leaves, int maxDocsPerSlice, int maxSegmentsPerSlice) { + +// TODO this is a temporary hack to force testing against multiple leaf reader context slices. +// It must be reverted before merging. +maxDocsPerSlice = 1; +maxSegmentsPerSlice = 1; +// end hack + // Make a copy so we can sort: List sortedLeaves = new ArrayList<>(leaves); // Sort by maxDoc, descending: -Collections.sort( -sortedLeaves, Collections.reverseOrder(Comparator.comparingInt(l -> l.reader().maxDoc(; +sortedLeaves.sort(Collections.reverseOrder(Comparator.comparingInt(l -> l.reader().maxDoc(; -final List> groupedLeaves = new ArrayList<>(); -long docSum = 0; -List group = null; +final List> groupedLeafPartitions = new ArrayList<>(); +int currentSliceNumDocs = 0; +List group = null; for (LeafReaderContext ctx : sortedLeaves) { if (ctx.reader().maxDoc() > maxDocsPerSlice) { assert group == null; -groupedLeaves.add(Collections.singletonList(ctx)); +// if the segment does not fit in a single slice, we split it in multiple partitions of +// equal size +int numSlices = Math.ceilDiv(ctx.reader().maxDoc(), maxDocsPerSlice); Review Comment: We are changing the meaning of slices from group of segments to a partition of segment if I understand correctly. I think its ok to change its definition in this PR(we are not adding anything thats conflicting just changing its meaning). But if it really creating confusion maybe renaming to `totalNumLeafPartitions`(or something better as naming is hard) could do the job? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Minor cleanup in some Facet tests [lucene]
stefanvodita merged PR #13489: URL: https://github.com/apache/lucene/pull/13489 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Nrt snapshot 9x [lucene]
benwtrent closed pull request #13533: Nrt snapshot 9x URL: https://github.com/apache/lucene/pull/13533 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Nrt snapshot 9x [lucene]
benwtrent commented on PR #13533: URL: https://github.com/apache/lucene/pull/13533#issuecomment-2220988485 @dianjifzm I went ahead and closed this PR. I am guessing this is a port forward of the other PR which also has no description. Do you mind adding some context directly in the PR etc. for what this code change is supposed to do? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] NRT add configurable commitData for Custom security verification [lucene]
benwtrent commented on issue #13044: URL: https://github.com/apache/lucene/issues/13044#issuecomment-2220993446 I see you have opened a PR to add this with very little context and use case. Do you mind further describing what you are trying to achieve and why? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Minor cleanup in some Facet tests [lucene]
stefanvodita commented on PR #13489: URL: https://github.com/apache/lucene/pull/13489#issuecomment-2221008091 I went ahead and merged since this PR had been pending for a few weeks. Thank you @slow-J for your contribution and @mikemccand for reviewing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] WIP: draft of intra segment concurrency [lucene]
javanna commented on code in PR #13542: URL: https://github.com/apache/lucene/pull/13542#discussion_r1672757642 ## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ## @@ -328,42 +336,65 @@ protected LeafSlice[] slices(List leaves) { /** Static method to segregate LeafReaderContexts amongst multiple slices */ public static LeafSlice[] slices( List leaves, int maxDocsPerSlice, int maxSegmentsPerSlice) { + +// TODO this is a temporary hack to force testing against multiple leaf reader context slices. +// It must be reverted before merging. +maxDocsPerSlice = 1; +maxSegmentsPerSlice = 1; +// end hack + // Make a copy so we can sort: List sortedLeaves = new ArrayList<>(leaves); // Sort by maxDoc, descending: -Collections.sort( -sortedLeaves, Collections.reverseOrder(Comparator.comparingInt(l -> l.reader().maxDoc(; +sortedLeaves.sort(Collections.reverseOrder(Comparator.comparingInt(l -> l.reader().maxDoc(; -final List> groupedLeaves = new ArrayList<>(); -long docSum = 0; -List group = null; +final List> groupedLeafPartitions = new ArrayList<>(); +int currentSliceNumDocs = 0; +List group = null; for (LeafReaderContext ctx : sortedLeaves) { if (ctx.reader().maxDoc() > maxDocsPerSlice) { assert group == null; -groupedLeaves.add(Collections.singletonList(ctx)); +// if the segment does not fit in a single slice, we split it in multiple partitions of +// equal size +int numSlices = Math.ceilDiv(ctx.reader().maxDoc(), maxDocsPerSlice); Review Comment: Thanks for the feedback! I'd suggest postponing detailed reviews around how slices are generated, and naming, for now. I don't plan on addressing these details at the moment, I'd rather focus on functionality and high-level API design (where do we expose the range of ids within the different Lucene API, what problems could there be with the current approach, find better solutions for the hack I came up with). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] WordBreakSpellChecker.generateBreakUpSuggestions() should do breadth first search [lucene]
hossman closed issue #12100: WordBreakSpellChecker.generateBreakUpSuggestions() should do breadth first search URL: https://github.com/apache/lucene/issues/12100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Feature/scalar quantized off heap scoring [lucene]
benwtrent commented on PR #13497: URL: https://github.com/apache/lucene/pull/13497#issuecomment-2221176325 Ok, I double checked, and indeed, half-byte is way slower when reading directly from memory segments instead of reading on heap. [memsegment_vs_baseline.zip](https://github.com/user-attachments/files/16167433/memsegment_vs_baseline.zip) The flamegraphs are wildly different. So much more time is being spent reading from memory segment and then comparing the vectors candidate (this PR): https://github.com/apache/lucene/assets/4357155/afa47bdd-3f53-4a27-8891-5e84ab32c0ed";> baseline: https://github.com/apache/lucene/assets/4357155/733913ca-1b2e-4e98-b8c4-18bf50787cca";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Group memory arenas by segment to reduce costly `Arena.close()` [lucene]
uschindler commented on code in PR #13555: URL: https://github.com/apache/lucene/pull/13555#discussion_r1672768505 ## lucene/core/src/java21/org/apache/lucene/store/GroupedArena.java: ## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.store; + +import java.lang.foreign.AddressLayout; +import java.lang.foreign.Arena; +import java.lang.foreign.MemoryLayout; +import java.lang.foreign.MemorySegment; +import java.lang.foreign.ValueLayout; +import java.nio.file.Path; +import java.util.concurrent.ConcurrentHashMap; +import java.util.concurrent.atomic.AtomicInteger; +import org.apache.lucene.index.IndexFileNames; + +@SuppressWarnings("preview") +final class GroupedArena implements Arena { + + private final String scopeId; + + private final ConcurrentHashMap arenas; + + private final Arena backing; + + private final AtomicInteger refCt; + + static Arena get(Path p, ConcurrentHashMap arenas) { +String filename = p.getFileName().toString(); +String segmentName = IndexFileNames.parseSegmentName(filename); +if (filename.length() == segmentName.length()) { + // no segment found; return a 1-off Arena + return Arena.ofShared(); +} +String scopeId = p.getParent().resolve(segmentName).toString(); +Arena ret; +do { + boolean[] computed = new boolean[1]; + final GroupedArena template = + arenas.computeIfAbsent( + scopeId, + (s) -> { +computed[0] = true; +return new GroupedArena(s, arenas); + }); + if (computed[0]) { +return template; + } + ret = template.cloneIfActive(); +} while (ret == null); // TODO: will this ever actually loop? +return ret; + } + + GroupedArena(String scopeId, ConcurrentHashMap arenas) { +this.scopeId = scopeId; +this.arenas = arenas; +this.backing = Arena.ofShared(); +this.refCt = new AtomicInteger(1); + } + + private GroupedArena(GroupedArena template) { +this.scopeId = template.scopeId; +this.arenas = template.arenas; +this.backing = template.backing; +this.refCt = template.refCt; + } + + private GroupedArena cloneIfActive() { +if (refCt.getAndIncrement() > 0) { + // the usual (always?) case + return new GroupedArena(this); +} else { + // TODO: this should never happen? + return null; +} + } + + @Override + public void close() { +int ct = refCt.decrementAndGet(); +if (ct == 0) { + arenas.remove(scopeId); + if (refCt.get() == 0) { +// TODO: this should always be the case? But if it's not, it should be a benign +// race condition. Whatever caller incremented `refCt` will close it, and if +// anyone tries to open a new arena with the same `scopeId` that we removed +// above, they'll simply create a new Arena, and we're no worse off than we +// would have been if every Arena was created as a one-off. +backing.close(); + } +} else { + assert ct > 0 : "refCt should never be negative; found " + ct; +} + } + + @Override Review Comment: I am not so happy that we need to implement all those methods. Let's keep the default ones. Maybe let the required ones throw UOE, because we never use the arena to allocate memory. ## lucene/core/src/java21/org/apache/lucene/store/GroupedArena.java: ## @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific langua
Re: [I] HnwsGraph creates disconnected components [lucene]
msokolov commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-2221207888 I'd like to take a stab at the "second pass" idea for patching up disconnected graph components. As a first step I think we ought to add state to the `HnswGraphBuilder` in order to clearly indicate that all nodes have been added and we are now engaged in finalizing the graph. My plan is to add a `getCompletedHnswGraph()` method that can be called when flushing, leaving the existing `getHnswGraph()` method that allows callers to observe the graph during construction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] SparseFixedBitSet#firstDoc: reduce number of `indices` iterations for a bit set that is not fully built yet. [lucene]
msokolov commented on PR #13559: URL: https://github.com/apache/lucene/pull/13559#issuecomment-2221245332 I wonder if `DocIdSetBuilder` would help? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Feature/scalar quantized off heap scoring [lucene]
benwtrent commented on PR #13497: URL: https://github.com/apache/lucene/pull/13497#issuecomment-2221250315 @ChrisHegarty have you seen a significant performance regression on MemorySegments & JDK22? Doing some testing, I updated my performance testing for this PR to use JDK22 and now it is WAY slower, more than 2x slower, even for full-byte. For int7, this branch is marginally faster (20%) with JDK21, but basically 2x slower on JDK22. I wonder if our off-heap scoring for `byte` vectors also suffers on JDK22. The quantized scorer for `int7` is just using those same methods. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Add HnswGraphBuilder.getCompletedGraph() and record completed state [lucene]
msokolov opened a new pull request, #13561: URL: https://github.com/apache/lucene/pull/13561 See https://github.com/apache/lucene/issues/12627 for context -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Feature/scalar quantized off heap scoring [lucene]
benwtrent commented on PR #13497: URL: https://github.com/apache/lucene/pull/13497#issuecomment-2221274152 To verify it wasn't some weird artifact in my code, I slightly changed it to where my execution path always reads the vectors on-heap and then wraps them in a memorysegment. Now JDK22 performs the same as JDK21 & the current baseline. Its weird to me that reading from a memory segment onto ByteVector objects would be 2x slower on JDK22 than 21. Regardless that its already much slower for the int4 case on both jdk 21 & 22. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add HnswGraphBuilder.getCompletedGraph() and record completed state [lucene]
benwtrent commented on code in PR #13561: URL: https://github.com/apache/lucene/pull/13561#discussion_r1672837709 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -84,7 +88,8 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException { }); } taskExecutor.invokeAll(futures); -return workers[0].getGraph(); +frozen = true; +return workers[0].getCompletedGraph(); Review Comment: Why are we freezing here instead of within `getCompletedGraph` like its done for the single threaded builder? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add HnswGraphBuilder.getCompletedGraph() and record completed state [lucene]
msokolov commented on code in PR #13561: URL: https://github.com/apache/lucene/pull/13561#discussion_r1672846447 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -84,7 +88,8 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException { }); } taskExecutor.invokeAll(futures); -return workers[0].getGraph(); +frozen = true; +return workers[0].getCompletedGraph(); Review Comment: Yeah ... I guess this allows a little more strictness that we can't have in the other case because this builder *only* allows building via `build()` method wheras the other one can accept individual nodes. But perhaps consistency is better and we should move this `frozen = true` into the `getCompletedGraph` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add HnswGraphBuilder.getCompletedGraph() and record completed state [lucene]
benwtrent commented on code in PR #13561: URL: https://github.com/apache/lucene/pull/13561#discussion_r1672957009 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -156,14 +157,20 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException { infoStream.message(HNSW_COMPONENT, "build graph from " + maxOrd + " vectors"); } addVectors(maxOrd); -return hnsw; Review Comment: I think checking for `frozen` at the start of build is wise and prevents info stream writing when we are already frozen. ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java: ## @@ -156,14 +157,20 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException { infoStream.message(HNSW_COMPONENT, "build graph from " + maxOrd + " vectors"); } addVectors(maxOrd); -return hnsw; +return getCompletedGraph(); } @Override public void setInfoStream(InfoStream infoStream) { this.infoStream = infoStream; } + @Override + public OnHeapHnswGraph getCompletedGraph() { +frozen = true; +return getGraph(); + } + @Override public OnHeapHnswGraph getGraph() { return hnsw; Review Comment: We should check for frozen directly in `addVectors` to prevent infoStream writing if there is some sub-class calling this method erroneously -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add HnswGraphBuilder.getCompletedGraph() and record completed state [lucene]
msokolov commented on code in PR #13561: URL: https://github.com/apache/lucene/pull/13561#discussion_r1673111785 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -84,7 +88,8 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException { }); } taskExecutor.invokeAll(futures); -return workers[0].getGraph(); +frozen = true; +return workers[0].getCompletedGraph(); Review Comment: I added `frozen=true` to `getCompletedGraph` so it has the same semantics as `HnswGraphBuilder` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add a `targetSearchConcurrency` parameter to `LogMergePolicy`. [lucene]
github-actions[bot] commented on PR #13517: URL: https://github.com/apache/lucene/pull/13517#issuecomment-2221754825 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Check whether liveDoc is null out of loop in Weight.scoreAll [lucene]
vsop-479 commented on PR #13557: URL: https://github.com/apache/lucene/pull/13557#issuecomment-2221949176 @jpountz I measured it with luceneutil on wikimedium10m: TaskQPS baseline StdDevQPS my_modified_version StdDev Pct diff p-value HighTermMonthSort 4422.96 (6.3%) 4272.29 (5.8%) -3.4% ( -14% -9%) 0.076 BrowseDayOfYearTaxoFacets 38.19 (28.4%) 37.06 (33.2%) -3.0% ( -50% - 81%) 0.762 BrowseDateTaxoFacets 37.85 (28.4%) 36.78 (33.4%) -2.8% ( -50% - 82%) 0.773 BrowseRandomLabelSSDVFacets 22.71 (9.6%) 22.27 (7.9%) -1.9% ( -17% - 17%) 0.484 OrNotHighLow 2643.42 (4.9%) 2596.34 (5.3%) -1.8% ( -11% -8%) 0.271 BrowseDayOfYearSSDVFacets 27.46 (9.6%) 27.22 (4.4%) -0.9% ( -13% - 14%) 0.706 IntNRQ 81.34 (9.3%) 80.63 (10.3%) -0.9% ( -18% - 20%) 0.777 OrHighNotHigh 562.10 (3.2%) 558.03 (3.4%) -0.7% ( -7% -6%) 0.488 HighSloppyPhrase 16.47 (2.6%) 16.36 (2.9%) -0.7% ( -6% -4%) 0.448 HighTermTitleSort 138.98 (5.4%) 138.26 (5.3%) -0.5% ( -10% - 10%) 0.761 OrNotHighMed 594.27 (3.2%) 591.52 (3.1%) -0.5% ( -6% -6%) 0.646 LowPhrase 104.35 (3.3%) 103.94 (3.0%) -0.4% ( -6% -6%) 0.695 AndHighLow 1913.21 (2.8%) 1906.77 (3.0%) -0.3% ( -5% -5%) 0.715 OrHighHigh 80.80 (2.1%) 80.58 (3.1%) -0.3% ( -5% -5%) 0.748 AndHighMedDayTaxoFacets 164.44 (2.2%) 164.00 (1.8%) -0.3% ( -4% -3%) 0.672 PKLookup 361.45 (2.7%) 360.51 (2.5%) -0.3% ( -5% -5%) 0.749 AndHighMed 164.72 (1.3%) 164.31 (2.6%) -0.2% ( -4% -3%) 0.703 AndHighHigh 115.25 (1.4%) 115.04 (2.7%) -0.2% ( -4% -3%) 0.787 MedSloppyPhrase 53.74 (2.0%) 53.66 (1.9%) -0.1% ( -3% -3%) 0.812 MedPhrase 219.79 (2.3%) 219.48 (3.0%) -0.1% ( -5% -5%) 0.868 OrHighNotLow 783.90 (5.0%) 783.58 (4.8%) -0.0% ( -9% - 10%) 0.978 Fuzzy2 34.75 (2.1%) 34.76 (1.6%)0.0% ( -3% -3%) 0.960 HighPhrase 176.79 (3.3%) 176.90 (4.1%)0.1% ( -7% -7%) 0.956 Respell 146.62 (2.6%) 146.83 (2.2%)0.1% ( -4% -5%) 0.846 LowSloppyPhrase 144.71 (1.6%) 144.93 (1.6%)0.1% ( -2% -3%) 0.766 OrNotHighHigh 673.32 (3.7%) 674.49 (2.8%)0.2% ( -6% -6%) 0.868 OrHighMed 329.23 (2.7%) 330.13 (2.9%)0.3% ( -5% -6%) 0.758 OrHighLow 766.63 (2.8%) 768.92 (3.7%)0.3% ( -6% -7%) 0.775 MedTermDayTaxoFacets 106.35 (2.2%) 106.74 (3.0%)0.4% ( -4% -5%) 0.660 OrHighMedDayTaxoFacets 24.88 (4.9%) 24.97 (5.7%)0.4% ( -9% - 11%) 0.825 Fuzzy1 166.68 (2.3%) 167.39 (2.1%)0.4% ( -3% -4%) 0.540 MedSpanNear 15.57 (1.6%) 15.64 (1.6%)0.4% ( -2% -3%) 0.374 OrHighNotMed 723.31 (3.9%) 726.73 (4.2%)0.5% ( -7% -8%) 0.715 HighSpanNear 11.64 (2.0%) 11.70 (1.4%)0.5% ( -2% -3%) 0.366 LowIntervalsOrdered 38.55 (2.3%) 38.75 (3.6%)0.5% ( -5% -6%) 0.598 Wildcard 358.09 (4.1%) 359.94 (2.7%)0.5% ( -6% -7%) 0.637 Prefix3 558.81 (2.2%) 562.08 (2.2%)0.6% ( -3% -5%) 0.397 LowTerm 1284.45 (4.0%) 1292.67 (4.8%)0.6% ( -7% -9%) 0.648 HighTermTitleBDVSort 15.89 (6.3%) 15.99 (5.3%)0.7% ( -10% - 13%) 0.716 LowSpanNear 131.68 (1.8%) 132.65 (2.1%)0.7% ( -3% -4%) 0.234 MedIntervalsOrdered 147.08 (5.7%) 148.26 (7.1%)0.8% ( -11% - 14%) 0
Re: [PR] Make LRUQueryCache respect Accountable queries on eviction and consisten… [lucene]
jaebongim commented on PR #12614: URL: https://github.com/apache/lucene/pull/12614#issuecomment-013014 @gtroitskiy @romseygeek Is the bug fixed on 8.12 Elasticseach? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org