Re: [I] Pruning of estimating the point value count since BooleanScorerSupplier [lucene]

2024-07-10 Thread via GitHub


jpountz commented on issue #13554:
URL: https://github.com/apache/lucene/issues/13554#issuecomment-2219711472

   The idea makes sense to me, but I worry that it wouldn't look good API-wise. 
I also imagine that the gains would be lower than in #13199 since 
`Weight#scorerSupplier` is called one time per segment while comparators used 
to estimate the point count multiple times per segment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Merge on Commit: No merges if new data is flushed (but not committed) [lucene]

2024-07-10 Thread via GitHub


jpountz commented on issue #13537:
URL: https://github.com/apache/lucene/issues/13537#issuecomment-2219745791

   What version are you using? We fixed a similar problem in version 9.9, I 
wonder if the problem that you are reporting is the same one or a new one: 
https://github.com/apache/lucene/pull/12549.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Lookup next when current doc is deleted in PerThreadPKLookup.lookup [lucene]

2024-07-10 Thread via GitHub


vsop-479 commented on code in PR #13556:
URL: https://github.com/apache/lucene/pull/13556#discussion_r1671812467


##
lucene/core/src/test/org/apache/lucene/index/TestTermsEnum.java:
##
@@ -998,6 +999,43 @@ public void testCommonPrefixTerms() throws Exception {
 d.close();
   }
 
+  public void testPKLookupWithUpdate() throws Exception {

Review Comment:
   Thanks @jpountz , I moved the test case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Lookup next when current doc is deleted in PerThreadPKLookup.lookup [lucene]

2024-07-10 Thread via GitHub


jpountz merged PR #13556:
URL: https://github.com/apache/lucene/pull/13556


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Check whether liveDoc is null out of loop in Weight.scoreAll [lucene]

2024-07-10 Thread via GitHub


jpountz commented on PR #13557:
URL: https://github.com/apache/lucene/pull/13557#issuecomment-2219848841

   Can you check if this helps with luceneutil on wikimedium10m or wikibigall?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Check whether liveDoc is null out of loop in Weight.scoreAll [lucene]

2024-07-10 Thread via GitHub


vsop-479 commented on PR #13557:
URL: https://github.com/apache/lucene/pull/13557#issuecomment-2219865957

   > Can you check if this helps with luceneutil on wikimedium10m or wikibigall?
   
   Sure, I will do that soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use a confined Arena for IOContext.READONCE [lucene]

2024-07-10 Thread via GitHub


ChrisHegarty merged PR #13535:
URL: https://github.com/apache/lucene/pull/13535


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] QueryRescorer: Use original order by default for same-score items rather than sorting by docId [lucene]

2024-07-10 Thread via GitHub


Willdotwhite commented on PR #13510:
URL: https://github.com/apache/lucene/pull/13510#issuecomment-2219974712

   Morning @andywebb1975 and @jpountz - I'm a bit late to the discussion, but 
I'm interested to be involved!
   
   I'd be happy to look into this with Andy if the rewrite is the way to go; is 
there any chance this fix could go as-is for a quick patch for now Adrien, and 
the rewrite follows in a later PR, or is that not the way things are done here? 
đŸ˜€ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Fix testAddDocumentOnDiskFull to handle IllegalStateException from IndexWriter#close [lucene]

2024-07-10 Thread via GitHub


easyice opened a new pull request, #13558:
URL: https://github.com/apache/lucene/pull/13558

   This issue is similar to https://github.com/apache/lucene/issues/11755, but 
it occurs in `IndexWriter#close` and also has about half of the time of 
reproduction.
   
   
   
   ```
   java.lang.IllegalStateException: this writer hit an unrecoverable error; 
cannot commit
   at 
__randomizedtesting.SeedInfo.seed([EAA58E75C71B5109:66BDB60A3A8E87CD]:0)
   at 
org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:5577)
   at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3781)
   at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:4122)
   at 
org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1331)
   at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1369)
   at 
org.apache.lucene.index.TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull(TestIndexWriterOnDiskFull.java:113)
   at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
   at java.base/java.lang.reflect.Method.invoke(Method.java:580)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
   at 
org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)
   at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   at 
org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
   at 
org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   at 
org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
   at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
   at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
   at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
   at 
org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
   at 
org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
   at 
org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
   at 
org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
   at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
   at java.bas

[PR] SparseFixedBitSet#firstDoc: reduce number of `indices` iterations for a bit set that is not fully built yet. [lucene]

2024-07-10 Thread via GitHub


epotyom opened a new pull request, #13559:
URL: https://github.com/apache/lucene/pull/13559

   In SparseFixedBitSet.firstDoc, instead of iterating though the entire 
indices array until non-zero value is found, keep track of max updated index.
   
   Use case where it improves performance:
   1. `SparseFixedBitSet` is created with high enough length, e.g. max doc in a 
segment
   2. `#nextSetBit` is called (in a loop) on a bit set that is still being 
built, i.e. some of the next bits are `#set`, but the rest of the bit set is 
still empty.
   3. The moment there are no further set bits, `#nextSetBit` call to 
`#firstDoc` iterates through the rest of `indices` array. 
   
   In my case, we use SparseFixedBitSet to track and iterate children hits 
found in `ToParentBlockJoinQuery`. Iterating through empty `indices` elements 
becomes expensive when we do it for each parent docID.
   
   Lucene util performance test results might not be great though - so maybe 
there is better way to achieve similar effect?
   
   ```
   python3 src/python/localrun.py -source wikimediumall
   ...
   
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
   BrowseDateTaxoFacets1.65  (9.1%)1.59  
(0.5%)   -3.6% ( -12% -6%) 0.081
  BrowseDayOfYearTaxoFacets1.67  (9.2%)1.61  
(0.6%)   -3.5% ( -12% -6%) 0.088
   MedTermDayTaxoFacets9.40  (6.2%)9.19  
(5.0%)   -2.2% ( -12% -9%) 0.206
BrowseRandomLabelTaxoFacets1.29  (4.6%)1.27  
(1.0%)   -1.9% (  -7% -3%) 0.070
Prefix3  543.93  (5.7%)  535.20  
(4.6%)   -1.6% ( -11% -9%) 0.326
 AndHighLow  780.30  (3.9%)  771.77  
(4.1%)   -1.1% (  -8% -7%) 0.383
 AndHighMed  199.79  (2.3%)  197.77  
(3.0%)   -1.0% (  -6% -4%) 0.233
MedSloppyPhrase   61.79  (4.1%)   61.24  
(4.1%)   -0.9% (  -8% -7%) 0.488
AndHighHigh   84.66  (6.6%)   83.92  
(7.6%)   -0.9% ( -14% -   14%) 0.699
   PKLookup  143.72  (1.9%)  142.48  
(2.1%)   -0.9% (  -4% -3%) 0.171
 Fuzzy1   56.85  (1.5%)   56.36  
(2.0%)   -0.8% (  -4% -2%) 0.122
   BrowseDateSSDVFacets0.43 (16.7%)0.43 
(16.1%)   -0.8% ( -28% -   38%) 0.873
   Wildcard  159.45  (2.7%)  158.30  
(4.0%)   -0.7% (  -7% -6%) 0.505
 Fuzzy2   56.79  (1.2%)   56.38  
(1.8%)   -0.7% (  -3% -2%) 0.139
 HighPhrase   20.07  (4.4%)   19.94  
(5.8%)   -0.6% ( -10% -9%) 0.701
MedSpanNear   15.66  (1.8%)   15.60  
(2.2%)   -0.4% (  -4% -3%) 0.537
   OrNotHighMed  211.86  (3.1%)  211.03  
(2.7%)   -0.4% (  -5% -5%) 0.670
   HighTermTitleBDVSort   16.31  (2.8%)   16.25  
(2.7%)   -0.4% (  -5% -5%) 0.661
  MedPhrase  154.39  (2.7%)  154.01  
(3.3%)   -0.2% (  -6% -5%) 0.800
  OrHighMed  184.54  (2.5%)  184.21  
(2.0%)   -0.2% (  -4% -4%) 0.797
  LowPhrase   72.18  (3.6%)   72.06  
(4.3%)   -0.2% (  -7% -8%) 0.893
  OrHighNotHigh  229.39  (4.4%)  229.05  
(4.5%)   -0.1% (  -8% -9%) 0.915
LowSloppyPhrase   98.92  (1.5%)   98.84  
(2.1%)   -0.1% (  -3% -3%) 0.897
LowSpanNear   53.22  (1.0%)   53.21  
(0.8%)   -0.0% (  -1% -1%) 0.932
Respell   34.18  (1.9%)   34.18  
(2.4%)   -0.0% (  -4% -4%) 0.986
   HighSpanNear5.05  (3.1%)5.06  
(3.0%)0.1% (  -5% -6%) 0.929
AndHighMedDayTaxoFacets   16.78  (1.6%)   16.79  
(1.7%)0.1% (  -3% -3%) 0.850
  OrHighLow  381.10  (3.4%)  381.52  
(3.0%)0.1% (  -6% -6%) 0.914
   HighSloppyPhrase   12.20  (3.4%)   12.22  
(4.1%)0.1% (  -7% -7%) 0.902
  HighTermMonthSort 1059.29  (4.7%) 1061.27  
(4.8%)0.2% (  -8% -   10%) 0.901
   AndHighHighDayTaxoFacets   13.53  (1.6%)   13.56  
(1.7%)0.2% (  -3% -3%) 0.703
   OrNotHighLow  664.93  (3.1%)  666.40  
(3.6%)0.2% (  -6% -7%) 0.835
MedTerm  330.39  (8.8%)  331.13  
(6.3%)0.

Re: [PR] GITHUB#13175: Stop double-checking priority queue inserts in some FacetCount classes [lucene]

2024-07-10 Thread via GitHub


mikemccand commented on PR #13488:
URL: https://github.com/apache/lucene/pull/13488#issuecomment-2220286531

   Thanks @slow-J -- I just backported to 9.12 as well.  I had to resolve a few 
conflicts, maybe have a peek and see if I did it correctly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] GITHUB#13175: Stop double-checking priority queue inserts in some FacetCount classes [lucene]

2024-07-10 Thread via GitHub


slow-J commented on PR #13488:
URL: https://github.com/apache/lucene/pull/13488#issuecomment-2220297663

   Thanks @mikemccand I think the diff for 9.12 looks good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [9.x] Use a confined Arena for IOContext.READONCE (#13535) [lucene]

2024-07-10 Thread via GitHub


uschindler commented on code in PR #13560:
URL: https://github.com/apache/lucene/pull/13560#discussion_r1672135733


##
lucene/core/src/test/org/apache/lucene/store/TestMMapDirectory.java:
##
@@ -141,4 +146,57 @@ public void testWithRandom() throws Exception {
   }
 }
   }
+
+  // Opens the input with IOContext.READONCE to ensure slice and clone are 
appropriately confined
+  public void testConfined() throws Exception {
+assumeTrue("Only testable with memory segments", 
Runtime.version().feature() >= 19);

Review Comment:
   Use `isMemorySegmentImpl()`, see 
https://github.com/apache/lucene/blob/9e5d278cd6e6f17f09dea8d05295008c855c971a/lucene/core/src/test/org/apache/lucene/store/TestMMapDirectory.java#L67
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Replace AtomicLong with LongAdder in HitsThresholdChecker [lucene]

2024-07-10 Thread via GitHub


shubhamvishu commented on PR #13546:
URL: https://github.com/apache/lucene/pull/13546#issuecomment-222032

   Makes sense @benwtrent! For `BufferedUpdatesStream` as its on the index side 
we should check on the indexing time and not the regular luceneutil benchmarks 
to check the QPS? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [9.x] Use a confined Arena for IOContext.READONCE (#13535) [lucene]

2024-07-10 Thread via GitHub


ChrisHegarty commented on code in PR #13560:
URL: https://github.com/apache/lucene/pull/13560#discussion_r1672214121


##
lucene/core/src/test/org/apache/lucene/store/TestMMapDirectory.java:
##
@@ -141,4 +146,57 @@ public void testWithRandom() throws Exception {
   }
 }
   }
+
+  // Opens the input with IOContext.READONCE to ensure slice and clone are 
appropriately confined
+  public void testConfined() throws Exception {
+assumeTrue("Only testable with memory segments", 
Runtime.version().feature() >= 19);

Review Comment:
   ah ha! that is what I was looking for. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Pruning of estimating the point value count since BooleanScorerSupplier [lucene]

2024-07-10 Thread via GitHub


kkewwei commented on issue #13554:
URL: https://github.com/apache/lucene/issues/13554#issuecomment-2220507498

   @jpountz, thank you for reply.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Pruning of estimating the point value count since BooleanScorerSupplier [lucene]

2024-07-10 Thread via GitHub


kkewwei commented on issue #13554:
URL: https://github.com/apache/lucene/issues/13554#issuecomment-2220514317

   @jpountz, thank you for reply.
   
   I will do  benchmark  if it's useful.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-07-10 Thread via GitHub


jpountz merged PR #13359:
URL: https://github.com/apache/lucene/pull/13359


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [9.x] Use a confined Arena for IOContext.READONCE (#13535) [lucene]

2024-07-10 Thread via GitHub


ChrisHegarty merged PR #13560:
URL: https://github.com/apache/lucene/pull/13560


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reduce heap usage for knn index writers [lucene]

2024-07-10 Thread via GitHub


benwtrent merged PR #13538:
URL: https://github.com/apache/lucene/pull/13538


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Replace AtomicLong with LongAdder in HitsThresholdChecker [lucene]

2024-07-10 Thread via GitHub


benwtrent commented on PR #13546:
URL: https://github.com/apache/lucene/pull/13546#issuecomment-2220694253

   @shubhamvishu I do not know off hand which benchmarks should be done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] WIP: draft of intra segment concurrency [lucene]

2024-07-10 Thread via GitHub


shubhamvishu commented on code in PR #13542:
URL: https://github.com/apache/lucene/pull/13542#discussion_r1672505812


##
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##
@@ -328,42 +336,65 @@ protected LeafSlice[] slices(List 
leaves) {
   /** Static method to segregate LeafReaderContexts amongst multiple slices */
   public static LeafSlice[] slices(
   List leaves, int maxDocsPerSlice, int 
maxSegmentsPerSlice) {
+
+// TODO this is a temporary hack to force testing against multiple leaf 
reader context slices.
+// It must be reverted before merging.
+maxDocsPerSlice = 1;
+maxSegmentsPerSlice = 1;
+// end hack
+
 // Make a copy so we can sort:
 List sortedLeaves = new ArrayList<>(leaves);
 
 // Sort by maxDoc, descending:
-Collections.sort(
-sortedLeaves, Collections.reverseOrder(Comparator.comparingInt(l -> 
l.reader().maxDoc(;
+sortedLeaves.sort(Collections.reverseOrder(Comparator.comparingInt(l -> 
l.reader().maxDoc(;
 
-final List> groupedLeaves = new ArrayList<>();
-long docSum = 0;
-List group = null;
+final List> groupedLeafPartitions = new 
ArrayList<>();
+int currentSliceNumDocs = 0;
+List group = null;
 for (LeafReaderContext ctx : sortedLeaves) {
   if (ctx.reader().maxDoc() > maxDocsPerSlice) {
 assert group == null;
-groupedLeaves.add(Collections.singletonList(ctx));
+// if the segment does not fit in a single slice, we split it in 
multiple partitions of
+// equal size
+int numSlices = Math.ceilDiv(ctx.reader().maxDoc(), maxDocsPerSlice);

Review Comment:
   We are changing the meaning of slices from group of segments to a partition 
of segment if I understand correctly. I'm thinking if its fine if we are 
changing its definition in this PR. But if it really creating confusion maybe 
renaming to `totalNumLeafPartitions`(or something better as naming is hard) 
could do the job?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] WIP: draft of intra segment concurrency [lucene]

2024-07-10 Thread via GitHub


shubhamvishu commented on code in PR #13542:
URL: https://github.com/apache/lucene/pull/13542#discussion_r1672505812


##
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##
@@ -328,42 +336,65 @@ protected LeafSlice[] slices(List 
leaves) {
   /** Static method to segregate LeafReaderContexts amongst multiple slices */
   public static LeafSlice[] slices(
   List leaves, int maxDocsPerSlice, int 
maxSegmentsPerSlice) {
+
+// TODO this is a temporary hack to force testing against multiple leaf 
reader context slices.
+// It must be reverted before merging.
+maxDocsPerSlice = 1;
+maxSegmentsPerSlice = 1;
+// end hack
+
 // Make a copy so we can sort:
 List sortedLeaves = new ArrayList<>(leaves);
 
 // Sort by maxDoc, descending:
-Collections.sort(
-sortedLeaves, Collections.reverseOrder(Comparator.comparingInt(l -> 
l.reader().maxDoc(;
+sortedLeaves.sort(Collections.reverseOrder(Comparator.comparingInt(l -> 
l.reader().maxDoc(;
 
-final List> groupedLeaves = new ArrayList<>();
-long docSum = 0;
-List group = null;
+final List> groupedLeafPartitions = new 
ArrayList<>();
+int currentSliceNumDocs = 0;
+List group = null;
 for (LeafReaderContext ctx : sortedLeaves) {
   if (ctx.reader().maxDoc() > maxDocsPerSlice) {
 assert group == null;
-groupedLeaves.add(Collections.singletonList(ctx));
+// if the segment does not fit in a single slice, we split it in 
multiple partitions of
+// equal size
+int numSlices = Math.ceilDiv(ctx.reader().maxDoc(), maxDocsPerSlice);

Review Comment:
   We are changing the meaning of slices from group of segments to a partition 
of segment if I understand correctly. I think its ok to change its definition 
in this PR(we are not adding anything thats conflicting just changing its 
meaning). But if it really creating confusion maybe renaming to 
`totalNumLeafPartitions`(or something better as naming is hard) could do the 
job?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Minor cleanup in some Facet tests [lucene]

2024-07-10 Thread via GitHub


stefanvodita merged PR #13489:
URL: https://github.com/apache/lucene/pull/13489


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Nrt snapshot 9x [lucene]

2024-07-10 Thread via GitHub


benwtrent closed pull request #13533: Nrt snapshot 9x
URL: https://github.com/apache/lucene/pull/13533


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Nrt snapshot 9x [lucene]

2024-07-10 Thread via GitHub


benwtrent commented on PR #13533:
URL: https://github.com/apache/lucene/pull/13533#issuecomment-2220988485

   @dianjifzm I went ahead and closed this PR. I am guessing this is a port 
forward of the other PR which also has no description. 
   
   Do you mind adding some context directly in the PR etc. for what this code 
change is supposed to do?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] NRT add configurable commitData for Custom security verification [lucene]

2024-07-10 Thread via GitHub


benwtrent commented on issue #13044:
URL: https://github.com/apache/lucene/issues/13044#issuecomment-2220993446

   I see you have opened a PR to add this with very little context and use 
case. Do you mind further describing what you are trying to achieve and why?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Minor cleanup in some Facet tests [lucene]

2024-07-10 Thread via GitHub


stefanvodita commented on PR #13489:
URL: https://github.com/apache/lucene/pull/13489#issuecomment-2221008091

   I went ahead and merged since this PR had been pending for a few weeks. 
Thank you @slow-J for your contribution and @mikemccand for reviewing!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] WIP: draft of intra segment concurrency [lucene]

2024-07-10 Thread via GitHub


javanna commented on code in PR #13542:
URL: https://github.com/apache/lucene/pull/13542#discussion_r1672757642


##
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##
@@ -328,42 +336,65 @@ protected LeafSlice[] slices(List 
leaves) {
   /** Static method to segregate LeafReaderContexts amongst multiple slices */
   public static LeafSlice[] slices(
   List leaves, int maxDocsPerSlice, int 
maxSegmentsPerSlice) {
+
+// TODO this is a temporary hack to force testing against multiple leaf 
reader context slices.
+// It must be reverted before merging.
+maxDocsPerSlice = 1;
+maxSegmentsPerSlice = 1;
+// end hack
+
 // Make a copy so we can sort:
 List sortedLeaves = new ArrayList<>(leaves);
 
 // Sort by maxDoc, descending:
-Collections.sort(
-sortedLeaves, Collections.reverseOrder(Comparator.comparingInt(l -> 
l.reader().maxDoc(;
+sortedLeaves.sort(Collections.reverseOrder(Comparator.comparingInt(l -> 
l.reader().maxDoc(;
 
-final List> groupedLeaves = new ArrayList<>();
-long docSum = 0;
-List group = null;
+final List> groupedLeafPartitions = new 
ArrayList<>();
+int currentSliceNumDocs = 0;
+List group = null;
 for (LeafReaderContext ctx : sortedLeaves) {
   if (ctx.reader().maxDoc() > maxDocsPerSlice) {
 assert group == null;
-groupedLeaves.add(Collections.singletonList(ctx));
+// if the segment does not fit in a single slice, we split it in 
multiple partitions of
+// equal size
+int numSlices = Math.ceilDiv(ctx.reader().maxDoc(), maxDocsPerSlice);

Review Comment:
   Thanks for the feedback! I'd suggest postponing detailed reviews around how 
slices are generated, and naming, for now. I don't plan on addressing these 
details at the moment, I'd rather focus on functionality and high-level API 
design (where do we expose the range of ids within the different Lucene API, 
what problems could there be with the current approach, find better solutions 
for the hack I came up with).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] WordBreakSpellChecker.generateBreakUpSuggestions() should do breadth first search [lucene]

2024-07-10 Thread via GitHub


hossman closed issue #12100: WordBreakSpellChecker.generateBreakUpSuggestions() 
should do breadth first search
URL: https://github.com/apache/lucene/issues/12100


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Feature/scalar quantized off heap scoring [lucene]

2024-07-10 Thread via GitHub


benwtrent commented on PR #13497:
URL: https://github.com/apache/lucene/pull/13497#issuecomment-2221176325

   Ok, I double checked, and indeed, half-byte is way slower when reading 
directly from memory segments instead of reading on heap. 
   
[memsegment_vs_baseline.zip](https://github.com/user-attachments/files/16167433/memsegment_vs_baseline.zip)
   
   The flamegraphs are wildly different. So much more time is being spent 
reading from memory segment and then comparing the vectors
   
   candidate (this PR): 
   https://github.com/apache/lucene/assets/4357155/afa47bdd-3f53-4a27-8891-5e84ab32c0ed";>
   
   baseline:
   
   https://github.com/apache/lucene/assets/4357155/733913ca-1b2e-4e98-b8c4-18bf50787cca";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Group memory arenas by segment to reduce costly `Arena.close()` [lucene]

2024-07-10 Thread via GitHub


uschindler commented on code in PR #13555:
URL: https://github.com/apache/lucene/pull/13555#discussion_r1672768505


##
lucene/core/src/java21/org/apache/lucene/store/GroupedArena.java:
##
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.lang.foreign.AddressLayout;
+import java.lang.foreign.Arena;
+import java.lang.foreign.MemoryLayout;
+import java.lang.foreign.MemorySegment;
+import java.lang.foreign.ValueLayout;
+import java.nio.file.Path;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.atomic.AtomicInteger;
+import org.apache.lucene.index.IndexFileNames;
+
+@SuppressWarnings("preview")
+final class GroupedArena implements Arena {
+
+  private final String scopeId;
+
+  private final ConcurrentHashMap arenas;
+
+  private final Arena backing;
+
+  private final AtomicInteger refCt;
+
+  static Arena get(Path p, ConcurrentHashMap arenas) {
+String filename = p.getFileName().toString();
+String segmentName = IndexFileNames.parseSegmentName(filename);
+if (filename.length() == segmentName.length()) {
+  // no segment found; return a 1-off Arena
+  return Arena.ofShared();
+}
+String scopeId = p.getParent().resolve(segmentName).toString();
+Arena ret;
+do {
+  boolean[] computed = new boolean[1];
+  final GroupedArena template =
+  arenas.computeIfAbsent(
+  scopeId,
+  (s) -> {
+computed[0] = true;
+return new GroupedArena(s, arenas);
+  });
+  if (computed[0]) {
+return template;
+  }
+  ret = template.cloneIfActive();
+} while (ret == null); // TODO: will this ever actually loop?
+return ret;
+  }
+
+  GroupedArena(String scopeId, ConcurrentHashMap arenas) 
{
+this.scopeId = scopeId;
+this.arenas = arenas;
+this.backing = Arena.ofShared();
+this.refCt = new AtomicInteger(1);
+  }
+
+  private GroupedArena(GroupedArena template) {
+this.scopeId = template.scopeId;
+this.arenas = template.arenas;
+this.backing = template.backing;
+this.refCt = template.refCt;
+  }
+
+  private GroupedArena cloneIfActive() {
+if (refCt.getAndIncrement() > 0) {
+  // the usual (always?) case
+  return new GroupedArena(this);
+} else {
+  // TODO: this should never happen?
+  return null;
+}
+  }
+
+  @Override
+  public void close() {
+int ct = refCt.decrementAndGet();
+if (ct == 0) {
+  arenas.remove(scopeId);
+  if (refCt.get() == 0) {
+// TODO: this should always be the case? But if it's not, it should be 
a benign
+//  race condition. Whatever caller incremented `refCt` will close it, 
and if
+//  anyone tries to open a new arena with the same `scopeId` that we 
removed
+//  above, they'll simply create a new Arena, and we're no worse off 
than we
+//  would have been if every Arena was created as a one-off.
+backing.close();
+  }
+} else {
+  assert ct > 0 : "refCt should never be negative; found " + ct;
+}
+  }
+
+  @Override

Review Comment:
   I am not so happy that we need to implement all those methods. Let's keep 
the default ones.
   Maybe let the required ones throw UOE, because we never use the arena to 
allocate memory.



##
lucene/core/src/java21/org/apache/lucene/store/GroupedArena.java:
##
@@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific langua

Re: [I] HnwsGraph creates disconnected components [lucene]

2024-07-10 Thread via GitHub


msokolov commented on issue #12627:
URL: https://github.com/apache/lucene/issues/12627#issuecomment-2221207888

   I'd like to take a stab at the "second pass" idea for patching up 
disconnected graph components. As a first step I think we ought to add state to 
the `HnswGraphBuilder` in order to clearly indicate that all nodes have been 
added and we are now engaged in finalizing the graph. My plan is to add a 
`getCompletedHnswGraph()` method that can be called when flushing, leaving the 
existing `getHnswGraph()` method that allows callers to observe the graph 
during construction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] SparseFixedBitSet#firstDoc: reduce number of `indices` iterations for a bit set that is not fully built yet. [lucene]

2024-07-10 Thread via GitHub


msokolov commented on PR #13559:
URL: https://github.com/apache/lucene/pull/13559#issuecomment-2221245332

   I wonder if `DocIdSetBuilder` would help?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Feature/scalar quantized off heap scoring [lucene]

2024-07-10 Thread via GitHub


benwtrent commented on PR #13497:
URL: https://github.com/apache/lucene/pull/13497#issuecomment-2221250315

   @ChrisHegarty have you seen a significant performance regression on 
MemorySegments & JDK22?
   
   Doing some testing, I updated my performance testing for this PR to use 
JDK22 and now it is WAY slower, more than 2x slower, even for full-byte.
   
   For int7, this branch is marginally faster (20%) with JDK21, but basically 
2x slower on JDK22.
   
   
   I wonder if our off-heap scoring for `byte` vectors also suffers on JDK22. 
The quantized scorer for `int7` is just using those same methods.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Add HnswGraphBuilder.getCompletedGraph() and record completed state [lucene]

2024-07-10 Thread via GitHub


msokolov opened a new pull request, #13561:
URL: https://github.com/apache/lucene/pull/13561

   See https://github.com/apache/lucene/issues/12627 for context


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Feature/scalar quantized off heap scoring [lucene]

2024-07-10 Thread via GitHub


benwtrent commented on PR #13497:
URL: https://github.com/apache/lucene/pull/13497#issuecomment-2221274152

   To verify it wasn't some weird artifact in my code, I slightly changed it to 
where my execution path always reads the vectors on-heap and then wraps them in 
a memorysegment. Now JDK22 performs the same as JDK21 & the current baseline.
   
   Its weird to me that reading from a memory segment onto ByteVector objects 
would be 2x slower on JDK22 than 21. 
   
   Regardless that its already much slower for the int4 case on both jdk 21 & 
22. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add HnswGraphBuilder.getCompletedGraph() and record completed state [lucene]

2024-07-10 Thread via GitHub


benwtrent commented on code in PR #13561:
URL: https://github.com/apache/lucene/pull/13561#discussion_r1672837709


##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java:
##
@@ -84,7 +88,8 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException {
   });
 }
 taskExecutor.invokeAll(futures);
-return workers[0].getGraph();
+frozen = true;
+return workers[0].getCompletedGraph();

Review Comment:
   Why are we freezing here instead of within `getCompletedGraph` like its done 
for the single threaded builder?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add HnswGraphBuilder.getCompletedGraph() and record completed state [lucene]

2024-07-10 Thread via GitHub


msokolov commented on code in PR #13561:
URL: https://github.com/apache/lucene/pull/13561#discussion_r1672846447


##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java:
##
@@ -84,7 +88,8 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException {
   });
 }
 taskExecutor.invokeAll(futures);
-return workers[0].getGraph();
+frozen = true;
+return workers[0].getCompletedGraph();

Review Comment:
   Yeah ... I guess this allows a little more strictness that we can't have in 
the other case because this builder *only* allows building via `build()` method 
wheras the other one can accept individual nodes.  But perhaps consistency is 
better and we should move this `frozen = true` into the `getCompletedGraph`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add HnswGraphBuilder.getCompletedGraph() and record completed state [lucene]

2024-07-10 Thread via GitHub


benwtrent commented on code in PR #13561:
URL: https://github.com/apache/lucene/pull/13561#discussion_r1672957009


##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##
@@ -156,14 +157,20 @@ public OnHeapHnswGraph build(int maxOrd) throws 
IOException {
   infoStream.message(HNSW_COMPONENT, "build graph from " + maxOrd + " 
vectors");
 }
 addVectors(maxOrd);
-return hnsw;

Review Comment:
   I think checking for `frozen` at the start of build is wise and prevents 
info stream writing when we are already frozen.



##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraphBuilder.java:
##
@@ -156,14 +157,20 @@ public OnHeapHnswGraph build(int maxOrd) throws 
IOException {
   infoStream.message(HNSW_COMPONENT, "build graph from " + maxOrd + " 
vectors");
 }
 addVectors(maxOrd);
-return hnsw;
+return getCompletedGraph();
   }
 
   @Override
   public void setInfoStream(InfoStream infoStream) {
 this.infoStream = infoStream;
   }
 
+  @Override
+  public OnHeapHnswGraph getCompletedGraph() {
+frozen = true;
+return getGraph();
+  }
+
   @Override
   public OnHeapHnswGraph getGraph() {
 return hnsw;

Review Comment:
   We should check for frozen directly in `addVectors` to prevent infoStream 
writing if there is some sub-class calling this method erroneously



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add HnswGraphBuilder.getCompletedGraph() and record completed state [lucene]

2024-07-10 Thread via GitHub


msokolov commented on code in PR #13561:
URL: https://github.com/apache/lucene/pull/13561#discussion_r1673111785


##
lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java:
##
@@ -84,7 +88,8 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException {
   });
 }
 taskExecutor.invokeAll(futures);
-return workers[0].getGraph();
+frozen = true;
+return workers[0].getCompletedGraph();

Review Comment:
   I added `frozen=true` to `getCompletedGraph` so it has the same semantics as 
`HnswGraphBuilder`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a `targetSearchConcurrency` parameter to `LogMergePolicy`. [lucene]

2024-07-10 Thread via GitHub


github-actions[bot] commented on PR #13517:
URL: https://github.com/apache/lucene/pull/13517#issuecomment-2221754825

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Check whether liveDoc is null out of loop in Weight.scoreAll [lucene]

2024-07-10 Thread via GitHub


vsop-479 commented on PR #13557:
URL: https://github.com/apache/lucene/pull/13557#issuecomment-2221949176

   @jpountz 
   
   I measured it with luceneutil on wikimedium10m:
   
   
   TaskQPS baseline  StdDevQPS my_modified_version  StdDev  
  Pct diff p-value
  HighTermMonthSort 4422.96  (6.3%) 4272.29  
(5.8%)   -3.4% ( -14% -9%) 0.076
  BrowseDayOfYearTaxoFacets   38.19 (28.4%)   37.06 
(33.2%)   -3.0% ( -50% -   81%) 0.762
   BrowseDateTaxoFacets   37.85 (28.4%)   36.78 
(33.4%)   -2.8% ( -50% -   82%) 0.773
BrowseRandomLabelSSDVFacets   22.71  (9.6%)   22.27  
(7.9%)   -1.9% ( -17% -   17%) 0.484
   OrNotHighLow 2643.42  (4.9%) 2596.34  
(5.3%)   -1.8% ( -11% -8%) 0.271
  BrowseDayOfYearSSDVFacets   27.46  (9.6%)   27.22  
(4.4%)   -0.9% ( -13% -   14%) 0.706
 IntNRQ   81.34  (9.3%)   80.63 
(10.3%)   -0.9% ( -18% -   20%) 0.777
  OrHighNotHigh  562.10  (3.2%)  558.03  
(3.4%)   -0.7% (  -7% -6%) 0.488
   HighSloppyPhrase   16.47  (2.6%)   16.36  
(2.9%)   -0.7% (  -6% -4%) 0.448
  HighTermTitleSort  138.98  (5.4%)  138.26  
(5.3%)   -0.5% ( -10% -   10%) 0.761
   OrNotHighMed  594.27  (3.2%)  591.52  
(3.1%)   -0.5% (  -6% -6%) 0.646
  LowPhrase  104.35  (3.3%)  103.94  
(3.0%)   -0.4% (  -6% -6%) 0.695
 AndHighLow 1913.21  (2.8%) 1906.77  
(3.0%)   -0.3% (  -5% -5%) 0.715
 OrHighHigh   80.80  (2.1%)   80.58  
(3.1%)   -0.3% (  -5% -5%) 0.748
AndHighMedDayTaxoFacets  164.44  (2.2%)  164.00  
(1.8%)   -0.3% (  -4% -3%) 0.672
   PKLookup  361.45  (2.7%)  360.51  
(2.5%)   -0.3% (  -5% -5%) 0.749
 AndHighMed  164.72  (1.3%)  164.31  
(2.6%)   -0.2% (  -4% -3%) 0.703
AndHighHigh  115.25  (1.4%)  115.04  
(2.7%)   -0.2% (  -4% -3%) 0.787
MedSloppyPhrase   53.74  (2.0%)   53.66  
(1.9%)   -0.1% (  -3% -3%) 0.812
  MedPhrase  219.79  (2.3%)  219.48  
(3.0%)   -0.1% (  -5% -5%) 0.868
   OrHighNotLow  783.90  (5.0%)  783.58  
(4.8%)   -0.0% (  -9% -   10%) 0.978
 Fuzzy2   34.75  (2.1%)   34.76  
(1.6%)0.0% (  -3% -3%) 0.960
 HighPhrase  176.79  (3.3%)  176.90  
(4.1%)0.1% (  -7% -7%) 0.956
Respell  146.62  (2.6%)  146.83  
(2.2%)0.1% (  -4% -5%) 0.846
LowSloppyPhrase  144.71  (1.6%)  144.93  
(1.6%)0.1% (  -2% -3%) 0.766
  OrNotHighHigh  673.32  (3.7%)  674.49  
(2.8%)0.2% (  -6% -6%) 0.868
  OrHighMed  329.23  (2.7%)  330.13  
(2.9%)0.3% (  -5% -6%) 0.758
  OrHighLow  766.63  (2.8%)  768.92  
(3.7%)0.3% (  -6% -7%) 0.775
   MedTermDayTaxoFacets  106.35  (2.2%)  106.74  
(3.0%)0.4% (  -4% -5%) 0.660
 OrHighMedDayTaxoFacets   24.88  (4.9%)   24.97  
(5.7%)0.4% (  -9% -   11%) 0.825
 Fuzzy1  166.68  (2.3%)  167.39  
(2.1%)0.4% (  -3% -4%) 0.540
MedSpanNear   15.57  (1.6%)   15.64  
(1.6%)0.4% (  -2% -3%) 0.374
   OrHighNotMed  723.31  (3.9%)  726.73  
(4.2%)0.5% (  -7% -8%) 0.715
   HighSpanNear   11.64  (2.0%)   11.70  
(1.4%)0.5% (  -2% -3%) 0.366
LowIntervalsOrdered   38.55  (2.3%)   38.75  
(3.6%)0.5% (  -5% -6%) 0.598
   Wildcard  358.09  (4.1%)  359.94  
(2.7%)0.5% (  -6% -7%) 0.637
Prefix3  558.81  (2.2%)  562.08  
(2.2%)0.6% (  -3% -5%) 0.397
LowTerm 1284.45  (4.0%) 1292.67  
(4.8%)0.6% (  -7% -9%) 0.648
   HighTermTitleBDVSort   15.89  (6.3%)   15.99  
(5.3%)0.7% ( -10% -   13%) 0.716
LowSpanNear  131.68  (1.8%)  132.65  
(2.1%)0.7% (  -3% -4%) 0.234
MedIntervalsOrdered  147.08  (5.7%)  148.26  
(7.1%)0.8% ( -11% -   14%) 0

Re: [PR] Make LRUQueryCache respect Accountable queries on eviction and consisten… [lucene]

2024-07-10 Thread via GitHub


jaebongim commented on PR #12614:
URL: https://github.com/apache/lucene/pull/12614#issuecomment-013014

   @gtroitskiy @romseygeek 
   Is the bug fixed on 8.12 Elasticseach? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org