Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-11-29 Thread via GitHub


gf2121 commented on PR #12699:
URL: https://github.com/apache/lucene/pull/12699#issuecomment-1831412787

   Hi @mikemccand @jpountz !
   
   The [newest nightly 
benchmark](https://home.apache.org/~mikemccand/lucenebench/2023.11.28.18.03.59.html)
 shows we get back some speed for 
[PKLookUp](https://home.apache.org/~mikemccand/lucenebench/PKLookup.html), but 
still not as good as the performance before the merge of  
https://github.com/apache/lucene/pull/12631.
   
   This result is not quite same as what i saw locally, but reasonable for me 
as we do more `Outputs#read` anyway. 
   
   Since we are [preparing for the release of 
9.9.0](https://lists.apache.org/thread/6bxf0fo1ypblxn2hbdfyg38y3hc8lo9g). I'm 
not sure if it is an acceptable trade-off now. Should we backport this or 
revert https://github.com/apache/lucene/pull/12631?  I'd appreciate your 
suggestions :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Multi-value Support for KnnVectorField [lucene]

2023-11-29 Thread via GitHub


alessandrobenedetti commented on issue #12313:
URL: https://github.com/apache/lucene/issues/12313#issuecomment-1831534483

   Hi @david-sitsky, the multi-valued vectors in Lucene's contribution is now 
paused for lack of fundings.
   I'll resume it from my side if and when I get some sponsors :)
   
   The nested documents approach on the other hand has been released with 
Lucene 9.8! You read the various considerations that apply in the thread!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Add Facets#getSpecificValues (bulk) and bulk path -> ordinal lookup for taxonomy faceting [lucene]

2023-11-29 Thread via GitHub


ChrisHegarty commented on issue #12180:
URL: https://github.com/apache/lucene/issues/12180#issuecomment-1831540117

   Hi,  
   
   I'd like to ask a clarifying question as part of the _9.9.0_ release manager 
duties, since this currently-unresolved issue is associated with _Milestone 
9.9.0_.
   
   It's clear that progress has been made on this issue, with the addition of 
the new `TaxonomyReader::getBulkOrdinals` method. Is more work planned, or 
should this issue be closed?  It's fine either way, I just want to ensure that 
we are clear about the targeted release, if any.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] configuration items of the alg file are adapted to the 9.0 branch [LUCENE-10100] [lucene]

2023-11-29 Thread via GitHub


ChrisHegarty commented on issue #11138:
URL: https://github.com/apache/lucene/issues/11138#issuecomment-1831542918

   Hi,
   
   I'd like to ask a clarifying question as part of the _9.9.0_ release manager 
duties, since this currently-unresolved issue is associated with Milestone 
9.9.0.
   
   Is more work planned, or should this issue be closed? It's fine either way, 
I just want to ensure that we are clear about the targeted release, if any.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] IntTaxonomyFacets chooses dense values array when FacetsCollector has no MatchingDocs [lucene]

2023-11-29 Thread via GitHub


Shradha26 commented on issue #12558:
URL: https://github.com/apache/lucene/issues/12558#issuecomment-1831628034

   Hey @gsmiller, thanks for trying this out again - were you able to reproduce 
this in 9_8 again? I'd found the bug in 9_7 and I don't think it was an issue 
in 9_8.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] configuration items of the alg file are adapted to the 9.0 branch [LUCENE-10100] [lucene]

2023-11-29 Thread via GitHub


mikemccand commented on issue #11138:
URL: https://github.com/apache/lucene/issues/11138#issuecomment-1831658096

   Ahh thanks @ChrisHegarty -- no more work for this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] configuration items of the alg file are adapted to the 9.0 branch [LUCENE-10100] [lucene]

2023-11-29 Thread via GitHub


mikemccand closed issue #11138: configuration items of the alg file are adapted 
to the 9.0 branch [LUCENE-10100]
URL: https://github.com/apache/lucene/issues/11138


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Fix intermittently failing TestSortedSetFieldSource [lucene]

2023-11-29 Thread via GitHub


ChrisHegarty opened a new pull request, #12850:
URL: https://github.com/apache/lucene/pull/12850

   This commit fixes the intermittently failing TestSortedSetFieldSource.
   
   The test assertions depend on doc order which may be affected by merging. 
The fix is to trivially avoid merging for the very small index, with just two 
docs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Add Facets#getSpecificValues (bulk) and bulk path -> ordinal lookup for taxonomy faceting [lucene]

2023-11-29 Thread via GitHub


mikemccand commented on issue #12180:
URL: https://github.com/apache/lucene/issues/12180#issuecomment-1831660442

   Thanks @ChrisHegarty -- this one is done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix intermittently failing TestSortedSetFieldSource [lucene]

2023-11-29 Thread via GitHub


ChrisHegarty commented on PR #12850:
URL: https://github.com/apache/lucene/pull/12850#issuecomment-1831661212

   The test previously has been seen to fail with:
   
   ```
   org.junit.ComparisonFailure: expected: but was:
 at __randomizedtesting.SeedInfo.seed([45D4C56DF2092811:7D67E193D5FAFCC0]:0)
 at junit@4.13.1/org.junit.Assert.assertEquals(Assert.java:117)
 at junit@4.13.1/org.junit.Assert.assertEquals(Assert.java:146)
 at 
org.apache.lucene.queries.function.TestSortedSetFieldSource.testSimple(TestSortedSetFieldSource.java:59)
 ...
   ```
   
   After the change the test has not failed in several hundreds of thousands of 
runs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Add Facets#getSpecificValues (bulk) and bulk path -> ordinal lookup for taxonomy faceting [lucene]

2023-11-29 Thread via GitHub


mikemccand closed issue #12180: Add Facets#getSpecificValues (bulk) and bulk 
path -> ordinal lookup for taxonomy faceting
URL: https://github.com/apache/lucene/issues/12180


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-11-29 Thread via GitHub


mikemccand commented on PR #12699:
URL: https://github.com/apache/lucene/pull/12699#issuecomment-1831665131

   I don't think we should revert #12631 -- that one was a nice disk space 
improvement, and the `PKLookup` performance is quite noisy -- going up or down 
by just as much when we upgrade JDKs or switch to Panama MMap impl.  Thanks 
@gf2121.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Report the time it took for building the FST in Test2BFST [lucene]

2023-11-29 Thread via GitHub


mikemccand commented on PR #12847:
URL: https://github.com/apache/lucene/pull/12847#issuecomment-1831667475

   > The test failed with just `Error: The operation was canceled.` but I can't 
tell why it happened. The same PR in my local branch works: 
[dungba88#20](https://github.com/dungba88/lucene/pull/20)
   
   Weird.  I restarted the failed job.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] [9.x] Fix intermittently failing TestSortedSetFieldSource [lucene]

2023-11-29 Thread via GitHub


ChrisHegarty opened a new pull request, #12851:
URL: https://github.com/apache/lucene/pull/12851

   Backport of:
* Fix intermittently failing TestSortedSetFieldSource #12850
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Report the time it took for building the FST in Test2BFST [lucene]

2023-11-29 Thread via GitHub


mikemccand commented on code in PR #12847:
URL: https://github.com/apache/lucene/pull/12847#discussion_r1408286029


##
lucene/CHANGES.txt:
##
@@ -176,7 +178,8 @@ API Changes
 * GITHUB#12799: Make TaskExecutor constructor public and use TaskExecutor for 
concurrent
   HNSW graph build. (Shubham Chaudhary)
 
-*
+* GITHUB#12758, GITHUB#12803: Remove FST constructor with DataInput for 
metadata. Please

Review Comment:
   Ahh are you adding the missing 9.9 CHANGES.txt entries for recent FST 
changes here?  Great!



##
lucene/CHANGES.txt:
##
@@ -246,6 +249,8 @@ Improvements
   minimal FST.  Inspired by this Rust FST implemention:
   https://blog.burntsushi.net/transducers (Mike McCandless)
 
+* GITHUB#12738: NodeHash now stores the FST nodes data instead of just node 
addresses (Anh Dung Bui)

Review Comment:
   And here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-11-29 Thread via GitHub


gf2121 commented on PR #12699:
URL: https://github.com/apache/lucene/pull/12699#issuecomment-1831673998

   Thanks @mikemccand for feedback! I'll backport this to 9.9.0 then.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Report the time it took for building the FST in Test2BFST [lucene]

2023-11-29 Thread via GitHub


dungba88 commented on code in PR #12847:
URL: https://github.com/apache/lucene/pull/12847#discussion_r1409117086


##
lucene/CHANGES.txt:
##
@@ -176,7 +178,8 @@ API Changes
 * GITHUB#12799: Make TaskExecutor constructor public and use TaskExecutor for 
concurrent
   HNSW graph build. (Shubham Chaudhary)
 
-*
+* GITHUB#12758, GITHUB#12803: Remove FST constructor with DataInput for 
metadata. Please

Review Comment:
   Yeah, I'm wondering if I should split it in 2 PR, or this change itself can 
be backported to 9.9



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Report the time it took for building the FST in Test2BFST [lucene]

2023-11-29 Thread via GitHub


mikemccand commented on code in PR #12847:
URL: https://github.com/apache/lucene/pull/12847#discussion_r1409134920


##
lucene/CHANGES.txt:
##
@@ -176,7 +178,8 @@ API Changes
 * GITHUB#12799: Make TaskExecutor constructor public and use TaskExecutor for 
concurrent
   HNSW graph build. (Shubham Chaudhary)
 
-*
+* GITHUB#12758, GITHUB#12803: Remove FST constructor with DataInput for 
metadata. Please

Review Comment:
   No worries -- I can merge in this PR and then backport everything -- the 
`fstSizeInBytes` is important for 9.9.0 as well.
   
   Thanks for fixing the CHANGES!  Please check if we missed anything else if 
you can!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Report the time it took for building the FST in Test2BFST [lucene]

2023-11-29 Thread via GitHub


mikemccand merged PR #12847:
URL: https://github.com/apache/lucene/pull/12847


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix intermittently failing TestSortedSetFieldSource [lucene]

2023-11-29 Thread via GitHub


mikemccand commented on PR #12850:
URL: https://github.com/apache/lucene/pull/12850#issuecomment-1831732238

   > The test assertions depend on doc order which may be affected by merging.
   
   Hmm, I don't fully understand the test, but did the sort order truly rely on 
`docid` tie-breaking?  If so then this is the right fix.
   
   I just want to make sure we are not hiding a bug that e.g. causes this 
sorter to incorrectly sort / compare documents across segments that were not in 
fact ties by the sort criteria.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Report the time it took for building the FST in Test2BFST [lucene]

2023-11-29 Thread via GitHub


dungba88 commented on code in PR #12847:
URL: https://github.com/apache/lucene/pull/12847#discussion_r1409158917


##
lucene/CHANGES.txt:
##
@@ -176,7 +178,8 @@ API Changes
 * GITHUB#12799: Make TaskExecutor constructor public and use TaskExecutor for 
concurrent
   HNSW graph build. (Shubham Chaudhary)
 
-*
+* GITHUB#12758, GITHUB#12803: Remove FST constructor with DataInput for 
metadata. Please

Review Comment:
   Thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Do we suboptimally call ByteBuffersDataOutput.toDataInput()? [lucene]

2023-11-29 Thread via GitHub


dungba88 opened a new issue, #12852:
URL: https://github.com/apache/lucene/issues/12852

   ### Description
   
   See this comment: 
https://github.com/apache/lucene/pull/12624#issuecomment-1829633268
   
   > I wonder: are there other places in Lucene that might fall prey to this 
performance trap (calling toDataInput frequently while continuing to append 
bytes to the DataOutput)? FreqProxTermsWriter seems to use this for each term 
to re-read the postings (maybe for the static sort use case).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-11-29 Thread via GitHub


dungba88 commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1409177467


##
lucene/core/src/java/org/apache/lucene/util/ArrayUtil.java:
##
@@ -330,6 +330,29 @@ public static int[] growExact(int[] array, int newLength) {
 return copy;
   }
 
+  /**
+   * Returns an array whose size is at least {@code minLength}, generally 
over-allocating
+   * exponentially, but never allocating more than {@code maxLength} elements.
+   */
+  public static int[] growInRange(int[] array, int minLength, int maxLength) {
+if (array.length >= minLength) {
+  return array;
+}
+if (minLength > maxLength) {
+  throw new IllegalArgumentException(
+  "requested minimum array length "
+  + minLength
+  + " is larger than requested maximum array length "
+  + maxLength);
+}
+
+int potentialLength = oversize(minLength, Integer.BYTES);
+if (potentialLength > maxLength) {

Review Comment:
   I think `growExact(array, Math.min(potentialLength, maxLength))` would be 
clearer?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-11-29 Thread via GitHub


dungba88 commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1409183001


##
lucene/core/src/java/org/apache/lucene/util/ArrayUtil.java:
##
@@ -330,6 +330,29 @@ public static int[] growExact(int[] array, int newLength) {
 return copy;
   }
 
+  /**
+   * Returns an array whose size is at least {@code minLength}, generally 
over-allocating
+   * exponentially, but never allocating more than {@code maxLength} elements.
+   */
+  public static int[] growInRange(int[] array, int minLength, int maxLength) {
+if (array.length >= minLength) {
+  return array;
+}
+if (minLength > maxLength) {
+  throw new IllegalArgumentException(
+  "requested minimum array length "
+  + minLength
+  + " is larger than requested maximum array length "
+  + maxLength);
+}
+
+int potentialLength = oversize(minLength, Integer.BYTES);
+if (potentialLength > maxLength) {
+  return growExact(array, maxLength);
+}
+return growExact(array, potentialLength);
+  }
+
   /**
* Returns an array whose size is at least {@code minSize}, generally 
over-allocating
* exponentially

Review Comment:
   Would it make sense if we delegate the existing `grow(int[], int)` to the 
new one to avoid having double code path? Something like `return grow(array, 
minSize, Integer.MAX_VALUE)` would work I guess.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-11-29 Thread via GitHub


dungba88 commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1409183001


##
lucene/core/src/java/org/apache/lucene/util/ArrayUtil.java:
##
@@ -330,6 +330,29 @@ public static int[] growExact(int[] array, int newLength) {
 return copy;
   }
 
+  /**
+   * Returns an array whose size is at least {@code minLength}, generally 
over-allocating
+   * exponentially, but never allocating more than {@code maxLength} elements.
+   */
+  public static int[] growInRange(int[] array, int minLength, int maxLength) {
+if (array.length >= minLength) {
+  return array;
+}
+if (minLength > maxLength) {
+  throw new IllegalArgumentException(
+  "requested minimum array length "
+  + minLength
+  + " is larger than requested maximum array length "
+  + maxLength);
+}
+
+int potentialLength = oversize(minLength, Integer.BYTES);
+if (potentialLength > maxLength) {
+  return growExact(array, maxLength);
+}
+return growExact(array, potentialLength);
+  }
+
   /**
* Returns an array whose size is at least {@code minSize}, generally 
over-allocating
* exponentially

Review Comment:
   Would it make sense if we delegate the existing `grow(int[], int)` to the 
new one to avoid having double code path? Something like `return 
growInRange(array, minSize, Integer.MAX_VALUE)` would work I guess.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix intermittently failing TestSortedSetFieldSource [lucene]

2023-11-29 Thread via GitHub


ChrisHegarty commented on PR #12850:
URL: https://github.com/apache/lucene/pull/12850#issuecomment-1831806426

   My understanding of this part of the test is that it asserts the sort order 
of the multi-valued string values in the `value` field. The changes here do not 
affect this assertion, the sort order is still checked.
   
   There are just two docs in the index. The first doc has a single value for 
the `value` field, `baz`. The second doc has two values for the `value` field, 
`foo` and `bar`.  The test expects to find `baz` and `bar` (since `bar` is less 
than `foo`), for the `values` field in doc1 and doc2 respectively.  
   
The test assertions are based on doc order in the index.  Does this address 
your concern? Or have I missed the point?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-11-29 Thread via GitHub


benwtrent commented on PR #12844:
URL: https://github.com/apache/lucene/pull/12844#issuecomment-1831865170

   I ran knnPerfTest in Lucene util against this PR. 100k cohere vectors. 
Flushing was depending on memory usage (Do we check the ram usage of 
onHeapgraph during indexing to determine when to flush?)
   
   main branch:
   ```
   recall   latency nDocfanout  maxConn beamWidth index
   0.912 0.77   10  0   16  100   48602
   ```
   
   This branch:
   ```
   recall   latency nDocfanout  maxConn beamWidth index ms
   0.912 0.75   10  0   16  100   165929
   ```
   
   Indexing is about 3.5x slower with this change. I don't know if its due to 
the OnHeapGraph memory estimation being slow or the node resizing. I am gonna 
run a profiler to see whats up.
   
   Good news is that search latency and recall are unchanged. Forced merging 
time seems about the same as well :).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-11-29 Thread via GitHub


benwtrent commented on PR #12844:
URL: https://github.com/apache/lucene/pull/12844#issuecomment-1831889329

   Yeah, `ramBytesUsed` changes here are adding significant overhead to 
indexing. Here is a cpu profile.
   
   
[pr12844-768-10-wall.jfr.zip](https://github.com/apache/lucene/files/13500964/pr12844-768-10-wall.jfr.zip)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-29 Thread via GitHub


dungba88 commented on PR #12624:
URL: https://github.com/apache/lucene/pull/12624#issuecomment-1832050665

   I re-ran the Test2BFST with the new change, it looks much better
   
   ```
 1> TEST: now verify [fst size=4621076364; nodeCount=2252341486; 
arcCount=2264078585]
 1> 0...: took 0 seconds
 1> 100...: took 27 seconds
 1> 200...: took 54 seconds
 1> 300...: took 82 seconds
 1> 400...: took 109 seconds
 1> 500...: took 137 seconds
 1> 600...: took 165 seconds
 1> 700...: took 192 seconds
 1> 800...: took 219 seconds
 1> 900...: took 247 seconds
 1> 1000...: took 275 seconds
 1> 1100...: took 302 seconds
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-11-29 Thread via GitHub


jpountz commented on PR #12699:
URL: https://github.com/apache/lucene/pull/12699#issuecomment-1832082403

   +1 to not reverting #12631. The fact that our terms index used to not share 
prefixes properly was a bug, I'm glad it's fixed even though it may make 
performance slightly slower because more nodes have non-empty outputs to 
contribute.
   
   FWIW it looks like your change did not only help PKLookup, but possibly also 
[`CountTerm`](https://home.apache.org/~mikemccand/lucenebench/CountTerm.html), 
[`Respell`](https://home.apache.org/~mikemccand/lucenebench/Respell.html), 
[`Fuzzy1`](https://home.apache.org/~mikemccand/lucenebench/Fuzzy1.html) and 
[`Fuzzy2`](https://home.apache.org/~mikemccand/lucenebench/Fuzzy2.html).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

2023-11-29 Thread via GitHub


jpountz commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1832112426

   > Sorry, I am still confused about this. If we clear an unEmpty treeset, how 
to deal with its competitive entries?
   
   This looks similar to the arraycopy that you perform on the `maxFreqs` 
array, ignoring existing values? We would just need to update the contract of 
this method to say something like `Copy the provided accumulator into this 
accumulator, ignoring its existing state. This is effectively the same as 
calling clear() then addAll().`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-11-29 Thread via GitHub


stefanvodita commented on PR #12844:
URL: https://github.com/apache/lucene/pull/12844#issuecomment-1832333168

   Thank you for running the benchmarks @benwtrent, you were quicker than I 
was. I was worried the modified `ramBytesUsed` would be slow. Maybe we can come 
up with a smarter way to do the estimation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-11-29 Thread via GitHub


stefanvodita commented on code in PR #12844:
URL: https://github.com/apache/lucene/pull/12844#discussion_r1409603364


##
lucene/core/src/java/org/apache/lucene/util/ArrayUtil.java:
##
@@ -330,6 +330,29 @@ public static int[] growExact(int[] array, int newLength) {
 return copy;
   }
 
+  /**
+   * Returns an array whose size is at least {@code minLength}, generally 
over-allocating
+   * exponentially, but never allocating more than {@code maxLength} elements.
+   */
+  public static int[] growInRange(int[] array, int minLength, int maxLength) {
+if (array.length >= minLength) {
+  return array;
+}
+if (minLength > maxLength) {
+  throw new IllegalArgumentException(
+  "requested minimum array length "
+  + minLength
+  + " is larger than requested maximum array length "
+  + maxLength);
+}

Review Comment:
   Initially, I had placed the validation first. I changed it to cover cases 
where INITIAL_CAPACITY >= minSize > maxSize. In these cases, we've already 
allocated beyond maxSize, so we might as well use that memory. Do you think 
that ends up being more confusing than validating first?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix intermittently failing TestSortedSetFieldSource [lucene]

2023-11-29 Thread via GitHub


ChrisHegarty commented on PR #12850:
URL: https://github.com/apache/lucene/pull/12850#issuecomment-1832391520

   > This new failure looks like it's due to the fact that 
MockRandomMergePolicy now randomly reverses the doc ID order on merge?
   
   That's it exactly. I confirmed this through local instrumentation of 
MockRandomMergePolicy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Ensure #finish is called on all drill-sideways FacetCollectors even when no hits are scored [lucene]

2023-11-29 Thread via GitHub


gsmiller opened a new pull request, #12853:
URL: https://github.com/apache/lucene/pull/12853

   In GH#12558, @Shradha26 discovered that faceting implementations are 
sometimes getting empty MatchingDoc lists, resulting in undesired behavior. We 
should consider it a bug for MatchingDocs to not be populated, even if there 
are no collected hits. After chasing the issue a bit, it appears that 
drill-sideways can miss calling `#finish` on the FacetsCollectors it manages if 
it end up not scoring any docs. This PR addresses it.
   
   I made a couple related changes in this PR to get at fixing the bug:
   1. Added `DrillSideways#createDrillSidewaysFacetsCollectorManager` as a 
protected method that users can extend to create specific facet collecting 
implementations. This seems reasonable to me as we already have a similar hook 
for the drill-down collector. I found this hook useful for testing. If we don't 
want to expose this API surface, I can adjust the testing approach, but I think 
it's reasonable to expose personally.
   2. Changed the `DrillSidewaysScorer` to accept `LeafCollectors` instead of 
`FacetCollectors`. This seems like a better division of responsibilities 
anyway, and is simpler, but it also lets `DrillSidewaysQuery` manage the 
creation of the leaf collectors when setting up the `BulkScorer`. This is 
necessary if `DrillSidewaysQuery` is going to call `#finish` in no-scoring 
cases since `#finish` relies on a leaf scorer having been created already from 
the collector.
   3. I also added a TODO to go remove 
`DrillSideways#createDrillDownFacetsCollector` since it is no longer used and 
is a trappy hook for users. I'll tackle this in a separate PR though since we 
need a deprecation back port and it's not really related to this (just 
something I came across).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Ensure #finish is called on all drill-sideways FacetCollectors even when no hits are scored [lucene]

2023-11-29 Thread via GitHub


gsmiller commented on code in PR #12853:
URL: https://github.com/apache/lucene/pull/12853#discussion_r1409693152


##
lucene/CHANGES.txt:
##
@@ -112,6 +112,9 @@ Bug Fixes
 
 * GITHUB#12220: Hunspell: disallow hidden title-case entries from compound 
middle/end
 
+* GITHUB#12558: Ensure #finish is called on all drill-sideways 
FacetsCollectors even when no hits are scored.

Review Comment:
   Put this under 10.0 for now. After 9.9, if we add a section for 9.10 I will 
back port it there (no reason not to, we just don't have the section setup). 
Also, if 9.9 re-spins I will back port to 9.9, but we shouldn't block that 
release for this fix IMO.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] IntTaxonomyFacets chooses dense values array when FacetsCollector has no MatchingDocs [lucene]

2023-11-29 Thread via GitHub


gsmiller commented on issue #12558:
URL: https://github.com/apache/lucene/issues/12558#issuecomment-1832460996

   I opened a PR here that fixes the root cause of this bug: 
https://github.com/apache/lucene/pull/12853


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] IntTaxonomyFacets test case [lucene]

2023-11-29 Thread via GitHub


gsmiller closed pull request #12563: IntTaxonomyFacets test case
URL: https://github.com/apache/lucene/pull/12563


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] IntTaxonomyFacets test case [lucene]

2023-11-29 Thread via GitHub


gsmiller commented on PR #12563:
URL: https://github.com/apache/lucene/pull/12563#issuecomment-1832462625

   Closing in favor of #12853 that fixes the underlying root cause and adds 
testing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-11-29 Thread via GitHub


zhaih commented on PR #12844:
URL: https://github.com/apache/lucene/pull/12844#issuecomment-1832571805

   @benwtrent Thanks for running the benchmark. I looked at the profile and I 
think we call `ramBytesUsed` after every document is indexed to control the 
flush 
[here](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/DocumentsWriterFlushControl.java#L206).
   
   @stefanvodita  I can think of two options here:
   1. just use maxSize as before for estimation, it's not too accurate but at 
least is a good upper bound
   2. Carefully account the extra memory used when we add new nodes to the 
graph and accumulate it by an `AtomicLong`, including the cost for the node 
itself as well as the cost of resizing the existing neighbor's NeighborArray. 
Which I think is doable (in both single thread and multi thread situation) but 
a bit tricky.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Mark DrillSideways#createDrillDownFacetsCollector as @Deprecated [lucene]

2023-11-29 Thread via GitHub


gsmiller opened a new pull request, #12854:
URL: https://github.com/apache/lucene/pull/12854

   This extension hook is only referenced by the also-deprecated 
`DrillSideways#search(DrillDownQuery, Collector)`, so let's mark it deprecated 
as well and remove it on main.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Remove DrillSideways#createDrillDownFacetsCollector in favor of the manager-based hook [lucene]

2023-11-29 Thread via GitHub


gsmiller opened a new pull request, #12855:
URL: https://github.com/apache/lucene/pull/12855

   This extension hook is no longer used on the main branch, so let's remove 
it. Opened a corresponding PR to mark this deprecated on 9x (#12854)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] [Minor] Fix the only use of java.lang.String#toLowerCase() with no Locale [lucene]

2023-11-29 Thread via GitHub


slow-J opened a new pull request, #12856:
URL: https://github.com/apache/lucene/pull/12856

   I was looking at this old issue: 
https://github.com/apache/lucene/issues/4390 `Flexible Queryparser has some 
Locale violation with lowercasing`, which was already fixed with the work done 
in [GH#5271](https://github.com/apache/lucene/issues/5271) But I came across 
this single use of `.toLowerCase()` with no Locale in Checksum.java.
   
   TIL: forbidden apis doesn't run against the `package 
org.apache.lucene.gradle`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduce growInRange to reduce array overallocation [lucene]

2023-11-29 Thread via GitHub


stefanvodita commented on PR #12844:
URL: https://github.com/apache/lucene/pull/12844#issuecomment-1832732598

   Thanks for the suggestions @zhaih! I have to think about option 2 a bit more.
   
   If we change `ramBytesUsed` back, performance recovers (Mostly? I'm not sure 
how noisy this benchmark is).
   
   Benchmark config:
   ```
   dim = 100
   doc_vectors = '%s/data/enwiki-20120502-lines-1k-100d.vec' % 
constants.BASE_DIR
   query_vectors = '%s/util/tasks/vector-task-minilm.vec' % constants.BASE_DIR
   ```
   
   main:
   ```
   recall  latency nDocfanout  maxConn beamWidth   visited index ms
   0.7200.16   1   0   64  250 100 25511.00
post-filter
   0.5420.23   10  0   64  250 100 43620   1.00
post-filter
   0.5120.27   20  0   64  250 100 108235  1.00
post-filter
   ```
   
   This PR:
   ```
   recall  latency nDocfanout  maxConn beamWidth   visited index ms
   0.7200.16   1   0   64  250 100 50281.00
post-filter
   0.5420.23   10  0   64  250 100 315369  1.00
post-filter
   0.5120.28   20  0   64  250 100 1578461 1.00
post-filter
   ```
   
   This PR with the old memory estimation:
   ```
   recall   latency nDocfanout  maxConn beamWidth   visited index ms
   0.720 0.16   1   0   64  250 100 24971.00
post-filter
   0.542 0.23   10  0   64  250 100 43886   1.00
post-filter
   0.512 0.28   20  0   64  250 100 117152  1.00
post-filter
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Reproducible TestDrillSideways failure [lucene]

2023-11-29 Thread via GitHub


gsmiller commented on issue #12418:
URL: https://github.com/apache/lucene/issues/12418#issuecomment-1832886094

   I think I know what's going on here, and I believe #12853 fixes it. In both 
seeds mentioned here, I've noticed that the drill-sideways query short-circuits 
on at least one segment, knowing it has no results. Specifically, a null 
BulkScorer gets returned from DrillSidewaysQuery. The problem with this is that 
the "sideways" facet collectors get left hanging, and nothing calls #finish on 
them.
   
   I reproduced with both seeds on 9x and main, and verified the above fix 
works in all cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Is it correct for facets to assume positive aggregation values? [lucene]

2023-11-29 Thread via GitHub


gsmiller commented on issue #12585:
URL: https://github.com/apache/lucene/issues/12585#issuecomment-1832904037

   I think I'd be curious if there are many real-world use-cases out there that 
have a non-positive association between each document and facet label? I would 
guess it's pretty uncommon. I actually can't really contrive an example either. 
@stefanvodita have you encountered real-world use-cases that would benefit from 
this? Apologies if you describe it somewhere and I'm missing it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Removing TermInSetQuery array ctor [lucene]

2023-11-29 Thread via GitHub


gsmiller commented on PR #12837:
URL: https://github.com/apache/lucene/pull/12837#issuecomment-1832910639

   +1 to this simplification. Let's also get rid of `public 
TermInSetQuery(RewriteMethod rewriteMethod, String field, BytesRef... terms)`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Add a new static method for KeywordField#newSetQuery to support collections parameter [lucene]

2023-11-29 Thread via GitHub


gsmiller commented on issue #12243:
URL: https://github.com/apache/lucene/issues/12243#issuecomment-1832911784

   +1 to shrinking the API surface of the `TermInSetQuery` ctors and getting 
rid of the varargs flavors since they're pure syntactic sugar. Looks like 
#12837 is the related PR, so left a small comment over there. Thanks @slow-J !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix intermittently failing TestSortedSetFieldSource [lucene]

2023-11-29 Thread via GitHub


mikemccand commented on PR #12850:
URL: https://github.com/apache/lucene/pull/12850#issuecomment-1833004513

   Thanks for the detailed explanation @ChrisHegarty and @jpountz -- fix looks 
good!  Sneaky random merge policies...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Copy collected acc(maxFreqs) into empty acc, rather than merge them. [lucene]

2023-11-29 Thread via GitHub


vsop-479 commented on PR #12846:
URL: https://github.com/apache/lucene/pull/12846#issuecomment-1833216581

   @jpountz I applied copy for all level's empty acc, Please take a look when 
you get a chance.
   As so far I just performed copy on empty acc that cleared by ``clear``. Do 
you mean we could skip the ``clear`` (mainly ``Arrays.fill(maxFreqs, 0)``) 
after ``writeSkipData``, and override the stale acc in next writing directly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org