[PR] Move synonym map off-heap for SynonymGraphFilter [lucene]

2024-01-30 Thread via GitHub


msfroh opened a new pull request, #13054:
URL: https://github.com/apache/lucene/pull/13054

   ### Description
   This stores the synonym map's FST and word lookup off-heap in a separate, 
configurable directory. 
   
   The initial implementation is rough, but the unit tests pass with this 
change randomly enabled.
   
   Obvious things that need work are:
   1. I tried to do something like a codec, but not really a codec for the 
synonym map files. For a solution that could evolve over time, we should 
probably at least write something to the metadata file saying what format was 
used.
   2. Right now it makes no effort to detect changes to the synonym files. I 
would suggest that SynonymGraphFilterFactory rebuild the directory if a 
checksum of the input files doesn't match a value recorded in the metadata file.
   3. I don't think I like the random seeks in `OffHeapBytesRefHashLike`, but I 
don't see an alternative (besides moving it on-heap). Given that the original 
issue was only about moving the FST off-heap, maybe we can keep the word 
dictionary on-heap.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] `SynonymGraphFilter` should read FSTs off-heap? [lucene]

2024-01-30 Thread via GitHub


msfroh commented on issue #13005:
URL: https://github.com/apache/lucene/issues/13005#issuecomment-1916284724

   I have a (rough) PR to address this: 
https://github.com/apache/lucene/pull/13054.
   
   I also moved the output word lookup off-heap, but it requires a random seek 
(within a hopefully MMapped file) before every lookup.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-30 Thread via GitHub


sabi0 commented on PR #13052:
URL: https://github.com/apache/lucene/pull/13052#issuecomment-1916306380

   I've added a comment explaining the `forEach`.
   
   [Opening a bug](https://bugs.openjdk.org/) in OpenJDK requires "OpenJDK 
Author" status which I do not have.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Align instanceof check with type cast [lucene]

2024-01-30 Thread via GitHub


sabi0 commented on code in PR #13039:
URL: https://github.com/apache/lucene/pull/13039#discussion_r1470768641


##
lucene/core/src/java/org/apache/lucene/analysis/tokenattributes/PayloadAttributeImpl.java:
##
@@ -62,8 +62,7 @@ public boolean equals(Object other) {
   return true;
 }
 
-if (other instanceof PayloadAttribute) {
-  PayloadAttributeImpl o = (PayloadAttributeImpl) other;
+if (other instanceof PayloadAttributeImpl o) {

Review Comment:
   Thank you for linking to the pattern-matching PR.
   I've removed the pattern variable part of this change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-30 Thread via GitHub


uschindler commented on PR #13052:
URL: https://github.com/apache/lucene/pull/13052#issuecomment-1916340120

   > I've added a comment explaining the `forEach`.
   > 
   > [Opening a bug](https://bugs.openjdk.org/) in OpenJDK requires "OpenJDK 
Author" status which I do not have.
   
   I can do that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-30 Thread via GitHub


jpountz commented on code in PR #13052:
URL: https://github.com/apache/lucene/pull/13052#discussion_r1470784089


##
lucene/core/src/java/org/apache/lucene/index/UpgradeIndexMergePolicy.java:
##
@@ -106,7 +106,11 @@ public MergeSpecification findForcedMerges(
   // the resulting set contains all segments that are left over
   // and will be merged to one additional segment:
   for (final OneMerge om : spec.merges) {
-oldSegments.keySet().removeAll(om.segments);
+// om.segments.forEach(::remove) is used here instead of 
oldSegments.keySet().removeAll()
+// for performance reasons; om.segments size is always greater than 
oldSegments size, and

Review Comment:
   This sounds incorrect? `om.segments` is a subset of `oldSegments.keySet()` 
since the merge policy is only allowed to merge segments that are in 
`oldSegments.keySet()`. So the problem you are describing only occurs when 
`oldSegments.size() == om.segments.size()`, ie. the merge policy merges all 
mergeable segments in one go, which is probably a common case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Make static final Set immutable [lucene]

2024-01-30 Thread via GitHub


sabi0 opened a new pull request, #13055:
URL: https://github.com/apache/lucene/pull/13055

   EnumSet.of() returns a mutable Set that should not be used for static final 
constants.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Replace `new HashSet<>(Arrays.asList())` with `EnumSet.of()` [lucene]

2024-01-30 Thread via GitHub


sabi0 commented on PR #13051:
URL: https://github.com/apache/lucene/pull/13051#issuecomment-1916363126

   > static final constants should be unmodifiable sets
   > Could you open an issue about this?
   
   #13055


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-30 Thread via GitHub


sabi0 commented on code in PR #13052:
URL: https://github.com/apache/lucene/pull/13052#discussion_r1470850110


##
lucene/core/src/java/org/apache/lucene/index/UpgradeIndexMergePolicy.java:
##
@@ -106,7 +106,11 @@ public MergeSpecification findForcedMerges(
   // the resulting set contains all segments that are left over
   // and will be merged to one additional segment:
   for (final OneMerge om : spec.merges) {
-oldSegments.keySet().removeAll(om.segments);
+// om.segments.forEach(::remove) is used here instead of 
oldSegments.keySet().removeAll()
+// for performance reasons; om.segments size is always greater than 
oldSegments size, and

Review Comment:
   Fixed.
   Thank you for explaining this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-30 Thread via GitHub


sabi0 commented on code in PR #13052:
URL: https://github.com/apache/lucene/pull/13052#discussion_r1470850110


##
lucene/core/src/java/org/apache/lucene/index/UpgradeIndexMergePolicy.java:
##
@@ -106,7 +106,11 @@ public MergeSpecification findForcedMerges(
   // the resulting set contains all segments that are left over
   // and will be merged to one additional segment:
   for (final OneMerge om : spec.merges) {
-oldSegments.keySet().removeAll(om.segments);
+// om.segments.forEach(::remove) is used here instead of 
oldSegments.keySet().removeAll()
+// for performance reasons; om.segments size is always greater than 
oldSegments size, and

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub


s1monw commented on PR #13046:
URL: https://github.com/apache/lucene/pull/13046#issuecomment-1916447395

   > We should PnP this!
   
   What on earth means PnP? Mike, check out this search: 
https://www.google.com/search?q=pnp+acronym   wikipedia FTW


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Optimize counts on two clause term disjunctions [lucene]

2024-01-30 Thread via GitHub


jfreden commented on PR #13036:
URL: https://github.com/apache/lucene/pull/13036#issuecomment-1916440041

   I added code to only apply the optimization `if 
count(term-with-less-docs)/count(term-with-more-docs) < 0.1` and it yielded a 
way better result. Will investigate the term cache idea too since there is 
still a slowdown of `CountHigHigh`.
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
CountOrHighHigh   43.48  (3.1%)   42.94  
(3.1%)   -1.2% (  -7% -5%) 0.201
CountAndHighMed  117.33  (1.6%)  116.68  
(2.2%)   -0.6% (  -4% -3%) 0.357
   CountAndHighHigh   13.40  (1.4%)   13.34  
(1.6%)   -0.5% (  -3% -2%) 0.305
 CountOrHighMed   44.02  (2.4%)   85.10  
(4.3%)   93.3% (  84% -  102%) 0.000
   ```
   ```
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
   HighIntervalsOrdered2.45 (15.8%)2.37 
(16.8%)   -3.3% ( -30% -   34%) 0.526
   BrowseDateSSDVFacets1.77  (9.7%)1.71 
(13.2%)   -3.1% ( -23% -   21%) 0.394
 IntNRQ   26.64  (9.5%)   25.91 
(13.2%)   -2.8% ( -23% -   22%) 0.447
  HighTermMonthSort 4247.44  (5.4%) 4174.43  
(6.8%)   -1.7% ( -13% -   11%) 0.379
MedIntervalsOrdered4.40  (9.0%)4.34  
(9.3%)   -1.5% ( -18% -   18%) 0.597
   OrNotHighLow 1013.74  (2.8%) 1000.22  
(4.0%)   -1.3% (  -7% -5%) 0.223
CountOrHighHigh   43.48  (3.1%)   42.94  
(3.1%)   -1.2% (  -7% -5%) 0.201
 OrHighHigh   45.23  (7.6%)   44.71  
(7.1%)   -1.1% ( -14% -   14%) 0.624
LowIntervalsOrdered   75.80  (6.3%)   74.94  
(7.0%)   -1.1% ( -13% -   12%) 0.591
  OrHighLow  490.67  (4.2%)  485.46  
(3.7%)   -1.1% (  -8% -7%) 0.394
LowTerm  782.27  (3.7%)  774.48  
(3.3%)   -1.0% (  -7% -6%) 0.366
   OrNotHighMed  327.78  (2.4%)  325.10  
(2.6%)   -0.8% (  -5% -4%) 0.295
 OrHighMedDayTaxoFacets7.49  (4.0%)7.44  
(4.3%)   -0.7% (  -8% -7%) 0.574
 AndHighLow 1052.61  (4.4%) 1044.91  
(4.7%)   -0.7% (  -9% -8%) 0.610
AndHighMedDayTaxoFacets   31.62  (1.4%)   31.39  
(2.3%)   -0.7% (  -4% -3%) 0.245
MedSloppyPhrase4.55  (2.2%)4.52  
(3.4%)   -0.6% (  -6% -5%) 0.494
   HighSloppyPhrase7.84  (2.6%)7.80  
(3.2%)   -0.6% (  -6% -5%) 0.535
CountAndHighMed  117.33  (1.6%)  116.68  
(2.2%)   -0.6% (  -4% -3%) 0.357
  HighTermDayOfYearSort  365.06  (3.2%)  363.03  
(3.3%)   -0.6% (  -6% -6%) 0.590
   Wildcard   35.27  (2.2%)   35.08  
(2.6%)   -0.6% (  -5% -4%) 0.470
 AndHighMed  149.03  (5.1%)  148.24  
(4.7%)   -0.5% (  -9% -9%) 0.736
MedSpanNear4.61  (2.7%)4.58  
(3.3%)   -0.5% (  -6% -5%) 0.588
  MedPhrase   28.98  (4.8%)   28.83  
(5.1%)   -0.5% (  -9% -9%) 0.743
   CountAndHighHigh   13.40  (1.4%)   13.34  
(1.6%)   -0.5% (  -3% -2%) 0.305
 HighPhrase8.65  (4.5%)8.61  
(5.4%)   -0.5% (  -9% -9%) 0.765
LowSloppyPhrase   29.81  (2.4%)   29.70  
(2.8%)   -0.4% (  -5% -4%) 0.639
  OrNotHighHigh  294.82  (3.6%)  293.71  
(2.9%)   -0.4% (  -6% -6%) 0.719
MedTerm  719.95  (6.3%)  717.27  
(5.2%)   -0.4% ( -11% -   11%) 0.839
Respell   70.19  (2.4%)   69.94  
(2.1%)   -0.4% (  -4% -4%) 0.607
   PKLookup  288.07  (2.7%)  287.18  
(3.4%)   -0.3% (  -6% -5%) 0.748
Prefix3  411.30  (1.9%)  410.09  
(2.5%)   -0.3% (  -4% -4%) 0.675
  OrHighMed  226.84  (5.1%)  226.17  
(4.0%)   -0.3% (  -8% -9%) 0.842
  LowPhrase   48.24  (4.1%)   48.15  
(4.3%)   -0.2% (  -8% -8%) 0.889
  OrHighNotHigh  464.84  (3.1%)  464.02  
(3.5%)   -0.2% (  -6% -6%) 0.866
 

Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-30 Thread via GitHub


uschindler commented on code in PR #13052:
URL: https://github.com/apache/lucene/pull/13052#discussion_r1470981973


##
lucene/core/src/java/org/apache/lucene/index/UpgradeIndexMergePolicy.java:
##
@@ -106,7 +106,11 @@ public MergeSpecification findForcedMerges(
   // the resulting set contains all segments that are left over
   // and will be merged to one additional segment:
   for (final OneMerge om : spec.merges) {
-oldSegments.keySet().removeAll(om.segments);
+// om.segments.forEach(::remove) is used here instead of 
oldSegments.keySet().removeAll()
+// for performance reasons; om.segments size is always greater than 
oldSegments size, and

Review Comment:
   New comment looks much better. To conclude, because I was also on wrong trip 
yesterday late evening:
   
   The code `oldSegments.keySet().removeAll(om.segments);` behaves like the 
following according to the docs:
   
   > This implementation determines which is the smaller of this set and the 
specified collection, by invoking the size method on each. If this set has 
fewer elements, then the implementation iterates over this set, checking each 
element returned by the iterator in turn to see if it is contained in the 
specified collection. If it is so contained, it is removed from this set with 
the iterator's remove method. If the specified collection has fewer elements, 
then the implementation iterates over the specified collection, removing from 
this set each element returned by the iterator, using this set's remove method.
   
   Translated to the example: If `oldSegments.keySet()` set has fewer elements, 
then the implementation iterates over `oldSegments.keySet()` set, checking each 
element returned by the iterator in turn to see if it is contained in 
`om.segments`. If it is so contained, it is removed from this set with the 
iterator's remove method. **If `om.segments` has fewer elements, then the 
implementation iterates over `om.segments`, removing from this set each element 
returned by the iterator, using this set's remove method.**
   
   So we would like to see the second (bold) part to be executed.
   
   Unfortunetely the spec only talks about "fewer", so we have no idea what 
happens on the boundary! Let's check here: 
https://github.com/openjdk/jdk/blob/fd8adf308357355bd33916ad80e2328c35434e5a/src/java.base/share/classes/java/util/AbstractSet.java#L166-L182
   
   On the boundary, unfortunately it executes the slow code (contains in list, 
lower in the above code part).
   
   So the problem only exists if both collections have equal size. Nevertheless 
I will open a bug report. It should only prefer the shorter collection if both 
are sets.
   
   Changing this would be fine in later JDK releases, because the spec of `Set` 
does not specify how the method should behave. The code has more problems, 
because the symmetry of `contains()` for both sides is not always guaranteed 
(e.g. for TreeSets with case insensitive order).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

2024-01-30 Thread via GitHub


uschindler merged PR #13052:
URL: https://github.com/apache/lucene/pull/13052


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make static final Set immutable [lucene]

2024-01-30 Thread via GitHub


uschindler commented on PR #13055:
URL: https://github.com/apache/lucene/pull/13055#issuecomment-1916562601

   Please add a changes.txt, as this is theoretically a backwards incompatible 
change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Align instanceof check with type cast [lucene]

2024-01-30 Thread via GitHub


uschindler merged PR #13039:
URL: https://github.com/apache/lucene/pull/13039


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Align instanceof check with type cast [lucene]

2024-01-30 Thread via GitHub


uschindler commented on PR #13039:
URL: https://github.com/apache/lucene/pull/13039#issuecomment-1916576302

   I added the changes entry to the existing one for you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Align instanceof check with type cast [lucene]

2024-01-30 Thread via GitHub


sabi0 commented on PR #13039:
URL: https://github.com/apache/lucene/pull/13039#issuecomment-1916580962

   Thank you. I will make sure the new PRs have them out of the box.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make static final Set immutable [lucene]

2024-01-30 Thread via GitHub


uschindler merged PR #13055:
URL: https://github.com/apache/lucene/pull/13055


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Make static final Set immutable [lucene]

2024-01-30 Thread via GitHub


uschindler commented on PR #13055:
URL: https://github.com/apache/lucene/pull/13055#issuecomment-1916591556

   I backported it. Like the previous PR this caused a conflict due to 
different stop tags, but this was easy to solve.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub


uschindler commented on code in PR #13046:
URL: https://github.com/apache/lucene/pull/13046#discussion_r1471018363


##
lucene/backward-codecs/src/test/org/apache/lucene/backward_index/BackwardsCompatibilityTestBase.java:
##
@@ -0,0 +1,252 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.backward_index;
+
+import com.carrotsearch.randomizedtesting.annotations.Name;
+import java.io.IOException;
+import java.io.InputStream;
+import java.lang.reflect.Field;
+import java.lang.reflect.Modifier;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.text.ParseException;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.lucene.codecs.Codec;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.SegmentReader;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.tests.util.LuceneTestCase;
+import org.apache.lucene.tests.util.TestUtil;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+import org.apache.lucene.util.Version;
+import org.junit.After;
+import org.junit.Before;
+
+public abstract class BackwardsCompatibilityTestBase extends LuceneTestCase {
+
+  protected final Version version;
+  private static final Version LATEST_PREVIOUS_MAJOR = 
getLatestPreviousMajorVersion();
+  protected final String indexPattern;
+  protected static final Set BINARY_SUPPORTED_VERSIONS;
+
+  static {
+String[] oldVersions =
+new String[] {
+  "8.0.0", "8.0.0", "8.1.0", "8.1.0", "8.1.1", "8.1.1", "8.2.0", 
"8.2.0", "8.3.0", "8.3.0",
+  "8.3.1", "8.3.1", "8.4.0", "8.4.0", "8.4.1", "8.4.1", "8.5.0", 
"8.5.0", "8.5.1", "8.5.1",
+  "8.5.2", "8.5.2", "8.6.0", "8.6.0", "8.6.1", "8.6.1", "8.6.2", 
"8.6.2", "8.6.3", "8.6.3",
+  "8.7.0", "8.7.0", "8.8.0", "8.8.0", "8.8.1", "8.8.1", "8.8.2", 
"8.8.2", "8.9.0", "8.9.0",
+  "8.10.0", "8.10.0", "8.10.1", "8.10.1", "8.11.0", "8.11.0", 
"8.11.1", "8.11.1", "8.11.2",
+  "8.11.2", "9.0.0", "9.1.0", "9.2.0", "9.3.0", "9.4.0", "9.4.1", 
"9.4.2", "9.5.0", "9.6.0",
+  "9.7.0", "9.8.0", "9.9.0", "9.9.1", "9.9.2", "9.10.0", "10.0.0",
+};
+
+Set binaryVersions = new HashSet<>();
+for (String version : oldVersions) {
+  try {
+Version v = Version.parse(version);
+assertTrue("Unsupported binary version: " + v, v.major >= 
Version.MIN_SUPPORTED_MAJOR - 1);
+binaryVersions.add(v);
+  } catch (ParseException ex) {
+throw new RuntimeException(ex);
+  }
+}
+List allCurrentVersions = getAllCurrentVersions();
+for (Version version : allCurrentVersions) {
+  // make sure we never miss a version.
+  assertTrue("Version: " + version + " missing", 
binaryVersions.remove(version));
+}
+BINARY_SUPPORTED_VERSIONS = binaryVersions;
+  }
+
+  /**
+   * This is a base constructor for parameterized BWC tests. The constructor 
arguments are provided
+   * by {@link com.carrotsearch.randomizedtesting.RandomizedRunner} during 
test execution. A {@link
+   * com.carrotsearch.randomizedtesting.annotations.ParametersFactory} 
specified in a subclass
+   * provides a list lists of arguments for the tests and RandomizedRunner 
will execute the test for
+   * each of the argument list.
+   *
+   * @param version the version this test should run for
+   * @param indexPattern an index pattern in order to open an index of see 
{@link
+   * #createPattern(String, String)}
+   */
+  protected BackwardsCompatibilityTestBase(
+  @Name("version") Version version, @Name("pattern") String indexPattern) {
+this.version = version;
+this.indexPattern = indexPattern;
+  }
+
+  Directory directory;
+
+  @Override
+  @Before
+  public void setUp() throws Exception {
+super.setUp();
+assertNull(
+"Index name " + version + " should not exist found",
+TestAncientIndicesCompatibility.class.getResourceAsStream(
+indexName(LATEST_PREVIOUS_MAJOR)))

Re: [PR] Update int array growth calls [lucene]

2024-01-30 Thread via GitHub


stefanvodita commented on code in PR #12947:
URL: https://github.com/apache/lucene/pull/12947#discussion_r1471158692


##
lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/TaskSequence.java:
##


Review Comment:
   My reasoning must have been way off the day I wrote this. Thank you for 
noticing the mistake! I've reverted the changes to this file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Reproducible failure in TestGeo3DPoint.testRandomBig [lucene]

2024-01-30 Thread via GitHub


easyice opened a new issue, #13056:
URL: https://github.com/apache/lucene/issues/13056

   ### Description
   
   ```
   
   org.apache.lucene.spatial3d.TestGeo3DPoint > testRandomBig FAILED
   java.lang.AssertionError: FAIL: id=23785 should not have matched but did
 shape=GeoStandardPath: {planetmodel=PlanetModel.WGS84, 
width=2.5638182999069305E-5(0.0014689596802307306), 
points={[[lat=2.4457272005608357E-47, lon=0.0([X=1.0011188539901221, Y=0.0, 
Z=2.4484636121979335E-47])], [lat=-1.0718146506637272, 
lon=0.017453291479645996([X=0.47775442364187093, Y=0.008339233987253964, 
Z=-0.876439379159])], [lat=1.2798499685240037, 
lon=-0.4737449978411123([X=0.2547649811064312, Y=-0.13061449414092574, 
Z=0.9560925461700265])]]}}
 bounds=XYZBounds: [xmin=0.25474324433710577 xmax=1.0011188549901222 
ymin=-0.1802138209379591 ymax=0.008364853863677697 zmin=-0.8767899152769958 
zmax=0.9561195469506941]
 world bounds=( minX=-1.0011188539901221 maxX=1.0011188539901221 
minY=-1.0011188539901221 maxY=1.0011188539901221 minZ=-0.997762292019756 
maxZ=0.997762292019756
 quantized point=[X=1.001118853604479, Y=-1.9836761499747446E-5, 
Z=2.3309121299774915E-10] within shape? false within bounds? true
 unquantized point=[lat=2.4457272005608357E-47, 
lon=-1.9814467377547217E-5([X=1.001118853793596, Y=-1.983663687213671E-5, 
Z=2.4484636121979335E-47])] within shape? true within bounds? true
 docID=23785 deleted?=false
 query=PointInGeo3DShapeQuery: field=point: Shape: GeoStandardPath: 
{planetmodel=PlanetModel.WGS84, 
width=2.5638182999069305E-5(0.0014689596802307306), 
points={[[lat=2.4457272005608357E-47, lon=0.0([X=1.0011188539901221, Y=0.0, 
Z=2.4484636121979335E-47])], [lat=-1.0718146506637272, 
lon=0.017453291479645996([X=0.47775442364187093, Y=0.008339233987253964, 
Z=-0.876439379159])], [lat=1.2798499685240037, 
lon=-0.4737449978411123([X=0.2547649811064312, Y=-0.13061449414092574, 
Z=0.9560925461700265])]]}}
 explanation:
   target is in leaf _0(10.0.0):C145628/1423:[diagnostics={os=Mac OS X, 
java.vendor=Oracle Corporation, java.runtime.version=17.0.7+8-LTS-224, 
timestamp=1706584548568, source=flush, lucene.version=10.0.0, os.version=13.6, 
os.arch=x86_64}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}]:delGen=1
 :id=df3rpl1vhr0vzenewjvq2zj8r of full reader 
StandardDirectoryReader(segments:3:nrt 
_0(10.0.0):C145628/1423:[diagnostics={os=Mac OS X, java.vendor=Oracle 
Corporation, java.runtime.version=17.0.7+8-LTS-224, timestamp=1706584548568, 
source=flush, lucene.version=10.0.0, os.version=13.6, 
os.arch=x86_64}]:[attributes={Lucene90StoredFieldsFormat.mode=BEST_SPEED}]:delGen=1
 :id=df3rpl1vhr0vzenewjvq2zj8r)
   full BKD path to target doc:
 Cell(x=-1.0011188539901221 TO 1.0011188539901221 
y=-1.0011188539901221 TO 1.0011188539901221 z=-0.9977622923536127 TO 
0.9977622923536126); Shape relationship = OVERLAPS; Quantized point within cell 
= true; Unquantized point within cell = true
 Cell(x=0.047844370442552187 TO 1.0011188539901221 
y=-1.0011188539901221 TO 1.0011188539901221 z=-0.9977622923536127 TO 
0.9977622923536126); Shape relationship = OVERLAPS; Quantized point within cell 
= true; Unquantized point within cell = true
 Cell(x=0.047844370442552187 TO 1.0011188539901221 
y=-1.0011188539901221 TO 4.661824259954982E-10 z=-0.9977622923536127 TO 
0.9977622923536126); Shape relationship = OVERLAPS; Quantized point within cell 
= true; Unquantized point within cell = true
 Cell(x=0.047844370442552187 TO 1.0011188539901221 
y=-1.0011188539901221 TO 4.661824259954982E-10 z=-0.9977622923536127 TO 
4.661824259954982E-10); Shape relationship = OVERLAPS; Quantized point within 
cell = true; Unquantized point within cell = true
 Cell(x=0.047844370442552187 TO 1.0011188539901221 
y=-0.08227179995049738 TO 4.661824259954982E-10 z=-0.9977622923536127 TO 
4.661824259954982E-10); Shape relationship = OVERLAPS; Quantized point within 
cell = true; Unquantized point within cell = true
 Cell(x=0.047844370442552187 TO 1.0011188539901221 
y=-0.08227179995049738 TO 4.661824259954982E-10 z=-0.5035355968293294 TO 
4.661824259954982E-10); Shape relationship = OVERLAPS; Quantized point within 
cell = true; Unquantized point within cell = true
 Cell(x=1.0011187391566934 TO 1.0011188539901221 
y=-0.08227179995049738 TO 4.661824259954982E-10 z=-0.5035355968293294 TO 
4.661824259954982E-10); Shape relationship = OVERLAPS; Quantized point within 
cell = true; Unquantized point within cell = true
 Cell(x=1.0011187391566934 TO 1.0011188539901221 
y=-0.08227179995049738 TO 4.661824259954982E-10 z=-0.5035355968293294 TO 
4.661824259954982E-10); Shape relationship = OVERLAPS; Quantized point within 
cell = true; Unquantized point within cell = true
 Cell(x=1.0011187391566934 TO 1.0011188539901221 
y=-0.08227179995049738 TO 4.661824259954982

Re: [PR] Fix too many open files Exception for TestConcurrentMergeScheduler [lucene]

2024-01-30 Thread via GitHub


easyice commented on PR #13035:
URL: https://github.com/apache/lucene/pull/13035#issuecomment-1917236900

   
   Pushed a new fix for reproducible test failure 
`TestIndexWriterThreadsToSegments.testManyThreadsClose`:
   
   ```
   ./gradlew test --tests TestIndexWriterThreadsToSegments.testManyThreadsClose 
-Dtests.seed=DFAE4EC5F20E1CD4 -Dtests.nightly=true -Dtests.locale=zgh-MA 
-Dtests.timezone=SystemV/EST5EDT -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   ```
   
   ```
   java.nio.file.FileSystemException: 
/Users//tests-tmp/lucene.index.TestIndexWriterThreadsToSegments_DFAE4EC5F20E1CD4-001/index-NIOFSDirectory-001/_hk_Lucene90FieldsIndexfile_pointers_ua.tmp:
 Too many open files
  > at 
org.apache.lucene.tests.mockfile.HandleLimitFS.onOpen(HandleLimitFS.java:67)
  > at 
org.apache.lucene.tests.mockfile.HandleTrackingFS.callOpenHook(HandleTrackingFS.java:82)
  > at 
org.apache.lucene.tests.mockfile.HandleTrackingFS.newOutputStream(HandleTrackingFS.java:163)
  > at 
org.apache.lucene.tests.mockfile.FilterFileSystemProvider.newOutputStream(FilterFileSystemProvider.java:200)
  > at 
java.base/java.nio.file.Files.newOutputStream(Files.java:228)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] Reproducible failure in TestParentBlockJoinFloatKnnVectorQuery.testScoringWithMultipleChildren [lucene]

2024-01-30 Thread via GitHub


easyice opened a new issue, #13057:
URL: https://github.com/apache/lucene/issues/13057

   ### Description
   
   ```
   org.apache.lucene.search.join.TestParentBlockJoinFloatKnnVectorQuery > 
testScoringWithMultipleChildren FAILED
   java.lang.AssertionError: expected:<1.0> but was:<0.019607843831181526>
   at 
__randomizedtesting.SeedInfo.seed([8F867B6F7689D69C:A19AB97A658D176E]:0)
   at junit@4.13.1/org.junit.Assert.fail(Assert.java:89)
   at junit@4.13.1/org.junit.Assert.failNotEquals(Assert.java:835)
   at junit@4.13.1/org.junit.Assert.assertEquals(Assert.java:555)
   at junit@4.13.1/org.junit.Assert.assertEquals(Assert.java:685)
   at 
org.apache.lucene.search.join.ParentBlockJoinKnnVectorQueryTestCase.assertScorerResults(ParentBlockJoinKnnVectorQueryTestCase.java:341)
   at 
org.apache.lucene.search.join.ParentBlockJoinKnnVectorQueryTestCase.testScoringWithMultipleChildren(ParentBlockJoinKnnVectorQueryTestCase.java:210)
   at 
org.apache.lucene.search.join.TestParentBlockJoinFloatKnnVectorQuery.testScoringWithMultipleChildren(TestParentBlockJoinFloatKnnVectorQuery.java:37)
   at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   ```
   
   
   ### Gradle command to reproduce
   
   ./gradlew test --tests 
TestParentBlockJoinFloatKnnVectorQuery.testScoringWithMultipleChildren 
-Dtests.seed=8F867B6F7689D69C -Dtests.nightly=true -Dtests.locale=en-GI 
-Dtests.timezone=Africa/Porto-Novo -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub


mikemccand commented on PR #13046:
URL: https://github.com/apache/lucene/pull/13046#issuecomment-1917387757

   > > We should PnP this!
   > 
   > What on earth means PnP? Mike, check out this search: 
https://www.google.com/search?q=pnp+acronym wikipedia FTW
   
   PnP = progress not perfection!  Not plug and play, not transistors, heh.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub


uschindler commented on PR #13046:
URL: https://github.com/apache/lucene/pull/13046#issuecomment-1917488764

   > > > We should PnP this!
   > > 
   > > 
   > > What on earth means PnP? Mike, check out this search: 
https://www.google.com/search?q=pnp+acronym wikipedia FTW
   > 
   > PnP = progress not perfection! Not plug and play, not transistors, heh.
   
   We all need some drugs sometimes, especially the policeman. 💉💊💊💊


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub


s1monw commented on PR #13046:
URL: https://github.com/apache/lucene/pull/13046#issuecomment-1917508359

   @uschindler I need to hear you speaking german to tell, you know that ;)  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub


uschindler commented on PR #13046:
URL: https://github.com/apache/lucene/pull/13046#issuecomment-1917572859

   There is no formatted() with locale.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub


uschindler commented on code in PR #13046:
URL: https://github.com/apache/lucene/pull/13046#discussion_r1471695464


##
lucene/backward-codecs/src/test/org/apache/lucene/backward_index/BackwardsCompatibilityTestBase.java:
##
@@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.backward_index;
+
+import com.carrotsearch.randomizedtesting.annotations.Name;
+import java.io.IOException;
+import java.io.InputStream;
+import java.lang.reflect.Field;
+import java.lang.reflect.Modifier;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.text.ParseException;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.lucene.codecs.Codec;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.SegmentReader;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.tests.util.LuceneTestCase;
+import org.apache.lucene.tests.util.TestUtil;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+import org.apache.lucene.util.Version;
+import org.junit.After;
+import org.junit.Before;
+
+public abstract class BackwardsCompatibilityTestBase extends LuceneTestCase {
+
+  protected final Version version;
+  private static final Version LATEST_PREVIOUS_MAJOR = 
getLatestPreviousMajorVersion();
+  protected final String indexPattern;
+  protected static final Set BINARY_SUPPORTED_VERSIONS;
+
+  static {
+String[] oldVersions =
+new String[] {
+  "8.0.0", "8.0.0", "8.1.0", "8.1.0", "8.1.1", "8.1.1", "8.2.0", 
"8.2.0", "8.3.0", "8.3.0",
+  "8.3.1", "8.3.1", "8.4.0", "8.4.0", "8.4.1", "8.4.1", "8.5.0", 
"8.5.0", "8.5.1", "8.5.1",
+  "8.5.2", "8.5.2", "8.6.0", "8.6.0", "8.6.1", "8.6.1", "8.6.2", 
"8.6.2", "8.6.3", "8.6.3",
+  "8.7.0", "8.7.0", "8.8.0", "8.8.0", "8.8.1", "8.8.1", "8.8.2", 
"8.8.2", "8.9.0", "8.9.0",
+  "8.10.0", "8.10.0", "8.10.1", "8.10.1", "8.11.0", "8.11.0", 
"8.11.1", "8.11.1", "8.11.2",
+  "8.11.2", "9.0.0", "9.1.0", "9.2.0", "9.3.0", "9.4.0", "9.4.1", 
"9.4.2", "9.5.0", "9.6.0",
+  "9.7.0", "9.8.0", "9.9.0", "9.9.1", "9.9.2", "9.10.0", "10.0.0",
+};
+
+Set binaryVersions = new HashSet<>();
+for (String version : oldVersions) {
+  try {
+Version v = Version.parse(version);
+assertTrue("Unsupported binary version: " + v, v.major >= 
Version.MIN_SUPPORTED_MAJOR - 1);
+binaryVersions.add(v);
+  } catch (ParseException ex) {
+throw new RuntimeException(ex);
+  }
+}
+List allCurrentVersions = getAllCurrentVersions();
+for (Version version : allCurrentVersions) {
+  // make sure we never miss a version.
+  assertTrue("Version: " + version + " missing", 
binaryVersions.remove(version));
+}
+BINARY_SUPPORTED_VERSIONS = binaryVersions;
+  }
+
+  /**
+   * This is a base constructor for parameterized BWC tests. The constructor 
arguments are provided
+   * by {@link com.carrotsearch.randomizedtesting.RandomizedRunner} during 
test execution. A {@link
+   * com.carrotsearch.randomizedtesting.annotations.ParametersFactory} 
specified in a subclass
+   * provides a list lists of arguments for the tests and RandomizedRunner 
will execute the test for
+   * each of the argument list.
+   *
+   * @param version the version this test should run for
+   * @param indexPattern an index pattern in order to open an index of see 
{@link
+   * #createPattern(String, String)}
+   */
+  protected BackwardsCompatibilityTestBase(
+  @Name("version") Version version, @Name("pattern") String indexPattern) {
+this.version = version;
+this.indexPattern = indexPattern;
+  }
+
+  Directory directory;
+
+  @Override
+  @Before
+  public void setUp() throws Exception {
+super.setUp();
+assertNull(
+"Index name " + version + " should not exist found",
+TestAncientIndicesCompatibility.class.getResourceAsStream(
+indexNam

Re: [PR] Optimize counts on two clause term disjunctions [lucene]

2024-01-30 Thread via GitHub


jpountz commented on PR #13036:
URL: https://github.com/apache/lucene/pull/13036#issuecomment-1917595110

   Thanks @jfreden, the heuristic looks sensible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub


uschindler commented on code in PR #13046:
URL: https://github.com/apache/lucene/pull/13046#discussion_r1471716431


##
lucene/backward-codecs/src/test/org/apache/lucene/backward_index/BackwardsCompatibilityTestBase.java:
##
@@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.backward_index;
+
+import com.carrotsearch.randomizedtesting.annotations.Name;
+import java.io.IOException;
+import java.io.InputStream;
+import java.lang.reflect.Field;
+import java.lang.reflect.Modifier;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.text.ParseException;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.lucene.codecs.Codec;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.SegmentReader;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.tests.util.LuceneTestCase;
+import org.apache.lucene.tests.util.TestUtil;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+import org.apache.lucene.util.Version;
+import org.junit.After;
+import org.junit.Before;
+
+public abstract class BackwardsCompatibilityTestBase extends LuceneTestCase {
+
+  protected final Version version;
+  private static final Version LATEST_PREVIOUS_MAJOR = 
getLatestPreviousMajorVersion();
+  protected final String indexPattern;
+  protected static final Set BINARY_SUPPORTED_VERSIONS;
+
+  static {
+String[] oldVersions =
+new String[] {
+  "8.0.0", "8.0.0", "8.1.0", "8.1.0", "8.1.1", "8.1.1", "8.2.0", 
"8.2.0", "8.3.0", "8.3.0",
+  "8.3.1", "8.3.1", "8.4.0", "8.4.0", "8.4.1", "8.4.1", "8.5.0", 
"8.5.0", "8.5.1", "8.5.1",
+  "8.5.2", "8.5.2", "8.6.0", "8.6.0", "8.6.1", "8.6.1", "8.6.2", 
"8.6.2", "8.6.3", "8.6.3",
+  "8.7.0", "8.7.0", "8.8.0", "8.8.0", "8.8.1", "8.8.1", "8.8.2", 
"8.8.2", "8.9.0", "8.9.0",
+  "8.10.0", "8.10.0", "8.10.1", "8.10.1", "8.11.0", "8.11.0", 
"8.11.1", "8.11.1", "8.11.2",
+  "8.11.2", "9.0.0", "9.1.0", "9.2.0", "9.3.0", "9.4.0", "9.4.1", 
"9.4.2", "9.5.0", "9.6.0",
+  "9.7.0", "9.8.0", "9.9.0", "9.9.1", "9.9.2", "9.10.0", "10.0.0",
+};
+
+Set binaryVersions = new HashSet<>();
+for (String version : oldVersions) {
+  try {
+Version v = Version.parse(version);
+assertTrue("Unsupported binary version: " + v, v.major >= 
Version.MIN_SUPPORTED_MAJOR - 1);
+binaryVersions.add(v);
+  } catch (ParseException ex) {
+throw new RuntimeException(ex);
+  }
+}
+List allCurrentVersions = getAllCurrentVersions();
+for (Version version : allCurrentVersions) {
+  // make sure we never miss a version.
+  assertTrue("Version: " + version + " missing", 
binaryVersions.remove(version));
+}
+BINARY_SUPPORTED_VERSIONS = binaryVersions;
+  }
+
+  /**
+   * This is a base constructor for parameterized BWC tests. The constructor 
arguments are provided
+   * by {@link com.carrotsearch.randomizedtesting.RandomizedRunner} during 
test execution. A {@link
+   * com.carrotsearch.randomizedtesting.annotations.ParametersFactory} 
specified in a subclass
+   * provides a list lists of arguments for the tests and RandomizedRunner 
will execute the test for
+   * each of the argument list.
+   *
+   * @param version the version this test should run for
+   * @param indexPattern an index pattern in order to open an index of see 
{@link
+   * #createPattern(String, String)}
+   */
+  protected BackwardsCompatibilityTestBase(
+  @Name("version") Version version, @Name("pattern") String indexPattern) {
+this.version = version;
+this.indexPattern = indexPattern;
+  }
+
+  Directory directory;
+
+  @Override
+  @Before
+  public void setUp() throws Exception {
+super.setUp();
+assertNull(
+"Index name " + version + " should not exist found",
+TestAncientIndicesCompatibility.class.getResourceAsStream(
+indexNam

Re: [I] Advance to first position of 1 in BitSet before iterating the lead in BitSetConjunctionDISI? [lucene]

2024-01-30 Thread via GitHub


jpountz commented on issue #13024:
URL: https://github.com/apache/lucene/issues/13024#issuecomment-1917610302

   I'm not sure I like this idea, which feels quite arbitrary: why would there 
be a big gap of matching doc IDs towards the start of the doc ID space, and not 
in the middle or towards the end?
   
   This `BitSetConjunctionDISI` was mostly added to mimic the way 
`FilteredQuery` used to work (before we replaced filters with queries and 
`FilteredQuery` with `FILTER` clauses on `BooleanQuery`) to avoid performance 
regressions. Maybe we can improve the heuristic regarding when to use it, which 
could in-turn help the case you're looking into? For instance we currently 
evaluate filters via `Bits#get` rather than `BitSet#nextSetBit` when their cost 
is greater than the minimum cost across all required clauses. In a similar 
context, `IndexOrDocValuesQuery` only uses doc values to run queries when their 
cost is 8x greater than the lead cost, maybe it would be a better threshold for 
evaluating filters via `Bits#get` as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Modernize BWC testing with parameterized tests [lucene]

2024-01-30 Thread via GitHub


uschindler commented on code in PR #13046:
URL: https://github.com/apache/lucene/pull/13046#discussion_r1471723409


##
lucene/backward-codecs/src/test/org/apache/lucene/backward_index/BackwardsCompatibilityTestBase.java:
##
@@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.backward_index;
+
+import com.carrotsearch.randomizedtesting.annotations.Name;
+import java.io.IOException;
+import java.io.InputStream;
+import java.lang.reflect.Field;
+import java.lang.reflect.Modifier;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.text.ParseException;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Locale;
+import java.util.Set;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.lucene.codecs.Codec;
+import org.apache.lucene.index.DirectoryReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.SegmentReader;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.tests.util.LuceneTestCase;
+import org.apache.lucene.tests.util.TestUtil;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.IOUtils;
+import org.apache.lucene.util.Version;
+import org.junit.After;
+import org.junit.Before;
+
+public abstract class BackwardsCompatibilityTestBase extends LuceneTestCase {
+
+  protected final Version version;
+  private static final Version LATEST_PREVIOUS_MAJOR = 
getLatestPreviousMajorVersion();
+  protected final String indexPattern;
+  protected static final Set BINARY_SUPPORTED_VERSIONS;
+
+  static {
+String[] oldVersions =
+new String[] {
+  "8.0.0", "8.0.0", "8.1.0", "8.1.0", "8.1.1", "8.1.1", "8.2.0", 
"8.2.0", "8.3.0", "8.3.0",
+  "8.3.1", "8.3.1", "8.4.0", "8.4.0", "8.4.1", "8.4.1", "8.5.0", 
"8.5.0", "8.5.1", "8.5.1",
+  "8.5.2", "8.5.2", "8.6.0", "8.6.0", "8.6.1", "8.6.1", "8.6.2", 
"8.6.2", "8.6.3", "8.6.3",
+  "8.7.0", "8.7.0", "8.8.0", "8.8.0", "8.8.1", "8.8.1", "8.8.2", 
"8.8.2", "8.9.0", "8.9.0",
+  "8.10.0", "8.10.0", "8.10.1", "8.10.1", "8.11.0", "8.11.0", 
"8.11.1", "8.11.1", "8.11.2",
+  "8.11.2", "9.0.0", "9.1.0", "9.2.0", "9.3.0", "9.4.0", "9.4.1", 
"9.4.2", "9.5.0", "9.6.0",
+  "9.7.0", "9.8.0", "9.9.0", "9.9.1", "9.9.2", "9.10.0", "10.0.0",
+};
+
+Set binaryVersions = new HashSet<>();
+for (String version : oldVersions) {
+  try {
+Version v = Version.parse(version);
+assertTrue("Unsupported binary version: " + v, v.major >= 
Version.MIN_SUPPORTED_MAJOR - 1);
+binaryVersions.add(v);
+  } catch (ParseException ex) {
+throw new RuntimeException(ex);
+  }
+}
+List allCurrentVersions = getAllCurrentVersions();
+for (Version version : allCurrentVersions) {
+  // make sure we never miss a version.
+  assertTrue("Version: " + version + " missing", 
binaryVersions.remove(version));
+}
+BINARY_SUPPORTED_VERSIONS = binaryVersions;
+  }
+
+  /**
+   * This is a base constructor for parameterized BWC tests. The constructor 
arguments are provided
+   * by {@link com.carrotsearch.randomizedtesting.RandomizedRunner} during 
test execution. A {@link
+   * com.carrotsearch.randomizedtesting.annotations.ParametersFactory} 
specified in a subclass
+   * provides a list lists of arguments for the tests and RandomizedRunner 
will execute the test for
+   * each of the argument list.
+   *
+   * @param version the version this test should run for
+   * @param indexPattern an index pattern in order to open an index of see 
{@link
+   * #createPattern(String, String)}
+   */
+  protected BackwardsCompatibilityTestBase(
+  @Name("version") Version version, @Name("pattern") String indexPattern) {
+this.version = version;
+this.indexPattern = indexPattern;
+  }
+
+  Directory directory;
+
+  @Override
+  @Before
+  public void setUp() throws Exception {
+super.setUp();
+assertNull(
+"Index name " + version + " should not exist found",
+TestAncientIndicesCompatibility.class.getResourceAsStream(
+indexNam

Re: [PR] clean up smoketester GPG leaks [lucene]

2024-01-30 Thread via GitHub


janhoy commented on PR #12882:
URL: https://github.com/apache/lucene/pull/12882#issuecomment-1917641774

   @hurutoriya Did you do any testing on the changes? If not, we need to QA 
that the script is not broken by this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org