[PR] Update lastDoc in ScoreCachingWrappingScorer [lucene]

2024-11-11 Thread via GitHub


msfroh opened a new pull request, #13987:
URL: https://github.com/apache/lucene/pull/13987

   ### Description
   
   I noticed that ScoreCachingWrappingScorer never updates lastDoc, so it's 
always -1. Technically, it's probably fine, since it still ends up returning 
the same score for multiple score() calls between collect calls, but I think 
this is the intended logic. (In particular, if the same doc was somehow 
collected multiple times, then the score would get recalculated.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Pruning of estimating the point value count in BooleanScorerSupplier [lucene]

2024-11-11 Thread via GitHub


kkewwei commented on PR #13988:
URL: https://github.com/apache/lucene/pull/13988#issuecomment-2469378470

   @jpountz please have a look when you are free. I will add additional tests 
if it makes sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Pruning of estimating the point value count in BooleanScorerSupplier [lucene]

2024-11-11 Thread via GitHub


kkewwei opened a new pull request, #13988:
URL: https://github.com/apache/lucene/pull/13988

   ### Description
   The pr aims to speed up computing cost in `BooleanScorerSupplier` with the 
`leadCost`, just as #13199.
   
   Lucene benchmark: `python3 src/python/localrun.py wikimedium10m`
   Hardware used: linux ecs.t2-c1m2dev.8xlarge | 32 cores | 64G
   
   
   ```
   Report after iter 19:
   TaskQPS baseline  StdDevQPS 
my_modified_version  StdDevPct diff p-value
   Wildcard  204.70  (4.1%)  195.95  
(4.6%)   -4.3% ( -12% -4%) 0.002
  range 3028.29  (9.7%) 2917.73 
(10.3%)   -3.7% ( -21% -   18%) 0.249
 AndHighLow  433.07  (3.7%)  422.23  
(4.6%)   -2.5% ( -10% -6%) 0.058
 TermDTSort   84.40  (7.9%)   82.49  
(6.2%)   -2.3% ( -15% -   12%) 0.312
Prefix3   76.79  (3.7%)   75.54  
(5.1%)   -1.6% ( -10% -7%) 0.245
 HighPhrase   46.03  (4.0%)   45.52  
(5.8%)   -1.1% ( -10% -9%) 0.487
  MedPhrase   18.85  (4.6%)   18.66  
(4.9%)   -1.0% ( -10% -8%) 0.490
  HighTermTitleSort   98.46  (4.6%)   97.70  
(3.2%)   -0.8% (  -8% -7%) 0.537
  HighTermDayOfYearSort  239.08  (6.8%)  237.24  
(6.0%)   -0.8% ( -12% -   12%) 0.703
   PKLookup  131.53  (3.9%)  130.56  
(4.6%)   -0.7% (  -8% -8%) 0.581
  LowPhrase   21.51  (5.4%)   21.36  
(4.8%)   -0.7% ( -10% -   10%) 0.682
  BrowseDayOfYearSSDVFacets   14.12 (13.0%)   14.03 
(12.4%)   -0.6% ( -22% -   28%) 0.882
   MedTermDayTaxoFacets   35.01  (3.4%)   34.81  
(2.8%)   -0.6% (  -6% -5%) 0.571
MedSloppyPhrase   21.86  (3.0%)   21.75  
(3.6%)   -0.5% (  -6% -6%) 0.609
 AndHighMed  117.34  (4.0%)  116.78  
(4.1%)   -0.5% (  -8% -7%) 0.710
   HighSloppyPhrase   22.99  (3.3%)   22.90  
(3.8%)   -0.4% (  -7% -6%) 0.712
BrowseRandomLabelSSDVFacets8.84  (4.5%)8.81  
(4.0%)   -0.4% (  -8% -8%) 0.790
   HighIntervalsOrdered7.43  (4.4%)7.40  
(4.1%)   -0.3% (  -8% -8%) 0.814
AndHighHigh   48.15  (4.6%)   48.02  
(4.6%)   -0.3% (  -9% -9%) 0.848
MedSpanNear   94.70  (2.9%)   94.49  
(3.1%)   -0.2% (  -6% -6%) 0.821
  OrHighMed   71.20  (7.8%)   71.10  
(6.3%)   -0.1% ( -13% -   15%) 0.949
  BrowseMonthSSDVFacets   14.53  (5.2%)   14.55  
(4.8%)0.1% (  -9% -   10%) 0.937
   HighSpanNear1.92  (1.8%)1.93  
(1.6%)0.2% (  -3% -3%) 0.752
AndHighMedDayTaxoFacets   32.00  (2.3%)   32.06  
(2.7%)0.2% (  -4% -5%) 0.816
LowSpanNear6.24  (2.1%)6.26  
(2.2%)0.2% (  -4% -4%) 0.776
   AndHighHighDayTaxoFacets7.97  (2.8%)7.99  
(4.1%)0.2% (  -6% -7%) 0.840
   BrowseDateSSDVFacets2.46 (20.7%)2.46 
(22.5%)0.2% ( -35% -   54%) 0.974
 OrHighMedDayTaxoFacets9.09  (2.6%)9.11  
(4.0%)0.3% (  -6% -7%) 0.770
   HighTermTitleBDVSort   10.86  (6.7%)   10.90  
(4.9%)0.3% ( -10% -   12%) 0.857
 Fuzzy1   35.48  (2.6%)   35.63  
(3.3%)0.4% (  -5% -6%) 0.659
LowIntervalsOrdered   63.75  (3.4%)   64.05  
(3.4%)0.5% (  -6% -7%) 0.669
MedIntervalsOrdered   24.79  (6.0%)   24.92  
(5.8%)0.5% ( -10% -   13%) 0.777
LowSloppyPhrase  133.33  (6.1%)  134.05  
(4.0%)0.5% (  -9% -   11%) 0.739
Respell   41.42  (3.5%)   41.70  
(3.3%)0.7% (  -5% -7%) 0.540
 IntNRQ   44.62 (28.9%)   44.97 
(27.1%)0.8% ( -42% -   79%) 0.929
 OrHighHigh   30.04  (7.4%)   30.30  
(7.8%)0.9% ( -13% -   17%) 0.716
  HighTermMonthSort 1217.65  (7.2%) 1231.77  
(7.5%)1.2% ( -12% -   17%) 0.617
  OrHighLow  438.87  (3.6%)  444.22  
(3.7%)1.2% (  -5% -8%) 0.290
LowTerm  411.15  (6.4%)  416.33  
(5.4%)1.3% (  -9% -   

Re: [PR] Allow easier verification of the Panama Vectorization provider with newer Java versions [lucene]

2024-11-11 Thread via GitHub


ChrisHegarty commented on code in PR #13986:
URL: https://github.com/apache/lucene/pull/13986#discussion_r1836935485


##
gradle/testing/defaults-tests.gradle:
##
@@ -128,7 +128,13 @@ allprojects {
   jvmArgs '--add-modules', 'jdk.management'
 
   // Enable the vector incubator module on supported Java versions:
-  if 
(rootProject.vectorIncubatorJavaVersions.contains(rootProject.runtimeJavaVersion))
 {
+  def v = JavaVersion.VERSION_1_1
+  def prop = 
providers.systemProperty("org.apache.lucene.vectorization.upperJavaFeatureVersion")

Review Comment:
   thanks @dweiss, that's a bit cleaner. Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [DRAFT] Change vector input from IndexInput to RandomAccessInput [lucene]

2024-11-11 Thread via GitHub


shubhamvishu commented on code in PR #13981:
URL: https://github.com/apache/lucene/pull/13981#discussion_r1836169057


##
lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorer.java:
##
@@ -40,8 +41,14 @@ abstract sealed class Lucene99MemorySegmentByteVectorScorer
* returned.
*/
   public static Optional create(
-  VectorSimilarityFunction type, IndexInput input, KnnVectorValues values, 
byte[] queryVector) {
+  VectorSimilarityFunction type,
+  RandomAccessInput slice,
+  KnnVectorValues values,
+  byte[] queryVector) {
 assert values instanceof ByteVectorValues;
+if (!(slice instanceof IndexInput input)) {

Review Comment:
   Nit : input is not used



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [DRAFT] Change vector input from IndexInput to RandomAccessInput [lucene]

2024-11-11 Thread via GitHub


shubhamvishu commented on code in PR #13981:
URL: https://github.com/apache/lucene/pull/13981#discussion_r1836169621


##
lucene/core/src/java/org/apache/lucene/store/RandomAccessInput.java:
##
@@ -77,4 +85,6 @@ default void readBytes(long pos, byte[] bytes, int offset, 
int length) throws IO
* @see IndexInput#prefetch
*/
   default void prefetch(long offset, long length) throws IOException {}
+
+  Object clone();

Review Comment:
   Change this to `RandomAccessInput clone();` so you don't have to cast in all 
places?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Tessellator: Improve logic when two holes share the same vertex with the polygon [lucene]

2024-11-11 Thread via GitHub


iverase merged PR #13980:
URL: https://github.com/apache/lucene/pull/13980


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Unable to Tessellate shape for a valid Polygon according to GDAL/OGR and PostGIS [lucene]

2024-11-11 Thread via GitHub


iverase closed issue #13841: Unable to Tessellate shape for a valid Polygon 
according to GDAL/OGR and PostGIS
URL: https://github.com/apache/lucene/issues/13841


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Allow easier verification of the Panama Vectorization provider with newer Java versions [lucene]

2024-11-11 Thread via GitHub


ChrisHegarty opened a new pull request, #13986:
URL: https://github.com/apache/lucene/pull/13986

   This commit allows easier verification of the Panama Vectorization provider 
with newer Java versions.
   
   The upper bound Java version of the Vectorization provider is hardcoded to 
the version that has been tested and is known to work. This is a bit inflexible 
when experimenting with and verifying newer JDK versions. This change proposes 
to add a new system property that allows to set the upper bound of the range of 
Java versions supported.
   
   With this change, and the accompanying small gradle change, then one can 
verify newer JDKs as follows:
   
   ```
   CI=true; 
RUNTIME_JAVA_HOME=/Users/chegar/binaries/jdk-24.jdk-ea-b23/Contents/Home
   ./gradlew :lucene:core:test 
-Dorg.apache.lucene.vectorization.upperJavaFeatureVersion=24
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Allow easier verification of the Panama Vectorization provider with newer Java versions [lucene]

2024-11-11 Thread via GitHub


dweiss commented on code in PR #13986:
URL: https://github.com/apache/lucene/pull/13986#discussion_r1836911525


##
gradle/testing/defaults-tests.gradle:
##
@@ -128,7 +128,13 @@ allprojects {
   jvmArgs '--add-modules', 'jdk.management'
 
   // Enable the vector incubator module on supported Java versions:
-  if 
(rootProject.vectorIncubatorJavaVersions.contains(rootProject.runtimeJavaVersion))
 {
+  def v = JavaVersion.VERSION_1_1
+  def prop = 
providers.systemProperty("org.apache.lucene.vectorization.upperJavaFeatureVersion")

Review Comment:
   It'd be probably more consistent to use the propertyOrDefault "function" 
that we defined globally to allow passing such properties via -P (gradle's 
project properties) or -D (system properties). You can provide the default as 
the second argument - look at any existing call of that function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-11 Thread via GitHub


shatejas commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1836913408


##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##
@@ -113,6 +114,25 @@ public Lucene99HnswVectorsReader(SegmentReadState state, 
FlatVectorsReader flatV
 }
   }
 
+  private Lucene99HnswVectorsReader(
+  Lucene99HnswVectorsReader reader, KnnVectorsReader flatVectorsReader) {
+assert flatVectorsReader instanceof FlatVectorsReader;

Review Comment:
   > maybe we don't even need a cast if we make getMergeInstance() return a 
FlatVectorsReader
   
   Actually figured out a way to do this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] KnnFloatVectorQuery#toString should show the filter [lucene]

2024-11-11 Thread via GitHub


viswanathk commented on issue #13983:
URL: https://github.com/apache/lucene/issues/13983#issuecomment-2469602539

   Seems like a good first issue -  I can contribute this @jpountz.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org