[PR] Update lastDoc in ScoreCachingWrappingScorer [lucene]
msfroh opened a new pull request, #13987: URL: https://github.com/apache/lucene/pull/13987 ### Description I noticed that ScoreCachingWrappingScorer never updates lastDoc, so it's always -1. Technically, it's probably fine, since it still ends up returning the same score for multiple score() calls between collect calls, but I think this is the intended logic. (In particular, if the same doc was somehow collected multiple times, then the score would get recalculated.) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Pruning of estimating the point value count in BooleanScorerSupplier [lucene]
kkewwei commented on PR #13988: URL: https://github.com/apache/lucene/pull/13988#issuecomment-2469378470 @jpountz please have a look when you are free. I will add additional tests if it makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Pruning of estimating the point value count in BooleanScorerSupplier [lucene]
kkewwei opened a new pull request, #13988: URL: https://github.com/apache/lucene/pull/13988 ### Description The pr aims to speed up computing cost in `BooleanScorerSupplier` with the `leadCost`, just as #13199. Lucene benchmark: `python3 src/python/localrun.py wikimedium10m` Hardware used: linux ecs.t2-c1m2dev.8xlarge | 32 cores | 64G ``` Report after iter 19: TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value Wildcard 204.70 (4.1%) 195.95 (4.6%) -4.3% ( -12% -4%) 0.002 range 3028.29 (9.7%) 2917.73 (10.3%) -3.7% ( -21% - 18%) 0.249 AndHighLow 433.07 (3.7%) 422.23 (4.6%) -2.5% ( -10% -6%) 0.058 TermDTSort 84.40 (7.9%) 82.49 (6.2%) -2.3% ( -15% - 12%) 0.312 Prefix3 76.79 (3.7%) 75.54 (5.1%) -1.6% ( -10% -7%) 0.245 HighPhrase 46.03 (4.0%) 45.52 (5.8%) -1.1% ( -10% -9%) 0.487 MedPhrase 18.85 (4.6%) 18.66 (4.9%) -1.0% ( -10% -8%) 0.490 HighTermTitleSort 98.46 (4.6%) 97.70 (3.2%) -0.8% ( -8% -7%) 0.537 HighTermDayOfYearSort 239.08 (6.8%) 237.24 (6.0%) -0.8% ( -12% - 12%) 0.703 PKLookup 131.53 (3.9%) 130.56 (4.6%) -0.7% ( -8% -8%) 0.581 LowPhrase 21.51 (5.4%) 21.36 (4.8%) -0.7% ( -10% - 10%) 0.682 BrowseDayOfYearSSDVFacets 14.12 (13.0%) 14.03 (12.4%) -0.6% ( -22% - 28%) 0.882 MedTermDayTaxoFacets 35.01 (3.4%) 34.81 (2.8%) -0.6% ( -6% -5%) 0.571 MedSloppyPhrase 21.86 (3.0%) 21.75 (3.6%) -0.5% ( -6% -6%) 0.609 AndHighMed 117.34 (4.0%) 116.78 (4.1%) -0.5% ( -8% -7%) 0.710 HighSloppyPhrase 22.99 (3.3%) 22.90 (3.8%) -0.4% ( -7% -6%) 0.712 BrowseRandomLabelSSDVFacets8.84 (4.5%)8.81 (4.0%) -0.4% ( -8% -8%) 0.790 HighIntervalsOrdered7.43 (4.4%)7.40 (4.1%) -0.3% ( -8% -8%) 0.814 AndHighHigh 48.15 (4.6%) 48.02 (4.6%) -0.3% ( -9% -9%) 0.848 MedSpanNear 94.70 (2.9%) 94.49 (3.1%) -0.2% ( -6% -6%) 0.821 OrHighMed 71.20 (7.8%) 71.10 (6.3%) -0.1% ( -13% - 15%) 0.949 BrowseMonthSSDVFacets 14.53 (5.2%) 14.55 (4.8%)0.1% ( -9% - 10%) 0.937 HighSpanNear1.92 (1.8%)1.93 (1.6%)0.2% ( -3% -3%) 0.752 AndHighMedDayTaxoFacets 32.00 (2.3%) 32.06 (2.7%)0.2% ( -4% -5%) 0.816 LowSpanNear6.24 (2.1%)6.26 (2.2%)0.2% ( -4% -4%) 0.776 AndHighHighDayTaxoFacets7.97 (2.8%)7.99 (4.1%)0.2% ( -6% -7%) 0.840 BrowseDateSSDVFacets2.46 (20.7%)2.46 (22.5%)0.2% ( -35% - 54%) 0.974 OrHighMedDayTaxoFacets9.09 (2.6%)9.11 (4.0%)0.3% ( -6% -7%) 0.770 HighTermTitleBDVSort 10.86 (6.7%) 10.90 (4.9%)0.3% ( -10% - 12%) 0.857 Fuzzy1 35.48 (2.6%) 35.63 (3.3%)0.4% ( -5% -6%) 0.659 LowIntervalsOrdered 63.75 (3.4%) 64.05 (3.4%)0.5% ( -6% -7%) 0.669 MedIntervalsOrdered 24.79 (6.0%) 24.92 (5.8%)0.5% ( -10% - 13%) 0.777 LowSloppyPhrase 133.33 (6.1%) 134.05 (4.0%)0.5% ( -9% - 11%) 0.739 Respell 41.42 (3.5%) 41.70 (3.3%)0.7% ( -5% -7%) 0.540 IntNRQ 44.62 (28.9%) 44.97 (27.1%)0.8% ( -42% - 79%) 0.929 OrHighHigh 30.04 (7.4%) 30.30 (7.8%)0.9% ( -13% - 17%) 0.716 HighTermMonthSort 1217.65 (7.2%) 1231.77 (7.5%)1.2% ( -12% - 17%) 0.617 OrHighLow 438.87 (3.6%) 444.22 (3.7%)1.2% ( -5% -8%) 0.290 LowTerm 411.15 (6.4%) 416.33 (5.4%)1.3% ( -9% -
Re: [PR] Allow easier verification of the Panama Vectorization provider with newer Java versions [lucene]
ChrisHegarty commented on code in PR #13986: URL: https://github.com/apache/lucene/pull/13986#discussion_r1836935485 ## gradle/testing/defaults-tests.gradle: ## @@ -128,7 +128,13 @@ allprojects { jvmArgs '--add-modules', 'jdk.management' // Enable the vector incubator module on supported Java versions: - if (rootProject.vectorIncubatorJavaVersions.contains(rootProject.runtimeJavaVersion)) { + def v = JavaVersion.VERSION_1_1 + def prop = providers.systemProperty("org.apache.lucene.vectorization.upperJavaFeatureVersion") Review Comment: thanks @dweiss, that's a bit cleaner. Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [DRAFT] Change vector input from IndexInput to RandomAccessInput [lucene]
shubhamvishu commented on code in PR #13981: URL: https://github.com/apache/lucene/pull/13981#discussion_r1836169057 ## lucene/core/src/java21/org/apache/lucene/internal/vectorization/Lucene99MemorySegmentByteVectorScorer.java: ## @@ -40,8 +41,14 @@ abstract sealed class Lucene99MemorySegmentByteVectorScorer * returned. */ public static Optional create( - VectorSimilarityFunction type, IndexInput input, KnnVectorValues values, byte[] queryVector) { + VectorSimilarityFunction type, + RandomAccessInput slice, + KnnVectorValues values, + byte[] queryVector) { assert values instanceof ByteVectorValues; +if (!(slice instanceof IndexInput input)) { Review Comment: Nit : input is not used -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] [DRAFT] Change vector input from IndexInput to RandomAccessInput [lucene]
shubhamvishu commented on code in PR #13981: URL: https://github.com/apache/lucene/pull/13981#discussion_r1836169621 ## lucene/core/src/java/org/apache/lucene/store/RandomAccessInput.java: ## @@ -77,4 +85,6 @@ default void readBytes(long pos, byte[] bytes, int offset, int length) throws IO * @see IndexInput#prefetch */ default void prefetch(long offset, long length) throws IOException {} + + Object clone(); Review Comment: Change this to `RandomAccessInput clone();` so you don't have to cast in all places? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Tessellator: Improve logic when two holes share the same vertex with the polygon [lucene]
iverase merged PR #13980: URL: https://github.com/apache/lucene/pull/13980 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Unable to Tessellate shape for a valid Polygon according to GDAL/OGR and PostGIS [lucene]
iverase closed issue #13841: Unable to Tessellate shape for a valid Polygon according to GDAL/OGR and PostGIS URL: https://github.com/apache/lucene/issues/13841 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[PR] Allow easier verification of the Panama Vectorization provider with newer Java versions [lucene]
ChrisHegarty opened a new pull request, #13986: URL: https://github.com/apache/lucene/pull/13986 This commit allows easier verification of the Panama Vectorization provider with newer Java versions. The upper bound Java version of the Vectorization provider is hardcoded to the version that has been tested and is known to work. This is a bit inflexible when experimenting with and verifying newer JDK versions. This change proposes to add a new system property that allows to set the upper bound of the range of Java versions supported. With this change, and the accompanying small gradle change, then one can verify newer JDKs as follows: ``` CI=true; RUNTIME_JAVA_HOME=/Users/chegar/binaries/jdk-24.jdk-ea-b23/Contents/Home ./gradlew :lucene:core:test -Dorg.apache.lucene.vectorization.upperJavaFeatureVersion=24 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Allow easier verification of the Panama Vectorization provider with newer Java versions [lucene]
dweiss commented on code in PR #13986: URL: https://github.com/apache/lucene/pull/13986#discussion_r1836911525 ## gradle/testing/defaults-tests.gradle: ## @@ -128,7 +128,13 @@ allprojects { jvmArgs '--add-modules', 'jdk.management' // Enable the vector incubator module on supported Java versions: - if (rootProject.vectorIncubatorJavaVersions.contains(rootProject.runtimeJavaVersion)) { + def v = JavaVersion.VERSION_1_1 + def prop = providers.systemProperty("org.apache.lucene.vectorization.upperJavaFeatureVersion") Review Comment: It'd be probably more consistent to use the propertyOrDefault "function" that we defined globally to allow passing such properties via -P (gradle's project properties) or -D (system properties). You can provide the default as the second argument - look at any existing call of that function. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]
shatejas commented on code in PR #13985: URL: https://github.com/apache/lucene/pull/13985#discussion_r1836913408 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -113,6 +114,25 @@ public Lucene99HnswVectorsReader(SegmentReadState state, FlatVectorsReader flatV } } + private Lucene99HnswVectorsReader( + Lucene99HnswVectorsReader reader, KnnVectorsReader flatVectorsReader) { +assert flatVectorsReader instanceof FlatVectorsReader; Review Comment: > maybe we don't even need a cast if we make getMergeInstance() return a FlatVectorsReader Actually figured out a way to do this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] KnnFloatVectorQuery#toString should show the filter [lucene]
viswanathk commented on issue #13983: URL: https://github.com/apache/lucene/issues/13983#issuecomment-2469602539 Seems like a good first issue - I can contribute this @jpountz. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org