[GitHub] [lucene] jimczi commented on a diff in pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
jimczi commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1334758274 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java: ## @@ -0,0 +1,851 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [lucene] jimczi commented on a diff in pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
jimczi commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1334746185 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java: ## @@ -0,0 +1,851 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [lucene] jimczi commented on a diff in pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
jimczi commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1334738549 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java: ## @@ -0,0 +1,851 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [lucene] gsmiller commented on pull request #12560: Defer #advanceExact on expression dependencies until their values are needed

2023-09-22 Thread via GitHub
gsmiller commented on PR #12560: URL: https://github.com/apache/lucene/pull/12560#issuecomment-1731814078 Circling back on this: For Amazon's Product Search engine, we make fairly heavy use of these expression implementations. I pulled this change into our Lucene fork early (currently on 9.

[GitHub] [lucene] easyice commented on pull request #12557: Improve refresh speed with softdelete enable

2023-09-22 Thread via GitHub
easyice commented on PR #12557: URL: https://github.com/apache/lucene/pull/12557#issuecomment-1731767546 Update: when we call `softUpdateDocument` for a segment that already has some deleted doc, it will iterate all the deleted doc use `ReadersAndUpdates#MergedDocValues#onDiskDocValu

[GitHub] [lucene] benwtrent commented on a diff in pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1334508429 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [lucene] rmuir commented on a diff in pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
rmuir commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1334477512 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

[GitHub] [lucene] tveasey commented on pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
tveasey commented on PR #12582: URL: https://github.com/apache/lucene/pull/12582#issuecomment-1731530040 > @tveasey helped me do some empirical analysis here and can provide some numbers. So the rationale is quite simple as Ben said. If you change the upper and lower quantiles very l

[GitHub] [lucene] uschindler commented on a diff in pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
uschindler commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1334450128 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [lucene] uschindler commented on a diff in pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
uschindler commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1334448931 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,209 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [lucene] benwtrent commented on pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
benwtrent commented on PR #12582: URL: https://github.com/apache/lucene/pull/12582#issuecomment-1731463225 > Do we know why search is faster? Is it mostly because working on the quantized vectors requires a lower memory bandwi[d]th? Search is faster in two regards: - PanamaVec

[GitHub] [lucene] benwtrent commented on a diff in pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1334388920 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java: ## @@ -0,0 +1,851 @@ +/* + * Licensed to the Apache Software Fou

[GitHub] [lucene] jpountz commented on a diff in pull request #12582: Add new int8 scalar quantization to HNSW codec

2023-09-22 Thread via GitHub
jpountz commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1334309792 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/QuantizedVectorsWriter.java: ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

[GitHub] [lucene] rmuir commented on a diff in pull request #12583: Fix hidden range embedded in UAX29URLEmail grammar

2023-09-22 Thread via GitHub
rmuir commented on code in PR #12583: URL: https://github.com/apache/lucene/pull/12583#discussion_r1334328967 ## lucene/analysis/common/src/test/org/apache/lucene/analysis/email/TestUAX29URLEmailAnalyzer.java: ## @@ -433,9 +433,9 @@ public void testMailtoSchemeEmails() throws Ex

[GitHub] [lucene] jpountz commented on pull request #12526: Speed up disjunctions by computing estimations of the score of the k-th top hit up-front.

2023-09-22 Thread via GitHub
jpountz commented on PR #12526: URL: https://github.com/apache/lucene/pull/12526#issuecomment-1731162359 > Maybe we should add OrHighVeryLow to nightly benchy too? @mikemccand I started looking into this, but my enwiki (`enwiki-20120502-lines-with-random-label.txt`) seems to have sli

[GitHub] [lucene] iverase commented on a diff in pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-22 Thread via GitHub
iverase commented on code in PR #12528: URL: https://github.com/apache/lucene/pull/12528#discussion_r1334077045 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -317,6 +329,18 @@ default void visit(DocIdSetIterator iterator, byte[] packedValue) throws IOE

[GitHub] [lucene] iverase commented on a diff in pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-22 Thread via GitHub
iverase commented on code in PR #12528: URL: https://github.com/apache/lucene/pull/12528#discussion_r1334077045 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -317,6 +329,18 @@ default void visit(DocIdSetIterator iterator, byte[] packedValue) throws IOE

[GitHub] [lucene] iverase commented on a diff in pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-22 Thread via GitHub
iverase commented on code in PR #12528: URL: https://github.com/apache/lucene/pull/12528#discussion_r1334030455 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -281,6 +297,12 @@ public interface PointTree extends Cloneable { * @lucene.experimental

[GitHub] [lucene] iverase commented on a diff in pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-22 Thread via GitHub
iverase commented on code in PR #12528: URL: https://github.com/apache/lucene/pull/12528#discussion_r1334028608 ## lucene/core/src/java/org/apache/lucene/index/PointValues.java: ## @@ -228,6 +228,22 @@ public enum Relation { CELL_CROSSES_QUERY }; + /** Math states for

[GitHub] [lucene] vsop-479 commented on pull request #12528: Early terminate visit BKD leaf when current value greater than upper point in sorted dim.

2023-09-22 Thread via GitHub
vsop-479 commented on PR #12528: URL: https://github.com/apache/lucene/pull/12528#issuecomment-1730932421 @iverase I replaced int values with static variables. Please take a look. Actually, i used enum to define the match states in pre version. but it downgraded the performance a little.

[GitHub] [lucene] javanna commented on pull request #12183: Make TermStates#build concurrent

2023-09-22 Thread via GitHub
javanna commented on PR #12183: URL: https://github.com/apache/lucene/pull/12183#issuecomment-1730930146 Great to see this merged, thanks @shubhamvishu for all the work as well as patience as we were figuring out a way forward! -- This is an automated message from the Apache Git Service.