Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-03 Thread via GitHub
gf2121 merged PR #12604: URL: https://github.com/apache/lucene/pull/12604 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] FST#Compiler allocates too much memory [lucene]

2023-10-03 Thread via GitHub
gf2121 closed issue #12598: FST#Compiler allocates too much memory URL: https://github.com/apache/lucene/issues/12598 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[I] Write VLong in opposite order for better outputs sharing in the FST [lucene]

2023-10-03 Thread via GitHub
gf2121 opened a new issue, #12620: URL: https://github.com/apache/lucene/issues/12620 ### Description > We also should really explore the TODO above to write vLong in opposite byte order -- this might save quite a bit of storage in the FST since outputs would share more prefixes. Aga

[I] Make FST BytesStore grow smoothly [lucene]

2023-10-03 Thread via GitHub
gf2121 opened a new issue, #12619: URL: https://github.com/apache/lucene/issues/12619 ### Description > Too bad we don't have a writer that uses tiny (like 8 bytes) block at first, but doubles size for each new block (16 bytes, 32 bytes next, etc.). Then we would naturally use log(si

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
tveasey commented on PR #12582: URL: https://github.com/apache/lucene/pull/12582#issuecomment-1745806230 I saw go by since I’m mentioned on the PR. It seems like Java can’t lay out byte vectors properly: https://stackoverflow.com/questions/14531235/in-java-is-it-more-efficient-to-use-byte-o

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
benwtrent commented on PR #12582: URL: https://github.com/apache/lucene/pull/12582#issuecomment-1745753817 I was doing some performance testing and was getting weird results. Quantization search and indexing build were marginally better or exactly the same. Attached a zip of async-p

Re: [I] Stop sorting determinize powersets unnecessarily [LUCENE-9983] [lucene]

2023-10-03 Thread via GitHub
zhaih closed issue #11022: Stop sorting determinize powersets unnecessarily [LUCENE-9983] URL: https://github.com/apache/lucene/issues/11022 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] avoid-circular-jar-checks [lucene]

2023-10-03 Thread via GitHub
risdenk merged PR #12618: URL: https://github.com/apache/lucene/pull/12618 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-03 Thread via GitHub
mikemccand commented on code in PR #12604: URL: https://github.com/apache/lucene/pull/12604#discussion_r1344628522 ## lucene/CHANGES.txt: ## @@ -163,6 +163,9 @@ Optimizations * GITHUB#12382: Faster top-level conjunctions on term queries when sorting by descending score. (Adr

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
jimczi commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1344560786 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[PR] avoid-circular-jar-checks [lucene]

2023-10-03 Thread via GitHub
risdenk opened a new pull request, #12618: URL: https://github.com/apache/lucene/pull/12618 ### Description jar-checks.gradle can go into an infinite loop if there are dependencies that could be circular. In Solr, grpc-utils has a compile dependency on grpc-core and grpc-core has a r

Re: [PR] Minor refactor for HNSW graph merging logic [lucene]

2023-10-03 Thread via GitHub
benwtrent merged PR #12616: URL: https://github.com/apache/lucene/pull/12616 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-03 Thread via GitHub
javanna commented on PR #12523: URL: https://github.com/apache/lucene/pull/12523#issuecomment-1745484064 This looks great thanks @quux00 ! Could you add an entry to the lucene/CHANGES.txt file under Lucene 9.9 please? -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1344514851 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1344513453 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [I] `FSTCompiler.Builder` should have an option to stream the FST bytes directly to Directory [lucene]

2023-10-03 Thread via GitHub
mikemccand commented on issue #12543: URL: https://github.com/apache/lucene/issues/12543#issuecomment-1745389156 @dungba88 raised a good point -- FST construction also needs to read prior bytes it wrote even as it is appending new bytes to the end of the file. Lucene's IndexInput/Outp

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-03 Thread via GitHub
gf2121 commented on PR #12604: URL: https://github.com/apache/lucene/pull/12604#issuecomment-1745354257 Thanks for all review and suggestions here! > @mikemccand maybe we can tradeoff here between segments we write the first time ie through IW and segments we write caused by a merge?

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-03 Thread via GitHub
quux00 commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1344356526 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +265,132 @@ protected LeafSlice[] slices(List leaves) { return slice

[I] Easy access TermStates's needStats flag [lucene]

2023-10-03 Thread via GitHub
yugushihuang opened a new issue, #12617: URL: https://github.com/apache/lucene/issues/12617 ### Description When we build TermStates we pass a flag needStats to determine if we want to front loading all term statistics, however, we did not have easy access to know if this flag has be

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-03 Thread via GitHub
quux00 commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1344264569 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slice

Re: [PR] Add readBytes method to RandomAccessInput [lucene]

2023-10-03 Thread via GitHub
iverase commented on PR #12600: URL: https://github.com/apache/lucene/pull/12600#issuecomment-1745118236 I am just trying to have an abstraction that can replace the BytesRef output for binary doc values with something that does not impose the internal representation of the bytes like Bytes

Re: [I] `FSTCompiler.Builder` should have an option to stream the FST bytes directly to Directory [lucene]

2023-10-03 Thread via GitHub
mikemccand commented on issue #12543: URL: https://github.com/apache/lucene/issues/12543#issuecomment-1745049853 Copying another comment from #10520: > Maybe we could first allow FSTCompiler to specify its own DataOutput even when building the tree on-the-fly, instead of always relyin

Re: [PR] Make LRUQueryCache respect Accountable queries on eviction and consisten… [lucene]

2023-10-03 Thread via GitHub
romseygeek merged PR #12614: URL: https://github.com/apache/lucene/pull/12614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

Re: [I] Improve FST performance created by Builder [LUCENE-9481] [lucene]

2023-10-03 Thread via GitHub
mikemccand commented on issue #10520: URL: https://github.com/apache/lucene/issues/10520#issuecomment-1745017914 > Maybe we could first allow FSTCompiler to specify its own DataOutput even when building the tree on-the-fly, instead of always relying on BytesStore? And we are free to choose

Re: [PR] Add readBytes method to RandomAccessInput [lucene]

2023-10-03 Thread via GitHub
uschindler commented on PR #12600: URL: https://github.com/apache/lucene/pull/12600#issuecomment-1744981658 In general to me it is still questionable if we really need a bulk random access byte[] reader. I am partly agree with thiy, but if somebody asks for float[] or long[] bulk reads with

Re: [PR] Add readBytes method to RandomAccessInput [lucene]

2023-10-03 Thread via GitHub
uschindler commented on PR #12600: URL: https://github.com/apache/lucene/pull/12600#issuecomment-1744979771 I changed the https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/ (Linux MMAP Job to use your branch). Please wait a bit until the checker is happy, as it tries all differet java ver

Re: [PR] Add readBytes method to RandomAccessInput [lucene]

2023-10-03 Thread via GitHub
uschindler commented on code in PR #12600: URL: https://github.com/apache/lucene/pull/12600#discussion_r1344093638 ## lucene/core/src/java19/org/apache/lucene/store/MemorySegmentIndexInput.java: ## @@ -168,6 +168,28 @@ private void readBytesBoundary(byte[] b, int offset, int le

[PR] Minor refactor for HNSW graph merging logic [lucene]

2023-10-03 Thread via GitHub
benwtrent opened a new pull request, #12616: URL: https://github.com/apache/lucene/pull/12616 This is a minor refactor of HNSW graph merging logic. Instead of directly checking the KnnVectorReader version, this commit adjusts the logic to see if a specific interface is satisfied for r

Re: [I] `FSTCompiler.Builder` should have an option to stream the FST bytes directly to Directory [lucene]

2023-10-03 Thread via GitHub
mikemccand commented on issue #12543: URL: https://github.com/apache/lucene/issues/12543#issuecomment-1744945237 Copying the comment from #10520 that's really about this issue: > I'm planning to refactor the BytesStore into an interface that can be chosen from the FST builder. And one

Re: [I] Improve FST performance created by Builder [LUCENE-9481] [lucene]

2023-10-03 Thread via GitHub
mikemccand commented on issue #10520: URL: https://github.com/apache/lucene/issues/10520#issuecomment-1744944312 > I'm planning to refactor the BytesStore into an interface that can be chosen from the FST builder. And one can decide whether on-heap or off-heap or on-heap without blocks is b

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1344049288 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1344048515 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
jimczi commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1344026176 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1344010395 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
jimczi commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1343976622 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1343965288 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [I] Improve FST performance created by Builder [LUCENE-9481] [lucene]

2023-10-03 Thread via GitHub
dungba88 commented on issue #10520: URL: https://github.com/apache/lucene/issues/10520#issuecomment-1744786263 I'm planning to refactor the BytesStore into an interface that can be chosen from the FST builder. And one can decide whether on-heap or off-heap or on-heap without blocks is best

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-03 Thread via GitHub
jimczi commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1343896112 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1170 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

Re: [I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-03 Thread via GitHub
benwtrent commented on issue #12615: URL: https://github.com/apache/lucene/issues/12615#issuecomment-1744763221 I do think Lucene's read-only segment based architecture leads itself to support quantization (required for DiskANN). It would be an interesting experiment to see how index

Re: [PR] Make QueryCache respect Accountable queries on eviction and consisten… [lucene]

2023-10-03 Thread via GitHub
gtroitskiy commented on PR #12614: URL: https://github.com/apache/lucene/pull/12614#issuecomment-1744720246 Thanks for reviewing! I ran tidy and made some refactoring -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[I] Should we explore DiskANN for aKNN vector search? [lucene]

2023-10-03 Thread via GitHub
mikemccand opened a new issue, #12615: URL: https://github.com/apache/lucene/issues/12615 ### Description I came across this compelling sounding [JVector project](https://foojay.io/today/jvector-1-0/) which looks to have awesome QPS performance. It uses [DiskANN](https://www.

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-03 Thread via GitHub
s1monw commented on PR #12604: URL: https://github.com/apache/lucene/pull/12604#issuecomment-1744672419 @mikemccand maybe we can tradeoff here between segments we write the first time ie through IW and segments we write caused by a merge? it might mitigate your concerns. -- This is an au

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-03 Thread via GitHub
javanna commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343870579 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +265,132 @@ protected LeafSlice[] slices(List leaves) { return slic

Re: [PR] Reduce FST block size for BlockTreeTermsWriter [lucene]

2023-10-03 Thread via GitHub
mikemccand commented on code in PR #12604: URL: https://github.com/apache/lucene/pull/12604#discussion_r1343852045 ## lucene/CHANGES.txt: ## @@ -163,6 +163,8 @@ Optimizations * GITHUB#12382: Faster top-level conjunctions on term queries when sorting by descending score. (Adr

Re: [I] FST#Compiler allocates too much memory [lucene]

2023-10-03 Thread via GitHub
mikemccand commented on issue #12598: URL: https://github.com/apache/lucene/issues/12598#issuecomment-1744644275 Thanks @gf2121 -- this is a great discovery (and thank you https://blunders.io for the [awesome integrated profiling in Lucene's nightly benchmarks](https://blunders.io/posts/luc

Re: [PR] Run top-level conjunctions of term queries with a specialized BulkScorer. [lucene]

2023-10-03 Thread via GitHub
jpountz commented on PR #12382: URL: https://github.com/apache/lucene/pull/12382#issuecomment-174462 I just pushed a fix: https://github.com/apache/lucene/commit/3f81f2f315745f86de3b516d53bf02fde61015a3. -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Make QueryCache respect Accountable queries on eviction and consisten… [lucene]

2023-10-03 Thread via GitHub
romseygeek commented on PR #12614: URL: https://github.com/apache/lucene/pull/12614#issuecomment-1744588907 Can you run `./gradlew tidy` at the root of the project to make sure the formatting is all correct? -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Make QueryCache respect Accountable queries on eviction and consisten… [lucene]

2023-10-03 Thread via GitHub
romseygeek commented on code in PR #12614: URL: https://github.com/apache/lucene/pull/12614#discussion_r1343791876 ## lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java: ## @@ -385,7 +385,9 @@ public void clearQuery(Query query) { private void onEviction(Query

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-03 Thread via GitHub
javanna commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343776898 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slic

Re: [PR] TaskExecutor waits for all tasks to complete before returning [lucene]

2023-10-03 Thread via GitHub
javanna commented on code in PR #12523: URL: https://github.com/apache/lucene/pull/12523#discussion_r1343749320 ## lucene/core/src/test/org/apache/lucene/search/TestIndexSearcher.java: ## @@ -267,7 +266,133 @@ protected LeafSlice[] slices(List leaves) { return slic

Re: [PR] Create a task executor when executor is not provided [lucene]

2023-10-03 Thread via GitHub
javanna merged PR #12606: URL: https://github.com/apache/lucene/pull/12606 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa