Re: [PR] Speedup integer functions for 128-bit neon vectors [lucene]

2023-10-18 Thread via GitHub
gf2121 commented on PR #12632: URL: https://github.com/apache/lucene/pull/12632#issuecomment-1770155713 > @gf2121 i think we could diagnose it further with https://github.com/travisdowns/avx-turbo Thanks @rmuir for profile guide! Sorry for the delay. It took me some time to app

Re: [PR] Consolidate the FSTStore and BytesStore in FST (#12543) [lucene]

2023-10-18 Thread via GitHub
dungba88 commented on code in PR #12691: URL: https://github.com/apache/lucene/pull/12691#discussion_r1364843168 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -552,14 +527,11 @@ public void save(DataOutput metaOut, DataOutput out) throws IOException {

Re: [I] ArrayIndexOutOfBoundsException when writing the FSTStore-backed FST with different DataOutput for meta [lucene]

2023-10-18 Thread via GitHub
dweiss commented on issue #12697: URL: https://github.com/apache/lucene/issues/12697#issuecomment-1770065868 Thank you for cleaning up these classes - it's a big improvement! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Avoid object construct when linear search [lucene]

2023-10-18 Thread via GitHub
gf2121 commented on code in PR #12692: URL: https://github.com/apache/lucene/pull/12692#discussion_r1364900678 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -1081,22 +1085,30 @@ public Arc findTargetArc(int labelToMatch, Arc follow, Arc arc, BytesRe }

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-10-18 Thread via GitHub
gf2121 closed pull request #12661: Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum URL: https://github.com/apache/lucene/pull/12661 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-10-18 Thread via GitHub
gf2121 commented on PR #12661: URL: https://github.com/apache/lucene/pull/12661#issuecomment-1769976958 The direction has changed and the conversation is too long to track so i raised https://github.com/apache/lucene/pull/12699 to make a summary of these. -- This is an automated message f

[PR] Optimize outputs accumulating for SegmentTermsEnum and IntersectTermsEnum [lucene]

2023-10-18 Thread via GitHub
gf2121 opened a new pull request, #12699: URL: https://github.com/apache/lucene/pull/12699 ## Description Previous talk is too long to track so i opened a new PR to make a summery here. More details are available in https://github.com/apache/lucene/pull/12661. After merging of http

Re: [PR] Consolidate the FSTStore and BytesStore in FST (#12543) [lucene]

2023-10-18 Thread via GitHub
dungba88 commented on code in PR #12691: URL: https://github.com/apache/lucene/pull/12691#discussion_r1364843168 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -552,14 +527,11 @@ public void save(DataOutput metaOut, DataOutput out) throws IOException {

[PR] Fix index out of bounds when writing FST to different metaOut (#12697) [lucene]

2023-10-18 Thread via GitHub
dungba88 opened a new pull request, #12698: URL: https://github.com/apache/lucene/pull/12698 ### Description Fix for #12697. This will move the writing of numBytes from `OnHeapFSTStore.writeTo()` to `FST.save()`. `OnHeapFSTStore.size()` will also be modified to return only the

[I] ArrayIndexOutOfBoundsException when writing the FSTStore-backed FST with different DataOutput for meta [lucene]

2023-10-18 Thread via GitHub
dungba88 opened a new issue, #12697: URL: https://github.com/apache/lucene/issues/12697 ### Description After writing the FSTStore-backed FST to DataOutput, and specifying a different DataOutput for meta, if we try to read from these (using the FST public ctor) we will get the follow

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-18 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1769299867 Here is the gist of my benchmark: https://gist.github.com/kaivalnp/79808017ed7666214540213d1e2a21cf I'm calculating the baseline / individual results as "count of vectors above t

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-18 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1769235109 Thanks for running this @benwtrent! I just had a couple of questions: 1. What was your baseline in the test? If the baseline / goal is to "get the K-Nearest Neighbors", then th

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-18 Thread via GitHub
benwtrent commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1769218762 @kaivalnp I see the issue with my test, you are specifically testing "post-filtering" on the top values, not just getting the top10 k. I understand my issue. Could you post your

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-18 Thread via GitHub
benwtrent commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1769123216 OK, I tried testing with KnnGraphTester. I indexed 100_000 normalized Cohere vectors (768 dims). With regular knn, recall@10: ``` recall latency nDocfa

Re: [I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-18 Thread via GitHub
gsmiller commented on issue #12696: URL: https://github.com/apache/lucene/issues/12696#issuecomment-1769095158 These results are really interesting! As another option, I wonder if it's worth thinking about this problem as a new codec (sandbox module to start?) that biases towards query spee

[I] Adding option to codec to disable patching in Lucene's PFOR encoding [lucene]

2023-10-18 Thread via GitHub
slow-J opened a new issue, #12696: URL: https://github.com/apache/lucene/issues/12696 ### Description Background: In https://github.com/Tony-X/search-benchmark-game we were comparing performance of Tantivy and Lucene. "One difference between Lucene and Tantivy is Lucene uses the "pat

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-18 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768958493 Sorry for the confusion, I tried renaming the branch from `radius-based-vector-search` to `similarity-based-vector-search` and the PR closed automatically. I guess I'm stuck with this b

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-10-18 Thread via GitHub
kaivalnp closed pull request #12679: Add support for similarity-based vector searches URL: https://github.com/apache/lucene/pull/12679 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-18 Thread via GitHub
mikemccand commented on code in PR #12653: URL: https://github.com/apache/lucene/pull/12653#discussion_r1364204027 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java: ## @@ -63,24 +63,23 @@ public abstract class MultiLevelSkipListWriter { /** for e

Re: [PR] Optimize computing number of levels in MultiLevelSkipListWriter#bufferSkip [lucene]

2023-10-18 Thread via GitHub
mikemccand commented on code in PR #12653: URL: https://github.com/apache/lucene/pull/12653#discussion_r1364203049 ## lucene/core/src/java/org/apache/lucene/codecs/MultiLevelSkipListWriter.java: ## @@ -63,6 +63,9 @@ public abstract class MultiLevelSkipListWriter { /** for eve

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-18 Thread via GitHub
benwtrent commented on PR #12582: URL: https://github.com/apache/lucene/pull/12582#issuecomment-1768925721 @uschindler I switch it around. Lucene95Codec is back, moved the previous HNSW format into backwards_codec and switched Lucene95Codec to use the Lucene99Hnsw format. -- This is an

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-18 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1364177301 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,317 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-18 Thread via GitHub
jmazanec15 commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1362956122 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,317 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-18 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768862106 > the Collector is full by flagging "incomplete" (I think this is possible) once a threshold is reached Do you mean that we return incomplete results? Instead, maybe we can

Re: [PR] Avoid object construct when linear search [lucene]

2023-10-18 Thread via GitHub
mikemccand commented on code in PR #12692: URL: https://github.com/apache/lucene/pull/12692#discussion_r1364130745 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -1081,22 +1085,30 @@ public Arc findTargetArc(int labelToMatch, Arc follow, Arc arc, BytesRe

Re: [PR] Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation [lucene]

2023-10-18 Thread via GitHub
mikemccand commented on PR #12633: URL: https://github.com/apache/lucene/pull/12633#issuecomment-1768776163 OK I think this is ready -- I removed/downgraded all `nocommit`s, added CHANGES entry, rebased to latest `main`. Tests and precommit passed for me (at least once). I set the d

Re: [PR] Record if block API has been used in SegmentInfo [lucene]

2023-10-18 Thread via GitHub
s1monw commented on PR #12685: URL: https://github.com/apache/lucene/pull/12685#issuecomment-1768768579 @jpountz I pushed new commits, wanna take a new look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-18 Thread via GitHub
benwtrent commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768737424 The results: https://github.com/apache/lucene/pull/12679#issuecomment-1766995337 Are astounding! I will try and replicate with Lucene Util. The numbers seem almost too goo

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-18 Thread via GitHub
benwtrent commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1768734871 > Something like a lazy-loading iterator, where we perform vector comparisons and determine whether a doc matches on #advance? I think @kaivalnp the thing to do would be to say

Re: [PR] Refactor ByteBlockPool so it is just a "shift/mask big array" [lucene]

2023-10-18 Thread via GitHub
mikemccand commented on PR #12625: URL: https://github.com/apache/lucene/pull/12625#issuecomment-1768710355 > What do you think of making a class like `ByteSlicePool` to separte concerns from other `TermsHashPerField` functionality? This sounds compelling to me. The byte slicing/inte

Re: [PR] Move private static classes or functions out of DoubleValuesSource [lucene]

2023-10-18 Thread via GitHub
shubhamvishu closed pull request #12671: Move private static classes or functions out of DoubleValuesSource URL: https://github.com/apache/lucene/pull/12671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Move private static classes or functions out of DoubleValuesSource [lucene]

2023-10-18 Thread via GitHub
shubhamvishu commented on PR #12671: URL: https://github.com/apache/lucene/pull/12671#issuecomment-1768664812 Thanks @gsmiller and @msokolov for sharing your views. I agree with your points and at this point this refactoring doesn't seem to provide much value, so I think we should just leav

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-18 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1364013877 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,267 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-18 Thread via GitHub
msokolov commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1364007944 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,267 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

Re: [PR] Specialize `BlockImpactsDocsEnum#nextDoc()`. [lucene]

2023-10-18 Thread via GitHub
jpountz commented on PR #12670: URL: https://github.com/apache/lucene/pull/12670#issuecomment-1768596280 This yielded a noticeable speedup on [`OrHighHigh`](http://people.apache.org/~mikemccand/lucenebench/OrHighHigh.html) and [`OrHighMed`](http://people.apache.org/~mikemccand/lucenebench/

Re: [PR] chore: update the Javadoc example in Analyzer [lucene]

2023-10-18 Thread via GitHub
msokolov merged PR #12693: URL: https://github.com/apache/lucene/pull/12693 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [I] normalize() override provided in Simple example in Analyzer class doc is missing String fieldName parameter [lucene]

2023-10-18 Thread via GitHub
msokolov closed issue #12666: normalize() override provided in Simple example in Analyzer class doc is missing String fieldName parameter URL: https://github.com/apache/lucene/issues/12666 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-18 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1363939444 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,316 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

[PR] [DRAFT] Add unsigned byte vector operations for uint8 quantization [lucene]

2023-10-18 Thread via GitHub
benwtrent opened a new pull request, #12694: URL: https://github.com/apache/lucene/pull/12694 {DRAFT} After finalizing work and merging: https://github.com/apache/lucene/pull/12582 Investigation on if adding unsigned vector operations should occur. Quantizing within `[0-255]`

Re: [PR] Avoid object construct when linear search [lucene]

2023-10-18 Thread via GitHub
gf2121 commented on code in PR #12692: URL: https://github.com/apache/lucene/pull/12692#discussion_r1363924060 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -1081,22 +1085,30 @@ public Arc findTargetArc(int labelToMatch, Arc follow, Arc arc, BytesRe }

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-18 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1363921374 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java: ## @@ -0,0 +1,782 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] Avoid object construct when linear search [lucene]

2023-10-18 Thread via GitHub
mikemccand commented on code in PR #12692: URL: https://github.com/apache/lucene/pull/12692#discussion_r1363842558 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -1081,22 +1085,30 @@ public Arc findTargetArc(int labelToMatch, Arc follow, Arc arc, BytesRe

Re: [PR] Fix SynonymQuery equals implementation [lucene]

2023-10-18 Thread via GitHub
javanna commented on PR #12260: URL: https://github.com/apache/lucene/pull/12260#issuecomment-1768403778 > I'm able to cherrypick this fix into branch_9_4, but I'm not sure if there'll be release 9.4.2 ever. Indeed, there won't be. -- This is an automated message from the Apache G

Re: [PR] [WIP] first cut at bounding the NodeHash size during FST compilation [lucene]

2023-10-18 Thread via GitHub
mikemccand commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1363823134 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -17,79 +17,177 @@ package org.apache.lucene.util.fst; import java.io.IOException; -impor

Re: [PR] [WIP] first cut at bounding the NodeHash size during FST compilation [lucene]

2023-10-18 Thread via GitHub
mikemccand commented on PR #12633: URL: https://github.com/apache/lucene/pull/12633#issuecomment-1768373972 OK, I switched the accounting to approximate RAM usage of the `NodeHash`, which is more intuitive for users. It behaves monodically / smoothly: as you give more RAM for the suffixes

[PR] chore: update the Javadoc example in Analyzer [lucene]

2023-10-18 Thread via GitHub
scampi opened a new pull request, #12693: URL: https://github.com/apache/lucene/pull/12693 Close #12666 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-18 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1363802913 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,267 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-18 Thread via GitHub
msokolov commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1363796956 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,267 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more +

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-10-18 Thread via GitHub
msokolov commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1768325379 I'm thinking of something like this: first build a fully-connected graph (no link removal, no enforcement of maxconns). Then apply a global pruning algorithm of some sort that atte

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-10-18 Thread via GitHub
msokolov commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1768315395 Have you any results how connectivity varies with maxconn? I found [this article](https://qdrant.tech/articles/filtrable-hnsw/) that talks about connectivity when filtering; anyway

Re: [PR] Avoid object construct when linear search [lucene]

2023-10-18 Thread via GitHub
gf2121 commented on PR #12692: URL: https://github.com/apache/lucene/pull/12692#issuecomment-1768310698 Result on wikimediumall (nothing changed obviously) : ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

Re: [PR] Move private static classes or functions out of DoubleValuesSource [lucene]

2023-10-18 Thread via GitHub
msokolov commented on PR #12671: URL: https://github.com/apache/lucene/pull/12671#issuecomment-1768298204 I think I share Greg's reluctance to make this big-looking change. Sorry, I think I was the instigator with the comment you referenced above. I confess I have always been confused that

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-18 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1363725046 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,316 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Improve refresh speed with softdelete enable [lucene]

2023-10-18 Thread via GitHub
gf2121 commented on PR #12557: URL: https://github.com/apache/lucene/pull/12557#issuecomment-1768169254 Thanks @easyice ! As we are exposing API like `softUpdateDocument(Term term, Iterable doc, Field... softDeletes)`. We cannot guarantee that all soft delete fields have the same valu

Re: [PR] TaskExecutor to cancel all tasks on exception [lucene]

2023-10-18 Thread via GitHub
javanna commented on code in PR #12689: URL: https://github.com/apache/lucene/pull/12689#discussion_r1363627307 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -64,64 +67,124 @@ public final class TaskExecutor { * @param the return type of the task

Re: [PR] TaskExecutor to cancel all tasks on exception [lucene]

2023-10-18 Thread via GitHub
javanna commented on code in PR #12689: URL: https://github.com/apache/lucene/pull/12689#discussion_r1363626228 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -64,64 +67,124 @@ public final class TaskExecutor { * @param the return type of the task

Re: [PR] Improve refresh speed with softdelete enable [lucene]

2023-10-18 Thread via GitHub
easyice commented on PR #12557: URL: https://github.com/apache/lucene/pull/12557#issuecomment-1768092984 I get some great suggestion from @gf2121 , we can get if the doc values has single value from `ReadersAndUpdates#onDiskDocValues` and `ReadersAndUpdates#updatesToApply` this can be avoi

[PR] Avoid object construct when linear search [lucene]

2023-10-18 Thread via GitHub
gf2121 opened a new pull request, #12692: URL: https://github.com/apache/lucene/pull/12692 ### Description This PR resolves a todo left in `FST` that we construct some useless objects during linear search. This is also an effort that tries to avoid `Outputs#read` as we have more outp

Re: [PR] Add a merge policy wrapper that performs recursive graph bisection on merge. [lucene]

2023-10-18 Thread via GitHub
jpountz commented on code in PR #12622: URL: https://github.com/apache/lucene/pull/12622#discussion_r1363535420 ## lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoin.java: ## @@ -113,6 +113,7 @@ public void testEmptyChildFilter() throws Exception { final Direc

Re: [PR] Sometimes intersect the essential clause and the best non-essential clause. [lucene]

2023-10-18 Thread via GitHub
jpountz commented on PR #12589: URL: https://github.com/apache/lucene/pull/12589#issuecomment-1768020392 Updated luceneutil results using wikibigall: ``` TaskQPS baseline StdDevQPS my_modified_version StdDevPct diff p-value

Re: [PR] TaskExecutor to cancel all tasks on exception [lucene]

2023-10-18 Thread via GitHub
jpountz commented on code in PR #12689: URL: https://github.com/apache/lucene/pull/12689#discussion_r1363513095 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -64,64 +67,124 @@ public final class TaskExecutor { * @param the return type of the task

Re: [PR] Consolidate the FSTStore and BytesStore in FST (#12543) [lucene]

2023-10-18 Thread via GitHub
dungba88 commented on code in PR #12691: URL: https://github.com/apache/lucene/pull/12691#discussion_r1363511529 ## lucene/core/src/java/org/apache/lucene/util/fst/FST.java: ## @@ -552,14 +527,11 @@ public void save(DataOutput metaOut, DataOutput out) throws IOException {

[PR] Consolidate the FSTStore and BytesStore in FST (#12543) [lucene]

2023-10-18 Thread via GitHub
dungba88 opened a new pull request, #12691: URL: https://github.com/apache/lucene/pull/12691 ### Description This change will make the transition to off-heap FST writing easier. The idea is to only build the FST after the end of the process (when calling `compile()`). - Conso

Re: [PR] Sometimes intersect the essential clause and the best non-essential clause. [lucene]

2023-10-18 Thread via GitHub
jpountz commented on PR #12589: URL: https://github.com/apache/lucene/pull/12589#issuecomment-1767952830 I moved the optimization as part of the partitioning logic so that it's easier to test. It's ready for review. -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Add a merge policy wrapper that performs recursive graph bisection on merge. [lucene]

2023-10-18 Thread via GitHub
gf2121 commented on code in PR #12622: URL: https://github.com/apache/lucene/pull/12622#discussion_r1363431399 ## lucene/join/src/test/org/apache/lucene/search/join/TestBlockJoin.java: ## @@ -113,6 +113,7 @@ public void testEmptyChildFilter() throws Exception { final Direct