Re: [PR] read MSB VLong in new way [lucene]

2023-10-17 Thread via GitHub
gf2121 commented on PR #12661: URL: https://github.com/apache/lucene/pull/12661#issuecomment-1765808689 Hi @jpountz , Thanks a lot for the suggestion! > another option could be to encode the number of supplementary bytes using unary coding (like UTF8). This is a great idea that

Re: [PR] Scorer should sum up scores into a double [lucene]

2023-10-17 Thread via GitHub
jpountz commented on code in PR #12682: URL: https://github.com/apache/lucene/pull/12682#discussion_r1361646368 ## lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java: ## @@ -266,7 +265,7 @@ public float score() throws IOException { score += optScorer.score

Re: [PR] read MSB VLong in new way [lucene]

2023-10-17 Thread via GitHub
jpountz commented on PR #12661: URL: https://github.com/apache/lucene/pull/12661#issuecomment-1765890646 Oh your explanation makes sense, and I agree with you that a more efficient encoding would unlikely help conterbalance the fact that more arcs need to be read per output. So this loo

Re: [PR] Scorer should sum up scores into a double [lucene]

2023-10-17 Thread via GitHub
shubhamvishu commented on code in PR #12682: URL: https://github.com/apache/lucene/pull/12682#discussion_r1361736510 ## lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java: ## @@ -266,7 +265,7 @@ public float score() throws IOException { score += optScorer.

Re: [PR] Scorer should sum up scores into a double [lucene]

2023-10-17 Thread via GitHub
shubhamvishu commented on code in PR #12682: URL: https://github.com/apache/lucene/pull/12682#discussion_r1361737240 ## lucene/core/src/java/org/apache/lucene/search/similarities/TFIDFSimilarity.java: ## @@ -504,9 +504,9 @@ public TFIDFScorer(float boost, Explanation idf, float[

Re: [PR] Scorer should sum up scores into a double [lucene]

2023-10-17 Thread via GitHub
jpountz commented on code in PR #12682: URL: https://github.com/apache/lucene/pull/12682#discussion_r1361739665 ## lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java: ## @@ -266,7 +265,7 @@ public float score() throws IOException { score += optScorer.score

Re: [PR] read MSB VLong in new way [lucene]

2023-10-17 Thread via GitHub
gf2121 commented on PR #12661: URL: https://github.com/apache/lucene/pull/12661#issuecomment-1765964640 > I wonder if extending the Outputs class directly would help, instead of storing data in an opaque byte[]? Yes ,The reuse is exactly what `Outputs` wants to do ! (see this [todo](

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
jpountz commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1765975335 If I read correctly, this query ends up calling `LeafReader#searchNearestNeighbors` with k=Integer.MAX_VALUE, which will not only run in O(maxDoc) time but also use O(maxDoc) memory. I d

Re: [PR] Scorer should sum up scores into a double [lucene]

2023-10-17 Thread via GitHub
shubhamvishu commented on PR #12682: URL: https://github.com/apache/lucene/pull/12682#issuecomment-1765985160 Thanks @jpountz for the review! I have addressed the comments in the new revision. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Scorer should sum up scores into a double [lucene]

2023-10-17 Thread via GitHub
shubhamvishu commented on code in PR #12682: URL: https://github.com/apache/lucene/pull/12682#discussion_r1361773569 ## lucene/core/src/java/org/apache/lucene/search/ReqOptSumScorer.java: ## @@ -266,7 +265,7 @@ public float score() throws IOException { score += optScorer.

Re: [PR] Add a merge policy wrapper that performs recursive graph bisection on merge. [lucene]

2023-10-17 Thread via GitHub
jpountz commented on code in PR #12622: URL: https://github.com/apache/lucene/pull/12622#discussion_r1361783707 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -5144,20 +5145,71 @@ public int length() { } mergeReaders.add(wrappedReader);

Re: [PR] Add a merge policy wrapper that performs recursive graph bisection on merge. [lucene]

2023-10-17 Thread via GitHub
jpountz commented on code in PR #12622: URL: https://github.com/apache/lucene/pull/12622#discussion_r1361793823 ## lucene/core/src/java/org/apache/lucene/index/SortingCodecReader.java: ## @@ -468,7 +468,11 @@ public void checkIntegrity() throws IOException { @Override

Re: [PR] Add a merge policy wrapper that performs recursive graph bisection on merge. [lucene]

2023-10-17 Thread via GitHub
jpountz commented on code in PR #12622: URL: https://github.com/apache/lucene/pull/12622#discussion_r1361798124 ## lucene/core/src/java/org/apache/lucene/index/SlowCompositeCodecReaderWrapper.java: ## @@ -0,0 +1,998 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Add a merge policy wrapper that performs recursive graph bisection on merge. [lucene]

2023-10-17 Thread via GitHub
jpountz commented on code in PR #12622: URL: https://github.com/apache/lucene/pull/12622#discussion_r1361799042 ## lucene/core/src/java/org/apache/lucene/index/SlowCompositeCodecReaderWrapper.java: ## @@ -0,0 +1,998 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Add a merge policy wrapper that performs recursive graph bisection on merge. [lucene]

2023-10-17 Thread via GitHub
jpountz commented on code in PR #12622: URL: https://github.com/apache/lucene/pull/12622#discussion_r1361802385 ## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ## @@ -5144,20 +5145,71 @@ public int length() { } mergeReaders.add(wrappedReader);

Re: [PR] [WIP] first cut at bounding the NodeHash size during FST compilation [lucene]

2023-10-17 Thread via GitHub
mikemccand commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1361812236 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -17,50 +17,80 @@ package org.apache.lucene.util.fst; import java.io.IOException; -import

Re: [PR] Use radix sort to speed up the sorting of terms in TermInSetQuery [lucene]

2023-10-17 Thread via GitHub
gf2121 merged PR #12587: URL: https://github.com/apache/lucene/pull/12587 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] [WIP] first cut at bounding the NodeHash size during FST compilation [lucene]

2023-10-17 Thread via GitHub
mikemccand commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1361814551 ## lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java: ## @@ -99,31 +87,23 @@ public class FSTCompiler { * tuning and tweaking, see {@link Builder

Re: [PR] [WIP] first cut at bounding the NodeHash size during FST compilation [lucene]

2023-10-17 Thread via GitHub
mikemccand commented on PR #12633: URL: https://github.com/apache/lucene/pull/12633#issuecomment-1766041651 > > With the PR, you unfortunately cannot easily say "give me a minimal FST at all costs", like you can with main today. You'd have to keep trying larger and larger NodeHash sizes unt

Re: [PR] Optimize outputs accumulating as MSB VLong outputs sharing more output prefix [lucene]

2023-10-17 Thread via GitHub
gf2121 commented on PR #12661: URL: https://github.com/apache/lucene/pull/12661#issuecomment-1766048357 Hi @mikemccand , it would be great if you can take a look too :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Use MergeSorter in StableStringSorter [lucene]

2023-10-17 Thread via GitHub
gf2121 merged PR #12652: URL: https://github.com/apache/lucene/pull/12652 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] [WIP] first cut at bounding the NodeHash size during FST compilation [lucene]

2023-10-17 Thread via GitHub
mikemccand commented on PR #12633: URL: https://github.com/apache/lucene/pull/12633#issuecomment-1766082689 Thanks for the suggestions @dungba88! I took the approach you suggested, with a few more pushed commits just now. Despite the increase in `nocommit`s I think this is actually close!

Re: [PR] Remove over-counting of deleted terms [lucene]

2023-10-17 Thread via GitHub
gf2121 merged PR #12586: URL: https://github.com/apache/lucene/pull/12586 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] Speed up TestIndexOrDocValuesQuery. [lucene]

2023-10-17 Thread via GitHub
jpountz merged PR #12672: URL: https://github.com/apache/lucene/pull/12672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Specialize `BlockImpactsDocsEnum#nextDoc()`. [lucene]

2023-10-17 Thread via GitHub
jpountz merged PR #12670: URL: https://github.com/apache/lucene/pull/12670 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-17 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1362016333 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,316 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Fix lazy decoding of frequencies in `BlockImpactsDocsEnum`. [lucene]

2023-10-17 Thread via GitHub
jpountz commented on PR #12668: URL: https://github.com/apache/lucene/pull/12668#issuecomment-1766342540 Even though the speedup is less pronounced than in the above luceneutil run, there seems to be an actual speedup in nightly benchmarks for boolean queries. E.g. the last 3 data points of

Re: [I] analysis-stempel incorrect tokens generation for numbers [LUCENE-10290] [lucene]

2023-10-17 Thread via GitHub
tomsquest commented on issue #11326: URL: https://github.com/apache/lucene/issues/11326#issuecomment-1766389365 This issue occurred to us also, and not only for numbers. Actually, token finishing by `1` will be stemmed! ``` GET _analyze { "tokenizer": "standard", "filt

Re: [PR] Reduce collection operations when minShouldMatch == 0. [lucene]

2023-10-17 Thread via GitHub
jpountz commented on PR #12602: URL: https://github.com/apache/lucene/pull/12602#issuecomment-1766428044 I would be surprisid if this change would yield a noticeable speedup? Does it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-17 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1362208743 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1149 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-17 Thread via GitHub
mayya-sharipova commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1362208743 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1149 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] [BROKEN, for reference only] concurrent hnsw [lucene]

2023-10-17 Thread via GitHub
msokolov commented on code in PR #12683: URL: https://github.com/apache/lucene/pull/12683#discussion_r1362245604 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswGraph.java: ## @@ -59,11 +60,26 @@ protected HnswGraph() {} * * @param level level of the graph *

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766741617 > If I read correctly, this query ends up calling LeafReader#searchNearestNeighbors with k=Integer.MAX_VALUE No, we're calling the [new API](https://github.com/apache/lucene/blob

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-17 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1362432970 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,316 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-17 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1362437535 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,316 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-17 Thread via GitHub
benwtrent commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1362456237 ## lucene/core/src/java/org/apache/lucene/util/hnsw/InitializedHnswGraphBuilder.java: ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-17 Thread via GitHub
benwtrent commented on code in PR #12657: URL: https://github.com/apache/lucene/pull/12657#discussion_r1362457390 ## lucene/core/src/java/org/apache/lucene/util/hnsw/IncrementalHnswGraphMerger.java: ## @@ -0,0 +1,197 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
jpountz commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766795903 Thanks for explaining, I had overlooked how the `Integer.MAX_VALUE` was used indeed. I'm still interested in figuring out if we can have stronger guarantees on the worst-case memory usag

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1362472568 ## lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1362475149 ## lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1362474206 ## lucene/core/src/java/org/apache/lucene/search/AbstractRnnVectorQuery.java: ## @@ -0,0 +1,105 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on code in PR #12679: URL: https://github.com/apache/lucene/pull/12679#discussion_r1362476143 ## lucene/core/src/java/org/apache/lucene/search/RnnCollector.java: ## @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + *

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766834111 Thanks for the review @shubhamvishu! Addressed some of the comments above > Is it right to call it a radius-based search here? I think of it as finding all results within a

Re: [PR] Fix SynonymQuery equals implementation [lucene]

2023-10-17 Thread via GitHub
mingshl commented on PR #12260: URL: https://github.com/apache/lucene/pull/12260#issuecomment-1766881156 Thank you! @mkhludnev -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Extract the hnsw graph merging from being part of the vector writer [lucene]

2023-10-17 Thread via GitHub
benwtrent merged PR #12657: URL: https://github.com/apache/lucene/pull/12657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Optimize outputs accumulating as MSB VLong outputs sharing more output prefix [lucene]

2023-10-17 Thread via GitHub
gf2121 commented on PR #12661: URL: https://github.com/apache/lucene/pull/12661#issuecomment-1766958252 An idea comes to me that maybe we do not really need to do combine all these `BytesRef`s to a single `BytesRef`, we can just build a `DataInput` over these `BytesRef`s to read. Luckily, o

[PR] TaskExecutor to cancel all tasks on exception [lucene]

2023-10-17 Thread via GitHub
javanna opened a new pull request, #12689: URL: https://github.com/apache/lucene/pull/12689 When operations are parallelized, like query rewrite, or search, or createWeight, one of the tasks may throw an exception. In that case we wait for all tasks to be completed before re-throwing th

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
benwtrent commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766983182 > I think of it as finding all results within a high-dimensional circle / sphere / equivalent, dot-product, cosine, etc. don't really follow that same idea as you point out. I w

Re: [PR] TaskExecutor to cancel all tasks on exception [lucene]

2023-10-17 Thread via GitHub
javanna commented on code in PR #12689: URL: https://github.com/apache/lucene/pull/12689#discussion_r1362620375 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -64,64 +67,124 @@ public final class TaskExecutor { * @param the return type of the task

Re: [PR] TaskExecutor to cancel all tasks on exception [lucene]

2023-10-17 Thread via GitHub
javanna commented on code in PR #12689: URL: https://github.com/apache/lucene/pull/12689#discussion_r1362621063 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -64,64 +67,124 @@ public final class TaskExecutor { * @param the return type of the task

Re: [PR] TaskExecutor to cancel all tasks on exception [lucene]

2023-10-17 Thread via GitHub
javanna commented on code in PR #12689: URL: https://github.com/apache/lucene/pull/12689#discussion_r1362621950 ## lucene/core/src/test/org/apache/lucene/search/TestTaskExecutor.java: ## @@ -43,7 +47,8 @@ public class TestTaskExecutor extends LuceneTestCase { public static vo

Re: [PR] TaskExecutor to cancel all tasks on exception [lucene]

2023-10-17 Thread via GitHub
javanna commented on code in PR #12689: URL: https://github.com/apache/lucene/pull/12689#discussion_r1362621950 ## lucene/core/src/test/org/apache/lucene/search/TestTaskExecutor.java: ## @@ -43,7 +47,8 @@ public class TestTaskExecutor extends LuceneTestCase { public static vo

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1766995337 ### Benchmarks Using the vector file from https://home.apache.org/~sokolov/enwiki-20120502-lines-1k-100d.vec (enwiki dataset, unit vectors, 100 dimensions) The setup was 1

Re: [PR] Add support for radius-based vector searches [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1767022898 > stronger guarantees on the worst-case memory usage Totally agreed @jpountz! It is very easy to go wrong in the new API, specially if the user passes a low threshold (high radius

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-17 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1362661760 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsWriter.java: ## @@ -0,0 +1,1149 @@ +/* + * Licensed to the Apache Software Foundation (A

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-17 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1362664506 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java: ## @@ -0,0 +1,782 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-17 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1362665321 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99ScalarQuantizedVectorsWriter.java: ## @@ -0,0 +1,782 @@ +/* + * Licensed to the Apache Software Fou

Re: [I] [DISCUSS] Should there be a threshold-based vector search API? [lucene]

2023-10-17 Thread via GitHub
kaivalnp commented on issue #12579: URL: https://github.com/apache/lucene/issues/12579#issuecomment-1767112899 > one other thing to think about is https://weaviate.io/blog/weaviate-1-20-release#autocut Interesting! They [seem to](https://github.com/weaviate/weaviate/blob/c382dcbe6ff0

Re: [PR] Add new int8 scalar quantization to HNSW codec [lucene]

2023-10-17 Thread via GitHub
benwtrent commented on code in PR #12582: URL: https://github.com/apache/lucene/pull/12582#discussion_r1362725464 ## lucene/core/src/java/org/apache/lucene/util/ScalarQuantizer.java: ## @@ -0,0 +1,267 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Move private static classes or functions out of DoubleValuesSource [lucene]

2023-10-17 Thread via GitHub
gsmiller commented on PR #12671: URL: https://github.com/apache/lucene/pull/12671#issuecomment-1767291109 Thanks for your further thoughts @shubhamvishu. Getting more opinions is always good, and like I said, I don't feel strongly enough about this change to block moving forward with it or

[PR] Remove direct dependency of NodeHash to FST [lucene]

2023-10-17 Thread via GitHub
dungba88 opened a new pull request, #12690: URL: https://github.com/apache/lucene/pull/12690 ### Description Follow-up of https://github.com/apache/lucene/pull/12646. NodeHash still depends on both FSTCompiler and FST. With the current method signature, one can create the NodeHash wi

Re: [PR] [WIP] first cut at bounding the NodeHash size during FST compilation [lucene]

2023-10-17 Thread via GitHub
dungba88 commented on code in PR #12633: URL: https://github.com/apache/lucene/pull/12633#discussion_r1363098628 ## lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java: ## @@ -17,79 +17,177 @@ package org.apache.lucene.util.fst; import java.io.IOException; -import

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-10-17 Thread via GitHub
nitirajrathore commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1767662289 I was able to run tests on wiki dataset using the luceneutils package. The [results shows](https://github.com/mikemccand/luceneutil/pull/236) that even with a single segment

Re: [PR] Refactor ByteBlockPool so it is just a "shift/mask big array" [lucene]

2023-10-17 Thread via GitHub
iverase merged PR #12625: URL: https://github.com/apache/lucene/pull/12625 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Optimize outputs accumulating as MSB VLong outputs sharing more output prefix [lucene]

2023-10-17 Thread via GitHub
gf2121 commented on PR #12661: URL: https://github.com/apache/lucene/pull/12661#issuecomment-1767756956 > So this looks like a hard search/space trade-off: we either get fast reads or good compression but we can't get both? IMO theoretically yes. We ignored some potential optimization

Re: [PR] Optimize outputs accumulating as MSB VLong outputs sharing more output prefix [lucene]

2023-10-17 Thread via GitHub
gf2121 commented on code in PR #12661: URL: https://github.com/apache/lucene/pull/12661#discussion_r1363317643 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/FieldReader.java: ## @@ -118,13 +118,11 @@ long readVLongOutput(DataInput in) throws IOException {