Re: [PR] Simplify advancing on postings/impacts enums [lucene]

2023-11-15 Thread via GitHub
jpountz commented on PR #12810: URL: https://github.com/apache/lucene/pull/12810#issuecomment-1812099057 For reference, starting postings and skip lists at -1 changes file formats, so I'm keen to getting this change in 9.9 since we had to change the file format anyway because of the move fr

Re: [PR] Utilize exact kNN search when gathering k > numVectors in a segment [lucene]

2023-11-15 Thread via GitHub
jpountz commented on code in PR #12806: URL: https://github.com/apache/lucene/pull/12806#discussion_r1394090752 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -110,6 +110,12 @@ private TopDocs getLeafResults(LeafReaderContext ctx, Weight fil

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-11-15 Thread via GitHub
benwtrent commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1812408858 @nitirajrathore @msokolov I had an idea around this, and it will cost an extra 4bytes per node on each layer its a member (maybe we only need this on the bottom layer...) W

Re: [PR] Utilize exact kNN search when gathering k > numVectors in a segment [lucene]

2023-11-15 Thread via GitHub
benwtrent commented on PR #12806: URL: https://github.com/apache/lucene/pull/12806#issuecomment-1812421692 > The idea makes sense to me, what is less clear to me is whether this logic belongs to the Query or to the vector reader: should searchNearestNeighbors implicitly do a linear scan whe

Re: [PR] Minor change to IndexOrDocValuesQuery#toString [lucene]

2023-11-15 Thread via GitHub
mikemccand commented on PR #12791: URL: https://github.com/apache/lucene/pull/12791#issuecomment-1812469386 Nightly sparse (NYC taxis) benchy was a bit unhappy with this change because it (weirdly) relies on `Query.toString` (I tried to fix the benchy [here](https://github.com/mikemccand/lu

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-11-15 Thread via GitHub
msokolov commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1812559330 Is the problem primarily to do with single isolated nodes or do we also see disconnected subgraphs containing multiple nodes? I think this idea would prevent the isolated nodes, bu

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-11-15 Thread via GitHub
benwtrent commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1812605864 @msokolov good point. It seems to me we would only fully disconnect a sub-graph only if its very clustered. Is there a way to detect this in the diversity selection?

Re: [PR] Utilize exact kNN search when gathering k > numVectors in a segment [lucene]

2023-11-15 Thread via GitHub
jpountz commented on code in PR #12806: URL: https://github.com/apache/lucene/pull/12806#discussion_r1394256922 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -238,11 +238,23 @@ public void search(String field, float[] target, Kn

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-11-15 Thread via GitHub
msokolov commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1812647124 My memory of the way this diversity criterion has evolved is kind of hazy, but I believe in the very first implementation we would not impose any diversity check until the neighbor

Re: [PR] Simplify advancing on postings/impacts enums [lucene]

2023-11-15 Thread via GitHub
msokolov commented on PR #12810: URL: https://github.com/apache/lucene/pull/12810#issuecomment-1812678173 this sounds reasonable to me, and the code does seem simpler, but I'm not able to give a thorough review. +1 to rationalize / simplify even if it doesn't show significant peformance imp

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-11-15 Thread via GitHub
benwtrent commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1812901631 @nitirajrathore could you add something to [KnnGraphTester](https://github.com/mikemccand/luceneutil/blob/master/src/main/KnnGraphTester.java) that is a test for connectedness?

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-15 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1812941899 > You still need to score the vectors to realize that they are in the iteration set or not Right, I meant that we need not score all *other* vectors to determine if the vector it

Re: [PR] Add support for similarity-based vector searches [lucene]

2023-11-15 Thread via GitHub
kaivalnp commented on PR #12679: URL: https://github.com/apache/lucene/pull/12679#issuecomment-1812956627 > could you test on cohere with Max-inner product? Thanks, the gist was really helpful and gave some files including normalized and un-normalized vectors. I assume that since you

Re: [PR] Utilize exact kNN search when gathering k > numVectors in a segment [lucene]

2023-11-15 Thread via GitHub
benwtrent merged PR #12806: URL: https://github.com/apache/lucene/pull/12806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Improve vector search speed by using FixedBitSet [lucene]

2023-11-15 Thread via GitHub
jpountz commented on PR #12789: URL: https://github.com/apache/lucene/pull/12789#issuecomment-1813030726 ++ This feels similar to `IndexOrDocValuesQuery`: we probably can't guess the absolute best threshold, but we can probably figure out something that is right more often than wrong. Hopef

Re: [I] USearch integration and potential Vector Search performance improvements [lucene]

2023-11-15 Thread via GitHub
chadbrewbaker commented on issue #12502: URL: https://github.com/apache/lucene/issues/12502#issuecomment-1813112623 > Yes: > > * no external libraries for Lucene Core > * no native code Put it in an "examples" directory to show how to extend Lucene with JNI. If you have a $1

[PR] Simple rename of unreleased quantization parameter [lucene]

2023-11-15 Thread via GitHub
benwtrent opened a new pull request, #12811: URL: https://github.com/apache/lucene/pull/12811 the `quantile` parameter is actually a `confidence_interval` this is a simple rename of this parameter for the hnsw scalar quantized format. -- This is an automated message from the Apache Git Se

Re: [PR] Simple rename of unreleased quantization parameter [lucene]

2023-11-15 Thread via GitHub
benwtrent merged PR #12811: URL: https://github.com/apache/lucene/pull/12811 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

[PR] Introduce workflow for stale PRs [lucene]

2023-11-15 Thread via GitHub
stefanvodita opened a new pull request, #12813: URL: https://github.com/apache/lucene/pull/12813 PRs get stale and we miss out on good contributions. This workflow will mark PRs that are becoming stale. Addresses #12796 -- This is an automated message from the Apache Git Service.

Re: [I] Port PR management bot from Apache Beam [lucene]

2023-11-15 Thread via GitHub
stefanvodita commented on issue #12796: URL: https://github.com/apache/lucene/issues/12796#issuecomment-1813265420 +1 to starting super simple. I tried to hack a workflow for marking stale PRs (#12813). Fortunately, GitHub provides good [support](https://github.com/actions/stale) for this t

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-11-15 Thread via GitHub
Shibi-bala commented on PR #12626: URL: https://github.com/apache/lucene/pull/12626#issuecomment-1813343695 @uschindler Ah I needed to re-sync my forked repo 😅 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-15 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395069004 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar to

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-15 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395069004 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar to

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-15 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393547261 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -21,12 +21,13 @@ import java.util.List; import org.apache.lucene.store.DataInput; import

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-15 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395069004 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar to

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-11-15 Thread via GitHub
MarcusSorealheis commented on PR #12626: URL: https://github.com/apache/lucene/pull/12626#issuecomment-1813833061 Looks good now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-15 Thread via GitHub
dungba88 commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1813856323 Seems like this PR is getting long, so I spawned 2 PR out of it: - https://github.com/apache/lucene/pull/12814: Simplify `BytesStore` operations (which was changed to GrowableByteArra

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-15 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1393462969 ## lucene/core/src/java/org/apache/lucene/util/fst/OnHeapFSTStore.java: ## @@ -64,22 +66,13 @@ public FSTStore init(DataInput in, long numBytes) throws IOException {

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-15 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395069004 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar to

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-16 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395069004 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar to

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-16 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395333947 ## lucene/core/src/java/org/apache/lucene/util/fst/GrowableByteArrayDataOutput.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [I] MultiSimilarity.MultiSimScorer should sum up scores into a double [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on issue #12675: URL: https://github.com/apache/lucene/issues/12675#issuecomment-1814024776 @jpountz I think we can close this now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-11-16 Thread via GitHub
nitirajrathore commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1814046690 > What if we added an "incoming connection" count for every node? & > I think this idea would prevent the isolated nodes, but not fix the other case. I w

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on PR #12716: URL: https://github.com/apache/lucene/pull/12716#issuecomment-1814055888 > @shubhamvishu can we close this one? Any other things to try? Sure @mikemccand ! Maybe we could just try a rehash value between 2/3 and 3/4 as you mentioned earlier(how abo

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-16 Thread via GitHub
javanna commented on code in PR #12799: URL: https://github.com/apache/lucene/pull/12799#discussion_r1395446929 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -77,42 +75,21 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-11-16 Thread via GitHub
javanna commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-1814140254 Thanks for reviving this PR @zacharymorn ! the changes look good to me, having top score doc and top field collector managers sounds like a natural next step, and removes code duplication. I

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on code in PR #12799: URL: https://github.com/apache/lucene/pull/12799#discussion_r1395505763 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -77,42 +75,21 @@ public OnHeapHnswGraph build(int maxOrd) throws IOExce

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on PR #12799: URL: https://github.com/apache/lucene/pull/12799#issuecomment-1814218891 @javanna I have added the CHANGES entry and addressed the comment. Seems the precommit fails on to `:lucene:documentation:markdownToHtml` task which looks unrelated? Not sure. --

Re: [I] Can/should `KnnByte/FloatVectorQuery` carry some human-meaningful opaque `toString` fragment? [lucene]

2023-11-16 Thread via GitHub
slow-J commented on issue #12487: URL: https://github.com/apache/lucene/issues/12487#issuecomment-1814317117 I think that we could simply add an `resourceDescription` field to the `AbstractKnnVectorQuery` and modify the toString in the implementations so that the output would look something

[PR] Remove delayed seek optimization. [lucene]

2023-11-16 Thread via GitHub
jpountz opened a new pull request, #12815: URL: https://github.com/apache/lucene/pull/12815 I think that this optimization was introduced because `advanceShallow` may advance skip lists and then never decode a block of postings. But actually `IndexInput#seek` is cheap, including on `NIOFSDi

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-16 Thread via GitHub
benwtrent commented on code in PR #12799: URL: https://github.com/apache/lucene/pull/12799#discussion_r1395689207 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -53,7 +53,7 @@ public final class TaskExecutor { private final Executor executor; -

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on code in PR #12799: URL: https://github.com/apache/lucene/pull/12799#discussion_r1395692713 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -53,7 +53,7 @@ public final class TaskExecutor { private final Executor executor;

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on code in PR #12799: URL: https://github.com/apache/lucene/pull/12799#discussion_r1395692713 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -53,7 +53,7 @@ public final class TaskExecutor { private final Executor executor;

Re: [PR] Remove delayed seek optimization. [lucene]

2023-11-16 Thread via GitHub
jpountz commented on PR #12815: URL: https://github.com/apache/lucene/pull/12815#issuecomment-1814493098 Here are results on `wikibigall`, none of the p-values seem significant: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev

Re: [I] AnalyzingSuggester exception because of length restriction: java.lang.IllegalArgumentException: len must be <= 32767; got 38751 [LUCENE-6012] [lucene]

2023-11-16 Thread via GitHub
sitepark-veltrup commented on issue #7074: URL: https://github.com/apache/lucene/issues/7074#issuecomment-1814499494 We use Solr to search for pages in a website. We index the content of the website and also the content of PDF documents into a field `content` Based on this field we would

Re: [PR] Minor change to IndexOrDocValuesQuery#toString [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on PR #12791: URL: https://github.com/apache/lucene/pull/12791#issuecomment-1814515539 > I assume this is because this is a DoubleDocValuesField which encodes the double using NumericUtils.doubleToSortableLong @mikemccand Is it possible to fix `NumericUtils.doub

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-11-16 Thread via GitHub
nitirajrathore commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1814530858 > meaning that we can get same recall for a smaller max-conn value now. I ran some tests with with max-conn 16 and max-conn = 8 and it seems like with [my proposal](htt

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-11-16 Thread via GitHub
benwtrent commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1814561267 @nitirajrathore very interesting findings. This makes me wonder if the heuristic should take a middle ground and instead of keeping all pruned connections, keep half.

[PR] Adding optional queryDescription String to AbstractKnnVectorQuery [lucene]

2023-11-16 Thread via GitHub
slow-J opened a new pull request, #12816: URL: https://github.com/apache/lucene/pull/12816 We use this only in KnnByte/FloatVectorQuery toString method so the benchmarker can disambiguate between different KnnFloatVectorQuery/KnnByteVectorQuery queries. Closes #12487 -- This i

Re: [PR] Adding optional queryDescription String to AbstractKnnVectorQuery [lucene]

2023-11-16 Thread via GitHub
jpountz commented on PR #12816: URL: https://github.com/apache/lucene/pull/12816#issuecomment-1814699360 I'd rather like not to touch these queries, and introduce a brand new query that rewrites to a `Knn(Byte|Float)VectorQuery` and may add a description string. Something like `HumanReadabl

Re: [PR] Adding optional queryDescription String to AbstractKnnVectorQuery [lucene]

2023-11-16 Thread via GitHub
benwtrent commented on PR #12816: URL: https://github.com/apache/lucene/pull/12816#issuecomment-1814712957 I much prefer @jpountz idea. This additional field is purely for debugging purposes. A `DebugQuery` or `HumanReadableQuery` does seem like a good idea. -- This is an automated messag

Re: [PR] Generalize LSBRadixSorter and use it in SortingPostingsEnum [lucene]

2023-11-16 Thread via GitHub
jpountz commented on PR #12800: URL: https://github.com/apache/lucene/pull/12800#issuecomment-1814761757 I like the idea, but this seems to come with greater heap requirements as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] Add KeywordField and StringValueFacetCounts example [lucene]

2023-11-16 Thread via GitHub
stefanvodita opened a new pull request, #12817: URL: https://github.com/apache/lucene/pull/12817 We don't have a demo for faceting using `KeywordField`, `SortedDocValuesField`, or `StringValueFacetCounts`. This PR adds one, mostly inspired by `SimpleSortedSetFacetsExample`. -- This i

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on PR #12716: URL: https://github.com/apache/lucene/pull/12716#issuecomment-1814848844 Ok so I ran the test with rehash value of 17/24 which is between 2/3 and 3/4. Here are the results: | Golden ratio Bit mixing | Rehash ratio (2/3) | Rehash ratio (17/24) |

[PR] Fix off-by-one error in SimpleSortedSetFacetsExample [lucene]

2023-11-16 Thread via GitHub
stefanvodita opened a new pull request, #12818: URL: https://github.com/apache/lucene/pull/12818 We're only printing results for the `Author` dimension instead of printing `Publish Year` too. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-16 Thread via GitHub
javanna commented on code in PR #12799: URL: https://github.com/apache/lucene/pull/12799#discussion_r1396067925 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -77,42 +75,17 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException

[PR] Log number of visited nodes in knn query [lucene]

2023-11-16 Thread via GitHub
mayya-sharipova opened a new pull request, #12819: URL: https://github.com/apache/lucene/pull/12819 Number of visited nodes during graph exploration is an important metric for a knn query, that is lost when the query is rewritten. This allows to optionally access it before the query is rewr

[PR] Re-use information from graph traversal during exact search [lucene]

2023-11-16 Thread via GitHub
kaivalnp opened a new pull request, #12820: URL: https://github.com/apache/lucene/pull/12820 ### Description In KNN queries with a pre-filter, we first perform an approximate graph search and then [fallback](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/l

Re: [PR] Adding optional queryDescription String to AbstractKnnVectorQuery [lucene]

2023-11-16 Thread via GitHub
slow-J commented on PR #12816: URL: https://github.com/apache/lucene/pull/12816#issuecomment-1815014722 Thanks for the suggestion @jpountz! I'll add a `HumanReadableQuery` and revert the current changes. I think it would be quite similar to the `AssertingQuery`. -- This is an automated m

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-16 Thread via GitHub
slow-J commented on PR #12816: URL: https://github.com/apache/lucene/pull/12816#issuecomment-1815175078 Created a `HumanReadableQuery` which wraps a Query and only changes the .toString() behaviour, please let me know if I misunderstood any part of the suggestion. -- This is an automated

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-16 Thread via GitHub
benwtrent commented on code in PR #12816: URL: https://github.com/apache/lucene/pull/12816#discussion_r1396236967 ## lucene/core/src/java/org/apache/lucene/search/HumanReadableQuery.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-16 Thread via GitHub
benwtrent commented on code in PR #12816: URL: https://github.com/apache/lucene/pull/12816#discussion_r1396239102 ## lucene/core/src/java/org/apache/lucene/search/HumanReadableQuery.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-16 Thread via GitHub
slow-J commented on code in PR #12816: URL: https://github.com/apache/lucene/pull/12816#discussion_r1396241261 ## lucene/core/src/java/org/apache/lucene/search/HumanReadableQuery.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-16 Thread via GitHub
mayya-sharipova commented on PR #12794: URL: https://github.com/apache/lucene/pull/12794#issuecomment-1815203589 ## Experiments - Available processors: 10; thread pool size: 16 - luceneutil tool Search: - **baseline**: Lucene main branch - **candidate1**: only global queue

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-16 Thread via GitHub
jpountz commented on PR #12816: URL: https://github.com/apache/lucene/pull/12816#issuecomment-1815216336 Should we move it in `lucene/misc` rather than `lucene/core`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-16 Thread via GitHub
jpountz commented on code in PR #12816: URL: https://github.com/apache/lucene/pull/12816#discussion_r1396259219 ## lucene/core/src/java/org/apache/lucene/search/HumanReadableQuery.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mor

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-11-16 Thread via GitHub
dweiss commented on PR #12716: URL: https://github.com/apache/lucene/pull/12716#issuecomment-1815225881 The fact hash key remixing doesn't improve the situation is not necessarily a sign that it's somehow wrong - it means hash keys are distributed evenly already (which is good). Remixing ad

Re: [PR] Re-use information from graph traversal during exact search [lucene]

2023-11-16 Thread via GitHub
jpountz commented on PR #12820: URL: https://github.com/apache/lucene/pull/12820#issuecomment-1815358559 This is an interesting idea. Ideally we would figure out up-front whether it's best to use the graph or not, but I can also imagine that we can't always make the right decision there, so

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-16 Thread via GitHub
slow-J commented on PR #12816: URL: https://github.com/apache/lucene/pull/12816#issuecomment-1815511866 > Should we move it in `lucene/misc` rather than `lucene/core`? Yes, that sounds like a better place for it. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-16 Thread via GitHub
slow-J commented on code in PR #12816: URL: https://github.com/apache/lucene/pull/12816#discussion_r1396510611 ## lucene/core/src/java/org/apache/lucene/search/HumanReadableQuery.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [I] Failure in TestXYPointQueries [LUCENE-9859] [lucene]

2023-11-16 Thread via GitHub
slow-J commented on issue #10898: URL: https://github.com/apache/lucene/issues/10898#issuecomment-1815535373 This is no longer a problem as it was fixed by https://github.com/apache/lucene/pull/537. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [I] Repeated code in Polygon [LUCENE-9757] [lucene]

2023-11-16 Thread via GitHub
slow-J commented on issue #10796: URL: https://github.com/apache/lucene/issues/10796#issuecomment-1815540459 No longer an issue, this was fixed by https://github.com/apache/lucene/pull/11812. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-11-16 Thread via GitHub
zacharymorn commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-181613 Thanks @javanna for the feedback! > One thing that I wonder is whether we are ok already deprecating search(Query, Collector) given that we have a lot of usages still within Lucen

Re: [I] Deprecated method copyChars is used in example [LUCENE-9052] [lucene]

2023-11-16 Thread via GitHub
slow-J commented on issue #10094: URL: https://github.com/apache/lucene/issues/10094#issuecomment-1815566763 No longer an issue, was fixed by https://github.com/apache/lucene/pull/249. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Improve bytes copy in NodeHash [lucene]

2023-11-16 Thread via GitHub
dungba88 commented on issue #12760: URL: https://github.com/apache/lucene/issues/12760#issuecomment-1815613094 The last TODO can be resolved with #12624 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Deprecated public constructor of FSTCompiler in favor of the Builder. [lucene]

2023-11-16 Thread via GitHub
dungba88 commented on PR #12715: URL: https://github.com/apache/lucene/pull/12715#issuecomment-1815622585 Thanks @cavorite , I have incorporated this change to #12624 . Removing the constructor would also be great as it means there is less thing needs to be backward compatible :) -- This

Re: [I] Fix Field.java documentation to refer to new IntField/FloatField/LongField/DoubleField [lucene]

2023-11-16 Thread via GitHub
Harshitha-g-06 commented on issue #12125: URL: https://github.com/apache/lucene/issues/12125#issuecomment-1815650208 @rmuir Hi, may I work on this task? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Take advantage of bloom filter when delete terms [lucene]

2023-11-16 Thread via GitHub
SreehariG73 commented on issue #12725: URL: https://github.com/apache/lucene/issues/12725#issuecomment-1815676094 Hello, I am planning to work on this issue. Can this issue be assigned to me please? -- This is an automated message from the Apache Git Service. To respond to the message,

[PR] Fix Field.java documentation to refer to new IntField/FloatField/LongField/DoubleField #12125 [lucene]

2023-11-16 Thread via GitHub
SreehariG73 opened a new pull request, #12821: URL: https://github.com/apache/lucene/pull/12821 ### Description Replaced IntPoint, LongPoint, FloatPoint, and DoublePoint with IntField, LongField, FloatField, and DoubleField to make it easier-to-use field subclasses. -- This is

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-11-16 Thread via GitHub
msokolov commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1815705050 yes, this is a promising avenue to explore! One note of caution: we should avoid drawing strong inferences from a single dataset. I'm especially wary of GloVe because I've noticed

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-11-16 Thread via GitHub
shubhamvishu closed pull request #12716: Improve hash mixing in FST's double-barrel LRU hash URL: https://github.com/apache/lucene/pull/12716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on PR #12716: URL: https://github.com/apache/lucene/pull/12716#issuecomment-1815794510 > I'm surprised linear probing doesn't yield an improvement. Perhaps it's not a significant factor because of other load? Hard to say. Anyway, no need to make things more complicate

[I] Remove the FST constructors with DataInput for metadata [lucene]

2023-11-16 Thread via GitHub
dungba88 opened a new issue, #12822: URL: https://github.com/apache/lucene/issues/12822 ### Description After https://github.com/apache/lucene/pull/12758, we streamlined the FST constructors and they eventually call the constructor with `FSTMetadata`. For the old constructors with `D

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-16 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392197630 ## lucene/core/src/java/org/apache/lucene/util/fst/ByteBuffersFSTReader.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-16 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395333947 ## lucene/core/src/java/org/apache/lucene/util/fst/GrowableByteArrayDataOutput.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] LUCENE-9951: Add InfoStream to ReplicationService [lucene]

2023-11-16 Thread via GitHub
ChristophKaser commented on PR #124: URL: https://github.com/apache/lucene/pull/124#issuecomment-1815890261 @mikemccand Thank you for looking at the patch! However it is a bit hard to refresh this PR - after all, the http servlet based replication mechanism has been removed from lucene in P

Re: [PR] LUCENE-9951: Add InfoStream to ReplicationService [lucene]

2023-11-16 Thread via GitHub
ChristophKaser closed pull request #124: LUCENE-9951: Add InfoStream to ReplicationService URL: https://github.com/apache/lucene/pull/124 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-17 Thread via GitHub
shubhamvishu commented on code in PR #12799: URL: https://github.com/apache/lucene/pull/12799#discussion_r1396959558 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -77,42 +75,17 @@ public OnHeapHnswGraph build(int maxOrd) throws IOExce

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-17 Thread via GitHub
shubhamvishu commented on code in PR #12799: URL: https://github.com/apache/lucene/pull/12799#discussion_r1396959558 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -77,42 +75,17 @@ public OnHeapHnswGraph build(int maxOrd) throws IOExce

Re: [PR] LUCENE-9951: Add InfoStream to ReplicationService [lucene]

2023-11-17 Thread via GitHub
mikemccand commented on PR #124: URL: https://github.com/apache/lucene/pull/124#issuecomment-1816528624 OK thank you for bringing closure @ChristophKaser. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[PR] Dry up DirectReader implementations [lucene]

2023-11-17 Thread via GitHub
original-brownbear opened a new pull request, #12823: URL: https://github.com/apache/lucene/pull/12823 This can be written in a much drier way that shouldn't come at any performance cost as far as I can see. -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-11-17 Thread via GitHub
mikemccand commented on PR #12716: URL: https://github.com/apache/lucene/pull/12716#issuecomment-1816539511 Thanks @shubhamvishu and @dweiss and @bruno-roustant. Hashing is fun and hard :) -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-11-17 Thread via GitHub
Shibi-bala commented on PR #12626: URL: https://github.com/apache/lucene/pull/12626#issuecomment-1816600417 @uschindler hey, thanks for the approval! Read the contributing guidelines, but not entirely sure how to get permissions to merge this PR. -- This is an automated message from the A

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-11-17 Thread via GitHub
uschindler commented on PR #12626: URL: https://github.com/apache/lucene/pull/12626#issuecomment-1816614768 You can't do it. Please add a Changes entry unter the 9.9 section, commit it to branch and I will merge and Backport your PR. I am just away from my computer at moment, s

Re: [PR] LUCENE-10241: Updating OpenNLP to 1.9.4. [lucene]

2023-11-17 Thread via GitHub
cpoerschke merged PR #448: URL: https://github.com/apache/lucene/pull/448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [I] Update OpenNLP to 1.9.4 [LUCENE-10241] [lucene]

2023-11-17 Thread via GitHub
cpoerschke closed issue #11277: Update OpenNLP to 1.9.4 [LUCENE-10241] URL: https://github.com/apache/lucene/issues/11277 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Update OpenNLP to 1.9.4 [LUCENE-10241] [lucene]

2023-11-17 Thread via GitHub
cpoerschke commented on issue #11277: URL: https://github.com/apache/lucene/issues/11277#issuecomment-1816701100 #448 is the merged `main` branch pull request and https://github.com/apache/lucene/commit/b8094d49aaf5e5cb5182c0307e25eafa2d332dda is the `branch_9x` commit. Thanks @jzont

Re: [PR] Re-use information from graph traversal during exact search [lucene]

2023-11-17 Thread via GitHub
kaivalnp commented on PR #12820: URL: https://github.com/apache/lucene/pull/12820#issuecomment-1816720340 Thanks @jpountz! I realised something from your comment: My current implementation has a flaw, because it cannot handle the [`OrdinalTranslatedKnnCollector`](https://github.com/ka

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-11-17 Thread via GitHub
MarcusSorealheis commented on PR #12626: URL: https://github.com/apache/lucene/pull/12626#issuecomment-1816799469 @Shibi-bala It's here: https://github.com/apache/lucene/blob/c228e4bb66ca73c8150d8eaebe2bb999bcc6c9b1/lucene/CHANGES.txt#L147 You need to include your user and the

Re: [PR] Fix segmentInfos replace doesn't set userData [lucene]

2023-11-17 Thread via GitHub
Shibi-bala commented on PR #12626: URL: https://github.com/apache/lucene/pull/12626#issuecomment-1816818775 Made the changes. Thanks @uschindler @MarcusSorealheis @msfroh 😁 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

<    22   23   24   25   26   27   28   29   30   31   >