Re: [PR] LUCENE-9951: Add InfoStream to ReplicationService [lucene]

2023-11-16 Thread via GitHub
ChristophKaser closed pull request #124: LUCENE-9951: Add InfoStream to ReplicationService URL: https://github.com/apache/lucene/pull/124 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] LUCENE-9951: Add InfoStream to ReplicationService [lucene]

2023-11-16 Thread via GitHub
ChristophKaser commented on PR #124: URL: https://github.com/apache/lucene/pull/124#issuecomment-1815890261 @mikemccand Thank you for looking at the patch! However it is a bit hard to refresh this PR - after all, the http servlet based replication mechanism has been removed from lucene in P

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-16 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395333947 ## lucene/core/src/java/org/apache/lucene/util/fst/GrowableByteArrayDataOutput.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-16 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1392197630 ## lucene/core/src/java/org/apache/lucene/util/fst/ByteBuffersFSTReader.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one o

[I] Remove the FST constructors with DataInput for metadata [lucene]

2023-11-16 Thread via GitHub
dungba88 opened a new issue, #12822: URL: https://github.com/apache/lucene/issues/12822 ### Description After https://github.com/apache/lucene/pull/12758, we streamlined the FST constructors and they eventually call the constructor with `FSTMetadata`. For the old constructors with `D

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on PR #12716: URL: https://github.com/apache/lucene/pull/12716#issuecomment-1815794510 > I'm surprised linear probing doesn't yield an improvement. Perhaps it's not a significant factor because of other load? Hard to say. Anyway, no need to make things more complicate

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-11-16 Thread via GitHub
shubhamvishu closed pull request #12716: Improve hash mixing in FST's double-barrel LRU hash URL: https://github.com/apache/lucene/pull/12716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-11-16 Thread via GitHub
msokolov commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1815705050 yes, this is a promising avenue to explore! One note of caution: we should avoid drawing strong inferences from a single dataset. I'm especially wary of GloVe because I've noticed

[PR] Fix Field.java documentation to refer to new IntField/FloatField/LongField/DoubleField #12125 [lucene]

2023-11-16 Thread via GitHub
SreehariG73 opened a new pull request, #12821: URL: https://github.com/apache/lucene/pull/12821 ### Description Replaced IntPoint, LongPoint, FloatPoint, and DoublePoint with IntField, LongField, FloatField, and DoubleField to make it easier-to-use field subclasses. -- This is

Re: [I] Take advantage of bloom filter when delete terms [lucene]

2023-11-16 Thread via GitHub
SreehariG73 commented on issue #12725: URL: https://github.com/apache/lucene/issues/12725#issuecomment-1815676094 Hello, I am planning to work on this issue. Can this issue be assigned to me please? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Fix Field.java documentation to refer to new IntField/FloatField/LongField/DoubleField [lucene]

2023-11-16 Thread via GitHub
Harshitha-g-06 commented on issue #12125: URL: https://github.com/apache/lucene/issues/12125#issuecomment-1815650208 @rmuir Hi, may I work on this task? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Deprecated public constructor of FSTCompiler in favor of the Builder. [lucene]

2023-11-16 Thread via GitHub
dungba88 commented on PR #12715: URL: https://github.com/apache/lucene/pull/12715#issuecomment-1815622585 Thanks @cavorite , I have incorporated this change to #12624 . Removing the constructor would also be great as it means there is less thing needs to be backward compatible :) -- This

Re: [I] Improve bytes copy in NodeHash [lucene]

2023-11-16 Thread via GitHub
dungba88 commented on issue #12760: URL: https://github.com/apache/lucene/issues/12760#issuecomment-1815613094 The last TODO can be resolved with #12624 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] Deprecated method copyChars is used in example [LUCENE-9052] [lucene]

2023-11-16 Thread via GitHub
slow-J commented on issue #10094: URL: https://github.com/apache/lucene/issues/10094#issuecomment-1815566763 No longer an issue, was fixed by https://github.com/apache/lucene/pull/249. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-11-16 Thread via GitHub
zacharymorn commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-181613 Thanks @javanna for the feedback! > One thing that I wonder is whether we are ok already deprecating search(Query, Collector) given that we have a lot of usages still within Lucen

Re: [I] Repeated code in Polygon [LUCENE-9757] [lucene]

2023-11-16 Thread via GitHub
slow-J commented on issue #10796: URL: https://github.com/apache/lucene/issues/10796#issuecomment-1815540459 No longer an issue, this was fixed by https://github.com/apache/lucene/pull/11812. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [I] Failure in TestXYPointQueries [LUCENE-9859] [lucene]

2023-11-16 Thread via GitHub
slow-J commented on issue #10898: URL: https://github.com/apache/lucene/issues/10898#issuecomment-1815535373 This is no longer a problem as it was fixed by https://github.com/apache/lucene/pull/537. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-16 Thread via GitHub
slow-J commented on code in PR #12816: URL: https://github.com/apache/lucene/pull/12816#discussion_r1396510611 ## lucene/core/src/java/org/apache/lucene/search/HumanReadableQuery.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-16 Thread via GitHub
slow-J commented on PR #12816: URL: https://github.com/apache/lucene/pull/12816#issuecomment-1815511866 > Should we move it in `lucene/misc` rather than `lucene/core`? Yes, that sounds like a better place for it. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Re-use information from graph traversal during exact search [lucene]

2023-11-16 Thread via GitHub
jpountz commented on PR #12820: URL: https://github.com/apache/lucene/pull/12820#issuecomment-1815358559 This is an interesting idea. Ideally we would figure out up-front whether it's best to use the graph or not, but I can also imagine that we can't always make the right decision there, so

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-11-16 Thread via GitHub
dweiss commented on PR #12716: URL: https://github.com/apache/lucene/pull/12716#issuecomment-1815225881 The fact hash key remixing doesn't improve the situation is not necessarily a sign that it's somehow wrong - it means hash keys are distributed evenly already (which is good). Remixing ad

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-16 Thread via GitHub
jpountz commented on code in PR #12816: URL: https://github.com/apache/lucene/pull/12816#discussion_r1396259219 ## lucene/core/src/java/org/apache/lucene/search/HumanReadableQuery.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mor

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-16 Thread via GitHub
jpountz commented on PR #12816: URL: https://github.com/apache/lucene/pull/12816#issuecomment-1815216336 Should we move it in `lucene/misc` rather than `lucene/core`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-16 Thread via GitHub
mayya-sharipova commented on PR #12794: URL: https://github.com/apache/lucene/pull/12794#issuecomment-1815203589 ## Experiments - Available processors: 10; thread pool size: 16 - luceneutil tool Search: - **baseline**: Lucene main branch - **candidate1**: only global queue

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-16 Thread via GitHub
slow-J commented on code in PR #12816: URL: https://github.com/apache/lucene/pull/12816#discussion_r1396241261 ## lucene/core/src/java/org/apache/lucene/search/HumanReadableQuery.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-16 Thread via GitHub
benwtrent commented on code in PR #12816: URL: https://github.com/apache/lucene/pull/12816#discussion_r1396239102 ## lucene/core/src/java/org/apache/lucene/search/HumanReadableQuery.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-16 Thread via GitHub
benwtrent commented on code in PR #12816: URL: https://github.com/apache/lucene/pull/12816#discussion_r1396236967 ## lucene/core/src/java/org/apache/lucene/search/HumanReadableQuery.java: ## @@ -0,0 +1,89 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or m

Re: [PR] Adding HumanReadableQuery with a descrition param, used for debugging print output [lucene]

2023-11-16 Thread via GitHub
slow-J commented on PR #12816: URL: https://github.com/apache/lucene/pull/12816#issuecomment-1815175078 Created a `HumanReadableQuery` which wraps a Query and only changes the .toString() behaviour, please let me know if I misunderstood any part of the suggestion. -- This is an automated

Re: [PR] Adding optional queryDescription String to AbstractKnnVectorQuery [lucene]

2023-11-16 Thread via GitHub
slow-J commented on PR #12816: URL: https://github.com/apache/lucene/pull/12816#issuecomment-1815014722 Thanks for the suggestion @jpountz! I'll add a `HumanReadableQuery` and revert the current changes. I think it would be quite similar to the `AssertingQuery`. -- This is an automated m

[PR] Re-use information from graph traversal during exact search [lucene]

2023-11-16 Thread via GitHub
kaivalnp opened a new pull request, #12820: URL: https://github.com/apache/lucene/pull/12820 ### Description In KNN queries with a pre-filter, we first perform an approximate graph search and then [fallback](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/l

[PR] Log number of visited nodes in knn query [lucene]

2023-11-16 Thread via GitHub
mayya-sharipova opened a new pull request, #12819: URL: https://github.com/apache/lucene/pull/12819 Number of visited nodes during graph exploration is an important metric for a knn query, that is lost when the query is rewritten. This allows to optionally access it before the query is rewr

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-16 Thread via GitHub
javanna commented on code in PR #12799: URL: https://github.com/apache/lucene/pull/12799#discussion_r1396067925 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -77,42 +75,17 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException

[PR] Fix off-by-one error in SimpleSortedSetFacetsExample [lucene]

2023-11-16 Thread via GitHub
stefanvodita opened a new pull request, #12818: URL: https://github.com/apache/lucene/pull/12818 We're only printing results for the `Author` dimension instead of printing `Publish Year` too. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on PR #12716: URL: https://github.com/apache/lucene/pull/12716#issuecomment-1814848844 Ok so I ran the test with rehash value of 17/24 which is between 2/3 and 3/4. Here are the results: | Golden ratio Bit mixing | Rehash ratio (2/3) | Rehash ratio (17/24) |

[PR] Add KeywordField and StringValueFacetCounts example [lucene]

2023-11-16 Thread via GitHub
stefanvodita opened a new pull request, #12817: URL: https://github.com/apache/lucene/pull/12817 We don't have a demo for faceting using `KeywordField`, `SortedDocValuesField`, or `StringValueFacetCounts`. This PR adds one, mostly inspired by `SimpleSortedSetFacetsExample`. -- This i

Re: [PR] Generalize LSBRadixSorter and use it in SortingPostingsEnum [lucene]

2023-11-16 Thread via GitHub
jpountz commented on PR #12800: URL: https://github.com/apache/lucene/pull/12800#issuecomment-1814761757 I like the idea, but this seems to come with greater heap requirements as well? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Adding optional queryDescription String to AbstractKnnVectorQuery [lucene]

2023-11-16 Thread via GitHub
benwtrent commented on PR #12816: URL: https://github.com/apache/lucene/pull/12816#issuecomment-1814712957 I much prefer @jpountz idea. This additional field is purely for debugging purposes. A `DebugQuery` or `HumanReadableQuery` does seem like a good idea. -- This is an automated messag

Re: [PR] Adding optional queryDescription String to AbstractKnnVectorQuery [lucene]

2023-11-16 Thread via GitHub
jpountz commented on PR #12816: URL: https://github.com/apache/lucene/pull/12816#issuecomment-1814699360 I'd rather like not to touch these queries, and introduce a brand new query that rewrites to a `Knn(Byte|Float)VectorQuery` and may add a description string. Something like `HumanReadabl

[PR] Adding optional queryDescription String to AbstractKnnVectorQuery [lucene]

2023-11-16 Thread via GitHub
slow-J opened a new pull request, #12816: URL: https://github.com/apache/lucene/pull/12816 We use this only in KnnByte/FloatVectorQuery toString method so the benchmarker can disambiguate between different KnnFloatVectorQuery/KnnByteVectorQuery queries. Closes #12487 -- This i

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-11-16 Thread via GitHub
benwtrent commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1814561267 @nitirajrathore very interesting findings. This makes me wonder if the heuristic should take a middle ground and instead of keeping all pruned connections, keep half.

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-11-16 Thread via GitHub
nitirajrathore commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1814530858 > meaning that we can get same recall for a smaller max-conn value now. I ran some tests with with max-conn 16 and max-conn = 8 and it seems like with [my proposal](htt

Re: [PR] Minor change to IndexOrDocValuesQuery#toString [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on PR #12791: URL: https://github.com/apache/lucene/pull/12791#issuecomment-1814515539 > I assume this is because this is a DoubleDocValuesField which encodes the double using NumericUtils.doubleToSortableLong @mikemccand Is it possible to fix `NumericUtils.doub

Re: [I] AnalyzingSuggester exception because of length restriction: java.lang.IllegalArgumentException: len must be <= 32767; got 38751 [LUCENE-6012] [lucene]

2023-11-16 Thread via GitHub
sitepark-veltrup commented on issue #7074: URL: https://github.com/apache/lucene/issues/7074#issuecomment-1814499494 We use Solr to search for pages in a website. We index the content of the website and also the content of PDF documents into a field `content` Based on this field we would

Re: [PR] Remove delayed seek optimization. [lucene]

2023-11-16 Thread via GitHub
jpountz commented on PR #12815: URL: https://github.com/apache/lucene/pull/12815#issuecomment-1814493098 Here are results on `wikibigall`, none of the p-values seem significant: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on code in PR #12799: URL: https://github.com/apache/lucene/pull/12799#discussion_r1395692713 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -53,7 +53,7 @@ public final class TaskExecutor { private final Executor executor;

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on code in PR #12799: URL: https://github.com/apache/lucene/pull/12799#discussion_r1395692713 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -53,7 +53,7 @@ public final class TaskExecutor { private final Executor executor;

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-16 Thread via GitHub
benwtrent commented on code in PR #12799: URL: https://github.com/apache/lucene/pull/12799#discussion_r1395689207 ## lucene/core/src/java/org/apache/lucene/search/TaskExecutor.java: ## @@ -53,7 +53,7 @@ public final class TaskExecutor { private final Executor executor; -

[PR] Remove delayed seek optimization. [lucene]

2023-11-16 Thread via GitHub
jpountz opened a new pull request, #12815: URL: https://github.com/apache/lucene/pull/12815 I think that this optimization was introduced because `advanceShallow` may advance skip lists and then never decode a block of postings. But actually `IndexInput#seek` is cheap, including on `NIOFSDi

Re: [I] Can/should `KnnByte/FloatVectorQuery` carry some human-meaningful opaque `toString` fragment? [lucene]

2023-11-16 Thread via GitHub
slow-J commented on issue #12487: URL: https://github.com/apache/lucene/issues/12487#issuecomment-1814317117 I think that we could simply add an `resourceDescription` field to the `AbstractKnnVectorQuery` and modify the toString in the implementations so that the output would look something

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on PR #12799: URL: https://github.com/apache/lucene/pull/12799#issuecomment-1814218891 @javanna I have added the CHANGES entry and addressed the comment. Seems the precommit fails on to `:lucene:documentation:markdownToHtml` task which looks unrelated? Not sure. --

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on code in PR #12799: URL: https://github.com/apache/lucene/pull/12799#discussion_r1395505763 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -77,42 +75,21 @@ public OnHeapHnswGraph build(int maxOrd) throws IOExce

Re: [PR] LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager [lucene]

2023-11-16 Thread via GitHub
javanna commented on PR #240: URL: https://github.com/apache/lucene/pull/240#issuecomment-1814140254 Thanks for reviving this PR @zacharymorn ! the changes look good to me, having top score doc and top field collector managers sounds like a natural next step, and removes code duplication. I

Re: [PR] Make TaskExecutor cx public and use TaskExecutor for concurrent HNSW graph build [lucene]

2023-11-16 Thread via GitHub
javanna commented on code in PR #12799: URL: https://github.com/apache/lucene/pull/12799#discussion_r1395446929 ## lucene/core/src/java/org/apache/lucene/util/hnsw/HnswConcurrentMergeBuilder.java: ## @@ -77,42 +75,21 @@ public OnHeapHnswGraph build(int maxOrd) throws IOException

Re: [PR] Improve hash mixing in FST's double-barrel LRU hash [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on PR #12716: URL: https://github.com/apache/lucene/pull/12716#issuecomment-1814055888 > @shubhamvishu can we close this one? Any other things to try? Sure @mikemccand ! Maybe we could just try a rehash value between 2/3 and 3/4 as you mentioned earlier(how abo

Re: [I] HnwsGraph creates disconnected components [lucene]

2023-11-16 Thread via GitHub
nitirajrathore commented on issue #12627: URL: https://github.com/apache/lucene/issues/12627#issuecomment-1814046690 > What if we added an "incoming connection" count for every node? & > I think this idea would prevent the isolated nodes, but not fix the other case. I w

Re: [I] MultiSimilarity.MultiSimScorer should sum up scores into a double [lucene]

2023-11-16 Thread via GitHub
shubhamvishu commented on issue #12675: URL: https://github.com/apache/lucene/issues/12675#issuecomment-1814024776 @jpountz I think we can close this now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-16 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395333947 ## lucene/core/src/java/org/apache/lucene/util/fst/GrowableByteArrayDataOutput.java: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) und

Re: [PR] Allow FST builder to use different writer (#12543) [lucene]

2023-11-16 Thread via GitHub
dungba88 commented on code in PR #12624: URL: https://github.com/apache/lucene/pull/12624#discussion_r1395069004 ## lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java: ## @@ -337,11 +349,23 @@ public long size() { return getPosition(); } + /** Similar to