[GitHub] [lucene] benwtrent commented on pull request #12434: Add ParentJoin KNN support

2023-07-12 Thread via GitHub
benwtrent commented on PR #12434: URL: https://github.com/apache/lucene/pull/12434#issuecomment-1632431376 @alessandrobenedetti I took some of your ideas on deduplicating vector IDs based on some other id for this PR. If this work continues, I think some of it can transfer to the native mul

[GitHub] [lucene] benwtrent commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-07-12 Thread via GitHub
benwtrent commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1632450075 Thank you for the deep information @searchivarius . eagerly waiting your results @jmazanec15 :) -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [lucene] easyice opened a new pull request, #12435: Remove sort for uniqueValues in NumericDocValues

2023-07-12 Thread via GitHub
easyice opened a new pull request, #12435: URL: https://github.com/apache/lucene/pull/12435 ### Description In table compression, it only need a mapping for value -> ord, as long as we can get the value via ord on reading, the order of values does not important. -- This is an a

[GitHub] [lucene] jpountz commented on pull request #12405: Skip docs with Docvalues in NumericLeafComparator

2023-07-12 Thread via GitHub
jpountz commented on PR #12405: URL: https://github.com/apache/lucene/pull/12405#issuecomment-1632507845 Thanks for adding the enum. In my view, we now need the two following changes: - `isMissingValueCompetitive()` should return false if the missing value is equal to the bottom value a

[GitHub] [lucene] benwtrent commented on pull request #12434: Add ParentJoin KNN support

2023-07-12 Thread via GitHub
benwtrent commented on PR #12434: URL: https://github.com/apache/lucene/pull/12434#issuecomment-1632517795 > would it be enough or is there more? I will dig a bit more on making this cleaner. My biggest performance concerns are around keeping track of the heap-index -> ID and

[GitHub] [lucene] benwtrent commented on pull request #12434: Add ParentJoin KNN support

2023-07-12 Thread via GitHub
benwtrent commented on PR #12434: URL: https://github.com/apache/lucene/pull/12434#issuecomment-1633057341 @jpountz I took another shot at the KnnResults interface. I restricted the abstract and `@Override` methods to narrow the API. Additionally, I disconnected it from the queue, but it st

[GitHub] [lucene] almogtavor commented on issue #12406: Register nested queries (ToParentBlockJoinQuery) to Lucene Monitor

2023-07-12 Thread via GitHub
almogtavor commented on issue #12406: URL: https://github.com/apache/lucene/issues/12406#issuecomment-1633133797 @romseygeek @dweiss @uschindler I'd love to get feedback from you on the subject @jpountz @benwtrent I saw that Elasticsearch does have the option of percolating nested qu

[GitHub] [lucene] benwtrent commented on a diff in pull request #12421: Concurrent hnsw graph and builder, take two

2023-07-12 Thread via GitHub
benwtrent commented on code in PR #12421: URL: https://github.com/apache/lucene/pull/12421#discussion_r1261778701 ## lucene/core/src/java/org/apache/lucene/util/hnsw/ConcurrentHnswGraphBuilder.java: ## @@ -0,0 +1,465 @@ +/* + * Licensed to the Apache Software Foundation (ASF) un

[GitHub] [lucene] mayya-sharipova opened a new pull request, #12436: Move max vector dims limit to Codec

2023-07-12 Thread via GitHub
mayya-sharipova opened a new pull request, #12436: URL: https://github.com/apache/lucene/pull/12436 Move vector max dimension limits enforcement into the default Codec's KnnVectorsFormat implementation. This allows different implementation of knn search algorithms define their own lim

[GitHub] [lucene] shubhamvishu commented on pull request #12427: StringsToAutomaton#build to take List as parameter instead of Collection

2023-07-12 Thread via GitHub
shubhamvishu commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1633489589 > This is a situation where we really cannot sort on behalf of the caller, so it might be a bit confusing/trappy to sort some flavors of this method but not others? Maybe it's best

[GitHub] [lucene] shubhamvishu commented on pull request #12427: StringsToAutomaton#build to take List as parameter instead of Collection

2023-07-12 Thread via GitHub
shubhamvishu commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1633502769 On the same note, since both the methods expects `Iterable` or `Iterators` why do we even need 2 separate methods here which are doing exactly the same thing i.e. iterating over the