[GitHub] [lucene] stefanvodita commented on pull request #12454: Clean up ordinal map in default SSDV reader state

2023-07-29 Thread via GitHub
stefanvodita commented on PR #12454: URL: https://github.com/apache/lucene/pull/12454#issuecomment-1656692746 Thanks Greg! I’ve kept the hash map and did some clean-up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [lucene] stefanvodita commented on a diff in pull request #12454: Clean up ordinal map in default SSDV reader state

2023-07-29 Thread via GitHub
stefanvodita commented on code in PR #12454: URL: https://github.com/apache/lucene/pull/12454#discussion_r1278280338 ## lucene/facet/src/java/org/apache/lucene/facet/sortedset/DefaultSortedSetDocValuesReaderState.java: ## @@ -233,13 +239,13 @@ private int createOneFlatFacetDimSt

[GitHub] [lucene] stefanvodita commented on a diff in pull request #12354: Fix docFreq in score calculation after rewrite of boolean query consisting of blended query and boosted term query

2023-07-29 Thread via GitHub
stefanvodita commented on code in PR #12354: URL: https://github.com/apache/lucene/pull/12354#discussion_r1278283127 ## lucene/core/src/java/org/apache/lucene/search/TermQuery.java: ## @@ -264,11 +264,25 @@ public TermStates getTermStates() { /** Returns true iff other is equ

[GitHub] [lucene] tang-hi opened a new pull request, #12470: fix error convert from utf32 to utf8

2023-07-29 Thread via GitHub
tang-hi opened a new pull request, #12470: URL: https://github.com/apache/lucene/pull/12470 ### Description fix error convert from utf32 to utf8 ISSUE #12458 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [lucene] almogtavor commented on issue #12406: Register nested queries (ToParentBlockJoinQuery) to Lucene Monitor

2023-07-29 Thread via GitHub
almogtavor commented on issue #12406: URL: https://github.com/apache/lucene/issues/12406#issuecomment-1656712504 @romseygeek @dweiss @uschindler @dsmiley @gsmiller I'd love to get feedback from you on the subject -- This is an automated message from the Apache Git Service. To respond to t

[GitHub] [lucene] tang-hi closed pull request #12470: fix error convert from utf32 to utf8

2023-07-29 Thread via GitHub
tang-hi closed pull request #12470: fix error convert from utf32 to utf8 URL: https://github.com/apache/lucene/pull/12470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [lucene] benwtrent commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-07-29 Thread via GitHub
benwtrent commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1656718447 OK, here (unless we have done these incorrectly), is the final Cohere test (IMO). This is mixing English and Japanese embeddings (just in case the cohere model encodes info for la

[GitHub] [lucene] benwtrent opened a new issue, #12471: TestLucene60FieldInfosFormat.testRandom test failure

2023-07-29 Thread via GitHub
benwtrent opened a new issue, #12471: URL: https://github.com/apache/lucene/issues/12471 ### Description This has failed many times, I haven't yet dug into why yet. ``` org.apache.lucene.backward_codecs.lucene60.TestLucene60FieldInfosFormat > testRandom FAILED java.la

[GitHub] [lucene] benwtrent commented on issue #12471: TestLucene60FieldInfosFormat.testRandom test failure

2023-07-29 Thread via GitHub
benwtrent commented on issue #12471: URL: https://github.com/apache/lucene/issues/12471#issuecomment-1656720282 Verified this is caused by: https://github.com/apache/lucene/commit/119635ad808c38d6878c5897bcf16a3b97523d4d -- This is an automated message from the Apache Git Service. To resp

[GitHub] [lucene] tang-hi opened a new pull request, #12472: Fix UTF32toUTF8 will produce invalid transition

2023-07-29 Thread via GitHub
tang-hi opened a new pull request, #12472: URL: https://github.com/apache/lucene/pull/12472 FIX ISSUE #12458 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[GitHub] [lucene] tang-hi commented on issue #12458: UTF32toUTF8 can create automata that produce/accept invalid unicode

2023-07-29 Thread via GitHub
tang-hi commented on issue #12458: URL: https://github.com/apache/lucene/issues/12458#issuecomment-1656725544 I have discovered the bug. ```Java if (endUTF8.numBits(upto) == 5) { // special case -- avoid created unused edges (endUTF8 // doesn't accept certain byt

[GitHub] [lucene] donnerpeter merged pull request #12468: hunspell: check for aff file wellformedness more strictly

2023-07-29 Thread via GitHub
donnerpeter merged PR #12468: URL: https://github.com/apache/lucene/pull/12468 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene

[GitHub] [lucene] benwtrent opened a new pull request, #12473: Fix randomly failing field info format tests

2023-07-29 Thread via GitHub
benwtrent opened a new pull request, #12473: URL: https://github.com/apache/lucene/pull/12473 The crux of the issue is that we attempt to generate a vector with at least size 1, but the EMPTY KnnVectorsField type has an upper limit of `0` (and thus is not a valid upper limit for random int)

[GitHub] [lucene] msokolov commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-07-29 Thread via GitHub
msokolov commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1656797446 Q: just want to make sure I understand what the transformation is that you are testing here. Is it that you are taking non-unit vectors and making them into unit vectors by dividin

[GitHub] [lucene] msokolov commented on a diff in pull request #12415: Optimize disjunction counts.

2023-07-29 Thread via GitHub
msokolov commented on code in PR #12415: URL: https://github.com/apache/lucene/pull/12415#discussion_r1278351078 ## lucene/core/src/java/org/apache/lucene/search/CheckedIntConsumer.java: ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

[GitHub] [lucene] msokolov commented on issue #12463: Learned sorting algorithm for Lucene

2023-07-29 Thread via GitHub
msokolov commented on issue #12463: URL: https://github.com/apache/lucene/issues/12463#issuecomment-1656810622 https://github.com/anikristo/LearnedSort/ is GPLv3 licensed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [lucene] jpountz commented on pull request #12473: Fix randomly failing field info format tests

2023-07-29 Thread via GitHub
jpountz commented on PR #12473: URL: https://github.com/apache/lucene/pull/12473#issuecomment-1656838152 With this change, would it be possible that the field gets added to field infos permanently even though vectors may never be actually indexed? Even if it not possible, I like the fact th

[GitHub] [lucene] jpountz merged pull request #12457: Improve MaxScoreBulkScorer partitioning logic.

2023-07-29 Thread via GitHub
jpountz merged PR #12457: URL: https://github.com/apache/lucene/pull/12457 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] jpountz commented on a diff in pull request #12415: Optimize disjunction counts.

2023-07-29 Thread via GitHub
jpountz commented on code in PR #12415: URL: https://github.com/apache/lucene/pull/12415#discussion_r1278368477 ## lucene/core/src/java/org/apache/lucene/search/CheckedIntConsumer.java: ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mor

[GitHub] [lucene] stefanvodita opened a new issue, #12474: Should more Facets implementations be concurrent?

2023-07-29 Thread via GitHub
stefanvodita opened a new issue, #12474: URL: https://github.com/apache/lucene/issues/12474 ### Description Faceting can happen in parallel for each segment, but most faceting implementations don't take advantage of this. I'm wondering if more faceting implementations should be

[GitHub] [lucene] benwtrent commented on pull request #12434: Add ParentJoin KNN support

2023-07-29 Thread via GitHub
benwtrent commented on PR #12434: URL: https://github.com/apache/lucene/pull/12434#issuecomment-1656874923 Thanks for digging in @msokolov! > I'd like to have a clearer sense of the problem you're solving. This PR solves a similar, but different problem to: https://github.com/a

[GitHub] [lucene] searchivarius commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-07-29 Thread via GitHub
searchivarius commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1656883408 @msokolov the transformation also preserves the inner product up to a query-specific constant. >what is the meaning of the length of the vectors - where does it come fro

[GitHub] [lucene] msokolov commented on a diff in pull request #12415: Optimize disjunction counts.

2023-07-29 Thread via GitHub
msokolov commented on code in PR #12415: URL: https://github.com/apache/lucene/pull/12415#discussion_r1278450910 ## lucene/core/src/java/org/apache/lucene/search/CheckedIntConsumer.java: ## @@ -0,0 +1,31 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mo

[GitHub] [lucene] msokolov commented on pull request #12434: Add ParentJoin KNN support

2023-07-29 Thread via GitHub
msokolov commented on PR #12434: URL: https://github.com/apache/lucene/pull/12434#issuecomment-1656953197 > The main issue is that it won't return the correct number of parent documents when the user requests the top-k parents based on their children vectors. If there are multiple children

[GitHub] [lucene] jpountz commented on pull request #12434: Add ParentJoin KNN support

2023-07-29 Thread via GitHub
jpountz commented on PR #12434: URL: https://github.com/apache/lucene/pull/12434#issuecomment-1657050056 I agree that there is similarity in that in both cases it boils down to whether or not you can accept having less than `k` hits. However the degradation is brutal with filtering as you e