Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-09-11 Thread via GitHub
benwtrent merged PR #13641: URL: https://github.com/apache/lucene/pull/13641 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-09-11 Thread via GitHub
benwtrent commented on code in PR #13641: URL: https://github.com/apache/lucene/pull/13641#discussion_r1755503855 ## lucene/codecs/src/test/org/apache/lucene/codecs/simpletext/TestSimpleTextKnnVectorsFormat.java: ## @@ -41,4 +41,14 @@ public void testRandomBytes() throws Excepti

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-09-11 Thread via GitHub
jpountz commented on code in PR #13641: URL: https://github.com/apache/lucene/pull/13641#discussion_r1755048948 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsWriter.java: ## @@ -212,14 +213,34 @@ private static void validateFieldEncoding(FieldInfo fieldInfo, Vector

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-09-11 Thread via GitHub
benwtrent commented on PR #13641: URL: https://github.com/apache/lucene/pull/13641#issuecomment-2344221805 @bugmakerr I don't want to make that change as the PerFieldDocValuesFormat & PerFieldPostingsFormat field readers will return `null` if the field is missing. I want to at least try

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-09-11 Thread via GitHub
bugmakerr commented on PR #13641: URL: https://github.com/apache/lucene/pull/13641#issuecomment-2344078213 @benwtrent should we do the same check for `PerFieldKnnVectorsFormat.FieldsReader`? If the target reader is null, throw an IAE and ensure that we do not return null. -- This is

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-09-11 Thread via GitHub
benwtrent commented on code in PR #13641: URL: https://github.com/apache/lucene/pull/13641#discussion_r1754984085 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java: ## @@ -230,6 +236,120 @@ public void testIllegalDimChangeTwoWriter

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-09-11 Thread via GitHub
benwtrent commented on PR #13641: URL: https://github.com/apache/lucene/pull/13641#issuecomment-2344006469 OK, I pushed up a fix for @msokolov's suggestion and your reminder @jpountz . I added a protection around our access to the KnnVectorReaders during merge. Additionally, I kept my new t

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-09-11 Thread via GitHub
benwtrent commented on PR #13641: URL: https://github.com/apache/lucene/pull/13641#issuecomment-2343701781 > Fine with me, I will work on fixing the merging logic then unless you plan on working on it. Ah dang, yeah, I completely spaced on that since this PR is so old. I will

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-09-11 Thread via GitHub
jpountz commented on PR #13641: URL: https://github.com/apache/lucene/pull/13641#issuecomment-2343692465 Fine with me, I will work on fixing the merging logic then unless you plan on working on it. -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-09-11 Thread via GitHub
benwtrent commented on PR #13641: URL: https://github.com/apache/lucene/pull/13641#issuecomment-2343677555 Unless there are rejections, I am going to move forward with merging this. Throwing an IAE seems way better than throwing an NPE. If we want to change this behavior to be more l

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-09-11 Thread via GitHub
jpountz commented on PR #13641: URL: https://github.com/apache/lucene/pull/13641#issuecomment-2343648325 > Separately, it looks like we call the KnnVectorsReader with the merged field info, which is not legal, we should call it with the field info of the segment. Apologies, I was wro

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-09-11 Thread via GitHub
benwtrent commented on code in PR #13641: URL: https://github.com/apache/lucene/pull/13641#discussion_r1754379198 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java: ## @@ -230,6 +236,120 @@ public void testIllegalDimChangeTwoWriter

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-09-11 Thread via GitHub
jpountz commented on code in PR #13641: URL: https://github.com/apache/lucene/pull/13641#discussion_r1753859763 ## lucene/test-framework/src/java/org/apache/lucene/tests/index/BaseKnnVectorsFormatTestCase.java: ## @@ -230,6 +236,120 @@ public void testIllegalDimChangeTwoWriters(

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-08-24 Thread via GitHub
github-actions[bot] commented on PR #13641: URL: https://github.com/apache/lucene/pull/13641#issuecomment-2308587265 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-08-10 Thread via GitHub
benwtrent commented on PR #13641: URL: https://github.com/apache/lucene/pull/13641#issuecomment-2282231690 @msokolov maybe? Looking at the code, this has never worked this way. Field entry existence was just assumed to be handled by the perfield codec. I tried digging into the

Re: [PR] Unify how missing field entries are handle in knn formats [lucene]

2024-08-10 Thread via GitHub
msokolov commented on PR #13641: URL: https://github.com/apache/lucene/pull/13641#issuecomment-2282193063 > It is possible to inappropriately use the knn formats and attempt to merge segments with mismatched field names. I don't understand this - what's inappropriate about it? I guess

[PR] Unify how missing field entries are handle in knn formats [lucene]

2024-08-09 Thread via GitHub
benwtrent opened a new pull request, #13641: URL: https://github.com/apache/lucene/pull/13641 It is possible to inappropriately use the knn formats and attempt to merge segments with mismatched field names. None of the formats actually check for `null`, they just all assume that the