Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2025-01-08 Thread via GitHub
benchaplin commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2578611721 Thanks @mikemccand and @msokolov! And oh, forgot about the CHANGES - here's an entry (I listed under 10.2.0) https://github.com/apache/lucene/pull/14120. -- This is an automated mes

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2025-01-08 Thread via GitHub
msokolov commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2578359028 FYI @benchaplin this is merged now, but then I realized there is no CHANGES entry -- would you like to add one? After that, we can probably also backport to `branch_10x` using cherry-pi

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2025-01-08 Thread via GitHub
mikemccand commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2578347646 > Let me know if you're able to test this revision! Yay, I just tested again and I see HNSW output, and the nightly benchy index passed all the new tests here! Thanks @benchapl

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2025-01-08 Thread via GitHub
msokolov merged PR #13984: URL: https://github.com/apache/lucene/pull/13984 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2025-01-06 Thread via GitHub
benchaplin commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2573974895 @mikemccand I tried for a bit to recreate the scenario you're describing but wasn't able to. I added the defensive suggestion anyway - I'll keep trying to reproduce. Let me know if yo

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2025-01-06 Thread via GitHub
benchaplin commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1904663757 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -2746,6 +2785,191 @@ public static Status.VectorValuesStatus testVectors( return status;

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2025-01-06 Thread via GitHub
msokolov commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2573971976 Ooh, I was wondering about this `PerFieldKnnVectorsFormat` case - thanks for testing @mikemccand -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2025-01-06 Thread via GitHub
mikemccand commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2573785203 Thank you for persisting on this important change @benchaplin! I applied this PR to my local Lucene clone and ran `CheckIndex` on the vector index created by [last night's night

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2025-01-06 Thread via GitHub
benwtrent commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1904112483 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -2746,6 +2785,191 @@ public static Status.VectorValuesStatus testVectors( return status;

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2025-01-06 Thread via GitHub
msokolov commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1904101920 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -2746,6 +2785,191 @@ public static Status.VectorValuesStatus testVectors( return status;

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-12-31 Thread via GitHub
benchaplin commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2566774509 I've added @msokolov's node-in-range check + neighbor-on-same-level check, and @tteofili's connectivity check. I chose not to throw an exception for disconnectedness, as I've found th

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-12-02 Thread via GitHub
github-actions[bot] commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2513254955 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-13 Thread via GitHub
msokolov commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2474129035 > Doesn't (neighbors are in order) and (neighbors are not repeated) verify uniqueness? duh, yes :) > Also, why wouldn't we assert neighbors are on this leve

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-13 Thread via GitHub
benchaplin commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2474165424 Gotcha, thanks for the comments @msokolov! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-13 Thread via GitHub
benchaplin commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1840492668 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -2746,6 +2781,106 @@ public static Status.VectorValuesStatus testVectors( return status;

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-13 Thread via GitHub
msokolov commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1840202966 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -2746,6 +2781,106 @@ public static Status.VectorValuesStatus testVectors( return status;

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-12 Thread via GitHub
benchaplin commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1838595160 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -2746,6 +2769,84 @@ public static Status.VectorValuesStatus testVectors( return status;

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-12 Thread via GitHub
benchaplin commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1838595160 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -2746,6 +2769,84 @@ public static Status.VectorValuesStatus testVectors( return status;

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-12 Thread via GitHub
benchaplin commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1838568093 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -2746,6 +2769,84 @@ public static Status.VectorValuesStatus testVectors( return status;

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-12 Thread via GitHub
benchaplin commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1838509594 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -406,6 +411,21 @@ public static final class VectorValuesStatus { public Throwable erro

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-12 Thread via GitHub
mikemccand commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2470310487 Actually, `CheckIndex` does have some coverage for vectors and KNN graph (it confirms it can enumerate all vectors, and also runs some searches on it if it's not just flat vectors (`c

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-12 Thread via GitHub
mikemccand commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2470306515 > this might also check for graph connectivity, see #12627 +1 -- this is a tricky thing about these HNSW graphs (that they are not necessarily born connected but rather must be

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-12 Thread via GitHub
mikemccand commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1837944575 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -406,6 +411,21 @@ public static final class VectorValuesStatus { public Throwable erro

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-12 Thread via GitHub
tteofili commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2469922060 this might also check for graph connectivity, see #12627 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-08 Thread via GitHub
benchaplin opened a new pull request, #13984: URL: https://github.com/apache/lucene/pull/13984 ### Description This is a first pass on #13724 - I've added the following checks: - verify neighbor list is in order - verify no duplicate neighbors The code was adapted from [