benwtrent commented on issue #14214:
URL: https://github.com/apache/lucene/issues/14214#issuecomment-2654755040

   So, verifying the "fewDistinct" slowness, here is how connect components 
works in this adverse case:
   
   ```
    1> HNSW 1 [2025-02-12T20:14:45.641640Z; 
TEST-TestKnnFloatVectorQuery.testFewDistinctVectors-seed#[29F3C0E3B32A8A0D]]: 
connect 36597 max component size=2543 min component_size=1 avg component 
size=1.06945924529333 components on level=0 notFullyConnected=39137 
numNodesOnLevel=39139
     1> HNSW 1 [2025-02-12T20:15:00.406824Z; 
TEST-TestKnnFloatVectorQuery.testFewDistinctVectors-seed#[29F3C0E3B32A8A0D]]: 
vectorOps: max=18056 min=4 std=5352.391988634614 avg=9055.092
     1> HNSW 1 [2025-02-12T20:15:00.408499Z; 
TEST-TestKnnFloatVectorQuery.testFewDistinctVectors-seed#[29F3C0E3B32A8A0D]]: 
connect 2294 max component size=189 min component size=1 avg component 
size=1.0819529206625982 components on level=1 notFullyConnected=2480 
numNodesOnLevel=2482
     1> HNSW 1 [2025-02-12T20:15:00.515135Z; 
TEST-TestKnnFloatVectorQuery.testFewDistinctVectors-seed#[29F3C0E3B32A8A0D]]: 
vectorOps: max=1677 min=11 std=517.4268668130792 avg=871.3201
     1> HNSW 1 [2025-02-12T20:15:00.515518Z; 
TEST-TestKnnFloatVectorQuery.testFewDistinctVectors-seed#[29F3C0E3B32A8A0D]]: 
connect 117 max component size=35 min component size=1 avg component 
size=1.2905982905982907 components on level=2 notFullyConnected=150 
numNodesOnLevel=151
     1> HNSW 1 [2025-02-12T20:15:00.516252Z; 
TEST-TestKnnFloatVectorQuery.testFewDistinctVectors-seed#[29F3C0E3B32A8A0D]]: 
vectorOps: max=116 min=26 std=42.01213116516541 avg=56.77587
     1> HNSW 1 [2025-02-12T20:15:00.516475Z; 
TEST-TestKnnFloatVectorQuery.testFewDistinctVectors-seed#[29F3C0E3B32A8A0D]]: 
connect 1 max component size=8 min component size=8 avg component size=8.0 
components on level=3 notFullyConnected=8 numNodesOnLevel=8
   
   ```
   
   So, for a total of 39139 vectors, we end up with 36597 components to connect 
(so, almost every node is its own component). This then requires on average 9k 
vector ops to connect each component (worst case was 18056). 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to