jtibshirani opened a new pull request, #11867:
URL: https://github.com/apache/lucene/pull/11867

   This is a rough draft of a large-scale test for kNN vectors.
   
   It tests a large dataset of kNN vectors to check for issues that only show 
up when
   segments are very large, like overflow. The dataset is based on the 
StackOverflow
   track from Elasticsearch's rally benchmarks: 
https://github.com/elastic/rally-tracks/tree/master/so_vector.
   I tried developing a test using random vectors, but HNSW can become quite 
slow
   and ineffective when the data doesn't have structure.
    
   Steps to run the test
   1. Download the dataset: `wget 
https://rally-tracks.elastic.co/so_vector/documents.bin`
   2. Move the dataset to the resources folder: `mv documents.bin 
lucene/core/src/resources/`
    3. Start the test: `./gradlew test --tests 
TestManyKnnVectors.testLargeSegment -Dtests.monster=true -Dtests.verbose=true 
-Dorg.gradle.jvmargs="-Xms2g -Xmx2g" --max-workers=1`
   
   Relates to #11863.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to