jtibshirani opened a new pull request, #11867: URL: https://github.com/apache/lucene/pull/11867
This is a rough draft of a large-scale test for kNN vectors. It tests a large dataset of kNN vectors to check for issues that only show up when segments are very large, like overflow. The dataset is based on the StackOverflow track from Elasticsearch's rally benchmarks: https://github.com/elastic/rally-tracks/tree/master/so_vector. I tried developing a test using random vectors, but HNSW can become quite slow and ineffective when the data doesn't have structure. Steps to run the test 1. Download the dataset: `wget https://rally-tracks.elastic.co/so_vector/documents.bin` 2. Move the dataset to the resources folder: `mv documents.bin lucene/core/src/resources/` 3. Start the test: `./gradlew test --tests TestManyKnnVectors.testLargeSegment -Dtests.monster=true -Dtests.verbose=true -Dorg.gradle.jvmargs="-Xms2g -Xmx2g" --max-workers=1` Relates to #11863. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org