RKSPD commented on PR #14892: URL: https://github.com/apache/lucene/pull/14892#issuecomment-3026031780
# Testing JVectorCodec Using luceneutil-jvector This guide provides step-by-step instructions for benchmarking and testing JVectorCodec performance using the luceneutil-jvector testing framework. ## Prerequisites * Java development environment with Gradle support * Python 3.x installed * Git installed * SSD storage recommended for optimal performance ## Setup Instructions ### 1. Environment Preparation Create a benchmark directory on an SSD for optimal I/O performance: ``` mkdir LUCENE_BENCH_HOME cd LUCENE_BENCH_HOME ``` ### 2. Repository Cloning Clone the required repositories: ``` git clone https://github.com/RKSPD/lucene-jvector lucene_candidate git clone https://github.com/RKSPD/luceneutil-jvector util ``` **Note:** The `lucene-jvector` repository contains the same code as the PR under review. ### 3. Initial Setup and Data Download Navigate to the utilities directory and run the initial setup: ``` cd util python3 src/python/initial_setup.py -d ``` This command will download the necessary test datasets. The download process may take some time depending on your internet connection. ### 4. Lucene Build While the data is downloading, open a new terminal session and build Lucene: ``` cd LUCENE_BENCH_HOME/lucene_candidate ./gradlew build ``` ## Running Performance Tests ### 5. Initial Test Run Once both the build and download processes are complete, navigate back to the utilities directory: ``` cd LUCENE_BENCH_HOME/util ``` Run the KNN performance test: ``` ./gradlew runKnnPerfTest ``` **Important:** The first execution will fail as expected. This initial run generates the path definitions for your Lucene repository and determines the Lucene version. ### 6. Successful Test Execution Run the performance test a second time: ``` ./gradlew runKnnPerfTest ``` This execution should complete successfully and provide performance metrics. ## Configuration and Tuning ### 7. Parameter Customization To customize the testing parameters for your specific benchmarking needs: #### Merge Policy Configuration * **File:** `util/src/main/knn/KnnIndexer.java` * **Purpose:** Configure the merge policy for index optimization #### Codec Configuration * **File:** `util/src/main/knn/KnnGraphTester.java` * **Method:** `getCodec()` * **Purpose:** Specify which codec implementation to test #### Performance Test Parameters * **File:** `src/python/knnPerfTest.py` * **Section:** `params` block * **Purpose:** Adjust various performance testing parameters including: * Vector dimensions * Index size * Query parameters * Recall targets * Other algorithm-specific settings ## Expected Outcomes Upon successful completion, you will have: * A fully configured benchmarking environment * Performance metrics comparing JVectorCodec against baseline implementations * Configurable parameters for comprehensive testing scenarios ## Troubleshooting * Ensure sufficient disk space for dataset downloads and index generation * Verify Java and Python environments are properly configured * Check network connectivity if initial setup fails during download phase * Confirm SSD usage for optimal I/O performance during benchmarking -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org