RKSPD commented on PR #14892:
URL: https://github.com/apache/lucene/pull/14892#issuecomment-3026031780

   # Testing JVectorCodec Using luceneutil-jvector
   
   This guide provides step-by-step instructions for benchmarking and testing 
JVectorCodec performance using the luceneutil-jvector testing framework.
   
   ## Prerequisites
   
   * Java development environment with Gradle support
   * Python 3.x installed
   * Git installed
   * SSD storage recommended for optimal performance
   
   ## Setup Instructions
   
   ### 1. Environment Preparation
   
   Create a benchmark directory on an SSD for optimal I/O performance:
   
   ```
   mkdir LUCENE_BENCH_HOME
   cd LUCENE_BENCH_HOME
   ```
   
   ### 2. Repository Cloning
   
   Clone the required repositories:
   
   ```
   git clone https://github.com/RKSPD/lucene-jvector lucene_candidate
   git clone https://github.com/RKSPD/luceneutil-jvector util
   ```
   
   **Note:** The `lucene-jvector` repository contains the same code as the PR 
under review.
   
   ### 3. Initial Setup and Data Download
   
   Navigate to the utilities directory and run the initial setup:
   
   ```
   cd util
   python3 src/python/initial_setup.py -d
   ```
   
   This command will download the necessary test datasets. The download process 
may take some time depending on your internet connection.
   
   ### 4. Lucene Build
   
   While the data is downloading, open a new terminal session and build Lucene:
   
   ```
   cd LUCENE_BENCH_HOME/lucene_candidate
   ./gradlew build
   ```
   
   ## Running Performance Tests
   
   ### 5. Initial Test Run
   
   Once both the build and download processes are complete, navigate back to 
the utilities directory:
   
   ```
   cd LUCENE_BENCH_HOME/util
   ```
   
   Run the KNN performance test:
   
   ```
   ./gradlew runKnnPerfTest
   ```
   
   **Important:** The first execution will fail as expected. This initial run 
generates the path definitions for your Lucene repository and determines the 
Lucene version.
   
   ### 6. Successful Test Execution
   
   Run the performance test a second time:
   
   ```
   ./gradlew runKnnPerfTest
   ```
   
   This execution should complete successfully and provide performance metrics.
   
   ## Configuration and Tuning
   
   ### 7. Parameter Customization
   
   To customize the testing parameters for your specific benchmarking needs:
   
   #### Merge Policy Configuration
   
   * **File:** `util/src/main/knn/KnnIndexer.java`
   * **Purpose:** Configure the merge policy for index optimization
   
   #### Codec Configuration
   
   * **File:** `util/src/main/knn/KnnGraphTester.java`
   * **Method:** `getCodec()`
   * **Purpose:** Specify which codec implementation to test
   
   #### Performance Test Parameters
   
   * **File:** `src/python/knnPerfTest.py`
   * **Section:** `params` block
   * **Purpose:** Adjust various performance testing parameters including: 
       * Vector dimensions
       * Index size
       * Query parameters
       * Recall targets
       * Other algorithm-specific settings
   
   ## Expected Outcomes
   
   Upon successful completion, you will have:
   
   * A fully configured benchmarking environment
   * Performance metrics comparing JVectorCodec against baseline implementations
   * Configurable parameters for comprehensive testing scenarios
   
   ## Troubleshooting
   
   * Ensure sufficient disk space for dataset downloads and index generation
   * Verify Java and Python environments are properly configured
   * Check network connectivity if initial setup fails during download phase
   * Confirm SSD usage for optimal I/O performance during benchmarking


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to