kaivalnp commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2715605453
Thanks for the review @navneet1v! > lucene util branch You can find some (very hacky) changes [here](https://github.com/kaivalnp/luceneutil/tree/faiss). Broad steps to run the benchmark: 1. Install conda (I used `miniconda3`) and install `faiss-cpu=1.10.0` as explained [here](https://github.com/facebookresearch/faiss/blob/main/INSTALL.md) (to directly get all shared library dependencies into the `$CONDA_PREFIX/lib` folder) 2. Checkout Faiss (at [this branch](https://github.com/kaivalnp/faiss/tree/custom_io_c) until the corresponding PR is merged) 3. Build the C_API (`libfaiss_c.so`) as explained [here](https://github.com/facebookresearch/faiss/blob/main/INSTALL.md). This is required till a corresponding PR is merged which pushes the C_API to conda, after which we won't need to build it! 4. Add the library to runtime along with all dependencies (something like `export LD_LIBRARY_PATH=$FAISS_DIR/build/c_api:$CONDA_PREFIX/lib`) 5. Run the setup as explained in luceneutil and generate vectors (something like `./gradlew vector-300`) 6. Run the benchmark (using `./gradlew runKnnPerfTest`) 7. To change from Lucene <-> Faiss, comment out the corresponding function in `KnnGraphTester` 8. Set `$OMP_NUM_THREADS=1` -- see above conversations for why (I'll also add a comment in the codec for this) Sample output(s) showing ~20% speedup with Faiss: Lucene: ``` recall latency (ms) nDoc topK fanout maxConn beamWidth quantized visited index s index docs/s force merge s num segments index size (MB) vec disk (MB) vec RAM (MB) 0.812 0.734 200000 100 50 32 200 no 1385 161.91 1235.25 0.01 1 236.93 228.882 228.882 ``` Faiss: ``` recall latency (ms) nDoc topK fanout maxConn beamWidth quantized visited index s index docs/s force merge s num segments index size (MB) vec disk (MB) vec RAM (MB) 0.811 0.568 200000 100 50 32 200 no 0 135.45 1476.61 0.01 1 511.97 228.882 228.882 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org