benwtrent commented on PR #13463: URL: https://github.com/apache/lucene/pull/13463#issuecomment-2155711517
@gsmiller My directories are: `<common_parent_path>/candidate` <- Lucene branch `<common_parent_path>/baseline` <- Lucene main `<common_parent_path>/util` <- lucene util Once you have the directories all set up: - `ant build` to compile whenever you adjust things and before your first run. For this particular test, I went into `KnnIndexer.java` and adjusted `WRITER_BUFFER_MB` down to 12MB - `python ./src/python/knnPerfTest.py` to actually run the test, but you probably need some data and need to point it to some data. - [cohere_download_and_format.zip](https://github.com/user-attachments/files/15745875/cohere_download_and_format.zip) is a bash to download a bunch of parquet & then a python script to format them for ingesting. I think this might download cohere v3 (1024 dims, dot_product for the similarity). - For knnPerfTest, I adjust it to then look at the train and test set I just built, and adjust whatever settings I care about. Pro tip, build your index just once (via the `reindex` parameter in `knnPerfTest`) but then you can do your candidate vs baseline queries against it which is WAY faster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org