benwtrent commented on PR #13463:
URL: https://github.com/apache/lucene/pull/13463#issuecomment-2155711517

   @gsmiller 
   
   My directories are:
   
   `<common_parent_path>/candidate` <- Lucene branch
   `<common_parent_path>/baseline` <- Lucene main
   `<common_parent_path>/util` <- lucene util
   
   Once you have the directories all set up:
   
    - `ant build` to compile whenever you adjust things and before your first 
run. For this particular test, I went into `KnnIndexer.java` and adjusted 
`WRITER_BUFFER_MB` down to 12MB
    - `python ./src/python/knnPerfTest.py` to actually run the test, but you 
probably need some data and need to point it to some data.
    - 
[cohere_download_and_format.zip](https://github.com/user-attachments/files/15745875/cohere_download_and_format.zip)
 is a bash to download a bunch of parquet & then a python script to format them 
for ingesting. I think this might download cohere v3 (1024 dims, dot_product 
for the similarity).
    - For knnPerfTest, I adjust it to then look at the train and test set I 
just built, and adjust whatever settings I care about. 
   
   
   Pro tip, build your index just once (via the `reindex` parameter in 
`knnPerfTest`) but then you can do your candidate vs baseline queries against 
it which is WAY faster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to