benwtrent commented on issue #13611: URL: https://github.com/apache/lucene/issues/13611#issuecomment-3347751096
@david-sitsky Missing one document for one query that is getting top-k=100 is a 99% recall for that query. Which is really really good. If you are missing true top k of 5 or more for many queries (recall <95%), that might be concerning. have you sampled out a couple hundred queries and calculated the overall average recall? That is the best way to benchmark. If you want to always have the best vector, no matter what, all the time, that means recall of 100%, which then gets very expensive and approximate knn may not be the best thing for you. > . Also, the parent doc in this instance only has a single nested... I wouldn't expect that to be a significant factor. You could try removing the parent/join thing, but I suspect you will run into other problems where you have to oversample a bunch of child docs as I expect multiple child documents to be similarly near the same query vector. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
