mikemccand opened a new issue, #12487: URL: https://github.com/apache/lucene/issues/12487
### Description Over in https://github.com/mikemccand/luceneutil/issues/226 while trying to fix a sneaky and long-standing Lucene nightly benchmark non-determinism that affected `VectorSearch` and some `*TaxoFacets` performance measures, I struggled and failed/cheated to pick which `VectorSearch` queries to keep for disambiguation. The tasks file has: ``` VectorSearch: vector//publisher backstory # freq=194856 freq=148 VectorSearch: vector//many geografia # freq=99550 freq=104 VectorSearch: vector//many foundation # freq=99550 freq=10894 VectorSearch: vector//this school # freq=238551 freq=29912 VectorSearch: vector//such 2007 # freq=111526 freq=90200 1.2 VectorSearch: vector//year work # freq=175324 freq=102732 1.7 VectorSearch: vector//interviews # freq=31768 VectorSearch: vector//golf # freq=31760 VectorSearch: vector//http # freq=389790 ``` The benchy then computes embeddings from each of these lexical terms, and creates `KnnFloatVectorQuery` for each. But then later, if something goes wrong, the `toString` of these queries just renders the first dimension float: ``` TASK: cat=VectorSearch q=KnnFloatVectorQuery:vector[0.02625591,...][100] s=null group=null hits=100 facets=[] ``` I realize from the machine's standpoint it really is only this vector that "matters", but we humans still think in terms of words (so far, anyways, heh). Could we maybe allow for an optional opaque and not counting towards `hashCode`/`equals`/etc. string that is then regurgitated back out in `toString` to help we humans that still need to interact with the machines? If we had this, I could have made the correct fix over in https://github.com/mikemccand/luceneutil/issues/226 to try to gain back some continuity in the vector nightly charts. But instead I just picked the top 5 vector queries, which is most likely wrong. Also, there is precedent in Lucene for such "opaque for-human strings": the `String resourceDescription` passed to base `IndexInput` constructor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org