zacharymorn commented on pull request #180: URL: https://github.com/apache/lucene/pull/180#issuecomment-877530400
> I think my preference is for you to commit this PR as-is after all. It introduces a new TermVectors interface (not great usability) but I think in LUCENE-10018 it would be reasonable to have TermVectors abstraction change to _be_ what Fields is today for the TV use-case, with some modifications. TV would look like: > > ``` > interface TermVectors { > Collection<String> termVectorFields(int docId) throws IOException; > Terms termVectorTerms(int docId, String fieldName) throws IOException; > } > ``` > > Since TermVectorsReader will be a thread-safe clone of the original (unlike 8x), it's then reasonable for it to cache the current TV info for the most recently accessed doc. So if you get the Terms for all fields for the current doc, it won't have to re-read anything. WDYT? I think this is a great idea! I like that it removes the need of having an extra class, and also mirrors `Fields` 's APIs and inheritance relationship with `FieldsProducer` for TV usage. In terms of avoiding re-read, in addition to caching, I guess we can also add some java doc to recommend calling `termVectorTerms` immediately after `termVectorFields` for the same doc id before jumping to the next, so that users / applications won't accidentally call them in the wrong way and encounter performance issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org