zacharymorn commented on pull request #180:
URL: https://github.com/apache/lucene/pull/180#issuecomment-877530400
> I think my preference is for you to commit this PR as-is after all. It
introduces a new TermVectors interface (not great usability) but I think in
LUCENE-10018 it would be reasonable to have TermVectors abstraction change to
_be_ what Fields is today for the TV use-case, with some modifications. TV
would look like:
>
> ```
> interface TermVectors {
> Collection<String> termVectorFields(int docId) throws IOException;
> Terms termVectorTerms(int docId, String fieldName) throws IOException;
> }
> ```
>
> Since TermVectorsReader will be a thread-safe clone of the original
(unlike 8x), it's then reasonable for it to cache the current TV info for the
most recently accessed doc. So if you get the Terms for all fields for the
current doc, it won't have to re-read anything. WDYT?
I think this is a great idea! I like that it removes the need of having an
extra class, and also mirrors `Fields` 's APIs and inheritance relationship
with `FieldsProducer` for TV usage. In terms of avoiding re-read, in addition
to caching, I guess we can also add some java doc to recommend calling
`termVectorTerms` immediately after `termVectorFields` for the same doc id
before jumping to the next, so that users / applications won't accidentally
call them in the wrong way and encounter performance issue?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]