zacharymorn commented on pull request #180:
URL: https://github.com/apache/lucene/pull/180#issuecomment-877530400


   > I think my preference is for you to commit this PR as-is after all. It 
introduces a new TermVectors interface (not great usability) but I think in 
LUCENE-10018 it would be reasonable to have TermVectors abstraction change to 
_be_ what Fields is today for the TV use-case, with some modifications. TV 
would look like:
   > 
   > ```
   > interface TermVectors {
   >   Collection<String> termVectorFields(int docId) throws IOException;
   >   Terms termVectorTerms(int docId, String fieldName) throws IOException;
   > }
   > ```
   > 
   > Since TermVectorsReader will be a thread-safe clone of the original 
(unlike 8x), it's then reasonable for it to cache the current TV info for the 
most recently accessed doc. So if you get the Terms for all fields for the 
current doc, it won't have to re-read anything. WDYT?
   
   I think this is a great idea! I like that it removes the need of having an 
extra class, and also mirrors `Fields` 's APIs and inheritance relationship 
with `FieldsProducer` for TV usage. In terms of avoiding re-read, in addition 
to caching, I guess we can also add some java doc to recommend calling 
`termVectorTerms` immediately after `termVectorFields` for the same doc id 
before jumping to the next, so that users / applications won't accidentally 
call them in the wrong way and encounter performance issue?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to