[GitHub] [lucene] rmuir opened a new pull request, #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

GitBox Mon, 05 Dec 2022 19:15:29 -0800


rmuir opened a new pull request, #11998:
URL: https://github.com/apache/lucene/pull/11998


   Currently the stored fields and term vectors apis on the index are 
"stateless".
   Unlike the other parts of the APIs, users can't call any 
iterators/enumerators, they just do stuff like:
   ```
   indexReader.document(0);
   indexReader.document(1);
   ... (up to potentially thousands of docs because lusers do that)
   ```
   
   Instead of adding any real iterator, threadlocals were added to prevent from 
having to clone() the reader on every document. For example this could reduce 
the amount of NIOFS buffer refills and so on.
   
   But the old API from a previous time, only gets worse these days, because 
the implementations are more complicated and do block-compression, 
dictionaries, etc.
   
   The threadlocals in segmentreader can cause memory issues if you have tons 
of segments, tons of threads, or especially both. Seems plenty of java 
developers can't help but run into it.
   
   I propose we deprecate these APIs and let the user get the iterator 
themselves e.g. per-search, without any threadlocal.
   ```
   StoredFields fields = indexReader.storedFields();
   for (docs in results) {
      dosomethingwith = fields.document(n);
   }
   // now fields can be gc'd
   ```
   It will re-use the datastructures if someone has thousands and thousands of 
hits, but avoid the threadlocal pain.
   
   NOTES: this is just a draft to demonstrate the idea. I'm not sure i have the 
resources to see it through, since it is a lot of labor.
   The old APIs/deprecations are here, and if you use deprecated methods, you 
will use threadlocals just like before. But if you don't call deprecated APIs, 
then no threadlocals are used.
   
   I didn't cut over all tests (which would be enormous effort), and some tests 
will fail with `java.lang.UnsupportedOperationException: deprecated document 
access is not supported`. That's because I don't want threadlocal nonsense to 
support deprecated stuff in CodecReader (belongs only in SegmentReader). We 
should keep CodecReader clean. Unfortunately lots of tests like to wrap their 
readers with codec/merging readers, so if they do that, these tests really need 
to be fixed to get off the deprecated stuff, so they will pass again.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir opened a new pull request, #11998: Migrate away from per-segment-per-threadlocals on SegmentReader

Reply via email to