rmuir opened a new pull request, #11998:
URL: https://github.com/apache/lucene/pull/11998
Currently the stored fields and term vectors apis on the index are
"stateless".
Unlike the other parts of the APIs, users can't call any
iterators/enumerators, they just do stuff like:
```
indexReader.document(0);
indexReader.document(1);
... (up to potentially thousands of docs because lusers do that)
```
Instead of adding any real iterator, threadlocals were added to prevent from
having to clone() the reader on every document. For example this could reduce
the amount of NIOFS buffer refills and so on.
But the old API from a previous time, only gets worse these days, because
the implementations are more complicated and do block-compression,
dictionaries, etc.
The threadlocals in segmentreader can cause memory issues if you have tons
of segments, tons of threads, or especially both. Seems plenty of java
developers can't help but run into it.
I propose we deprecate these APIs and let the user get the iterator
themselves e.g. per-search, without any threadlocal.
```
StoredFields fields = indexReader.storedFields();
for (docs in results) {
dosomethingwith = fields.document(n);
}
// now fields can be gc'd
```
It will re-use the datastructures if someone has thousands and thousands of
hits, but avoid the threadlocal pain.
NOTES: this is just a draft to demonstrate the idea. I'm not sure i have the
resources to see it through, since it is a lot of labor.
The old APIs/deprecations are here, and if you use deprecated methods, you
will use threadlocals just like before. But if you don't call deprecated APIs,
then no threadlocals are used.
I didn't cut over all tests (which would be enormous effort), and some tests
will fail with `java.lang.UnsupportedOperationException: deprecated document
access is not supported`. That's because I don't want threadlocal nonsense to
support deprecated stuff in CodecReader (belongs only in SegmentReader). We
should keep CodecReader clean. Unfortunately lots of tests like to wrap their
readers with codec/merging readers, so if they do that, these tests really need
to be fixed to get off the deprecated stuff, so they will pass again.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]