dungba88 commented on PR #12624: URL: https://github.com/apache/lucene/pull/12624#issuecomment-1828936176
I checked some of the usage in the analysis module. SynonymGraphFilter cache the `BytesReader` on constructor, and I think TokenFilter by default are cached per field? But lots of other places does not have this cache, such as: - Stemmer - GeneratingSuggester - ... I think the main bottleneck are the non-trivial creation of the BufferList (in `toBufferList` or `toWritableBufferList`) and the ByteBufferDataInput. In the worst case, how do you think about reviving the BytesStore (with a much simpler implementation as it does have the scratch bytes operation)? Or alternatively, we could create a reusable block-based byte array DataOutput (not sure if there's already something like that). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org