msokolov commented on issue #13565: URL: https://github.com/apache/lucene/issues/13565#issuecomment-2301788507
Hi thanks for that @jpountz, no worries; this was something we all agreed on. I'm able to continue with the "research" part of this by simply increasing heap size - it's not a blocker. At the same time I think we might want to reintroduce random-access vector readers as a first-class API for other reasons. Even the current case of merging multiple large segments containing vectors would be affected by this, wouldn't it? Since SortingCodecReader is used by IndexWriter when merging sorted indexes, it means that in that case all vector data of segments being merged is held in RAM, potentially requiring quite a lot of RAM when instead we could read from "disk" at the cost of some random accesses. I guess disk random accesses are generally to be avoided, but given that the alternative is to "page in" every vector page to the heap, I would think we would prefer to let the OS do the paging as usual for our index data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org