jpountz opened a new issue, #13179: URL: https://github.com/apache/lucene/issues/13179
### Description Currently, Lucene's I/O concurrency is bound by the search concurrency. If `IndexSearcher` runs on N threads, then Lucene will never perform more than N I/Os concurrently. Unless you significantly overprovision your search thread pool - which is bad for other reasons, Lucene will bottleneck on I/O latency without even maxing out the IOPS of the host. I don't think that Lucene should fully embrace asynchronousness in its APIs, or query evaluation would become overly complicated. But I still expect that we have a lot of room for improvement to allow each search thread to perform multiple I/Os concurrently under the hood when needed. Some examples: - When running a query on two terms, e.g. `apache OR lucene`, could the I/O lookups in the `tim` file (terms dictionary) be performed concurrently for both terms? - When running a query on two terms and start offsets in the `doc` file (postings) have been resolved, could we start loading the first bytes from these postings lists from disk concurrently? - When fetching the top N=100 stored documents that match a query, could we load bytes from the `fdt` file (stored fields) for all these documents concurrently? This would require API changes in our `Directory` APIs, and some low-level `IndexReader` APIs (`TermsEnum`, `StoredFieldsReader`?). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org