rmuir commented on issue #14655: URL: https://github.com/apache/lucene/issues/14655#issuecomment-2875770908
> So, I suggest a radical shift in the underlying posting format. Give up log-structured format (LSM)! Give up segments. LSM aims for sequential I/O, which is no longer relevant for NVMe SSDs. Have one monolithic index. Each term initially gets a page for its posting and related data. If the term is popular, progressively assign it chunks of multiple pages. Pages/chunks are chained using forward pointers. These pointers will be stored in an on-heap (or in-memory) data structure similar to skip offsets. (Embrace random SSD I/Os.) A single request can be broken down into many random pages on an SSD, and all these pages are part of the batch for io_submit(). On the write path, similarly, gather all updates in a submit buffer and issue hundreds of random I/Os. > Terms are no longer sorted You're free to write your own separate search engine, but this doesn't sound like lucene to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org