martijnvg opened a new issue, #14881: URL: https://github.com/apache/lucene/issues/14881
### Description Today when index sorting is enabled and stored fields get flushed then the `SortingStoredFieldsConsumer` gets used in order to store stored fields in the order in which index sorting is configured. This class writes temp files to disk that then get read completely twice. The first time is to do an integrity check and the second time the temp files are read in random order. This to write stored fields in the right order (defined by index sorting) in the new segment. During heavy indexing the fact that the stored field temp files are read twice is expensive. Especially given that these temp files will be removed after flushing has completed. In other formats (postings, bkd tree, quantized vectors), tmp files that get created during writing seem to be read only once. During reading either integrity is check using `Directory#openChecksumInput()` (only possible if temp file is read from beginning to end) or there is a footer check (reads and validates CRC, footer magic and algorithm id). I wonder whether it makes sense to remove the full separate [integrity check](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SortingStoredFieldsConsumer.java#L106) in `SortingStoredFieldsConsumer`? This can be quite costly, especially the integrity check for the temp fdt file and also there is already some light integrity checking via `CodecUtil.checkFooter(...)` in `Lucene90CompressingStoredFieldsReader`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org