jpountz commented on code in PR #14510: URL: https://github.com/apache/lucene/pull/14510#discussion_r2071603158
########## lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java: ########## @@ -277,6 +277,32 @@ public final long getMaxChunkSize() { return 1L << chunkSizePower; } + public static ReadAdvice toReadAdvice(IOContext context) { + if (context.context() == IOContext.Context.MERGE + || context.context() == IOContext.Context.FLUSH) { + return ReadAdvice.SEQUENTIAL; + } + + if (context.hints().contains(DataAccessHint.RANDOM)) { + return ReadAdvice.RANDOM; + } + if (context.hints().contains(DataAccessHint.SEQUENTIAL)) { + return ReadAdvice.SEQUENTIAL; + } + + if (context.hints().contains(FileTypeHint.DATA) + || context.hints().contains(FileTypeHint.INDEX)) { + return ReadAdvice.NORMAL; + } + // Postings have a forward-only access pattern, so pass ReadAdvice.NORMAL to perform + // readahead. + if (context.hints().contains(FileDataHint.POSTINGS)) { + return ReadAdvice.NORMAL; + } Review Comment: I feel like this default implementation should be dumb and trust the `DataAccessHint`, and the `FileDataHint` hint should only be used by users to override defaults when they know better about their data / access pattern? ########## lucene/test-framework/src/java/org/apache/lucene/tests/store/SerialIOCountingDirectory.java: ########## @@ -72,7 +72,7 @@ public ChecksumIndexInput openChecksumInput(String name) throws IOException { @Override public IndexInput openInput(String name, IOContext context) throws IOException { - ReadAdvice readAdvice = context.readAdvice().orElse(Constants.DEFAULT_READADVICE); + ReadAdvice readAdvice = MMapDirectory.toReadAdvice(context); Review Comment: I would expect this class to make the following assumptions - METADATA implies that the file is read once and its content is loaded into heap, - INDEX implies that the file should fit in the page cache, so count only one I/O (potentially per block) at open time - DATA implies that the file may be larger than the page cache, so it needs to increment the counter when moving to a different page. (Related to my other comment about the default implementation in `MMapDirectory` needing to be simple and not looking at the kind of data it's working on.) Currently, the test tries to differenciate between random-access data files and sequential-access data files, I'd be happy with either trusting the `DataAccessHint` or always assuming sequential access. ########## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene50/Lucene50CompoundReader.java: ########## @@ -74,7 +74,7 @@ public Lucene50CompoundReader(Directory directory, SegmentInfo si) throws IOExce } expectedLength += CodecUtil.footerLength(); - handle = directory.openInput(dataFileName, IOContext.DEFAULT.withReadAdvice(ReadAdvice.NORMAL)); + handle = directory.openInput(dataFileName, IOContext.DEFAULT.withHints(FileTypeHint.DATA)); Review Comment: `FileTypeHint.DATA` feels confusing as this file has `METADATA` and `INDEX` sub files? ########## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene101/Lucene101PostingsReader.java: ########## @@ -149,9 +150,9 @@ public Lucene101PostingsReader(SegmentReadState state) throws IOException { IndexFileNames.segmentFileName( state.segmentInfo.name, state.segmentSuffix, Lucene101PostingsFormat.DOC_EXTENSION); try { - // Postings have a forward-only access pattern, so pass ReadAdvice.NORMAL to perform - // readahead. - docIn = state.directory.openInput(docName, state.context.withReadAdvice(ReadAdvice.NORMAL)); + docIn = + state.directory.openInput( + docName, state.context.withHints(FileTypeHint.DATA, FileDataHint.POSTINGS)); Review Comment: Should this call also pass a `DataAccessHint`? The fact that it's sometimes provided and sometimes not is a bit confusing to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org