Re: [PR] Change uses of withReadAdvice to use hints instead [lucene]

via GitHub Fri, 02 May 2025 06:32:54 -0700


jpountz commented on code in PR #14510:
URL: https://github.com/apache/lucene/pull/14510#discussion_r2071603158



##########
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##########
@@ -277,6 +277,32 @@ public final long getMaxChunkSize() {
     return 1L << chunkSizePower;
   }
 
+  public static ReadAdvice toReadAdvice(IOContext context) {
+    if (context.context() == IOContext.Context.MERGE
+        || context.context() == IOContext.Context.FLUSH) {
+      return ReadAdvice.SEQUENTIAL;
+    }
+
+    if (context.hints().contains(DataAccessHint.RANDOM)) {
+      return ReadAdvice.RANDOM;
+    }
+    if (context.hints().contains(DataAccessHint.SEQUENTIAL)) {
+      return ReadAdvice.SEQUENTIAL;
+    }
+
+    if (context.hints().contains(FileTypeHint.DATA)
+        || context.hints().contains(FileTypeHint.INDEX)) {
+      return ReadAdvice.NORMAL;
+    }
+    // Postings have a forward-only access pattern, so pass ReadAdvice.NORMAL 
to perform
+    // readahead.
+    if (context.hints().contains(FileDataHint.POSTINGS)) {
+      return ReadAdvice.NORMAL;
+    }

Review Comment:
   I feel like this default implementation should be dumb and trust the 
`DataAccessHint`, and the `FileDataHint` hint should only be used by users to 
override defaults when they know better about their data / access pattern?



##########
lucene/test-framework/src/java/org/apache/lucene/tests/store/SerialIOCountingDirectory.java:
##########
@@ -72,7 +72,7 @@ public ChecksumIndexInput openChecksumInput(String name) 
throws IOException {
 
   @Override
   public IndexInput openInput(String name, IOContext context) throws 
IOException {
-    ReadAdvice readAdvice = 
context.readAdvice().orElse(Constants.DEFAULT_READADVICE);
+    ReadAdvice readAdvice = MMapDirectory.toReadAdvice(context);

Review Comment:
   I would expect this class to make the following assumptions
    - METADATA implies that the file is read once and its content is loaded 
into heap,
    - INDEX implies that the file should fit in the page cache, so count only 
one I/O (potentially per block) at open time
    - DATA implies that the file may be larger than the page cache, so it needs 
to increment the counter when moving to a different page.
   
   (Related to my other comment about the default implementation in 
`MMapDirectory` needing to be simple and not looking at the kind of data it's 
working on.)
   
   Currently, the test tries to differenciate between random-access data files 
and sequential-access data files, I'd be happy with either trusting the 
`DataAccessHint` or always assuming sequential access.



##########
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene50/Lucene50CompoundReader.java:
##########
@@ -74,7 +74,7 @@ public Lucene50CompoundReader(Directory directory, 
SegmentInfo si) throws IOExce
     }
     expectedLength += CodecUtil.footerLength();
 
-    handle = directory.openInput(dataFileName, 
IOContext.DEFAULT.withReadAdvice(ReadAdvice.NORMAL));
+    handle = directory.openInput(dataFileName, 
IOContext.DEFAULT.withHints(FileTypeHint.DATA));

Review Comment:
   `FileTypeHint.DATA` feels confusing as this file has `METADATA` and `INDEX` 
sub files?



##########
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene101/Lucene101PostingsReader.java:
##########
@@ -149,9 +150,9 @@ public Lucene101PostingsReader(SegmentReadState state) 
throws IOException {
         IndexFileNames.segmentFileName(
             state.segmentInfo.name, state.segmentSuffix, 
Lucene101PostingsFormat.DOC_EXTENSION);
     try {
-      // Postings have a forward-only access pattern, so pass 
ReadAdvice.NORMAL to perform
-      // readahead.
-      docIn = state.directory.openInput(docName, 
state.context.withReadAdvice(ReadAdvice.NORMAL));
+      docIn =
+          state.directory.openInput(
+              docName, state.context.withHints(FileTypeHint.DATA, 
FileDataHint.POSTINGS));

Review Comment:
   Should this call also pass a `DataAccessHint`? The fact that it's sometimes 
provided and sometimes not is a bit confusing to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Change uses of withReadAdvice to use hints instead [lucene]

Reply via email to