uschindler commented on code in PR #13219:
URL: https://github.com/apache/lucene/pull/13219#discussion_r1539364622
##########
lucene/core/src/java21/org/apache/lucene/store/PosixNativeAccess.java:
##########
@@ -137,17 +136,11 @@ public void madvise(MemorySegment segment, IOContext
context) throws IOException
}
private Integer mapIOContext(IOContext ctx) {
- // Merging always wins and implies sequential access, because kernel is
advised to free pages
- // after use:
- if (ctx.context() == Context.MERGE) {
- return POSIX_MADV_SEQUENTIAL;
- }
- if (ctx.randomAccess()) {
- return POSIX_MADV_RANDOM;
- }
- if (ctx.readOnce()) {
- return POSIX_MADV_SEQUENTIAL;
- }
- return null;
+ return switch (ctx.readAdvice()) {
Review Comment:
I think we can remove the context from the signature and change it to
`madvise(MemorySegment, ReadAdvice)`. `MemorySegmentIndexInputProvider` would
just pass `context.readAdvice()` to `madvice()` then.
##########
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##########
@@ -54,58 +43,74 @@ public enum Context {
DEFAULT
};
- public static final IOContext DEFAULT = new IOContext(Context.DEFAULT);
+ /** Advice regarding the read access pattern. */
+ public enum ReadAdvice {
+ /**
+ * Normal behavior. Data is expected to be read mostly sequentially. The
system is expected to
+ * cache the hottest pages.
+ */
+ NORMAL,
+ /**
+ * Data is expected to be read in a random-access fashion, either by {@link
+ * IndexInput#seek(long) seeking} often and reading relatively short
sequences of bytes at once,
+ * or by reading data through the {@link RandomAccessInput} abstraction in
random order.
+ */
+ RANDOM,
+ /** Data is expected to be read sequentially with very little seeking at
most. */
+ SEQUENTIAL,
+ /**
+ * Data is treated as random-access memory in practice. {@link Directory}
implementations may
+ * explicitly load the content of the file in memory, or provide hints to
the system so that it
+ * loads the content of the file into the page cache at open time. This
should only be used on
+ * very small files that can be expected to fit in RAM with very high
confidence.
+ */
+ LOAD
+ }
+
+ public static final IOContext DEFAULT =
+ new IOContext(Context.DEFAULT, null, null, ReadAdvice.NORMAL);
- public static final IOContext READONCE = new IOContext(true, false, false);
+ public static final IOContext READONCE = new
IOContext(ReadAdvice.SEQUENTIAL);
- public static final IOContext READ = new IOContext(false, false, false);
+ public static final IOContext READ = new IOContext(ReadAdvice.NORMAL);
- public static final IOContext LOAD = new IOContext(false, true, true);
+ public static final IOContext LOAD = new IOContext(ReadAdvice.LOAD);
- public static final IOContext RANDOM = new IOContext(false, false, true);
+ public static final IOContext RANDOM = new IOContext(ReadAdvice.RANDOM);
@SuppressWarnings("incomplete-switch")
public IOContext {
+ Objects.requireNonNull(context, "context must not be null");
+ Objects.requireNonNull(readAdvice, "readAdvice must not be null");
switch (context) {
case MERGE -> Objects.requireNonNull(
mergeInfo, "mergeInfo must not be null if context is MERGE");
case FLUSH -> Objects.requireNonNull(
flushInfo, "flushInfo must not be null if context is FLUSH");
}
- if (load && readOnce) {
- throw new IllegalArgumentException("load and readOnce are mutually
exclusive");
- }
- if (readOnce && randomAccess) {
- throw new IllegalArgumentException("readOnce and randomAccess are
mutually exclusive");
+ if (context == Context.MERGE && readAdvice != ReadAdvice.SEQUENTIAL) {
Review Comment:
This is really a good idea! It makes code much easier and the merge case
needs no special handling in MMapDir.
##########
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##########
@@ -27,22 +27,11 @@
* @param context An object of a enumerator Context type
* @param mergeInfo must be given when {@code context == MERGE}
* @param flushInfo must be given when {@code context == FLUSH}
- * @param readOnce This flag indicates that the file will be opened, then
fully read sequentially
- * then closed.
- * @param load This flag is used for files that are a small fraction of the
total index size and are
- * expected to be heavily accessed in random-access fashion. Some {@link
Directory}
- * implementations may choose to load such files into physical memory
(e.g. Java heap) as a way
- * to provide stronger guarantees on query latency.
- * @param randomAccess This flag indicates that the file will be accessed
randomly. If this flag is
- * set, then readOnce will be false.
+ * @param readAdvice Advice regarding the read access pattern. Write
operations should disregard
Review Comment:
Writing in our case is always sequential (OutputStream). If we have a
solutions for fadvise when writing files we can add another enum.
##########
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##########
@@ -54,58 +43,74 @@ public enum Context {
DEFAULT
};
- public static final IOContext DEFAULT = new IOContext(Context.DEFAULT);
+ /** Advice regarding the read access pattern. */
+ public enum ReadAdvice {
+ /**
+ * Normal behavior. Data is expected to be read mostly sequentially. The
system is expected to
+ * cache the hottest pages.
+ */
+ NORMAL,
+ /**
+ * Data is expected to be read in a random-access fashion, either by {@link
+ * IndexInput#seek(long) seeking} often and reading relatively short
sequences of bytes at once,
+ * or by reading data through the {@link RandomAccessInput} abstraction in
random order.
+ */
+ RANDOM,
+ /** Data is expected to be read sequentially with very little seeking at
most. */
Review Comment:
The madvise flags also say "Expect page references in sequential order.
(Hence, pages in the given range can be aggressively read ahead, and may be
freed soon after they are accessed.)"
The second sentence is important as this is exactly our use case
This is also the reason why we don't use sequential for preloaded files, as
it's a "read once" like approach.
##########
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##########
@@ -54,58 +43,74 @@ public enum Context {
DEFAULT
};
- public static final IOContext DEFAULT = new IOContext(Context.DEFAULT);
+ /** Advice regarding the read access pattern. */
+ public enum ReadAdvice {
Review Comment:
Maybe make this toplevel class!? I am tempting between both variants.
Could we maybe rename the inner `Context` as the name `IOContext` is so
similar.
##########
lucene/core/src/java/org/apache/lucene/store/IOContext.java:
##########
@@ -54,58 +43,74 @@ public enum Context {
DEFAULT
};
- public static final IOContext DEFAULT = new IOContext(Context.DEFAULT);
+ /** Advice regarding the read access pattern. */
+ public enum ReadAdvice {
+ /**
+ * Normal behavior. Data is expected to be read mostly sequentially. The
system is expected to
+ * cache the hottest pages.
+ */
+ NORMAL,
+ /**
+ * Data is expected to be read in a random-access fashion, either by {@link
+ * IndexInput#seek(long) seeking} often and reading relatively short
sequences of bytes at once,
+ * or by reading data through the {@link RandomAccessInput} abstraction in
random order.
+ */
+ RANDOM,
+ /** Data is expected to be read sequentially with very little seeking at
most. */
+ SEQUENTIAL,
+ /**
+ * Data is treated as random-access memory in practice. {@link Directory}
implementations may
+ * explicitly load the content of the file in memory, or provide hints to
the system so that it
+ * loads the content of the file into the page cache at open time. This
should only be used on
+ * very small files that can be expected to fit in RAM with very high
confidence.
+ */
+ LOAD
Review Comment:
I don't like load, should be preload, maybe `RANDOM_PRELOAD`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]