Re: [I] Introduce a `pread` Directory based on Panama-FFI ? [lucene]

via GitHub Mon, 11 May 2026 09:04:05 -0700


neoremind commented on issue #16044:
URL: https://github.com/apache/lucene/issues/16044#issuecomment-4422435972


   Hi Guo Feng,
   
   This is a really interesting issue. I spent some time spinning up a JMH test 
to vet and reproduce the `NativeThreadSet` contention, mmap page-fault storms, 
and FFI pread as an alternative. Figured I'd like to share the results here.
   
   ## Setup
   
   I wrote two JMH benchmarks (see Appendix#1):
   - **RandomReadIOBenchmark** : 16 random 16 KiB reads per op
   - **SequentialReadIOBenchmark** : 1 random seek + 15 sequential 16 KiB reads 
per op
   
   Each benchmark tests five I/O strategies:
   1. **mmap** (`MemorySegment.copy` from a mapped file, simulates 
`MMapDirectory`)
   2. **FFI pread** (direct `pread(2)` syscall via FFI)
   3. **FileChannel + DirectByteBuffer** (simulates `NIOFSDirectory`)
   4. **FileChannel + HeapByteBuffer** (same, with extra bounce-buffer copy)
   5. **FFI pread + O_DIRECT** (bypasses kernel's page cache)
   
   Thread counts: 1, 4, 8, 16.
   
   **Environment:** c5.4xlarge (32 GiB RAM, 16 vCPU), io2 EBS (20K provisioned 
IOPS). JDK 25.0.2, `-Xms2g -Xmx2g`. Three data files: 16 GiB, 32 GiB, 64 GiB 
(created with like `dd if=/dev/urandom of=/path/data.dat bs=1M count=2048`).  I 
didn't test against nvme ssd with higher IOPS and throughput, but I think the 
numbers based on block store below speak enough.
   
   **Procedure:** Before each run, drop system page caches (`echo 3 > 
/proc/sys/vm/drop_caches`), then warm the page cache by reading the file (`cat 
file > /dev/null`). This gives a controlled starting state to feed data into 
page cache as much as possible. The key variable is how much of the working set 
fits in the ~30G available for page cache.
   
   ## Results
   
   ### Random Read - 16G file (fits in memory, ops/ms)
   
   | Threads | mmap | FFI pread | FileChannel Direct | FileChannel Heap | FFI 
O_DIRECT |
   
|---------|------|-----------|-------------------|-----------------|--------------|
   | 1       | 38.2 | 24.7      | 23.8              | 19.7            | 0.15    
     |
   | 4       | 144.9| 86.8      | 76.8              | 64.9            | 0.60    
     |
   | 8       | 267.1| 149.7     | 95.9              | 88.3            | 1.21    
     |
   | 16      | **309.9** | **205.4** | 92.1         | 86.0            | 1.25    
     |
   
   ### Random Read - 32G file (at memory limit, ops/ms)
   
   | Threads | mmap | FFI pread | FileChannel Direct | FileChannel Heap | FFI 
O_DIRECT |
   
|---------|------|-----------|-------------------|-----------------|--------------|
   | 1       | 0.70 | 1.14      | 0.90              | 0.88            | 0.15    
     |
   | 4       | 2.87 | 4.04      | 3.62              | 3.52            | 0.60    
     |
   | 8       | 3.50 | 4.47      | 4.01              | 3.87            | 1.21    
     |
   | 16      | 3.32 | **4.21**  | 3.94              | 3.83            | 1.25    
     |
   
   ### Random Read - 64G file (exceeds memory, ops/ms)
   
   | Threads | mmap | FFI pread | FileChannel Direct | FileChannel Heap | FFI 
O_DIRECT |
   
|---------|------|-----------|-------------------|-----------------|--------------|
   | 1       | 0.12 | 0.23      | 0.20              | 0.19            | 0.15    
     |
   | 4       | 0.50 | 0.87      | 0.81              | 0.78            | 0.60    
     |
   | 8       | 0.50 | 1.57      | 1.35              | 1.28            | 1.19    
     |
   | 16      | 0.51 | **1.43**  | 1.32              | 1.26            | 1.25    
     |
   
   ### Sequential Read - 16G file (fits in memory, ops/ms)
   
   | Threads | mmap | FFI pread | FileChannel Direct | FileChannel Heap | FFI 
O_DIRECT |
   
|---------|------|-----------|-------------------|-----------------|--------------|
   | 1       | 46.1 | 27.7      | 26.6              | 21.0            | 0.16    
     |
   | 4       | 172.7| 92.8      | 79.6              | 68.1            | 0.63    
     |
   | 8       | 311.7| 157.2     | 88.7              | 90.0            | 1.25    
     |
   | 16      | **334.9** | **218.6** | 95.9         | 85.6            | 1.25    
     |
   
   ### Sequential Read - 32G file (at memory limit, ops/ms)
   
   | Threads | mmap | FFI pread | FileChannel Direct | FileChannel Heap | FFI 
O_DIRECT |
   
|---------|------|-----------|-------------------|-----------------|--------------|
   | 1       | 2.63 | 3.19      | 2.49              | 2.33            | 0.15    
     |
   | 4       | 10.8 | 11.0      | 9.85              | 9.41            | 0.62    
     |
   | 8       | 14.9 | 16.3      | 14.8              | 14.4            | 1.23    
     |
   | 16      | 14.9 | **15.3**  | 14.5              | 14.3            | 1.25    
     |
   
   ### Sequential Read - 64G file (exceeds memory, ops/ms)
   
   | Threads | mmap | FFI pread | FileChannel Direct | FileChannel Heap | FFI 
O_DIRECT |
   
|---------|------|-----------|-------------------|-----------------|--------------|
   | 1       | 0.75 | 0.70      | 0.67              | 0.67            | 0.16    
     |
   | 4       | 2.19 | 2.36      | 2.47              | 2.47            | 0.62    
     |
   | 8       | 2.19 | 2.42      | 2.44              | 2.43            | 1.24    
     |
   | 16      | 2.19 | **2.44**  | 2.44              | 2.44            | 1.25    
     |
   
   ## Key Observations
   
   **1. NativeThreadSet contention is real and dramatic.**
   
   When the working set fits in RAM (the "happy path"), FFI pread delivers 
**2.2x the throughput** of FileChannel at 16 threads. And FileChannel actually 
*regresses* from T8 to T16, lock contention occurrs where adding threads makes 
things worse. FFI pread scales linearly throughout.
   
   At T1 the diff is tiny because the lock is uncontended. The gap opens 
progressively with concurrency.
   
   (Random Read - 16G file)
   
   | Threads | FFI pread | FileChannel Direct | FFI advantage |
   |---------|-----------|-------------------|---------------|
   | 1 | 24.7 | 23.8 | 1.04x |
   | 4 | 86.8 | 76.8 | 1.13x |
   | 8 | 149.7 | 95.9 | **1.56x** |
   | 16 | 205.4 | 92.1 | **2.23x** |
   
   FFI pread scales almost linearly (8.3x at T16 vs T1). FileChannel peaks at 
T8 and then *drops*. That's the NativeThreadSet synchronized block becoming the 
bottleneck.
   
   **2. mmap collapses under memory pressure - exactly as described.**
   
   mmap goes from 310 ops/ms (warm) to 0.51 ops/ms (64G working set). When the 
working set exceeds available memory, `pgmajfault` surges (grep 
"pgfault|pgmajfault" seeing the number jumps) and throughput collapses, this is 
exactly what the issue observed.
   
   The full scaling picture under pressure (64G random read) tells the story:
   
   | Threads | mmap | FFI pread | FileChannel |
   |---------|------|-----------|-------------|
   | 1 | 0.12 | 0.23 | 0.20 |
   | 4 | 0.50 | 0.87 | 0.81 |
   | 8 | 0.50 | 1.57 | 1.35 |
   | 16 | 0.51 | 1.43 | 1.32 |
   
   mmap flatlines after T4, adding 4x more threads gives no additional 
throughput. pread variants continue scaling to T8 before hitting the IOPS 
ceiling, likely because the page cache read path has a cheaper eviction penalty 
than mmap's page fault handling.
   
   **3. Sequential access helps mmap significantly, but doesn't save it.**
   
   Under pressure (64G), sequential mmap is 4.3x better than random mmap (2.19 
vs 0.51) thanks to kernel readahead. But pread is still slightly ahead (2.44 vs 
2.19).
   
   **4. When disk-bound, FFI pread and FileChannel converge.**
   
   At 64G (severe pressure), FFI pread (1.43) vs FileChannel (1.32) is only 
with little diff. The NativeThreadSet lock overhead is invisible when disk I/O 
spending is way bigger. The contention story only matters when I/O is fast like 
with page cache hits.
   
   **5. O_DIRECT confirms the IOPS ceiling.**
   
   O_DIRECT saturates at exactly 1.25 ops/ms (1.25 × 16 reads/op = 20K IOPS — 
our provisioned limit). This serves as a nice baseline confirming the disk is 
the bottleneck when cache misses dominate.
   
   **6. Cold start with small file (2G, fits in RAM).**
   
   I also ran a test with a 2G file (fits in 32G RAM) but dropping page caches 
before each iteration simulating a cold-start scenario, adding each iteration 
to run up to 10 seconds instead of 5s above:
   
   | Threads | mmap | FFI pread | FileChannel Direct |
   |---------|------|-----------|-------------------|
   | 1       | 0.21 | 0.14      | 0.14              |
   | 4       | 148.7| 8.8       | 7.5               |
   | 8       | 278.6| 57.3      | 38.4              |
   | 16      | 316.6| 80.2      | 41.0              |
   
   At T1, everyone is disk-bound and roughly equal. But mmap warms the cache 
way faster, by T4 it's already at 148.7 ops/ms while pread is still at 8.8. 
This is because once mmap faults in the pages, subsequent accesses within the 
same iteration are pure memory reads (zero-copy, no syscall). The 2G file gets 
fully resident quickly and the rest of the iteration takes advantage and runs 
at memory speed.
   
   Running command: 
`BENCH_FILE=/home/ec2-user/environment/data/pread-bench-2G.dat 
BENCH_FILE_SIZE_MIB=2048 BENCH_DROP_CACHES=true ./gradlew jmh --rerun 
-Pjmh.includes='RandomReadIOBenchmark'`.
   
   ## Conclusions
   
   The benchmark data supports the direction of this issue:
   
   1. **mmap degrades under memory pressure** once the index working set 
exceeds the available memory. This is because `pgmajfault` surges and 
throughput collapses.
   
   2. **mmap is the most performant option when memory is not the limit**, 
perfect if everything fits in memory staying in page cache. In that scenario, 
nothing beats zero-copy memory reads. Even in a cold-start scenario where the 
file fits in RAM, mmap warms the cache faster than pread and quickly reaches 
memory-speed throughput.
   
   3. **FileChannel thread contention is verified**, and FFI pread eliminates 
`NativeThreadSet` contention delivering 2.2x better throughput at 16 threads in 
the warm-cache case, while FileChannel actually regresses.
   
   4. **FFI pread is a good option for memory-bound scenarios**, it matches or 
beats FileChannel in almost all scenarios, and avoids mmap's severe degradation 
under memory pressure. This looks like a good option for memory-bound 
deployments (cgroup-limited containers, large indices compared to available 
RAM).
   
   ## Appendix
   
   ### Appendix 1. JMH test cases
   
   <details>
   <summary>RandomReadIOBenchmark.java</summary>
   
   ```java
   import java.io.IOException;
   import java.lang.foreign.Arena;
   import java.lang.foreign.FunctionDescriptor;
   import java.lang.foreign.Linker;
   import java.lang.foreign.MemorySegment;
   import java.lang.foreign.SymbolLookup;
   import java.lang.foreign.ValueLayout;
   import java.lang.invoke.MethodHandle;
   import java.nio.ByteBuffer;
   import java.nio.channels.FileChannel;
   import java.nio.channels.FileChannel.MapMode;
   import java.nio.file.Files;
   import java.nio.file.Path;
   import java.nio.file.StandardOpenOption;
   import java.util.concurrent.ThreadLocalRandom;
   import java.util.concurrent.TimeUnit;
   import org.openjdk.jmh.annotations.Benchmark;
   import org.openjdk.jmh.annotations.BenchmarkMode;
   import org.openjdk.jmh.annotations.Fork;
   import org.openjdk.jmh.annotations.Level;
   import org.openjdk.jmh.annotations.Measurement;
   import org.openjdk.jmh.annotations.Mode;
   import org.openjdk.jmh.annotations.OutputTimeUnit;
   import org.openjdk.jmh.annotations.Scope;
   import org.openjdk.jmh.annotations.Setup;
   import org.openjdk.jmh.annotations.State;
   import org.openjdk.jmh.annotations.TearDown;
   import org.openjdk.jmh.annotations.Threads;
   import org.openjdk.jmh.annotations.Warmup;
   import org.openjdk.jmh.infra.Blackhole;
   
   @BenchmarkMode(Mode.Throughput)
   @OutputTimeUnit(TimeUnit.MILLISECONDS)
   @State(Scope.Benchmark)
   @Warmup(iterations = 3, time = 3)
   @Measurement(iterations = 5, time = 5)
   @Fork(
           value = 2,
           jvmArgsPrepend = {"--enable-native-access=ALL-UNNAMED", "-Xms2g", 
"-Xmx2g"})
   public class RandomReadIOBenchmark {
   
       private static final int READ_SIZE = 16 * 1024; // 16 KiB
       private static final int READS_PER_OP = 16;
       // O_DIRECT requires offsets aligned to filesystem block size
       private static final long ALIGNMENT = 4096;
   
       /**
        * File size in MiB. Read from env var BENCH_FILE_SIZE_MIB or system 
property bench.fileSizeMiB.
        * Env var takes precedence (inherited by forked JVMs automatically).
        */
       private static final long FILE_SIZE =
               Long.parseLong(envOrProp("BENCH_FILE_SIZE_MIB", 
"bench.fileSizeMiB", "1024"))
                       * 1024L
                       * 1024L;
       private static final long MAX_OFFSET = FILE_SIZE - READ_SIZE;
       private static final long MAX_ALIGNED_OFFSET = (MAX_OFFSET / ALIGNMENT) 
* ALIGNMENT;
   
       // FFI handles for pread / open / close
       private static final MethodHandle PREAD;
       private static final MethodHandle OPEN;
       private static final MethodHandle CLOSE;
   
       static {
           Linker linker = Linker.nativeLinker();
           SymbolLookup lookup = linker.defaultLookup();
   
           PREAD =
                   linker.downcallHandle(
                           lookup.find("pread").orElseThrow(),
                           FunctionDescriptor.of(
                                   ValueLayout.JAVA_LONG, // ssize_t return
                                   ValueLayout.JAVA_INT, // int fd
                                   ValueLayout.ADDRESS, // void *buf
                                   ValueLayout.JAVA_LONG, // size_t count
                                   ValueLayout.JAVA_LONG // off_t offset
                           ));
   
           OPEN =
                   linker.downcallHandle(
                           lookup.find("open").orElseThrow(),
                           FunctionDescriptor.of(
                                   ValueLayout.JAVA_INT, // int return (fd)
                                   ValueLayout.ADDRESS, // const char *pathname
                                   ValueLayout.JAVA_INT // int flags
                           ));
   
           CLOSE =
                   linker.downcallHandle(
                           lookup.find("close").orElseThrow(),
                           FunctionDescriptor.of(
                                   ValueLayout.JAVA_INT, // int return
                                   ValueLayout.JAVA_INT // int fd
                           ));
       }
   
       /** Per-thread pre-allocated buffers to avoid allocation noise in the 
measured path. */
       @State(Scope.Thread)
       public static class ThreadBuffers {
           ByteBuffer directBuf;
           ByteBuffer heapBuf;
           Arena ffiArena;
           MemorySegment ffiBuf;
           MemorySegment ffiDirectIoBuf;
   
           @Setup(Level.Trial)
           public void setup() {
               directBuf = ByteBuffer.allocateDirect(READ_SIZE);
               heapBuf = ByteBuffer.allocate(READ_SIZE);
               ffiArena = Arena.ofConfined();
               ffiBuf = ffiArena.allocate(READ_SIZE);
               // O_DIRECT requires buffer aligned to filesystem block size 
(typically 4096)
               ffiDirectIoBuf = ffiArena.allocate(READ_SIZE, 4096);
           }
   
           @TearDown(Level.Trial)
           public void tearDown() {
               ffiArena.close();
           }
       }
   
       private Path tempFile;
       private FileChannel fileChannel;
       private MemorySegment mmapSegment;
       private int nativeFd;
       private int directIoFd;
       private Arena arena;
   
       /**
        * Path to the benchmark data file. Create it before running with:
        *
        * <pre>
        * dd if=/dev/urandom of=/tmp/pread-bench.dat bs=1M count=1024
        * </pre>
        */
       private static final String BENCH_FILE =
               envOrProp("BENCH_FILE", "bench.file", "/tmp/pread-bench.dat");
   
       /**
        * Whether to drop page caches before each iteration. Requires root or 
sudo without password.
        * Pass via env var BENCH_DROP_CACHES=true or -Dbench.dropCaches=true
        *
        * <p>When enabled, caches are dropped at the start of each iteration 
(warmup and measurement).
        * JIT still warms up across iterations since the JVM persists, but each 
iteration starts with a
        * cold page cache — simulating memory-constrained containers.
        */
       private static final boolean DROP_CACHES =
               Boolean.parseBoolean(envOrProp("BENCH_DROP_CACHES", 
"bench.dropCaches", "false"));
   
       @Setup(Level.Trial)
       public void setup() throws Exception {
           System.out.println("[bench] ===== RandomReadIOBenchmark 
Configuration =====");
           System.out.println("[bench]   file:           " + BENCH_FILE);
           System.out.println("[bench]   fileSizeMiB:    " + (FILE_SIZE / (1024 
* 1024)));
           System.out.println("[bench]   dropCaches:     " + DROP_CACHES);
           System.out.println("[bench]   readSize:       " + READ_SIZE + " 
bytes");
           System.out.println("[bench]   readsPerOp:     " + READS_PER_OP);
           System.out.println("[bench] 
================================================");
   
           tempFile = Path.of(BENCH_FILE);
           if (!Files.exists(tempFile)) {
               throw new IOException(
                       "Benchmark file not found: "
                               + tempFile
                               + "\nCreate it with: dd if=/dev/urandom of="
                               + BENCH_FILE
                               + " bs=1M count="
                               + (FILE_SIZE / (1024 * 1024)));
           }
           long size = Files.size(tempFile);
           if (size < FILE_SIZE) {
               throw new IOException(
                       "Benchmark file too small: "
                               + size
                               + " bytes, expected at least "
                               + FILE_SIZE
                               + "\nRecreate with: dd if=/dev/urandom of="
                               + BENCH_FILE
                               + " bs=1M count="
                               + (FILE_SIZE / (1024 * 1024)));
           }
   
           // Open FileChannel for the benchmark
           fileChannel = FileChannel.open(tempFile, StandardOpenOption.READ);
   
           // Open native fd via FFI
           arena = Arena.ofShared();
   
           // Memory-map the file (simulates MMapDirectory)
           mmapSegment = fileChannel.map(MapMode.READ_ONLY, 0, FILE_SIZE, 
arena);
   
           MemorySegment pathStr = arena.allocateFrom(tempFile.toString());
           int O_RDONLY = 0;
           try {
               nativeFd = (int) OPEN.invokeExact(pathStr, O_RDONLY);
           } catch (Throwable t) {
               throw new RuntimeException("Failed to open file via FFI", t);
           }
           if (nativeFd < 0) {
               throw new IOException("FFI open() returned " + nativeFd);
           }
   
           // Open native fd with O_DIRECT for Direct I/O (Linux only, bypasses 
page cache)
           int O_DIRECT = 0x4000; // Linux x86_64 value for O_DIRECT
           try {
               directIoFd = (int) OPEN.invokeExact(pathStr, O_RDONLY | 
O_DIRECT);
           } catch (Throwable t) {
               throw new RuntimeException("Failed to open file with O_DIRECT 
via FFI", t);
           }
           if (directIoFd < 0) {
               // O_DIRECT may not be supported on all filesystems (e.g. tmpfs)
               System.err.println(
                       "WARNING: O_DIRECT open failed (fd=" + directIoFd + "). "
                               + "Direct I/O benchmarks will fail. Use a 
filesystem that supports O_DIRECT.");
               directIoFd = -1;
           }
       }
   
       @TearDown(Level.Trial)
       public void tearDown() throws Exception {
           fileChannel.close();
           try {
               int rc = (int) CLOSE.invokeExact(nativeFd);
               if (directIoFd >= 0) {
                   rc = (int) CLOSE.invokeExact(directIoFd);
               }
           } catch (Throwable t) {
               throw new RuntimeException(t);
           }
           arena.close();
       }
   
       /**
        * Drops page caches before each iteration (warmup and measurement).
        * This ensures each iteration starts with a cold page cache.
        * JIT still warms up across iterations since the JVM persists across 
the fork.
        */
       @Setup(Level.Iteration)
       public void setupIteration() throws IOException {
           if (DROP_CACHES) {
               dropPageCaches();
           }
       }
   
       /**
        * Drops the kernel page cache to simulate cold-cache / 
memory-constrained scenarios.
        * Requires running as root or with passwordless sudo.
        * Uses: sync && echo 3 > /proc/sys/vm/drop_caches
        */
       private static void dropPageCaches() throws IOException {
           // Sync first to flush dirty pages
           Process sync = new ProcessBuilder("sync").inheritIO().start();
           try {
               if (sync.waitFor() != 0) {
                   throw new IOException("sync failed with exit code " + 
sync.exitValue());
               }
           } catch (InterruptedException e) {
               Thread.currentThread().interrupt();
               throw new IOException("Interrupted during sync", e);
           }
   
           // Drop page cache (requires root)
           Process drop =
                   new ProcessBuilder("sudo", "bash", "-c", "echo 3 > 
/proc/sys/vm/drop_caches")
                           .inheritIO()
                           .start();
           try {
               if (drop.waitFor() != 0) {
                   throw new IOException(
                           "Failed to drop page caches (exit code "
                                   + drop.exitValue()
                                   + "). Run as root or with: sudo sysctl 
vm.drop_caches=3");
               }
           } catch (InterruptedException e) {
               Thread.currentThread().interrupt();
               throw new IOException("Interrupted during drop_caches", e);
           }
           System.out.println("[bench] Page caches dropped.");
       }
   
       /** Reads a config value from env var first, then system property, then 
default. */
       private static String envOrProp(String envKey, String propKey, String 
defaultValue) {
           String env = System.getenv(envKey);
           if (env != null && !env.isEmpty()) {
               return env;
           }
           return System.getProperty(propKey, defaultValue);
       }
   
       // ---- FileChannel + DirectByteBuffer (contended NativeThreadSet) ----
   
       @Benchmark
       @Threads(1)
       public void fileChannelDirect_T01(ThreadBuffers tb, Blackhole bh) throws 
IOException {
           doFileChannelDirectReads(tb, bh);
       }
   
       @Benchmark
       @Threads(4)
       public void fileChannelDirect_T04(ThreadBuffers tb, Blackhole bh) throws 
IOException {
           doFileChannelDirectReads(tb, bh);
       }
   
       @Benchmark
       @Threads(8)
       public void fileChannelDirect_T08(ThreadBuffers tb, Blackhole bh) throws 
IOException {
           doFileChannelDirectReads(tb, bh);
       }
   
       @Benchmark
       @Threads(16)
       public void fileChannelDirect_T16(ThreadBuffers tb, Blackhole bh) throws 
IOException {
           doFileChannelDirectReads(tb, bh);
       }
   
       // ---- FileChannel + HeapByteBuffer (extra copy + contended 
NativeThreadSet) ----
   
       @Benchmark
       @Threads(1)
       public void fileChannelHeap_T01(ThreadBuffers tb, Blackhole bh) throws 
IOException {
           doFileChannelHeapReads(tb, bh);
       }
   
       @Benchmark
       @Threads(4)
       public void fileChannelHeap_T04(ThreadBuffers tb, Blackhole bh) throws 
IOException {
           doFileChannelHeapReads(tb, bh);
       }
   
       @Benchmark
       @Threads(8)
       public void fileChannelHeap_T08(ThreadBuffers tb, Blackhole bh) throws 
IOException {
           doFileChannelHeapReads(tb, bh);
       }
   
       @Benchmark
       @Threads(16)
       public void fileChannelHeap_T16(ThreadBuffers tb, Blackhole bh) throws 
IOException {
           doFileChannelHeapReads(tb, bh);
       }
   
       // ---- FFI pread benchmark (no contention) ----
   
       @Benchmark
       @Threads(1)
       public void ffiPread_T01(ThreadBuffers tb, Blackhole bh) {
           doFfiReads(tb, bh);
       }
   
       @Benchmark
       @Threads(4)
       public void ffiPread_T04(ThreadBuffers tb, Blackhole bh) {
           doFfiReads(tb, bh);
       }
   
       @Benchmark
       @Threads(8)
       public void ffiPread_T08(ThreadBuffers tb, Blackhole bh) {
           doFfiReads(tb, bh);
       }
   
       @Benchmark
       @Threads(16)
       public void ffiPread_T16(ThreadBuffers tb, Blackhole bh) {
           doFfiReads(tb, bh);
       }
   
       // ---- mmap benchmark (simulates MMapDirectory — page faults under 
memory pressure) ----
   
       @Benchmark
       @Threads(1)
       public void mmap_T01(ThreadBuffers tb, Blackhole bh) {
           doMmapReads(tb, bh);
       }
   
       @Benchmark
       @Threads(4)
       public void mmap_T04(ThreadBuffers tb, Blackhole bh) {
           doMmapReads(tb, bh);
       }
   
       @Benchmark
       @Threads(8)
       public void mmap_T08(ThreadBuffers tb, Blackhole bh) {
           doMmapReads(tb, bh);
       }
   
       @Benchmark
       @Threads(16)
       public void mmap_T16(ThreadBuffers tb, Blackhole bh) {
           doMmapReads(tb, bh);
       }
   
       // ---- FFI pread + O_DIRECT benchmark (bypasses page cache, Linux only) 
----
   
       @Benchmark
       @Threads(1)
       public void ffiPreadDirectIO_T01(ThreadBuffers tb, Blackhole bh) {
           doFfiDirectIoReads(tb, bh);
       }
   
       @Benchmark
       @Threads(4)
       public void ffiPreadDirectIO_T04(ThreadBuffers tb, Blackhole bh) {
           doFfiDirectIoReads(tb, bh);
       }
   
       @Benchmark
       @Threads(8)
       public void ffiPreadDirectIO_T08(ThreadBuffers tb, Blackhole bh) {
           doFfiDirectIoReads(tb, bh);
       }
   
       @Benchmark
       @Threads(16)
       public void ffiPreadDirectIO_T16(ThreadBuffers tb, Blackhole bh) {
           doFfiDirectIoReads(tb, bh);
       }
   
       // ---- Implementation ----
   
       private void doFileChannelDirectReads(ThreadBuffers tb, Blackhole bh) 
throws IOException {
           ThreadLocalRandom rng = ThreadLocalRandom.current();
           ByteBuffer buf = tb.directBuf;
           for (int i = 0; i < READS_PER_OP; i++) {
               long offset = rng.nextLong(MAX_OFFSET);
               buf.clear();
               int n = fileChannel.read(buf, offset);
               bh.consume(n);
           }
       }
   
       private void doFileChannelHeapReads(ThreadBuffers tb, Blackhole bh) 
throws IOException {
           ThreadLocalRandom rng = ThreadLocalRandom.current();
           ByteBuffer buf = tb.heapBuf;
           for (int i = 0; i < READS_PER_OP; i++) {
               long offset = rng.nextLong(MAX_OFFSET);
               buf.clear();
               int n = fileChannel.read(buf, offset);
               bh.consume(n);
           }
       }
   
       private void doFfiReads(ThreadBuffers tb, Blackhole bh) {
           ThreadLocalRandom rng = ThreadLocalRandom.current();
           MemorySegment buf = tb.ffiBuf;
           try {
               for (int i = 0; i < READS_PER_OP; i++) {
                   long offset = rng.nextLong(MAX_OFFSET);
                   long n = (long) PREAD.invokeExact(nativeFd, buf, (long) 
READ_SIZE, offset);
                   bh.consume(n);
               }
           } catch (Throwable t) {
               throw new RuntimeException(t);
           }
       }
   
       private void doMmapReads(ThreadBuffers tb, Blackhole bh) {
           ThreadLocalRandom rng = ThreadLocalRandom.current();
           byte[] dst = tb.heapBuf.array();
           for (int i = 0; i < READS_PER_OP; i++) {
               long offset = rng.nextLong(MAX_OFFSET);
               // Copy from mmap'd region into a byte array — this is what 
MMapDirectory does.
               // If the page is not resident, this triggers a page fault 
(major fault if evicted).
               MemorySegment.copy(mmapSegment, ValueLayout.JAVA_BYTE, offset, 
dst, 0, READ_SIZE);
               bh.consume(dst[0]);
           }
       }
   
       private void doFfiDirectIoReads(ThreadBuffers tb, Blackhole bh) {
           if (directIoFd < 0) {
               // O_DIRECT not available on this filesystem — skip silently
               bh.consume(0);
               return;
           }
           ThreadLocalRandom rng = ThreadLocalRandom.current();
           MemorySegment buf = tb.ffiDirectIoBuf;
           try {
               for (int i = 0; i < READS_PER_OP; i++) {
                   // O_DIRECT requires aligned offset; generate random aligned 
position
                   long offset = (rng.nextLong(MAX_ALIGNED_OFFSET / ALIGNMENT)) 
* ALIGNMENT;
                   long n = (long) PREAD.invokeExact(directIoFd, buf, (long) 
READ_SIZE, offset);
                   bh.consume(n);
               }
           } catch (Throwable t) {
               throw new RuntimeException(t);
           }
       }
   }
   ```
   </details>
   
   <details>
   <summary>SequentialReadIOBenchmark.java</summary>
   
   ```java
   import java.io.IOException;
   import java.lang.foreign.Arena;
   import java.lang.foreign.FunctionDescriptor;
   import java.lang.foreign.Linker;
   import java.lang.foreign.MemorySegment;
   import java.lang.foreign.SymbolLookup;
   import java.lang.foreign.ValueLayout;
   import java.lang.invoke.MethodHandle;
   import java.nio.ByteBuffer;
   import java.nio.channels.FileChannel;
   import java.nio.channels.FileChannel.MapMode;
   import java.nio.file.Files;
   import java.nio.file.Path;
   import java.nio.file.StandardOpenOption;
   import java.util.concurrent.ThreadLocalRandom;
   import java.util.concurrent.TimeUnit;
   import org.openjdk.jmh.annotations.Benchmark;
   import org.openjdk.jmh.annotations.BenchmarkMode;
   import org.openjdk.jmh.annotations.Fork;
   import org.openjdk.jmh.annotations.Level;
   import org.openjdk.jmh.annotations.Measurement;
   import org.openjdk.jmh.annotations.Mode;
   import org.openjdk.jmh.annotations.OutputTimeUnit;
   import org.openjdk.jmh.annotations.Scope;
   import org.openjdk.jmh.annotations.Setup;
   import org.openjdk.jmh.annotations.State;
   import org.openjdk.jmh.annotations.TearDown;
   import org.openjdk.jmh.annotations.Threads;
   import org.openjdk.jmh.annotations.Warmup;
   import org.openjdk.jmh.infra.Blackhole;
   
   @BenchmarkMode(Mode.Throughput)
   @OutputTimeUnit(TimeUnit.MILLISECONDS)
   @State(Scope.Benchmark)
   @Warmup(iterations = 3, time = 3)
   @Measurement(iterations = 5, time = 5)
   @Fork(
           value = 2,
           jvmArgsPrepend = {"--enable-native-access=ALL-UNNAMED", "-Xms2g", 
"-Xmx2g"})
   public class SequentialReadIOBenchmark {
   
       private static final int READ_SIZE = 16 * 1024; // 16 KiB
       private static final int READS_PER_OP = 16;
       // O_DIRECT requires offsets aligned to filesystem block size
       private static final long ALIGNMENT = 4096;
   
       /**
        * File size in MiB. Read from env var BENCH_FILE_SIZE_MIB or system 
property bench.fileSizeMiB.
        * Env var takes precedence (inherited by forked JVMs automatically).
        */
       private static final long FILE_SIZE =
               Long.parseLong(envOrProp("BENCH_FILE_SIZE_MIB", 
"bench.fileSizeMiB", "1024"))
                       * 1024L
                       * 1024L;
       // Must leave room for 16 sequential reads from the starting offset
       private static final long MAX_START_OFFSET = FILE_SIZE - ((long) 
READ_SIZE * READS_PER_OP);
       private static final long MAX_ALIGNED_START = (MAX_START_OFFSET / 
ALIGNMENT) * ALIGNMENT;
   
       // FFI handles for pread / open / close
       private static final MethodHandle PREAD;
       private static final MethodHandle OPEN;
       private static final MethodHandle CLOSE;
   
       static {
           Linker linker = Linker.nativeLinker();
           SymbolLookup lookup = linker.defaultLookup();
   
           PREAD =
                   linker.downcallHandle(
                           lookup.find("pread").orElseThrow(),
                           FunctionDescriptor.of(
                                   ValueLayout.JAVA_LONG, // ssize_t return
                                   ValueLayout.JAVA_INT, // int fd
                                   ValueLayout.ADDRESS, // void *buf
                                   ValueLayout.JAVA_LONG, // size_t count
                                   ValueLayout.JAVA_LONG // off_t offset
                           ));
   
           OPEN =
                   linker.downcallHandle(
                           lookup.find("open").orElseThrow(),
                           FunctionDescriptor.of(
                                   ValueLayout.JAVA_INT, // int return (fd)
                                   ValueLayout.ADDRESS, // const char *pathname
                                   ValueLayout.JAVA_INT // int flags
                           ));
   
           CLOSE =
                   linker.downcallHandle(
                           lookup.find("close").orElseThrow(),
                           FunctionDescriptor.of(
                                   ValueLayout.JAVA_INT, // int return
                                   ValueLayout.JAVA_INT // int fd
                           ));
       }
   
       /** Per-thread pre-allocated buffers to avoid allocation noise in the 
measured path. */
       @State(Scope.Thread)
       public static class ThreadBuffers {
           ByteBuffer directBuf;
           ByteBuffer heapBuf;
           Arena ffiArena;
           MemorySegment ffiBuf;
           MemorySegment ffiDirectIoBuf;
   
           @Setup(Level.Trial)
           public void setup() {
               directBuf = ByteBuffer.allocateDirect(READ_SIZE);
               heapBuf = ByteBuffer.allocate(READ_SIZE);
               ffiArena = Arena.ofConfined();
               ffiBuf = ffiArena.allocate(READ_SIZE);
               // O_DIRECT requires buffer aligned to filesystem block size 
(typically 4096)
               ffiDirectIoBuf = ffiArena.allocate(READ_SIZE, 4096);
           }
   
           @TearDown(Level.Trial)
           public void tearDown() {
               ffiArena.close();
           }
       }
   
       private Path tempFile;
       private FileChannel fileChannel;
       private MemorySegment mmapSegment;
       private int nativeFd;
       private int directIoFd;
       private Arena arena;
   
       /**
        * Path to the benchmark data file. Create it before running with:
        *
        * <pre>
        * dd if=/dev/urandom of=/tmp/pread-bench.dat bs=1M count=1024
        * </pre>
        */
       private static final String BENCH_FILE =
               envOrProp("BENCH_FILE", "bench.file", "/tmp/pread-bench.dat");
   
       /**
        * Whether to drop page caches before each iteration. Requires root or 
sudo without password.
        * Pass via env var BENCH_DROP_CACHES=true or -Dbench.dropCaches=true
        *
        * <p>When enabled, caches are dropped at the start of each iteration 
(warmup and measurement).
        * JIT still warms up across iterations since the JVM persists, but each 
iteration starts with a
        * cold page cache — simulating memory-constrained containers.
        */
       private static final boolean DROP_CACHES =
               Boolean.parseBoolean(envOrProp("BENCH_DROP_CACHES", 
"bench.dropCaches", "false"));
   
       @Setup(Level.Trial)
       public void setup() throws Exception {
           System.out.println("[bench] ===== SequentialReadIOBenchmark 
Configuration =====");
           System.out.println("[bench]   file:           " + BENCH_FILE);
           System.out.println("[bench]   fileSizeMiB:    " + (FILE_SIZE / (1024 
* 1024)));
           System.out.println("[bench]   dropCaches:     " + DROP_CACHES);
           System.out.println("[bench]   readSize:       " + READ_SIZE + " 
bytes");
           System.out.println("[bench]   readsPerOp:     " + READS_PER_OP + " 
(sequential)");
           System.out.println("[bench]   bytesPerOp:     " + ((long) READ_SIZE 
* READS_PER_OP) + " bytes");
           System.out.println("[bench] 
====================================================");
   
           tempFile = Path.of(BENCH_FILE);
           if (!Files.exists(tempFile)) {
               throw new IOException(
                       "Benchmark file not found: "
                               + tempFile
                               + "\nCreate it with: dd if=/dev/urandom of="
                               + BENCH_FILE
                               + " bs=1M count="
                               + (FILE_SIZE / (1024 * 1024)));
           }
           long size = Files.size(tempFile);
           if (size < FILE_SIZE) {
               throw new IOException(
                       "Benchmark file too small: "
                               + size
                               + " bytes, expected at least "
                               + FILE_SIZE
                               + "\nRecreate with: dd if=/dev/urandom of="
                               + BENCH_FILE
                               + " bs=1M count="
                               + (FILE_SIZE / (1024 * 1024)));
           }
   
           // Open FileChannel for the benchmark
           fileChannel = FileChannel.open(tempFile, StandardOpenOption.READ);
   
           // Open native fd via FFI
           arena = Arena.ofShared();
   
           // Memory-map the file (simulates MMapDirectory)
           mmapSegment = fileChannel.map(MapMode.READ_ONLY, 0, FILE_SIZE, 
arena);
   
           MemorySegment pathStr = arena.allocateFrom(tempFile.toString());
           int O_RDONLY = 0;
           try {
               nativeFd = (int) OPEN.invokeExact(pathStr, O_RDONLY);
           } catch (Throwable t) {
               throw new RuntimeException("Failed to open file via FFI", t);
           }
           if (nativeFd < 0) {
               throw new IOException("FFI open() returned " + nativeFd);
           }
   
           // Open native fd with O_DIRECT for Direct I/O (Linux only, bypasses 
page cache)
           int O_DIRECT = 0x4000; // Linux x86_64 value for O_DIRECT
           try {
               directIoFd = (int) OPEN.invokeExact(pathStr, O_RDONLY | 
O_DIRECT);
           } catch (Throwable t) {
               throw new RuntimeException("Failed to open file with O_DIRECT 
via FFI", t);
           }
           if (directIoFd < 0) {
               // O_DIRECT may not be supported on all filesystems (e.g. tmpfs)
               System.err.println(
                       "WARNING: O_DIRECT open failed (fd=" + directIoFd + "). "
                               + "Direct I/O benchmarks will fail. Use a 
filesystem that supports O_DIRECT.");
               directIoFd = -1;
           }
       }
   
       @TearDown(Level.Trial)
       public void tearDown() throws Exception {
           fileChannel.close();
           try {
               int rc = (int) CLOSE.invokeExact(nativeFd);
               if (directIoFd >= 0) {
                   rc = (int) CLOSE.invokeExact(directIoFd);
               }
           } catch (Throwable t) {
               throw new RuntimeException(t);
           }
           arena.close();
       }
   
       /**
        * Drops page caches before each iteration (warmup and measurement).
        * This ensures each iteration starts with a cold page cache.
        * JIT still warms up across iterations since the JVM persists across 
the fork.
        */
       @Setup(Level.Iteration)
       public void setupIteration() throws IOException {
           if (DROP_CACHES) {
               dropPageCaches();
           }
       }
   
       /**
        * Drops the kernel page cache to simulate cold-cache / 
memory-constrained scenarios.
        * Requires running as root or with passwordless sudo.
        * Uses: sync && echo 3 > /proc/sys/vm/drop_caches
        */
       private static void dropPageCaches() throws IOException {
           // Sync first to flush dirty pages
           Process sync = new ProcessBuilder("sync").inheritIO().start();
           try {
               if (sync.waitFor() != 0) {
                   throw new IOException("sync failed with exit code " + 
sync.exitValue());
               }
           } catch (InterruptedException e) {
               Thread.currentThread().interrupt();
               throw new IOException("Interrupted during sync", e);
           }
   
           // Drop page cache (requires root)
           Process drop =
                   new ProcessBuilder("sudo", "bash", "-c", "echo 3 > 
/proc/sys/vm/drop_caches")
                           .inheritIO()
                           .start();
           try {
               if (drop.waitFor() != 0) {
                   throw new IOException(
                           "Failed to drop page caches (exit code "
                                   + drop.exitValue()
                                   + "). Run as root or with: sudo sysctl 
vm.drop_caches=3");
               }
           } catch (InterruptedException e) {
               Thread.currentThread().interrupt();
               throw new IOException("Interrupted during drop_caches", e);
           }
           System.out.println("[bench] Page caches dropped.");
       }
   
       /** Reads a config value from env var first, then system property, then 
default. */
       private static String envOrProp(String envKey, String propKey, String 
defaultValue) {
           String env = System.getenv(envKey);
           if (env != null && !env.isEmpty()) {
               return env;
           }
           return System.getProperty(propKey, defaultValue);
       }
   
       // ---- FileChannel + DirectByteBuffer (contended NativeThreadSet) ----
   
       @Benchmark
       @Threads(1)
       public void fileChannelDirect_T01(ThreadBuffers tb, Blackhole bh) throws 
IOException {
           doFileChannelDirectReads(tb, bh);
       }
   
       @Benchmark
       @Threads(4)
       public void fileChannelDirect_T04(ThreadBuffers tb, Blackhole bh) throws 
IOException {
           doFileChannelDirectReads(tb, bh);
       }
   
       @Benchmark
       @Threads(8)
       public void fileChannelDirect_T08(ThreadBuffers tb, Blackhole bh) throws 
IOException {
           doFileChannelDirectReads(tb, bh);
       }
   
       @Benchmark
       @Threads(16)
       public void fileChannelDirect_T16(ThreadBuffers tb, Blackhole bh) throws 
IOException {
           doFileChannelDirectReads(tb, bh);
       }
   
       // ---- FileChannel + HeapByteBuffer (extra copy + contended 
NativeThreadSet) ----
   
       @Benchmark
       @Threads(1)
       public void fileChannelHeap_T01(ThreadBuffers tb, Blackhole bh) throws 
IOException {
           doFileChannelHeapReads(tb, bh);
       }
   
       @Benchmark
       @Threads(4)
       public void fileChannelHeap_T04(ThreadBuffers tb, Blackhole bh) throws 
IOException {
           doFileChannelHeapReads(tb, bh);
       }
   
       @Benchmark
       @Threads(8)
       public void fileChannelHeap_T08(ThreadBuffers tb, Blackhole bh) throws 
IOException {
           doFileChannelHeapReads(tb, bh);
       }
   
       @Benchmark
       @Threads(16)
       public void fileChannelHeap_T16(ThreadBuffers tb, Blackhole bh) throws 
IOException {
           doFileChannelHeapReads(tb, bh);
       }
   
       // ---- FFI pread benchmark (no contention) ----
   
       @Benchmark
       @Threads(1)
       public void ffiPread_T01(ThreadBuffers tb, Blackhole bh) {
           doFfiReads(tb, bh);
       }
   
       @Benchmark
       @Threads(4)
       public void ffiPread_T04(ThreadBuffers tb, Blackhole bh) {
           doFfiReads(tb, bh);
       }
   
       @Benchmark
       @Threads(8)
       public void ffiPread_T08(ThreadBuffers tb, Blackhole bh) {
           doFfiReads(tb, bh);
       }
   
       @Benchmark
       @Threads(16)
       public void ffiPread_T16(ThreadBuffers tb, Blackhole bh) {
           doFfiReads(tb, bh);
       }
   
       // ---- mmap benchmark (simulates MMapDirectory — page faults under 
memory pressure) ----
   
       @Benchmark
       @Threads(1)
       public void mmap_T01(ThreadBuffers tb, Blackhole bh) {
           doMmapReads(tb, bh);
       }
   
       @Benchmark
       @Threads(4)
       public void mmap_T04(ThreadBuffers tb, Blackhole bh) {
           doMmapReads(tb, bh);
       }
   
       @Benchmark
       @Threads(8)
       public void mmap_T08(ThreadBuffers tb, Blackhole bh) {
           doMmapReads(tb, bh);
       }
   
       @Benchmark
       @Threads(16)
       public void mmap_T16(ThreadBuffers tb, Blackhole bh) {
           doMmapReads(tb, bh);
       }
   
       // ---- FFI pread + O_DIRECT benchmark (bypasses page cache, Linux only) 
----
   
       @Benchmark
       @Threads(1)
       public void ffiPreadDirectIO_T01(ThreadBuffers tb, Blackhole bh) {
           doFfiDirectIoReads(tb, bh);
       }
   
       @Benchmark
       @Threads(4)
       public void ffiPreadDirectIO_T04(ThreadBuffers tb, Blackhole bh) {
           doFfiDirectIoReads(tb, bh);
       }
   
       @Benchmark
       @Threads(8)
       public void ffiPreadDirectIO_T08(ThreadBuffers tb, Blackhole bh) {
           doFfiDirectIoReads(tb, bh);
       }
   
       @Benchmark
       @Threads(16)
       public void ffiPreadDirectIO_T16(ThreadBuffers tb, Blackhole bh) {
           doFfiDirectIoReads(tb, bh);
       }
   
       // ---- Implementation: sequential reads (random start, then scan 
forward) ----
   
       private void doFileChannelDirectReads(ThreadBuffers tb, Blackhole bh) 
throws IOException {
           ThreadLocalRandom rng = ThreadLocalRandom.current();
           ByteBuffer buf = tb.directBuf;
           long startOffset = rng.nextLong(MAX_START_OFFSET);
           for (int i = 0; i < READS_PER_OP; i++) {
               buf.clear();
               int n = fileChannel.read(buf, startOffset + (long) i * 
READ_SIZE);
               bh.consume(n);
           }
       }
   
       private void doFileChannelHeapReads(ThreadBuffers tb, Blackhole bh) 
throws IOException {
           ThreadLocalRandom rng = ThreadLocalRandom.current();
           ByteBuffer buf = tb.heapBuf;
           long startOffset = rng.nextLong(MAX_START_OFFSET);
           for (int i = 0; i < READS_PER_OP; i++) {
               buf.clear();
               int n = fileChannel.read(buf, startOffset + (long) i * 
READ_SIZE);
               bh.consume(n);
           }
       }
   
       private void doFfiReads(ThreadBuffers tb, Blackhole bh) {
           ThreadLocalRandom rng = ThreadLocalRandom.current();
           MemorySegment buf = tb.ffiBuf;
           long startOffset = rng.nextLong(MAX_START_OFFSET);
           try {
               for (int i = 0; i < READS_PER_OP; i++) {
                   long n =
                           (long)
                                   PREAD.invokeExact(
                                           nativeFd, buf, (long) READ_SIZE, 
startOffset + (long) i * READ_SIZE);
                   bh.consume(n);
               }
           } catch (Throwable t) {
               throw new RuntimeException(t);
           }
       }
   
       private void doMmapReads(ThreadBuffers tb, Blackhole bh) {
           ThreadLocalRandom rng = ThreadLocalRandom.current();
           byte[] dst = tb.heapBuf.array();
           long startOffset = rng.nextLong(MAX_START_OFFSET);
           for (int i = 0; i < READS_PER_OP; i++) {
               // Copy from mmap'd region into a byte array — this is what 
MMapDirectory does.
               // If the page is not resident, this triggers a page fault 
(major fault if evicted).
               MemorySegment.copy(
                       mmapSegment, ValueLayout.JAVA_BYTE, startOffset + (long) 
i * READ_SIZE, dst, 0,
                       READ_SIZE);
               bh.consume(dst[0]);
           }
       }
   
       private void doFfiDirectIoReads(ThreadBuffers tb, Blackhole bh) {
           if (directIoFd < 0) {
               // O_DIRECT not available on this filesystem — skip silently
               bh.consume(0);
               return;
           }
           ThreadLocalRandom rng = ThreadLocalRandom.current();
           MemorySegment buf = tb.ffiDirectIoBuf;
           // Align start offset for O_DIRECT
           long startOffset = (rng.nextLong(MAX_ALIGNED_START / ALIGNMENT)) * 
ALIGNMENT;
           try {
               for (int i = 0; i < READS_PER_OP; i++) {
                   long n =
                           (long)
                                   PREAD.invokeExact(
                                           directIoFd, buf, (long) READ_SIZE, 
startOffset + (long) i * READ_SIZE);
                   bh.consume(n);
               }
           } catch (Throwable t) {
               throw new RuntimeException(t);
           }
       }
   }
   ```
   </details>
   
   
   ### Appendix 2. Running steps and logs
   
   <details>
   <summary>Running commands</summary>
   
   ```bash
   #!/bin/bash
   
   sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
   
   cat /home/ec2-user/environment/data/pread-bench-16G.dat > /dev/null
   
   BENCH_FILE=/home/ec2-user/environment/data/pread-bench-16G.dat 
BENCH_FILE_SIZE_MIB=16384 BENCH_DROP_CACHES=false ./gradlew jmh --rerun 
-Pjmh.includes='RandomReadIOBenchmark' >> 0517.log
   
   sleep 30
   
   sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
   
   cat /home/ec2-user/environment/data/pread-bench-32G.dat > /dev/null
   
   BENCH_FILE=/home/ec2-user/environment/data/pread-bench-32G.dat 
BENCH_FILE_SIZE_MIB=32768 BENCH_DROP_CACHES=false ./gradlew jmh --rerun 
-Pjmh.includes='RandomReadIOBenchmark' >> 0517.log
   
   sleep 30
   
   sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
   
   cat /home/ec2-user/environment/data/pread-bench-64G.dat > /dev/null
   
   BENCH_FILE=/home/ec2-user/environment/data/pread-bench-64G.dat 
BENCH_FILE_SIZE_MIB=65536 BENCH_DROP_CACHES=false ./gradlew jmh --rerun 
-Pjmh.includes='RandomReadIOBenchmark' >> 0517.log
   
   
   sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
   
   cat /home/ec2-user/environment/data/pread-bench-16G.dat > /dev/null
   
   BENCH_FILE=/home/ec2-user/environment/data/pread-bench-16G.dat 
BENCH_FILE_SIZE_MIB=16384 BENCH_DROP_CACHES=false ./gradlew jmh --rerun 
-Pjmh.includes='SequentialReadIOBenchmark' >> 0517.log
   
   sleep 30
   
   sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
   
   cat /home/ec2-user/environment/data/pread-bench-32G.dat > /dev/null
   
   BENCH_FILE=/home/ec2-user/environment/data/pread-bench-32G.dat 
BENCH_FILE_SIZE_MIB=32768 BENCH_DROP_CACHES=false ./gradlew jmh --rerun 
-Pjmh.includes='SequentialReadIOBenchmark' >> 0517.log
   
   sleep 30
   
   sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
   
   cat /home/ec2-user/environment/data/pread-bench-64G.dat > /dev/null
   
   BENCH_FILE=/home/ec2-user/environment/data/pread-bench-64G.dat 
BENCH_FILE_SIZE_MIB=65536 BENCH_DROP_CACHES=false ./gradlew jmh --rerun 
-Pjmh.includes='SequentialReadIOBenchmark' >> 0517.log
   ```
   </details>
   
   Full log can be found 
[here](https://neoremind.com/report/log/lucene/issue-16044/0517.log).
   
   
   ### Appendix 2. Raw JMH benchmark results
   
   <details>
   <summary>Results</summary>
   
   ## Random Read IO benchmark
   
   ## Warmup cache as much as possible
   
   ### 16G file random read
   
   - Command
   ```
   sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
   
   cat /home/ec2-user/environment/data/pread-bench-16G.dat > /dev/null
   
   BENCH_FILE=/home/ec2-user/environment/data/pread-bench-16G.dat 
BENCH_FILE_SIZE_MIB=16384 BENCH_DROP_CACHES=false ./gradlew jmh --rerun 
-Pjmh.includes='RandomReadIOBenchmark' >> 0517.log
   ```
   
   - Result
   ```
   Benchmark                                     Mode  Cnt    Score    Error   
Units
   RandomReadIOBenchmark.ffiPreadDirectIO_T01   thrpt   10    0.147 ±  0.003  
ops/ms
   RandomReadIOBenchmark.ffiPreadDirectIO_T04   thrpt   10    0.599 ±  0.013  
ops/ms
   RandomReadIOBenchmark.ffiPreadDirectIO_T08   thrpt   10    1.206 ±  0.023  
ops/ms
   RandomReadIOBenchmark.ffiPreadDirectIO_T16   thrpt   10    1.250 ±  0.001  
ops/ms
   RandomReadIOBenchmark.ffiPread_T01           thrpt   10   24.691 ±  0.418  
ops/ms
   RandomReadIOBenchmark.ffiPread_T04           thrpt   10   86.814 ±  0.584  
ops/ms
   RandomReadIOBenchmark.ffiPread_T08           thrpt   10  149.697 ±  0.433  
ops/ms
   RandomReadIOBenchmark.ffiPread_T16           thrpt   10  205.366 ±  1.534  
ops/ms
   RandomReadIOBenchmark.fileChannelDirect_T01  thrpt   10   23.781 ±  0.425  
ops/ms
   RandomReadIOBenchmark.fileChannelDirect_T04  thrpt   10   76.825 ±  0.681  
ops/ms
   RandomReadIOBenchmark.fileChannelDirect_T08  thrpt   10   95.923 ±  3.866  
ops/ms
   RandomReadIOBenchmark.fileChannelDirect_T16  thrpt   10   92.143 ±  7.888  
ops/ms
   RandomReadIOBenchmark.fileChannelHeap_T01    thrpt   10   19.698 ±  0.123  
ops/ms
   RandomReadIOBenchmark.fileChannelHeap_T04    thrpt   10   64.903 ±  0.427  
ops/ms
   RandomReadIOBenchmark.fileChannelHeap_T08    thrpt   10   88.341 ±  5.815  
ops/ms
   RandomReadIOBenchmark.fileChannelHeap_T16    thrpt   10   85.964 ±  3.464  
ops/ms
   RandomReadIOBenchmark.mmap_T01               thrpt   10   38.191 ±  1.018  
ops/ms
   RandomReadIOBenchmark.mmap_T04               thrpt   10  144.854 ±  0.461  
ops/ms
   RandomReadIOBenchmark.mmap_T08               thrpt   10  267.100 ±  2.101  
ops/ms
   RandomReadIOBenchmark.mmap_T16               thrpt   10  309.911 ±  0.937  
ops/ms
   ```
   
   ### 32G file random read
   
   - Command
   ```
   sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
   
   cat /home/ec2-user/environment/data/pread-bench-32G.dat > /dev/null
   
   BENCH_FILE=/home/ec2-user/environment/data/pread-bench-32G.dat 
BENCH_FILE_SIZE_MIB=32768 BENCH_DROP_CACHES=false ./gradlew jmh --rerun 
-Pjmh.includes='RandomReadIOBenchmark' >> 0517.log
   ```
   
   - Result
   ```
   Benchmark                                     Mode  Cnt  Score    Error   
Units
   RandomReadIOBenchmark.ffiPreadDirectIO_T01   thrpt   10  0.145 ±  0.003  
ops/ms
   RandomReadIOBenchmark.ffiPreadDirectIO_T04   thrpt   10  0.601 ±  0.005  
ops/ms
   RandomReadIOBenchmark.ffiPreadDirectIO_T08   thrpt   10  1.207 ±  0.016  
ops/ms
   RandomReadIOBenchmark.ffiPreadDirectIO_T16   thrpt   10  1.250 ±  0.001  
ops/ms
   RandomReadIOBenchmark.ffiPread_T01           thrpt   10  1.136 ±  0.044  
ops/ms
   RandomReadIOBenchmark.ffiPread_T04           thrpt   10  4.039 ±  0.191  
ops/ms
   RandomReadIOBenchmark.ffiPread_T08           thrpt   10  4.473 ±  0.170  
ops/ms
   RandomReadIOBenchmark.ffiPread_T16           thrpt   10  4.207 ±  0.086  
ops/ms
   RandomReadIOBenchmark.fileChannelDirect_T01  thrpt   10  0.896 ±  0.013  
ops/ms
   RandomReadIOBenchmark.fileChannelDirect_T04  thrpt   10  3.620 ±  0.037  
ops/ms
   RandomReadIOBenchmark.fileChannelDirect_T08  thrpt   10  4.011 ±  0.050  
ops/ms
   RandomReadIOBenchmark.fileChannelDirect_T16  thrpt   10  3.941 ±  0.051  
ops/ms
   RandomReadIOBenchmark.fileChannelHeap_T01    thrpt   10  0.877 ±  0.016  
ops/ms
   RandomReadIOBenchmark.fileChannelHeap_T04    thrpt   10  3.523 ±  0.063  
ops/ms
   RandomReadIOBenchmark.fileChannelHeap_T08    thrpt   10  3.868 ±  0.036  
ops/ms
   RandomReadIOBenchmark.fileChannelHeap_T16    thrpt   10  3.832 ±  0.018  
ops/ms
   RandomReadIOBenchmark.mmap_T01               thrpt   10  0.695 ±  0.019  
ops/ms
   RandomReadIOBenchmark.mmap_T04               thrpt   10  2.868 ±  0.041  
ops/ms
   RandomReadIOBenchmark.mmap_T08               thrpt   10  3.500 ±  0.367  
ops/ms
   RandomReadIOBenchmark.mmap_T16               thrpt   10  3.322 ±  0.383  
ops/ms
   ```
   
   ### 64G file random read
   
   - Command
   ```
   sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
   
   cat /home/ec2-user/environment/data/pread-bench-64G.dat > /dev/null
   
   BENCH_FILE=/home/ec2-user/environment/data/pread-bench-64G.dat 
BENCH_FILE_SIZE_MIB=65536 BENCH_DROP_CACHES=false ./gradlew jmh --rerun 
-Pjmh.includes='RandomReadIOBenchmark' >> 0517.log
   ```
   
   - Result
   ```
   Benchmark                                     Mode  Cnt  Score   Error   
Units
   RandomReadIOBenchmark.ffiPreadDirectIO_T01   thrpt   10  0.148 ± 0.002  
ops/ms
   RandomReadIOBenchmark.ffiPreadDirectIO_T04   thrpt   10  0.601 ± 0.013  
ops/ms
   RandomReadIOBenchmark.ffiPreadDirectIO_T08   thrpt   10  1.188 ± 0.029  
ops/ms
   RandomReadIOBenchmark.ffiPreadDirectIO_T16   thrpt   10  1.250 ± 0.001  
ops/ms
   RandomReadIOBenchmark.ffiPread_T01           thrpt   10  0.230 ± 0.007  
ops/ms
   RandomReadIOBenchmark.ffiPread_T04           thrpt   10  0.869 ± 0.019  
ops/ms
   RandomReadIOBenchmark.ffiPread_T08           thrpt   10  1.565 ± 0.113  
ops/ms
   RandomReadIOBenchmark.ffiPread_T16           thrpt   10  1.432 ± 0.032  
ops/ms
   RandomReadIOBenchmark.fileChannelDirect_T01  thrpt   10  0.198 ± 0.003  
ops/ms
   RandomReadIOBenchmark.fileChannelDirect_T04  thrpt   10  0.806 ± 0.014  
ops/ms
   RandomReadIOBenchmark.fileChannelDirect_T08  thrpt   10  1.348 ± 0.016  
ops/ms
   RandomReadIOBenchmark.fileChannelDirect_T16  thrpt   10  1.315 ± 0.020  
ops/ms
   RandomReadIOBenchmark.fileChannelHeap_T01    thrpt   10  0.192 ± 0.004  
ops/ms
   RandomReadIOBenchmark.fileChannelHeap_T04    thrpt   10  0.775 ± 0.014  
ops/ms
   RandomReadIOBenchmark.fileChannelHeap_T08    thrpt   10  1.284 ± 0.013  
ops/ms
   RandomReadIOBenchmark.fileChannelHeap_T16    thrpt   10  1.263 ± 0.009  
ops/ms
   RandomReadIOBenchmark.mmap_T01               thrpt   10  0.124 ± 0.002  
ops/ms
   RandomReadIOBenchmark.mmap_T04               thrpt   10  0.495 ± 0.023  
ops/ms
   RandomReadIOBenchmark.mmap_T08               thrpt   10  0.503 ± 0.008  
ops/ms
   RandomReadIOBenchmark.mmap_T16               thrpt   10  0.508 ± 0.015  
ops/ms
   ```
   
   ## Sequential Read IO benchmark
   
   ## Warmup cache as much as possible
   
   ### 16G file sequential read
   
   - Command
   ```
   sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
   
   cat /home/ec2-user/environment/data/pread-bench-16G.dat > /dev/null
   
   BENCH_FILE=/home/ec2-user/environment/data/pread-bench-16G.dat 
BENCH_FILE_SIZE_MIB=16384 BENCH_DROP_CACHES=false ./gradlew jmh --rerun 
-Pjmh.includes='SequentialReadIOBenchmark' >> 0517.log```
   ```
   
   - Result
   ```
   Benchmark                                         Mode  Cnt    Score   Error 
  Units
   SequentialReadIOBenchmark.ffiPreadDirectIO_T01   thrpt   10    0.155 ± 0.003 
 ops/ms
   SequentialReadIOBenchmark.ffiPreadDirectIO_T04   thrpt   10    0.631 ± 0.005 
 ops/ms
   SequentialReadIOBenchmark.ffiPreadDirectIO_T08   thrpt   10    1.253 ± 0.017 
 ops/ms
   SequentialReadIOBenchmark.ffiPreadDirectIO_T16   thrpt   10    1.250 ± 0.001 
 ops/ms
   SequentialReadIOBenchmark.ffiPread_T01           thrpt   10   27.720 ± 0.093 
 ops/ms
   SequentialReadIOBenchmark.ffiPread_T04           thrpt   10   92.820 ± 1.452 
 ops/ms
   SequentialReadIOBenchmark.ffiPread_T08           thrpt   10  157.172 ± 3.699 
 ops/ms
   SequentialReadIOBenchmark.ffiPread_T16           thrpt   10  218.619 ± 1.288 
 ops/ms
   SequentialReadIOBenchmark.fileChannelDirect_T01  thrpt   10   26.577 ± 0.042 
 ops/ms
   SequentialReadIOBenchmark.fileChannelDirect_T04  thrpt   10   79.615 ± 2.439 
 ops/ms
   SequentialReadIOBenchmark.fileChannelDirect_T08  thrpt   10   88.710 ± 3.450 
 ops/ms
   SequentialReadIOBenchmark.fileChannelDirect_T16  thrpt   10   95.892 ± 4.006 
 ops/ms
   SequentialReadIOBenchmark.fileChannelHeap_T01    thrpt   10   21.036 ± 0.359 
 ops/ms
   SequentialReadIOBenchmark.fileChannelHeap_T04    thrpt   10   68.143 ± 0.942 
 ops/ms
   SequentialReadIOBenchmark.fileChannelHeap_T08    thrpt   10   90.006 ± 3.340 
 ops/ms
   SequentialReadIOBenchmark.fileChannelHeap_T16    thrpt   10   85.597 ± 5.060 
 ops/ms
   SequentialReadIOBenchmark.mmap_T01               thrpt   10   46.078 ± 0.084 
 ops/ms
   SequentialReadIOBenchmark.mmap_T04               thrpt   10  172.748 ± 0.472 
 ops/ms
   SequentialReadIOBenchmark.mmap_T08               thrpt   10  311.737 ± 1.623 
 ops/ms
   SequentialReadIOBenchmark.mmap_T16               thrpt   10  334.911 ± 1.108 
 ops/ms
   ```
   
   ### 32G file sequential read
   
   - Command
   ```
   sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
   
   cat /home/ec2-user/environment/data/pread-bench-32G.dat > /dev/null
   
   BENCH_FILE=/home/ec2-user/environment/data/pread-bench-32G.dat 
BENCH_FILE_SIZE_MIB=32768 BENCH_DROP_CACHES=false ./gradlew jmh --rerun 
-Pjmh.includes='SequentialReadIOBenchmark' >> 0517.log
   ```
   
   - Result
   ```
   Benchmark                                         Mode  Cnt   Score    Error 
  Units
   SequentialReadIOBenchmark.ffiPreadDirectIO_T01   thrpt   10   0.154 ±  0.002 
 ops/ms
   SequentialReadIOBenchmark.ffiPreadDirectIO_T04   thrpt   10   0.621 ±  0.007 
 ops/ms
   SequentialReadIOBenchmark.ffiPreadDirectIO_T08   thrpt   10   1.232 ±  0.017 
 ops/ms
   SequentialReadIOBenchmark.ffiPreadDirectIO_T16   thrpt   10   1.250 ±  0.001 
 ops/ms
   SequentialReadIOBenchmark.ffiPread_T01           thrpt   10   3.192 ±  0.202 
 ops/ms
   SequentialReadIOBenchmark.ffiPread_T04           thrpt   10  11.043 ±  0.441 
 ops/ms
   SequentialReadIOBenchmark.ffiPread_T08           thrpt   10  16.266 ±  0.612 
 ops/ms
   SequentialReadIOBenchmark.ffiPread_T16           thrpt   10  15.321 ±  0.227 
 ops/ms
   SequentialReadIOBenchmark.fileChannelDirect_T01  thrpt   10   2.488 ±  0.057 
 ops/ms
   SequentialReadIOBenchmark.fileChannelDirect_T04  thrpt   10   9.853 ±  0.163 
 ops/ms
   SequentialReadIOBenchmark.fileChannelDirect_T08  thrpt   10  14.824 ±  0.146 
 ops/ms
   SequentialReadIOBenchmark.fileChannelDirect_T16  thrpt   10  14.457 ±  0.176 
 ops/ms
   SequentialReadIOBenchmark.fileChannelHeap_T01    thrpt   10   2.328 ±  0.050 
 ops/ms
   SequentialReadIOBenchmark.fileChannelHeap_T04    thrpt   10   9.408 ±  0.220 
 ops/ms
   SequentialReadIOBenchmark.fileChannelHeap_T08    thrpt   10  14.448 ±  0.208 
 ops/ms
   SequentialReadIOBenchmark.fileChannelHeap_T16    thrpt   10  14.331 ±  0.146 
 ops/ms
   SequentialReadIOBenchmark.mmap_T01               thrpt   10   2.633 ±  0.049 
 ops/ms
   SequentialReadIOBenchmark.mmap_T04               thrpt   10  10.784 ±  0.201 
 ops/ms
   SequentialReadIOBenchmark.mmap_T08               thrpt   10  14.853 ±  0.303 
 ops/ms
   SequentialReadIOBenchmark.mmap_T16               thrpt   10  14.937 ±  0.242 
 ops/ms
   ```
   
   ### 64G file sequential read
   
   - Command
   ```
   sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
   
   cat /home/ec2-user/environment/data/pread-bench-64G.dat > /dev/null
   
   BENCH_FILE=/home/ec2-user/environment/data/pread-bench-64G.dat 
BENCH_FILE_SIZE_MIB=65536 BENCH_DROP_CACHES=false ./gradlew jmh --rerun 
-Pjmh.includes='SequentialReadIOBenchmark' >> 0517.log
   ```
   
   - Result
   ```
   Benchmark                                         Mode  Cnt  Score   Error   
Units
   SequentialReadIOBenchmark.ffiPreadDirectIO_T01   thrpt   10  0.156 ± 0.002  
ops/ms
   SequentialReadIOBenchmark.ffiPreadDirectIO_T04   thrpt   10  0.622 ± 0.009  
ops/ms
   SequentialReadIOBenchmark.ffiPreadDirectIO_T08   thrpt   10  1.239 ± 0.014  
ops/ms
   SequentialReadIOBenchmark.ffiPreadDirectIO_T16   thrpt   10  1.250 ± 0.001  
ops/ms
   SequentialReadIOBenchmark.ffiPread_T01           thrpt   10  0.695 ± 0.017  
ops/ms
   SequentialReadIOBenchmark.ffiPread_T04           thrpt   10  2.361 ± 0.049  
ops/ms
   SequentialReadIOBenchmark.ffiPread_T08           thrpt   10  2.424 ± 0.018  
ops/ms
   SequentialReadIOBenchmark.ffiPread_T16           thrpt   10  2.439 ± 0.016  
ops/ms
   SequentialReadIOBenchmark.fileChannelDirect_T01  thrpt   10  0.669 ± 0.012  
ops/ms
   SequentialReadIOBenchmark.fileChannelDirect_T04  thrpt   10  2.469 ± 0.103  
ops/ms
   SequentialReadIOBenchmark.fileChannelDirect_T08  thrpt   10  2.442 ± 0.024  
ops/ms
   SequentialReadIOBenchmark.fileChannelDirect_T16  thrpt   10  2.442 ± 0.024  
ops/ms
   SequentialReadIOBenchmark.fileChannelHeap_T01    thrpt   10  0.667 ± 0.014  
ops/ms
   SequentialReadIOBenchmark.fileChannelHeap_T04    thrpt   10  2.472 ± 0.123  
ops/ms
   SequentialReadIOBenchmark.fileChannelHeap_T08    thrpt   10  2.428 ± 0.025  
ops/ms
   SequentialReadIOBenchmark.fileChannelHeap_T16    thrpt   10  2.440 ± 0.027  
ops/ms
   SequentialReadIOBenchmark.mmap_T01               thrpt   10  0.745 ± 0.018  
ops/ms
   SequentialReadIOBenchmark.mmap_T04               thrpt   10  2.189 ± 0.026  
ops/ms
   SequentialReadIOBenchmark.mmap_T08               thrpt   10  2.188 ± 0.021  
ops/ms
   SequentialReadIOBenchmark.mmap_T16               thrpt   10  2.186 ± 0.012  
ops/ms
   ```
   </details>
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Introduce a `pread` Directory based on Panama-FFI ? [lucene]

Reply via email to