neoremind commented on issue #16044:
URL: https://github.com/apache/lucene/issues/16044#issuecomment-4422435972
Hi Guo Feng,
This is a really interesting issue. I spent some time spinning up a JMH test
to vet and reproduce the `NativeThreadSet` contention, mmap page-fault storms,
and FFI pread as an alternative. Figured I'd like to share the results here.
## Setup
I wrote two JMH benchmarks (see Appendix#1):
- **RandomReadIOBenchmark** : 16 random 16 KiB reads per op
- **SequentialReadIOBenchmark** : 1 random seek + 15 sequential 16 KiB reads
per op
Each benchmark tests five I/O strategies:
1. **mmap** (`MemorySegment.copy` from a mapped file, simulates
`MMapDirectory`)
2. **FFI pread** (direct `pread(2)` syscall via FFI)
3. **FileChannel + DirectByteBuffer** (simulates `NIOFSDirectory`)
4. **FileChannel + HeapByteBuffer** (same, with extra bounce-buffer copy)
5. **FFI pread + O_DIRECT** (bypasses kernel's page cache)
Thread counts: 1, 4, 8, 16.
**Environment:** c5.4xlarge (32 GiB RAM, 16 vCPU), io2 EBS (20K provisioned
IOPS). JDK 25.0.2, `-Xms2g -Xmx2g`. Three data files: 16 GiB, 32 GiB, 64 GiB
(created with like `dd if=/dev/urandom of=/path/data.dat bs=1M count=2048`). I
didn't test against nvme ssd with higher IOPS and throughput, but I think the
numbers based on block store below speak enough.
**Procedure:** Before each run, drop system page caches (`echo 3 >
/proc/sys/vm/drop_caches`), then warm the page cache by reading the file (`cat
file > /dev/null`). This gives a controlled starting state to feed data into
page cache as much as possible. The key variable is how much of the working set
fits in the ~30G available for page cache.
## Results
### Random Read - 16G file (fits in memory, ops/ms)
| Threads | mmap | FFI pread | FileChannel Direct | FileChannel Heap | FFI
O_DIRECT |
|---------|------|-----------|-------------------|-----------------|--------------|
| 1 | 38.2 | 24.7 | 23.8 | 19.7 | 0.15
|
| 4 | 144.9| 86.8 | 76.8 | 64.9 | 0.60
|
| 8 | 267.1| 149.7 | 95.9 | 88.3 | 1.21
|
| 16 | **309.9** | **205.4** | 92.1 | 86.0 | 1.25
|
### Random Read - 32G file (at memory limit, ops/ms)
| Threads | mmap | FFI pread | FileChannel Direct | FileChannel Heap | FFI
O_DIRECT |
|---------|------|-----------|-------------------|-----------------|--------------|
| 1 | 0.70 | 1.14 | 0.90 | 0.88 | 0.15
|
| 4 | 2.87 | 4.04 | 3.62 | 3.52 | 0.60
|
| 8 | 3.50 | 4.47 | 4.01 | 3.87 | 1.21
|
| 16 | 3.32 | **4.21** | 3.94 | 3.83 | 1.25
|
### Random Read - 64G file (exceeds memory, ops/ms)
| Threads | mmap | FFI pread | FileChannel Direct | FileChannel Heap | FFI
O_DIRECT |
|---------|------|-----------|-------------------|-----------------|--------------|
| 1 | 0.12 | 0.23 | 0.20 | 0.19 | 0.15
|
| 4 | 0.50 | 0.87 | 0.81 | 0.78 | 0.60
|
| 8 | 0.50 | 1.57 | 1.35 | 1.28 | 1.19
|
| 16 | 0.51 | **1.43** | 1.32 | 1.26 | 1.25
|
### Sequential Read - 16G file (fits in memory, ops/ms)
| Threads | mmap | FFI pread | FileChannel Direct | FileChannel Heap | FFI
O_DIRECT |
|---------|------|-----------|-------------------|-----------------|--------------|
| 1 | 46.1 | 27.7 | 26.6 | 21.0 | 0.16
|
| 4 | 172.7| 92.8 | 79.6 | 68.1 | 0.63
|
| 8 | 311.7| 157.2 | 88.7 | 90.0 | 1.25
|
| 16 | **334.9** | **218.6** | 95.9 | 85.6 | 1.25
|
### Sequential Read - 32G file (at memory limit, ops/ms)
| Threads | mmap | FFI pread | FileChannel Direct | FileChannel Heap | FFI
O_DIRECT |
|---------|------|-----------|-------------------|-----------------|--------------|
| 1 | 2.63 | 3.19 | 2.49 | 2.33 | 0.15
|
| 4 | 10.8 | 11.0 | 9.85 | 9.41 | 0.62
|
| 8 | 14.9 | 16.3 | 14.8 | 14.4 | 1.23
|
| 16 | 14.9 | **15.3** | 14.5 | 14.3 | 1.25
|
### Sequential Read - 64G file (exceeds memory, ops/ms)
| Threads | mmap | FFI pread | FileChannel Direct | FileChannel Heap | FFI
O_DIRECT |
|---------|------|-----------|-------------------|-----------------|--------------|
| 1 | 0.75 | 0.70 | 0.67 | 0.67 | 0.16
|
| 4 | 2.19 | 2.36 | 2.47 | 2.47 | 0.62
|
| 8 | 2.19 | 2.42 | 2.44 | 2.43 | 1.24
|
| 16 | 2.19 | **2.44** | 2.44 | 2.44 | 1.25
|
## Key Observations
**1. NativeThreadSet contention is real and dramatic.**
When the working set fits in RAM (the "happy path"), FFI pread delivers
**2.2x the throughput** of FileChannel at 16 threads. And FileChannel actually
*regresses* from T8 to T16, lock contention occurrs where adding threads makes
things worse. FFI pread scales linearly throughout.
At T1 the diff is tiny because the lock is uncontended. The gap opens
progressively with concurrency.
(Random Read - 16G file)
| Threads | FFI pread | FileChannel Direct | FFI advantage |
|---------|-----------|-------------------|---------------|
| 1 | 24.7 | 23.8 | 1.04x |
| 4 | 86.8 | 76.8 | 1.13x |
| 8 | 149.7 | 95.9 | **1.56x** |
| 16 | 205.4 | 92.1 | **2.23x** |
FFI pread scales almost linearly (8.3x at T16 vs T1). FileChannel peaks at
T8 and then *drops*. That's the NativeThreadSet synchronized block becoming the
bottleneck.
**2. mmap collapses under memory pressure - exactly as described.**
mmap goes from 310 ops/ms (warm) to 0.51 ops/ms (64G working set). When the
working set exceeds available memory, `pgmajfault` surges (grep
"pgfault|pgmajfault" seeing the number jumps) and throughput collapses, this is
exactly what the issue observed.
The full scaling picture under pressure (64G random read) tells the story:
| Threads | mmap | FFI pread | FileChannel |
|---------|------|-----------|-------------|
| 1 | 0.12 | 0.23 | 0.20 |
| 4 | 0.50 | 0.87 | 0.81 |
| 8 | 0.50 | 1.57 | 1.35 |
| 16 | 0.51 | 1.43 | 1.32 |
mmap flatlines after T4, adding 4x more threads gives no additional
throughput. pread variants continue scaling to T8 before hitting the IOPS
ceiling, likely because the page cache read path has a cheaper eviction penalty
than mmap's page fault handling.
**3. Sequential access helps mmap significantly, but doesn't save it.**
Under pressure (64G), sequential mmap is 4.3x better than random mmap (2.19
vs 0.51) thanks to kernel readahead. But pread is still slightly ahead (2.44 vs
2.19).
**4. When disk-bound, FFI pread and FileChannel converge.**
At 64G (severe pressure), FFI pread (1.43) vs FileChannel (1.32) is only
with little diff. The NativeThreadSet lock overhead is invisible when disk I/O
spending is way bigger. The contention story only matters when I/O is fast like
with page cache hits.
**5. O_DIRECT confirms the IOPS ceiling.**
O_DIRECT saturates at exactly 1.25 ops/ms (1.25 × 16 reads/op = 20K IOPS —
our provisioned limit). This serves as a nice baseline confirming the disk is
the bottleneck when cache misses dominate.
**6. Cold start with small file (2G, fits in RAM).**
I also ran a test with a 2G file (fits in 32G RAM) but dropping page caches
before each iteration simulating a cold-start scenario, adding each iteration
to run up to 10 seconds instead of 5s above:
| Threads | mmap | FFI pread | FileChannel Direct |
|---------|------|-----------|-------------------|
| 1 | 0.21 | 0.14 | 0.14 |
| 4 | 148.7| 8.8 | 7.5 |
| 8 | 278.6| 57.3 | 38.4 |
| 16 | 316.6| 80.2 | 41.0 |
At T1, everyone is disk-bound and roughly equal. But mmap warms the cache
way faster, by T4 it's already at 148.7 ops/ms while pread is still at 8.8.
This is because once mmap faults in the pages, subsequent accesses within the
same iteration are pure memory reads (zero-copy, no syscall). The 2G file gets
fully resident quickly and the rest of the iteration takes advantage and runs
at memory speed.
Running command:
`BENCH_FILE=/home/ec2-user/environment/data/pread-bench-2G.dat
BENCH_FILE_SIZE_MIB=2048 BENCH_DROP_CACHES=true ./gradlew jmh --rerun
-Pjmh.includes='RandomReadIOBenchmark'`.
## Conclusions
The benchmark data supports the direction of this issue:
1. **mmap degrades under memory pressure** once the index working set
exceeds the available memory. This is because `pgmajfault` surges and
throughput collapses.
2. **mmap is the most performant option when memory is not the limit**,
perfect if everything fits in memory staying in page cache. In that scenario,
nothing beats zero-copy memory reads. Even in a cold-start scenario where the
file fits in RAM, mmap warms the cache faster than pread and quickly reaches
memory-speed throughput.
3. **FileChannel thread contention is verified**, and FFI pread eliminates
`NativeThreadSet` contention delivering 2.2x better throughput at 16 threads in
the warm-cache case, while FileChannel actually regresses.
4. **FFI pread is a good option for memory-bound scenarios**, it matches or
beats FileChannel in almost all scenarios, and avoids mmap's severe degradation
under memory pressure. This looks like a good option for memory-bound
deployments (cgroup-limited containers, large indices compared to available
RAM).
## Appendix
### Appendix 1. JMH test cases
<details>
<summary>RandomReadIOBenchmark.java</summary>
```java
import java.io.IOException;
import java.lang.foreign.Arena;
import java.lang.foreign.FunctionDescriptor;
import java.lang.foreign.Linker;
import java.lang.foreign.MemorySegment;
import java.lang.foreign.SymbolLookup;
import java.lang.foreign.ValueLayout;
import java.lang.invoke.MethodHandle;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.TearDown;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 5, time = 5)
@Fork(
value = 2,
jvmArgsPrepend = {"--enable-native-access=ALL-UNNAMED", "-Xms2g",
"-Xmx2g"})
public class RandomReadIOBenchmark {
private static final int READ_SIZE = 16 * 1024; // 16 KiB
private static final int READS_PER_OP = 16;
// O_DIRECT requires offsets aligned to filesystem block size
private static final long ALIGNMENT = 4096;
/**
* File size in MiB. Read from env var BENCH_FILE_SIZE_MIB or system
property bench.fileSizeMiB.
* Env var takes precedence (inherited by forked JVMs automatically).
*/
private static final long FILE_SIZE =
Long.parseLong(envOrProp("BENCH_FILE_SIZE_MIB",
"bench.fileSizeMiB", "1024"))
* 1024L
* 1024L;
private static final long MAX_OFFSET = FILE_SIZE - READ_SIZE;
private static final long MAX_ALIGNED_OFFSET = (MAX_OFFSET / ALIGNMENT)
* ALIGNMENT;
// FFI handles for pread / open / close
private static final MethodHandle PREAD;
private static final MethodHandle OPEN;
private static final MethodHandle CLOSE;
static {
Linker linker = Linker.nativeLinker();
SymbolLookup lookup = linker.defaultLookup();
PREAD =
linker.downcallHandle(
lookup.find("pread").orElseThrow(),
FunctionDescriptor.of(
ValueLayout.JAVA_LONG, // ssize_t return
ValueLayout.JAVA_INT, // int fd
ValueLayout.ADDRESS, // void *buf
ValueLayout.JAVA_LONG, // size_t count
ValueLayout.JAVA_LONG // off_t offset
));
OPEN =
linker.downcallHandle(
lookup.find("open").orElseThrow(),
FunctionDescriptor.of(
ValueLayout.JAVA_INT, // int return (fd)
ValueLayout.ADDRESS, // const char *pathname
ValueLayout.JAVA_INT // int flags
));
CLOSE =
linker.downcallHandle(
lookup.find("close").orElseThrow(),
FunctionDescriptor.of(
ValueLayout.JAVA_INT, // int return
ValueLayout.JAVA_INT // int fd
));
}
/** Per-thread pre-allocated buffers to avoid allocation noise in the
measured path. */
@State(Scope.Thread)
public static class ThreadBuffers {
ByteBuffer directBuf;
ByteBuffer heapBuf;
Arena ffiArena;
MemorySegment ffiBuf;
MemorySegment ffiDirectIoBuf;
@Setup(Level.Trial)
public void setup() {
directBuf = ByteBuffer.allocateDirect(READ_SIZE);
heapBuf = ByteBuffer.allocate(READ_SIZE);
ffiArena = Arena.ofConfined();
ffiBuf = ffiArena.allocate(READ_SIZE);
// O_DIRECT requires buffer aligned to filesystem block size
(typically 4096)
ffiDirectIoBuf = ffiArena.allocate(READ_SIZE, 4096);
}
@TearDown(Level.Trial)
public void tearDown() {
ffiArena.close();
}
}
private Path tempFile;
private FileChannel fileChannel;
private MemorySegment mmapSegment;
private int nativeFd;
private int directIoFd;
private Arena arena;
/**
* Path to the benchmark data file. Create it before running with:
*
* <pre>
* dd if=/dev/urandom of=/tmp/pread-bench.dat bs=1M count=1024
* </pre>
*/
private static final String BENCH_FILE =
envOrProp("BENCH_FILE", "bench.file", "/tmp/pread-bench.dat");
/**
* Whether to drop page caches before each iteration. Requires root or
sudo without password.
* Pass via env var BENCH_DROP_CACHES=true or -Dbench.dropCaches=true
*
* <p>When enabled, caches are dropped at the start of each iteration
(warmup and measurement).
* JIT still warms up across iterations since the JVM persists, but each
iteration starts with a
* cold page cache — simulating memory-constrained containers.
*/
private static final boolean DROP_CACHES =
Boolean.parseBoolean(envOrProp("BENCH_DROP_CACHES",
"bench.dropCaches", "false"));
@Setup(Level.Trial)
public void setup() throws Exception {
System.out.println("[bench] ===== RandomReadIOBenchmark
Configuration =====");
System.out.println("[bench] file: " + BENCH_FILE);
System.out.println("[bench] fileSizeMiB: " + (FILE_SIZE / (1024
* 1024)));
System.out.println("[bench] dropCaches: " + DROP_CACHES);
System.out.println("[bench] readSize: " + READ_SIZE + "
bytes");
System.out.println("[bench] readsPerOp: " + READS_PER_OP);
System.out.println("[bench]
================================================");
tempFile = Path.of(BENCH_FILE);
if (!Files.exists(tempFile)) {
throw new IOException(
"Benchmark file not found: "
+ tempFile
+ "\nCreate it with: dd if=/dev/urandom of="
+ BENCH_FILE
+ " bs=1M count="
+ (FILE_SIZE / (1024 * 1024)));
}
long size = Files.size(tempFile);
if (size < FILE_SIZE) {
throw new IOException(
"Benchmark file too small: "
+ size
+ " bytes, expected at least "
+ FILE_SIZE
+ "\nRecreate with: dd if=/dev/urandom of="
+ BENCH_FILE
+ " bs=1M count="
+ (FILE_SIZE / (1024 * 1024)));
}
// Open FileChannel for the benchmark
fileChannel = FileChannel.open(tempFile, StandardOpenOption.READ);
// Open native fd via FFI
arena = Arena.ofShared();
// Memory-map the file (simulates MMapDirectory)
mmapSegment = fileChannel.map(MapMode.READ_ONLY, 0, FILE_SIZE,
arena);
MemorySegment pathStr = arena.allocateFrom(tempFile.toString());
int O_RDONLY = 0;
try {
nativeFd = (int) OPEN.invokeExact(pathStr, O_RDONLY);
} catch (Throwable t) {
throw new RuntimeException("Failed to open file via FFI", t);
}
if (nativeFd < 0) {
throw new IOException("FFI open() returned " + nativeFd);
}
// Open native fd with O_DIRECT for Direct I/O (Linux only, bypasses
page cache)
int O_DIRECT = 0x4000; // Linux x86_64 value for O_DIRECT
try {
directIoFd = (int) OPEN.invokeExact(pathStr, O_RDONLY |
O_DIRECT);
} catch (Throwable t) {
throw new RuntimeException("Failed to open file with O_DIRECT
via FFI", t);
}
if (directIoFd < 0) {
// O_DIRECT may not be supported on all filesystems (e.g. tmpfs)
System.err.println(
"WARNING: O_DIRECT open failed (fd=" + directIoFd + "). "
+ "Direct I/O benchmarks will fail. Use a
filesystem that supports O_DIRECT.");
directIoFd = -1;
}
}
@TearDown(Level.Trial)
public void tearDown() throws Exception {
fileChannel.close();
try {
int rc = (int) CLOSE.invokeExact(nativeFd);
if (directIoFd >= 0) {
rc = (int) CLOSE.invokeExact(directIoFd);
}
} catch (Throwable t) {
throw new RuntimeException(t);
}
arena.close();
}
/**
* Drops page caches before each iteration (warmup and measurement).
* This ensures each iteration starts with a cold page cache.
* JIT still warms up across iterations since the JVM persists across
the fork.
*/
@Setup(Level.Iteration)
public void setupIteration() throws IOException {
if (DROP_CACHES) {
dropPageCaches();
}
}
/**
* Drops the kernel page cache to simulate cold-cache /
memory-constrained scenarios.
* Requires running as root or with passwordless sudo.
* Uses: sync && echo 3 > /proc/sys/vm/drop_caches
*/
private static void dropPageCaches() throws IOException {
// Sync first to flush dirty pages
Process sync = new ProcessBuilder("sync").inheritIO().start();
try {
if (sync.waitFor() != 0) {
throw new IOException("sync failed with exit code " +
sync.exitValue());
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new IOException("Interrupted during sync", e);
}
// Drop page cache (requires root)
Process drop =
new ProcessBuilder("sudo", "bash", "-c", "echo 3 >
/proc/sys/vm/drop_caches")
.inheritIO()
.start();
try {
if (drop.waitFor() != 0) {
throw new IOException(
"Failed to drop page caches (exit code "
+ drop.exitValue()
+ "). Run as root or with: sudo sysctl
vm.drop_caches=3");
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new IOException("Interrupted during drop_caches", e);
}
System.out.println("[bench] Page caches dropped.");
}
/** Reads a config value from env var first, then system property, then
default. */
private static String envOrProp(String envKey, String propKey, String
defaultValue) {
String env = System.getenv(envKey);
if (env != null && !env.isEmpty()) {
return env;
}
return System.getProperty(propKey, defaultValue);
}
// ---- FileChannel + DirectByteBuffer (contended NativeThreadSet) ----
@Benchmark
@Threads(1)
public void fileChannelDirect_T01(ThreadBuffers tb, Blackhole bh) throws
IOException {
doFileChannelDirectReads(tb, bh);
}
@Benchmark
@Threads(4)
public void fileChannelDirect_T04(ThreadBuffers tb, Blackhole bh) throws
IOException {
doFileChannelDirectReads(tb, bh);
}
@Benchmark
@Threads(8)
public void fileChannelDirect_T08(ThreadBuffers tb, Blackhole bh) throws
IOException {
doFileChannelDirectReads(tb, bh);
}
@Benchmark
@Threads(16)
public void fileChannelDirect_T16(ThreadBuffers tb, Blackhole bh) throws
IOException {
doFileChannelDirectReads(tb, bh);
}
// ---- FileChannel + HeapByteBuffer (extra copy + contended
NativeThreadSet) ----
@Benchmark
@Threads(1)
public void fileChannelHeap_T01(ThreadBuffers tb, Blackhole bh) throws
IOException {
doFileChannelHeapReads(tb, bh);
}
@Benchmark
@Threads(4)
public void fileChannelHeap_T04(ThreadBuffers tb, Blackhole bh) throws
IOException {
doFileChannelHeapReads(tb, bh);
}
@Benchmark
@Threads(8)
public void fileChannelHeap_T08(ThreadBuffers tb, Blackhole bh) throws
IOException {
doFileChannelHeapReads(tb, bh);
}
@Benchmark
@Threads(16)
public void fileChannelHeap_T16(ThreadBuffers tb, Blackhole bh) throws
IOException {
doFileChannelHeapReads(tb, bh);
}
// ---- FFI pread benchmark (no contention) ----
@Benchmark
@Threads(1)
public void ffiPread_T01(ThreadBuffers tb, Blackhole bh) {
doFfiReads(tb, bh);
}
@Benchmark
@Threads(4)
public void ffiPread_T04(ThreadBuffers tb, Blackhole bh) {
doFfiReads(tb, bh);
}
@Benchmark
@Threads(8)
public void ffiPread_T08(ThreadBuffers tb, Blackhole bh) {
doFfiReads(tb, bh);
}
@Benchmark
@Threads(16)
public void ffiPread_T16(ThreadBuffers tb, Blackhole bh) {
doFfiReads(tb, bh);
}
// ---- mmap benchmark (simulates MMapDirectory — page faults under
memory pressure) ----
@Benchmark
@Threads(1)
public void mmap_T01(ThreadBuffers tb, Blackhole bh) {
doMmapReads(tb, bh);
}
@Benchmark
@Threads(4)
public void mmap_T04(ThreadBuffers tb, Blackhole bh) {
doMmapReads(tb, bh);
}
@Benchmark
@Threads(8)
public void mmap_T08(ThreadBuffers tb, Blackhole bh) {
doMmapReads(tb, bh);
}
@Benchmark
@Threads(16)
public void mmap_T16(ThreadBuffers tb, Blackhole bh) {
doMmapReads(tb, bh);
}
// ---- FFI pread + O_DIRECT benchmark (bypasses page cache, Linux only)
----
@Benchmark
@Threads(1)
public void ffiPreadDirectIO_T01(ThreadBuffers tb, Blackhole bh) {
doFfiDirectIoReads(tb, bh);
}
@Benchmark
@Threads(4)
public void ffiPreadDirectIO_T04(ThreadBuffers tb, Blackhole bh) {
doFfiDirectIoReads(tb, bh);
}
@Benchmark
@Threads(8)
public void ffiPreadDirectIO_T08(ThreadBuffers tb, Blackhole bh) {
doFfiDirectIoReads(tb, bh);
}
@Benchmark
@Threads(16)
public void ffiPreadDirectIO_T16(ThreadBuffers tb, Blackhole bh) {
doFfiDirectIoReads(tb, bh);
}
// ---- Implementation ----
private void doFileChannelDirectReads(ThreadBuffers tb, Blackhole bh)
throws IOException {
ThreadLocalRandom rng = ThreadLocalRandom.current();
ByteBuffer buf = tb.directBuf;
for (int i = 0; i < READS_PER_OP; i++) {
long offset = rng.nextLong(MAX_OFFSET);
buf.clear();
int n = fileChannel.read(buf, offset);
bh.consume(n);
}
}
private void doFileChannelHeapReads(ThreadBuffers tb, Blackhole bh)
throws IOException {
ThreadLocalRandom rng = ThreadLocalRandom.current();
ByteBuffer buf = tb.heapBuf;
for (int i = 0; i < READS_PER_OP; i++) {
long offset = rng.nextLong(MAX_OFFSET);
buf.clear();
int n = fileChannel.read(buf, offset);
bh.consume(n);
}
}
private void doFfiReads(ThreadBuffers tb, Blackhole bh) {
ThreadLocalRandom rng = ThreadLocalRandom.current();
MemorySegment buf = tb.ffiBuf;
try {
for (int i = 0; i < READS_PER_OP; i++) {
long offset = rng.nextLong(MAX_OFFSET);
long n = (long) PREAD.invokeExact(nativeFd, buf, (long)
READ_SIZE, offset);
bh.consume(n);
}
} catch (Throwable t) {
throw new RuntimeException(t);
}
}
private void doMmapReads(ThreadBuffers tb, Blackhole bh) {
ThreadLocalRandom rng = ThreadLocalRandom.current();
byte[] dst = tb.heapBuf.array();
for (int i = 0; i < READS_PER_OP; i++) {
long offset = rng.nextLong(MAX_OFFSET);
// Copy from mmap'd region into a byte array — this is what
MMapDirectory does.
// If the page is not resident, this triggers a page fault
(major fault if evicted).
MemorySegment.copy(mmapSegment, ValueLayout.JAVA_BYTE, offset,
dst, 0, READ_SIZE);
bh.consume(dst[0]);
}
}
private void doFfiDirectIoReads(ThreadBuffers tb, Blackhole bh) {
if (directIoFd < 0) {
// O_DIRECT not available on this filesystem — skip silently
bh.consume(0);
return;
}
ThreadLocalRandom rng = ThreadLocalRandom.current();
MemorySegment buf = tb.ffiDirectIoBuf;
try {
for (int i = 0; i < READS_PER_OP; i++) {
// O_DIRECT requires aligned offset; generate random aligned
position
long offset = (rng.nextLong(MAX_ALIGNED_OFFSET / ALIGNMENT))
* ALIGNMENT;
long n = (long) PREAD.invokeExact(directIoFd, buf, (long)
READ_SIZE, offset);
bh.consume(n);
}
} catch (Throwable t) {
throw new RuntimeException(t);
}
}
}
```
</details>
<details>
<summary>SequentialReadIOBenchmark.java</summary>
```java
import java.io.IOException;
import java.lang.foreign.Arena;
import java.lang.foreign.FunctionDescriptor;
import java.lang.foreign.Linker;
import java.lang.foreign.MemorySegment;
import java.lang.foreign.SymbolLookup;
import java.lang.foreign.ValueLayout;
import java.lang.invoke.MethodHandle;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileChannel.MapMode;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.TearDown;
import org.openjdk.jmh.annotations.Threads;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 5, time = 5)
@Fork(
value = 2,
jvmArgsPrepend = {"--enable-native-access=ALL-UNNAMED", "-Xms2g",
"-Xmx2g"})
public class SequentialReadIOBenchmark {
private static final int READ_SIZE = 16 * 1024; // 16 KiB
private static final int READS_PER_OP = 16;
// O_DIRECT requires offsets aligned to filesystem block size
private static final long ALIGNMENT = 4096;
/**
* File size in MiB. Read from env var BENCH_FILE_SIZE_MIB or system
property bench.fileSizeMiB.
* Env var takes precedence (inherited by forked JVMs automatically).
*/
private static final long FILE_SIZE =
Long.parseLong(envOrProp("BENCH_FILE_SIZE_MIB",
"bench.fileSizeMiB", "1024"))
* 1024L
* 1024L;
// Must leave room for 16 sequential reads from the starting offset
private static final long MAX_START_OFFSET = FILE_SIZE - ((long)
READ_SIZE * READS_PER_OP);
private static final long MAX_ALIGNED_START = (MAX_START_OFFSET /
ALIGNMENT) * ALIGNMENT;
// FFI handles for pread / open / close
private static final MethodHandle PREAD;
private static final MethodHandle OPEN;
private static final MethodHandle CLOSE;
static {
Linker linker = Linker.nativeLinker();
SymbolLookup lookup = linker.defaultLookup();
PREAD =
linker.downcallHandle(
lookup.find("pread").orElseThrow(),
FunctionDescriptor.of(
ValueLayout.JAVA_LONG, // ssize_t return
ValueLayout.JAVA_INT, // int fd
ValueLayout.ADDRESS, // void *buf
ValueLayout.JAVA_LONG, // size_t count
ValueLayout.JAVA_LONG // off_t offset
));
OPEN =
linker.downcallHandle(
lookup.find("open").orElseThrow(),
FunctionDescriptor.of(
ValueLayout.JAVA_INT, // int return (fd)
ValueLayout.ADDRESS, // const char *pathname
ValueLayout.JAVA_INT // int flags
));
CLOSE =
linker.downcallHandle(
lookup.find("close").orElseThrow(),
FunctionDescriptor.of(
ValueLayout.JAVA_INT, // int return
ValueLayout.JAVA_INT // int fd
));
}
/** Per-thread pre-allocated buffers to avoid allocation noise in the
measured path. */
@State(Scope.Thread)
public static class ThreadBuffers {
ByteBuffer directBuf;
ByteBuffer heapBuf;
Arena ffiArena;
MemorySegment ffiBuf;
MemorySegment ffiDirectIoBuf;
@Setup(Level.Trial)
public void setup() {
directBuf = ByteBuffer.allocateDirect(READ_SIZE);
heapBuf = ByteBuffer.allocate(READ_SIZE);
ffiArena = Arena.ofConfined();
ffiBuf = ffiArena.allocate(READ_SIZE);
// O_DIRECT requires buffer aligned to filesystem block size
(typically 4096)
ffiDirectIoBuf = ffiArena.allocate(READ_SIZE, 4096);
}
@TearDown(Level.Trial)
public void tearDown() {
ffiArena.close();
}
}
private Path tempFile;
private FileChannel fileChannel;
private MemorySegment mmapSegment;
private int nativeFd;
private int directIoFd;
private Arena arena;
/**
* Path to the benchmark data file. Create it before running with:
*
* <pre>
* dd if=/dev/urandom of=/tmp/pread-bench.dat bs=1M count=1024
* </pre>
*/
private static final String BENCH_FILE =
envOrProp("BENCH_FILE", "bench.file", "/tmp/pread-bench.dat");
/**
* Whether to drop page caches before each iteration. Requires root or
sudo without password.
* Pass via env var BENCH_DROP_CACHES=true or -Dbench.dropCaches=true
*
* <p>When enabled, caches are dropped at the start of each iteration
(warmup and measurement).
* JIT still warms up across iterations since the JVM persists, but each
iteration starts with a
* cold page cache — simulating memory-constrained containers.
*/
private static final boolean DROP_CACHES =
Boolean.parseBoolean(envOrProp("BENCH_DROP_CACHES",
"bench.dropCaches", "false"));
@Setup(Level.Trial)
public void setup() throws Exception {
System.out.println("[bench] ===== SequentialReadIOBenchmark
Configuration =====");
System.out.println("[bench] file: " + BENCH_FILE);
System.out.println("[bench] fileSizeMiB: " + (FILE_SIZE / (1024
* 1024)));
System.out.println("[bench] dropCaches: " + DROP_CACHES);
System.out.println("[bench] readSize: " + READ_SIZE + "
bytes");
System.out.println("[bench] readsPerOp: " + READS_PER_OP + "
(sequential)");
System.out.println("[bench] bytesPerOp: " + ((long) READ_SIZE
* READS_PER_OP) + " bytes");
System.out.println("[bench]
====================================================");
tempFile = Path.of(BENCH_FILE);
if (!Files.exists(tempFile)) {
throw new IOException(
"Benchmark file not found: "
+ tempFile
+ "\nCreate it with: dd if=/dev/urandom of="
+ BENCH_FILE
+ " bs=1M count="
+ (FILE_SIZE / (1024 * 1024)));
}
long size = Files.size(tempFile);
if (size < FILE_SIZE) {
throw new IOException(
"Benchmark file too small: "
+ size
+ " bytes, expected at least "
+ FILE_SIZE
+ "\nRecreate with: dd if=/dev/urandom of="
+ BENCH_FILE
+ " bs=1M count="
+ (FILE_SIZE / (1024 * 1024)));
}
// Open FileChannel for the benchmark
fileChannel = FileChannel.open(tempFile, StandardOpenOption.READ);
// Open native fd via FFI
arena = Arena.ofShared();
// Memory-map the file (simulates MMapDirectory)
mmapSegment = fileChannel.map(MapMode.READ_ONLY, 0, FILE_SIZE,
arena);
MemorySegment pathStr = arena.allocateFrom(tempFile.toString());
int O_RDONLY = 0;
try {
nativeFd = (int) OPEN.invokeExact(pathStr, O_RDONLY);
} catch (Throwable t) {
throw new RuntimeException("Failed to open file via FFI", t);
}
if (nativeFd < 0) {
throw new IOException("FFI open() returned " + nativeFd);
}
// Open native fd with O_DIRECT for Direct I/O (Linux only, bypasses
page cache)
int O_DIRECT = 0x4000; // Linux x86_64 value for O_DIRECT
try {
directIoFd = (int) OPEN.invokeExact(pathStr, O_RDONLY |
O_DIRECT);
} catch (Throwable t) {
throw new RuntimeException("Failed to open file with O_DIRECT
via FFI", t);
}
if (directIoFd < 0) {
// O_DIRECT may not be supported on all filesystems (e.g. tmpfs)
System.err.println(
"WARNING: O_DIRECT open failed (fd=" + directIoFd + "). "
+ "Direct I/O benchmarks will fail. Use a
filesystem that supports O_DIRECT.");
directIoFd = -1;
}
}
@TearDown(Level.Trial)
public void tearDown() throws Exception {
fileChannel.close();
try {
int rc = (int) CLOSE.invokeExact(nativeFd);
if (directIoFd >= 0) {
rc = (int) CLOSE.invokeExact(directIoFd);
}
} catch (Throwable t) {
throw new RuntimeException(t);
}
arena.close();
}
/**
* Drops page caches before each iteration (warmup and measurement).
* This ensures each iteration starts with a cold page cache.
* JIT still warms up across iterations since the JVM persists across
the fork.
*/
@Setup(Level.Iteration)
public void setupIteration() throws IOException {
if (DROP_CACHES) {
dropPageCaches();
}
}
/**
* Drops the kernel page cache to simulate cold-cache /
memory-constrained scenarios.
* Requires running as root or with passwordless sudo.
* Uses: sync && echo 3 > /proc/sys/vm/drop_caches
*/
private static void dropPageCaches() throws IOException {
// Sync first to flush dirty pages
Process sync = new ProcessBuilder("sync").inheritIO().start();
try {
if (sync.waitFor() != 0) {
throw new IOException("sync failed with exit code " +
sync.exitValue());
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new IOException("Interrupted during sync", e);
}
// Drop page cache (requires root)
Process drop =
new ProcessBuilder("sudo", "bash", "-c", "echo 3 >
/proc/sys/vm/drop_caches")
.inheritIO()
.start();
try {
if (drop.waitFor() != 0) {
throw new IOException(
"Failed to drop page caches (exit code "
+ drop.exitValue()
+ "). Run as root or with: sudo sysctl
vm.drop_caches=3");
}
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
throw new IOException("Interrupted during drop_caches", e);
}
System.out.println("[bench] Page caches dropped.");
}
/** Reads a config value from env var first, then system property, then
default. */
private static String envOrProp(String envKey, String propKey, String
defaultValue) {
String env = System.getenv(envKey);
if (env != null && !env.isEmpty()) {
return env;
}
return System.getProperty(propKey, defaultValue);
}
// ---- FileChannel + DirectByteBuffer (contended NativeThreadSet) ----
@Benchmark
@Threads(1)
public void fileChannelDirect_T01(ThreadBuffers tb, Blackhole bh) throws
IOException {
doFileChannelDirectReads(tb, bh);
}
@Benchmark
@Threads(4)
public void fileChannelDirect_T04(ThreadBuffers tb, Blackhole bh) throws
IOException {
doFileChannelDirectReads(tb, bh);
}
@Benchmark
@Threads(8)
public void fileChannelDirect_T08(ThreadBuffers tb, Blackhole bh) throws
IOException {
doFileChannelDirectReads(tb, bh);
}
@Benchmark
@Threads(16)
public void fileChannelDirect_T16(ThreadBuffers tb, Blackhole bh) throws
IOException {
doFileChannelDirectReads(tb, bh);
}
// ---- FileChannel + HeapByteBuffer (extra copy + contended
NativeThreadSet) ----
@Benchmark
@Threads(1)
public void fileChannelHeap_T01(ThreadBuffers tb, Blackhole bh) throws
IOException {
doFileChannelHeapReads(tb, bh);
}
@Benchmark
@Threads(4)
public void fileChannelHeap_T04(ThreadBuffers tb, Blackhole bh) throws
IOException {
doFileChannelHeapReads(tb, bh);
}
@Benchmark
@Threads(8)
public void fileChannelHeap_T08(ThreadBuffers tb, Blackhole bh) throws
IOException {
doFileChannelHeapReads(tb, bh);
}
@Benchmark
@Threads(16)
public void fileChannelHeap_T16(ThreadBuffers tb, Blackhole bh) throws
IOException {
doFileChannelHeapReads(tb, bh);
}
// ---- FFI pread benchmark (no contention) ----
@Benchmark
@Threads(1)
public void ffiPread_T01(ThreadBuffers tb, Blackhole bh) {
doFfiReads(tb, bh);
}
@Benchmark
@Threads(4)
public void ffiPread_T04(ThreadBuffers tb, Blackhole bh) {
doFfiReads(tb, bh);
}
@Benchmark
@Threads(8)
public void ffiPread_T08(ThreadBuffers tb, Blackhole bh) {
doFfiReads(tb, bh);
}
@Benchmark
@Threads(16)
public void ffiPread_T16(ThreadBuffers tb, Blackhole bh) {
doFfiReads(tb, bh);
}
// ---- mmap benchmark (simulates MMapDirectory — page faults under
memory pressure) ----
@Benchmark
@Threads(1)
public void mmap_T01(ThreadBuffers tb, Blackhole bh) {
doMmapReads(tb, bh);
}
@Benchmark
@Threads(4)
public void mmap_T04(ThreadBuffers tb, Blackhole bh) {
doMmapReads(tb, bh);
}
@Benchmark
@Threads(8)
public void mmap_T08(ThreadBuffers tb, Blackhole bh) {
doMmapReads(tb, bh);
}
@Benchmark
@Threads(16)
public void mmap_T16(ThreadBuffers tb, Blackhole bh) {
doMmapReads(tb, bh);
}
// ---- FFI pread + O_DIRECT benchmark (bypasses page cache, Linux only)
----
@Benchmark
@Threads(1)
public void ffiPreadDirectIO_T01(ThreadBuffers tb, Blackhole bh) {
doFfiDirectIoReads(tb, bh);
}
@Benchmark
@Threads(4)
public void ffiPreadDirectIO_T04(ThreadBuffers tb, Blackhole bh) {
doFfiDirectIoReads(tb, bh);
}
@Benchmark
@Threads(8)
public void ffiPreadDirectIO_T08(ThreadBuffers tb, Blackhole bh) {
doFfiDirectIoReads(tb, bh);
}
@Benchmark
@Threads(16)
public void ffiPreadDirectIO_T16(ThreadBuffers tb, Blackhole bh) {
doFfiDirectIoReads(tb, bh);
}
// ---- Implementation: sequential reads (random start, then scan
forward) ----
private void doFileChannelDirectReads(ThreadBuffers tb, Blackhole bh)
throws IOException {
ThreadLocalRandom rng = ThreadLocalRandom.current();
ByteBuffer buf = tb.directBuf;
long startOffset = rng.nextLong(MAX_START_OFFSET);
for (int i = 0; i < READS_PER_OP; i++) {
buf.clear();
int n = fileChannel.read(buf, startOffset + (long) i *
READ_SIZE);
bh.consume(n);
}
}
private void doFileChannelHeapReads(ThreadBuffers tb, Blackhole bh)
throws IOException {
ThreadLocalRandom rng = ThreadLocalRandom.current();
ByteBuffer buf = tb.heapBuf;
long startOffset = rng.nextLong(MAX_START_OFFSET);
for (int i = 0; i < READS_PER_OP; i++) {
buf.clear();
int n = fileChannel.read(buf, startOffset + (long) i *
READ_SIZE);
bh.consume(n);
}
}
private void doFfiReads(ThreadBuffers tb, Blackhole bh) {
ThreadLocalRandom rng = ThreadLocalRandom.current();
MemorySegment buf = tb.ffiBuf;
long startOffset = rng.nextLong(MAX_START_OFFSET);
try {
for (int i = 0; i < READS_PER_OP; i++) {
long n =
(long)
PREAD.invokeExact(
nativeFd, buf, (long) READ_SIZE,
startOffset + (long) i * READ_SIZE);
bh.consume(n);
}
} catch (Throwable t) {
throw new RuntimeException(t);
}
}
private void doMmapReads(ThreadBuffers tb, Blackhole bh) {
ThreadLocalRandom rng = ThreadLocalRandom.current();
byte[] dst = tb.heapBuf.array();
long startOffset = rng.nextLong(MAX_START_OFFSET);
for (int i = 0; i < READS_PER_OP; i++) {
// Copy from mmap'd region into a byte array — this is what
MMapDirectory does.
// If the page is not resident, this triggers a page fault
(major fault if evicted).
MemorySegment.copy(
mmapSegment, ValueLayout.JAVA_BYTE, startOffset + (long)
i * READ_SIZE, dst, 0,
READ_SIZE);
bh.consume(dst[0]);
}
}
private void doFfiDirectIoReads(ThreadBuffers tb, Blackhole bh) {
if (directIoFd < 0) {
// O_DIRECT not available on this filesystem — skip silently
bh.consume(0);
return;
}
ThreadLocalRandom rng = ThreadLocalRandom.current();
MemorySegment buf = tb.ffiDirectIoBuf;
// Align start offset for O_DIRECT
long startOffset = (rng.nextLong(MAX_ALIGNED_START / ALIGNMENT)) *
ALIGNMENT;
try {
for (int i = 0; i < READS_PER_OP; i++) {
long n =
(long)
PREAD.invokeExact(
directIoFd, buf, (long) READ_SIZE,
startOffset + (long) i * READ_SIZE);
bh.consume(n);
}
} catch (Throwable t) {
throw new RuntimeException(t);
}
}
}
```
</details>
### Appendix 2. Running steps and logs
<details>
<summary>Running commands</summary>
```bash
#!/bin/bash
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
cat /home/ec2-user/environment/data/pread-bench-16G.dat > /dev/null
BENCH_FILE=/home/ec2-user/environment/data/pread-bench-16G.dat
BENCH_FILE_SIZE_MIB=16384 BENCH_DROP_CACHES=false ./gradlew jmh --rerun
-Pjmh.includes='RandomReadIOBenchmark' >> 0517.log
sleep 30
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
cat /home/ec2-user/environment/data/pread-bench-32G.dat > /dev/null
BENCH_FILE=/home/ec2-user/environment/data/pread-bench-32G.dat
BENCH_FILE_SIZE_MIB=32768 BENCH_DROP_CACHES=false ./gradlew jmh --rerun
-Pjmh.includes='RandomReadIOBenchmark' >> 0517.log
sleep 30
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
cat /home/ec2-user/environment/data/pread-bench-64G.dat > /dev/null
BENCH_FILE=/home/ec2-user/environment/data/pread-bench-64G.dat
BENCH_FILE_SIZE_MIB=65536 BENCH_DROP_CACHES=false ./gradlew jmh --rerun
-Pjmh.includes='RandomReadIOBenchmark' >> 0517.log
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
cat /home/ec2-user/environment/data/pread-bench-16G.dat > /dev/null
BENCH_FILE=/home/ec2-user/environment/data/pread-bench-16G.dat
BENCH_FILE_SIZE_MIB=16384 BENCH_DROP_CACHES=false ./gradlew jmh --rerun
-Pjmh.includes='SequentialReadIOBenchmark' >> 0517.log
sleep 30
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
cat /home/ec2-user/environment/data/pread-bench-32G.dat > /dev/null
BENCH_FILE=/home/ec2-user/environment/data/pread-bench-32G.dat
BENCH_FILE_SIZE_MIB=32768 BENCH_DROP_CACHES=false ./gradlew jmh --rerun
-Pjmh.includes='SequentialReadIOBenchmark' >> 0517.log
sleep 30
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
cat /home/ec2-user/environment/data/pread-bench-64G.dat > /dev/null
BENCH_FILE=/home/ec2-user/environment/data/pread-bench-64G.dat
BENCH_FILE_SIZE_MIB=65536 BENCH_DROP_CACHES=false ./gradlew jmh --rerun
-Pjmh.includes='SequentialReadIOBenchmark' >> 0517.log
```
</details>
Full log can be found
[here](https://neoremind.com/report/log/lucene/issue-16044/0517.log).
### Appendix 2. Raw JMH benchmark results
<details>
<summary>Results</summary>
## Random Read IO benchmark
## Warmup cache as much as possible
### 16G file random read
- Command
```
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
cat /home/ec2-user/environment/data/pread-bench-16G.dat > /dev/null
BENCH_FILE=/home/ec2-user/environment/data/pread-bench-16G.dat
BENCH_FILE_SIZE_MIB=16384 BENCH_DROP_CACHES=false ./gradlew jmh --rerun
-Pjmh.includes='RandomReadIOBenchmark' >> 0517.log
```
- Result
```
Benchmark Mode Cnt Score Error
Units
RandomReadIOBenchmark.ffiPreadDirectIO_T01 thrpt 10 0.147 ± 0.003
ops/ms
RandomReadIOBenchmark.ffiPreadDirectIO_T04 thrpt 10 0.599 ± 0.013
ops/ms
RandomReadIOBenchmark.ffiPreadDirectIO_T08 thrpt 10 1.206 ± 0.023
ops/ms
RandomReadIOBenchmark.ffiPreadDirectIO_T16 thrpt 10 1.250 ± 0.001
ops/ms
RandomReadIOBenchmark.ffiPread_T01 thrpt 10 24.691 ± 0.418
ops/ms
RandomReadIOBenchmark.ffiPread_T04 thrpt 10 86.814 ± 0.584
ops/ms
RandomReadIOBenchmark.ffiPread_T08 thrpt 10 149.697 ± 0.433
ops/ms
RandomReadIOBenchmark.ffiPread_T16 thrpt 10 205.366 ± 1.534
ops/ms
RandomReadIOBenchmark.fileChannelDirect_T01 thrpt 10 23.781 ± 0.425
ops/ms
RandomReadIOBenchmark.fileChannelDirect_T04 thrpt 10 76.825 ± 0.681
ops/ms
RandomReadIOBenchmark.fileChannelDirect_T08 thrpt 10 95.923 ± 3.866
ops/ms
RandomReadIOBenchmark.fileChannelDirect_T16 thrpt 10 92.143 ± 7.888
ops/ms
RandomReadIOBenchmark.fileChannelHeap_T01 thrpt 10 19.698 ± 0.123
ops/ms
RandomReadIOBenchmark.fileChannelHeap_T04 thrpt 10 64.903 ± 0.427
ops/ms
RandomReadIOBenchmark.fileChannelHeap_T08 thrpt 10 88.341 ± 5.815
ops/ms
RandomReadIOBenchmark.fileChannelHeap_T16 thrpt 10 85.964 ± 3.464
ops/ms
RandomReadIOBenchmark.mmap_T01 thrpt 10 38.191 ± 1.018
ops/ms
RandomReadIOBenchmark.mmap_T04 thrpt 10 144.854 ± 0.461
ops/ms
RandomReadIOBenchmark.mmap_T08 thrpt 10 267.100 ± 2.101
ops/ms
RandomReadIOBenchmark.mmap_T16 thrpt 10 309.911 ± 0.937
ops/ms
```
### 32G file random read
- Command
```
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
cat /home/ec2-user/environment/data/pread-bench-32G.dat > /dev/null
BENCH_FILE=/home/ec2-user/environment/data/pread-bench-32G.dat
BENCH_FILE_SIZE_MIB=32768 BENCH_DROP_CACHES=false ./gradlew jmh --rerun
-Pjmh.includes='RandomReadIOBenchmark' >> 0517.log
```
- Result
```
Benchmark Mode Cnt Score Error
Units
RandomReadIOBenchmark.ffiPreadDirectIO_T01 thrpt 10 0.145 ± 0.003
ops/ms
RandomReadIOBenchmark.ffiPreadDirectIO_T04 thrpt 10 0.601 ± 0.005
ops/ms
RandomReadIOBenchmark.ffiPreadDirectIO_T08 thrpt 10 1.207 ± 0.016
ops/ms
RandomReadIOBenchmark.ffiPreadDirectIO_T16 thrpt 10 1.250 ± 0.001
ops/ms
RandomReadIOBenchmark.ffiPread_T01 thrpt 10 1.136 ± 0.044
ops/ms
RandomReadIOBenchmark.ffiPread_T04 thrpt 10 4.039 ± 0.191
ops/ms
RandomReadIOBenchmark.ffiPread_T08 thrpt 10 4.473 ± 0.170
ops/ms
RandomReadIOBenchmark.ffiPread_T16 thrpt 10 4.207 ± 0.086
ops/ms
RandomReadIOBenchmark.fileChannelDirect_T01 thrpt 10 0.896 ± 0.013
ops/ms
RandomReadIOBenchmark.fileChannelDirect_T04 thrpt 10 3.620 ± 0.037
ops/ms
RandomReadIOBenchmark.fileChannelDirect_T08 thrpt 10 4.011 ± 0.050
ops/ms
RandomReadIOBenchmark.fileChannelDirect_T16 thrpt 10 3.941 ± 0.051
ops/ms
RandomReadIOBenchmark.fileChannelHeap_T01 thrpt 10 0.877 ± 0.016
ops/ms
RandomReadIOBenchmark.fileChannelHeap_T04 thrpt 10 3.523 ± 0.063
ops/ms
RandomReadIOBenchmark.fileChannelHeap_T08 thrpt 10 3.868 ± 0.036
ops/ms
RandomReadIOBenchmark.fileChannelHeap_T16 thrpt 10 3.832 ± 0.018
ops/ms
RandomReadIOBenchmark.mmap_T01 thrpt 10 0.695 ± 0.019
ops/ms
RandomReadIOBenchmark.mmap_T04 thrpt 10 2.868 ± 0.041
ops/ms
RandomReadIOBenchmark.mmap_T08 thrpt 10 3.500 ± 0.367
ops/ms
RandomReadIOBenchmark.mmap_T16 thrpt 10 3.322 ± 0.383
ops/ms
```
### 64G file random read
- Command
```
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
cat /home/ec2-user/environment/data/pread-bench-64G.dat > /dev/null
BENCH_FILE=/home/ec2-user/environment/data/pread-bench-64G.dat
BENCH_FILE_SIZE_MIB=65536 BENCH_DROP_CACHES=false ./gradlew jmh --rerun
-Pjmh.includes='RandomReadIOBenchmark' >> 0517.log
```
- Result
```
Benchmark Mode Cnt Score Error
Units
RandomReadIOBenchmark.ffiPreadDirectIO_T01 thrpt 10 0.148 ± 0.002
ops/ms
RandomReadIOBenchmark.ffiPreadDirectIO_T04 thrpt 10 0.601 ± 0.013
ops/ms
RandomReadIOBenchmark.ffiPreadDirectIO_T08 thrpt 10 1.188 ± 0.029
ops/ms
RandomReadIOBenchmark.ffiPreadDirectIO_T16 thrpt 10 1.250 ± 0.001
ops/ms
RandomReadIOBenchmark.ffiPread_T01 thrpt 10 0.230 ± 0.007
ops/ms
RandomReadIOBenchmark.ffiPread_T04 thrpt 10 0.869 ± 0.019
ops/ms
RandomReadIOBenchmark.ffiPread_T08 thrpt 10 1.565 ± 0.113
ops/ms
RandomReadIOBenchmark.ffiPread_T16 thrpt 10 1.432 ± 0.032
ops/ms
RandomReadIOBenchmark.fileChannelDirect_T01 thrpt 10 0.198 ± 0.003
ops/ms
RandomReadIOBenchmark.fileChannelDirect_T04 thrpt 10 0.806 ± 0.014
ops/ms
RandomReadIOBenchmark.fileChannelDirect_T08 thrpt 10 1.348 ± 0.016
ops/ms
RandomReadIOBenchmark.fileChannelDirect_T16 thrpt 10 1.315 ± 0.020
ops/ms
RandomReadIOBenchmark.fileChannelHeap_T01 thrpt 10 0.192 ± 0.004
ops/ms
RandomReadIOBenchmark.fileChannelHeap_T04 thrpt 10 0.775 ± 0.014
ops/ms
RandomReadIOBenchmark.fileChannelHeap_T08 thrpt 10 1.284 ± 0.013
ops/ms
RandomReadIOBenchmark.fileChannelHeap_T16 thrpt 10 1.263 ± 0.009
ops/ms
RandomReadIOBenchmark.mmap_T01 thrpt 10 0.124 ± 0.002
ops/ms
RandomReadIOBenchmark.mmap_T04 thrpt 10 0.495 ± 0.023
ops/ms
RandomReadIOBenchmark.mmap_T08 thrpt 10 0.503 ± 0.008
ops/ms
RandomReadIOBenchmark.mmap_T16 thrpt 10 0.508 ± 0.015
ops/ms
```
## Sequential Read IO benchmark
## Warmup cache as much as possible
### 16G file sequential read
- Command
```
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
cat /home/ec2-user/environment/data/pread-bench-16G.dat > /dev/null
BENCH_FILE=/home/ec2-user/environment/data/pread-bench-16G.dat
BENCH_FILE_SIZE_MIB=16384 BENCH_DROP_CACHES=false ./gradlew jmh --rerun
-Pjmh.includes='SequentialReadIOBenchmark' >> 0517.log```
```
- Result
```
Benchmark Mode Cnt Score Error
Units
SequentialReadIOBenchmark.ffiPreadDirectIO_T01 thrpt 10 0.155 ± 0.003
ops/ms
SequentialReadIOBenchmark.ffiPreadDirectIO_T04 thrpt 10 0.631 ± 0.005
ops/ms
SequentialReadIOBenchmark.ffiPreadDirectIO_T08 thrpt 10 1.253 ± 0.017
ops/ms
SequentialReadIOBenchmark.ffiPreadDirectIO_T16 thrpt 10 1.250 ± 0.001
ops/ms
SequentialReadIOBenchmark.ffiPread_T01 thrpt 10 27.720 ± 0.093
ops/ms
SequentialReadIOBenchmark.ffiPread_T04 thrpt 10 92.820 ± 1.452
ops/ms
SequentialReadIOBenchmark.ffiPread_T08 thrpt 10 157.172 ± 3.699
ops/ms
SequentialReadIOBenchmark.ffiPread_T16 thrpt 10 218.619 ± 1.288
ops/ms
SequentialReadIOBenchmark.fileChannelDirect_T01 thrpt 10 26.577 ± 0.042
ops/ms
SequentialReadIOBenchmark.fileChannelDirect_T04 thrpt 10 79.615 ± 2.439
ops/ms
SequentialReadIOBenchmark.fileChannelDirect_T08 thrpt 10 88.710 ± 3.450
ops/ms
SequentialReadIOBenchmark.fileChannelDirect_T16 thrpt 10 95.892 ± 4.006
ops/ms
SequentialReadIOBenchmark.fileChannelHeap_T01 thrpt 10 21.036 ± 0.359
ops/ms
SequentialReadIOBenchmark.fileChannelHeap_T04 thrpt 10 68.143 ± 0.942
ops/ms
SequentialReadIOBenchmark.fileChannelHeap_T08 thrpt 10 90.006 ± 3.340
ops/ms
SequentialReadIOBenchmark.fileChannelHeap_T16 thrpt 10 85.597 ± 5.060
ops/ms
SequentialReadIOBenchmark.mmap_T01 thrpt 10 46.078 ± 0.084
ops/ms
SequentialReadIOBenchmark.mmap_T04 thrpt 10 172.748 ± 0.472
ops/ms
SequentialReadIOBenchmark.mmap_T08 thrpt 10 311.737 ± 1.623
ops/ms
SequentialReadIOBenchmark.mmap_T16 thrpt 10 334.911 ± 1.108
ops/ms
```
### 32G file sequential read
- Command
```
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
cat /home/ec2-user/environment/data/pread-bench-32G.dat > /dev/null
BENCH_FILE=/home/ec2-user/environment/data/pread-bench-32G.dat
BENCH_FILE_SIZE_MIB=32768 BENCH_DROP_CACHES=false ./gradlew jmh --rerun
-Pjmh.includes='SequentialReadIOBenchmark' >> 0517.log
```
- Result
```
Benchmark Mode Cnt Score Error
Units
SequentialReadIOBenchmark.ffiPreadDirectIO_T01 thrpt 10 0.154 ± 0.002
ops/ms
SequentialReadIOBenchmark.ffiPreadDirectIO_T04 thrpt 10 0.621 ± 0.007
ops/ms
SequentialReadIOBenchmark.ffiPreadDirectIO_T08 thrpt 10 1.232 ± 0.017
ops/ms
SequentialReadIOBenchmark.ffiPreadDirectIO_T16 thrpt 10 1.250 ± 0.001
ops/ms
SequentialReadIOBenchmark.ffiPread_T01 thrpt 10 3.192 ± 0.202
ops/ms
SequentialReadIOBenchmark.ffiPread_T04 thrpt 10 11.043 ± 0.441
ops/ms
SequentialReadIOBenchmark.ffiPread_T08 thrpt 10 16.266 ± 0.612
ops/ms
SequentialReadIOBenchmark.ffiPread_T16 thrpt 10 15.321 ± 0.227
ops/ms
SequentialReadIOBenchmark.fileChannelDirect_T01 thrpt 10 2.488 ± 0.057
ops/ms
SequentialReadIOBenchmark.fileChannelDirect_T04 thrpt 10 9.853 ± 0.163
ops/ms
SequentialReadIOBenchmark.fileChannelDirect_T08 thrpt 10 14.824 ± 0.146
ops/ms
SequentialReadIOBenchmark.fileChannelDirect_T16 thrpt 10 14.457 ± 0.176
ops/ms
SequentialReadIOBenchmark.fileChannelHeap_T01 thrpt 10 2.328 ± 0.050
ops/ms
SequentialReadIOBenchmark.fileChannelHeap_T04 thrpt 10 9.408 ± 0.220
ops/ms
SequentialReadIOBenchmark.fileChannelHeap_T08 thrpt 10 14.448 ± 0.208
ops/ms
SequentialReadIOBenchmark.fileChannelHeap_T16 thrpt 10 14.331 ± 0.146
ops/ms
SequentialReadIOBenchmark.mmap_T01 thrpt 10 2.633 ± 0.049
ops/ms
SequentialReadIOBenchmark.mmap_T04 thrpt 10 10.784 ± 0.201
ops/ms
SequentialReadIOBenchmark.mmap_T08 thrpt 10 14.853 ± 0.303
ops/ms
SequentialReadIOBenchmark.mmap_T16 thrpt 10 14.937 ± 0.242
ops/ms
```
### 64G file sequential read
- Command
```
sync && echo 3 | sudo tee /proc/sys/vm/drop_caches
cat /home/ec2-user/environment/data/pread-bench-64G.dat > /dev/null
BENCH_FILE=/home/ec2-user/environment/data/pread-bench-64G.dat
BENCH_FILE_SIZE_MIB=65536 BENCH_DROP_CACHES=false ./gradlew jmh --rerun
-Pjmh.includes='SequentialReadIOBenchmark' >> 0517.log
```
- Result
```
Benchmark Mode Cnt Score Error
Units
SequentialReadIOBenchmark.ffiPreadDirectIO_T01 thrpt 10 0.156 ± 0.002
ops/ms
SequentialReadIOBenchmark.ffiPreadDirectIO_T04 thrpt 10 0.622 ± 0.009
ops/ms
SequentialReadIOBenchmark.ffiPreadDirectIO_T08 thrpt 10 1.239 ± 0.014
ops/ms
SequentialReadIOBenchmark.ffiPreadDirectIO_T16 thrpt 10 1.250 ± 0.001
ops/ms
SequentialReadIOBenchmark.ffiPread_T01 thrpt 10 0.695 ± 0.017
ops/ms
SequentialReadIOBenchmark.ffiPread_T04 thrpt 10 2.361 ± 0.049
ops/ms
SequentialReadIOBenchmark.ffiPread_T08 thrpt 10 2.424 ± 0.018
ops/ms
SequentialReadIOBenchmark.ffiPread_T16 thrpt 10 2.439 ± 0.016
ops/ms
SequentialReadIOBenchmark.fileChannelDirect_T01 thrpt 10 0.669 ± 0.012
ops/ms
SequentialReadIOBenchmark.fileChannelDirect_T04 thrpt 10 2.469 ± 0.103
ops/ms
SequentialReadIOBenchmark.fileChannelDirect_T08 thrpt 10 2.442 ± 0.024
ops/ms
SequentialReadIOBenchmark.fileChannelDirect_T16 thrpt 10 2.442 ± 0.024
ops/ms
SequentialReadIOBenchmark.fileChannelHeap_T01 thrpt 10 0.667 ± 0.014
ops/ms
SequentialReadIOBenchmark.fileChannelHeap_T04 thrpt 10 2.472 ± 0.123
ops/ms
SequentialReadIOBenchmark.fileChannelHeap_T08 thrpt 10 2.428 ± 0.025
ops/ms
SequentialReadIOBenchmark.fileChannelHeap_T16 thrpt 10 2.440 ± 0.027
ops/ms
SequentialReadIOBenchmark.mmap_T01 thrpt 10 0.745 ± 0.018
ops/ms
SequentialReadIOBenchmark.mmap_T04 thrpt 10 2.189 ± 0.026
ops/ms
SequentialReadIOBenchmark.mmap_T08 thrpt 10 2.188 ± 0.021
ops/ms
SequentialReadIOBenchmark.mmap_T16 thrpt 10 2.186 ± 0.012
ops/ms
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]