Hello Ceph community,
I am evaluating Crimson OSD + Seastore performance for potential deployment
in a distributed storage environment.
With BlueStore, I have been able to achieve satisfying performance levels
in my FIO tests for 4K random read/write IOPS.
However, when testing Crimson OSD + Seastore, I observed that 4K random
read/write IOPS do not scale as expected when increasing the number of
SSDs/OSDs. The performance plateaus beyond a certain point or is much lower
than expected. (See attached test results.)
Test Environment:
- Cluster: 8 clients, 1 OSD
- Hardware: 40-core CPUs, 377 GiB DRAM
- Image SHA (quay.io): e0543089a9e9cae97999761059eaccdf6bb22e9e
- Configuration parameters:
osd_memory_target = 34359738368
crimson_osd_scheduler_concurrency = 0
seastore_max_concurrent_transactions = 16
crimson_osd_obc_lru_size = 8192
seastore_cache_lru_size = 16G
seastore_obj_data_write_amplification = 4
seastore_journal_batch_capacity = 1024
seastore_journal_batch_flush_size = 256M
seastore_journal_iodepth_limit = 16
seastore_journal_batch_preferred_fullness = 0.8
seastore_segment_size = 128M
seastore_device_size = 512G
seastore_block_create = true
seastore_default_object_metadata_reservation = 1073741824
rbd_cache = false
rbd_cache_writethrough_until_flush = true
rbd_op_threads = 16
Replication policy:
- 4096 PGs, no replication (only 1 copy)
Test Results:
1 SSD test (varying number of allocated CPUs, alien threads = 26-29, 36-39):
num CPU | 4k randread | 4k randwrite | Allocated CPU sets
2 | 126772 | 14830 | 0-1
4 | 107860 | 16451 | 0-3
6 | 113741 | 17019 | 0-5
8 | 132060 | 16099 | 0-7
SSD scaling test (2 CPUs per SSD):
OSD CPU mapping: OSD.0 (0-1), OSD.1 (10-11), OSD.2 (2-3), OSD.3 (12-13),
..., OSD.15 (34-35), Alien threads (26-29, 36-39)
num SSD | 4k randread | 4k randwrite
4 | 861273 | 22360
8 | 1022793 | 22786
12 | 1019161 | 21211
16 | 927570 | 20502
SSD scaling test (1 CPU per SSD):
OSD CPU mapping: OSD.0 (0), OSD.1 (10), OSD.2 (2), OSD.3 (12), ..., OSD.15
(24), Alien CPUs: 1, 11, 3, 13, ..., 15, 25
num SSD | 4k randread | 4k randwrite
4 | 936685 | 13730
8 | 1048204 | 18259
12 | 922727 | 23078
16 | 987838 | 30792
Questions:
1. Since Seastore is still under active development, are there any known
unresolved performance issues that could explain this scaling behavior?
2. Are there recommended tuning parameters for improving small-block read
scalability in multi-SSD configurations?
3. Regarding alien threads, are there best practices for CPU pinning or
NUMA-aware placement that have shown measurable improvements?
4. Any additional guidance for maximizing IOPS with Crimson OSD + Seastore
would be greatly appreciated.
My goal is to be ready to switch from BlueStore to Crimson + Seastore after
it becomes stable and shows reasonable performance compared to BlueStore,
so I’d like to understand the current limitations and tuning opportunities.
Thank you,
Ki-taek Lee
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]