Hi Kasper, Thanks for sharing.
I don't see anything wrong with this specific OSD when it comes to bluestore_rocksdb_*. It's RocksDB database is using column families and this OSD was resharded properly (if not created or recreated in Pacific). What the perf dump shows is that the db_used_bytes is above the db_total_bytes. If this cluster makes heavy use of metadata (RGW workloads for example) then 90GB of DB device for 10TB drives is less than 1% which is not enough. General recommendation for RGW workloads is to use a DB device of at least 4% in size of the data device [1]. Now, your best move is probably to enable RocksDB compression (ceph config set osd bluestore_rocksdb_options_annex 'compression=kLZ4Compression'), restart and compact these OSDs to update bluefs stats, and consider giving those OSDs larger RocksDB partitions in the future. Regards, Frédéric. [1] https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#sizing ----- Le 15 Mai 25, à 7:44, Kasper Rasmussen [email protected] a écrit : > perf dump: > "bluefs": { > "db_total_bytes": 88906653696, > "db_used_bytes": 11631853568, > "wal_total_bytes": 0, > "wal_used_bytes": 0, > "slow_total_bytes": 9796816207872, > "slow_used_bytes": 1881341952, > "num_files": 229, > "log_bytes": 11927552, > "log_compactions": 78, > "log_write_count": 281792, > "logged_bytes": 1154220032, > "files_written_wal": 179, > "files_written_sst": 311, > "write_count_wal": 280405, > "write_count_sst": 29432, > "bytes_written_wal": 4015595520, > "bytes_written_sst": 15728308224, > "bytes_written_slow": 2691231744, > "max_bytes_wal": 0, > "max_bytes_db": 13012828160, > "max_bytes_slow": 3146252288, > "alloc_unit_slow": 65536, > "alloc_unit_db": 1048576, > "alloc_unit_wal": 0, > "read_random_count": 1871590, > "read_random_bytes": 18959576586, > "read_random_disk_count": 563421, > "read_random_disk_bytes": 17110012647, > "read_random_disk_bytes_wal": 0, > "read_random_disk_bytes_db": 11373755941, > "read_random_disk_bytes_slow": 5736256706, > "read_random_buffer_count": 1313456, > "read_random_buffer_bytes": 1849563939, > "read_count": 275731, > "read_bytes": 4825912551, > "read_disk_count": 225997, > "read_disk_bytes": 4016943104, > "read_disk_bytes_wal": 0, > "read_disk_bytes_db": 3909947392, > "read_disk_bytes_slow": 106999808, > "read_prefetch_count": 274534, > "read_prefetch_bytes": 4785141168, > "write_count": 591760, > "write_disk_count": 591838, > "write_bytes": 21062987776, > "compact_lat": { > "avgcount": 78, > "sum": 0.572247346, > "avgtime": 0.007336504 > }, > "compact_lock_lat": { > "avgcount": 78, > "sum": 0.182746199, > "avgtime": 0.002342899 > }, > "alloc_slow_fallback": 0, > "alloc_slow_size_fallback": 0, > "read_zeros_candidate": 0, > "read_zeros_errors": 0, > "wal_alloc_lat": { > "avgcount": 0, > "sum": 0.000000000, > "avgtime": 0.000000000 > }, > "db_alloc_lat": { > "avgcount": 969, > "sum": 0.006368060, > "avgtime": 0.000006571 > }, > "slow_alloc_lat": { > "avgcount": 39, > "sum": 0.004502210, > "avgtime": 0.000115441 > }, > "alloc_wal_max_lat": 0.000000000, > "alloc_db_max_lat": 0.000113831, > "alloc_slow_max_lat": 0.000301347 > }, > > > config show: > "bluestore_rocksdb_cf": "true", > "bluestore_rocksdb_cfs": "m(3) p(3,0-12) > O(3,0-13)=block_cache={type=binned_lru} > L=min_write_buffer_number_to_merge=32 P=min_write_buffer_number_to_merge=32", > "bluestore_rocksdb_options": > "compression=kLZ4Compression,max_write_buffer_number=64,min_write_buffer_number_to_merge=6,compaction_style=kCompactionStyleLevel,write_buffer_size=16777216,max_background_jobs=4,level0_file_num_compaction_trigger=8,max_bytes_for_level_base=1073741824,max_bytes_for_level_multiplier=8,compaction_readahead_size=2MB,max_total_wal_size=1073741824,writable_file_max_buffer_size=0", > "bluestore_rocksdb_options_annex": "", > > > Dono if it is of any help, but I've compared the config from an OSD not > reporting an issues, and there is no difference. > > > ________________________________ > From: Enrico Bocchi <[email protected]> > Sent: Wednesday, May 14, 2025 22:47 > To: Kasper Rasmussen <[email protected]>; ceph-users > <[email protected]> > Subject: Re: BLUEFS_SPILLOVER after Reef upgrade > > Hi Kasper, > > Would you mind sharing the output of `perf dump` and `config show` from the > daemon socket of one of the OSDs reporting blues spillover? I am interested in > the bluefs part of the former and in the bluestore_rocksdb options of the > latter. > > The warning about slow ops in bluestore is a different story. There have been > several messages on this mailing list recently with suggestions on how to tune > the alert threshold. From my experience, they very likely relate to some > problem with the underlying storage device, so I'd recommend investigating the > root cause rather than simply silencing the warning. > > Cheers, > Enrico > > > ________________________________ > From: Kasper Rasmussen <[email protected]> > Sent: Wednesday, May 14, 2025 8:22:46 PM > To: ceph-users <[email protected]> > Subject: [ceph-users] BLUEFS_SPILLOVER after Reef upgrade > > I've just upgraded our ceph cluster from pacific 16.2.15 -> Reef 18.2.7 > > After that I see the warnings: > > [WRN] BLUEFS_SPILLOVER: 5 OSD(s) experiencing BlueFS spillover > osd.110 spilled over 4.5 GiB metadata from 'db' device (8.0 GiB used of > 83 GiB) > to slow device > osd.455 spilled over 1.1 GiB metadata from 'db' device (11 GiB used of 83 > GiB) > to slow device > osd.533 spilled over 426 MiB metadata from 'db' device (10 GiB used of 83 > GiB) > to slow device > osd.560 spilled over 389 MiB metadata from 'db' device (9.8 GiB used of > 83 GiB) > to slow device > osd.597 spilled over 8.6 GiB metadata from 'db' device (7.7 GiB used of > 83 GiB) > to slow device > [WRN] BLUESTORE_SLOW_OP_ALERT: 4 OSD(s) experiencing slow operations in > BlueStore > osd.410 observed slow operation indications in BlueStore > osd.443 observed slow operation indications in BlueStore > osd.508 observed slow operation indications in BlueStore > osd.593 observed slow operation indications in BlueStore > > I've tried to run ceph tell osd.XXX compact with no result. > > Bluefs stats: > > ceph tell osd.110 bluefs stats > 1 : device size 0x14b33fe000 : using 0x202c00000(8.0 GiB) > 2 : device size 0x8e8ffc00000 : using 0x5d31d150000(5.8 TiB) > RocksDBBlueFSVolumeSelector >>>Settings<< extra=0 B, l0_size=1 GiB, l_base=1 GiB, l_multi=8 B > DEV/LEV WAL DB SLOW * * REAL > FILES > LOG 0 B 16 MiB 0 B 0 B 0 B 15 MiB > 1 > WAL 0 B 18 MiB 0 B 0 B 0 B 6.3 > MiB > 1 > DB 0 B 8.0 GiB 0 B 0 B 0 B 8.0 > GiB > 140 > SLOW 0 B 0 B 4.5 GiB 0 B 0 B 4.5 > GiB > 78 > TOTAL 0 B 8.0 GiB 4.5 GiB 0 B 0 B 0 B > 220 > MAXIMUMS: > LOG 0 B 25 MiB 0 B 0 B 0 B 21 MiB > WAL 0 B 118 MiB 0 B 0 B 0 B 93 MiB > DB 0 B 8.2 GiB 0 B 0 B 0 B 8.2 > GiB > SLOW 0 B 0 B 14 GiB 0 B 0 B 14 GiB > TOTAL 0 B 8.2 GiB 14 GiB 0 B 0 B 0 B >>> SIZE << 0 B 79 GiB 8.5 TiB > > Help with what to do next will, be much appreciated > > > _______________________________________________ > ceph-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] > > _______________________________________________ > ceph-users mailing list -- [email protected] > To unsubscribe send an email to [email protected] _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
