Hi Michael,

On Wed, May 21, 2025 at 10:09 PM Michael Götting
<[email protected]> wrote:
>
> Hi all,
>
> we have the following problem with our CephFS Setup (Ceph version 19.2.2).
>
> Today our two active MDS nodes failed and then the nodes that were in
> „stand-by replay“ took over and failed as well.
>
>
> The CephFS system is equipped as follows:
> - 3 monitor nodes
> - 4 MDS nodes (2 active/ 2 stand-by)
>         - 2 active
>         - 2 stand-by replay
> - CephFS Pool
>         - max_mds = 2
>         - 1x Meta data pool
>         - 2x data pools (hdd_pool, ssd_pool)
>
>
>
> << ----------------- Ceph fs status output  START----------------- >>
>
> cephfs - 0 clients
>
> RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS
> 0 failed
> 1 failed
>
> << ----------------- Ceph fs status output  END-----------------  >>
>
> To restore the service, we used the documentation
>
> https://docs.ceph.com/en/quincy/cephfs/disaster-recovery-experts/?highlight=mds+repair
>
> we carried out the steps up to and including "MDS table wipes". We did
> not carry out the MDS MAP RESET step as we were not sure that we would
> then lose all the data from RANK 1. We also carried out the steps under
> "Avoiding recovery roadblocks"
> https://docs.ceph.com/en/quincy/cephfs/troubleshooting/#avoiding-recovery-roadblocks.
>
>
> Parameters set of the MDS nodes:
>
> mds advanced mds_abort_on_newly_corrupt_dentry false
> mds advanced mds_bal_interval 0
> mds basic mds_cache_memory_limit 274877906944
> mds advanced mds_cache_trim_threshold 524288
> mds advanced mds_go_bad_corrupt_dentry false
> mds advanced mds_heartbeat_grace 3600. 000000
> mds advanced mds_min_caps_working_set 60000
> mds advanced mds_oft_prefetch_dirfrags false *
>
>
> After trying the recovery steps (truncating the journal) the MDS daemons
> are in a crash -> restart loop behavior.
>
>
> << ----------------- Example log file mds-1:  START ----------------- >>
>
> -14> 2025-05-21T17:49:27.003+0200 7f36cfc7e640  1 mds.0.42052 active_start
>     -13> 2025-05-21T17:49:27.003+0200 7f36d2c84640 10 monclient:
> get_auth_request con 0x559a76e8a400 auth_method 0
>     -12> 2025-05-21T17:49:27.003+0200 7f36d2483640 10 monclient:
> get_auth_request con 0x559a7e041400 auth_method 0
>     -11> 2025-05-21T17:49:27.003+0200 7f36d3485640 10 monclient:
> get_auth_request con 0x559a76e8b000 auth_method 0
>     -10> 2025-05-21T17:49:27.003+0200 7f36d3485640 10 monclient:
> get_auth_request con 0x559ab89cb800 auth_method 0
>      -9> 2025-05-21T17:49:27.003+0200 7f36d2c84640 10 monclient:
> get_auth_request con 0x559a7bcc6400 auth_method 0
>      -8> 2025-05-21T17:49:27.015+0200 7f36cfc7e640  1 mds.0.42052
> cluster recovered.
>      -7> 2025-05-21T17:49:27.015+0200 7f36cfc7e640  4 mds.0.42052
> set_osd_epoch_barrier: epoch=492573
>      -6> 2025-05-21T17:49:27.015+0200 7f36cfc7e640  5 quiesce.mds.0
> <quiesce_cluster_update> epoch:42055 me:7764062 leader:7764062
> members:7764062
>      -5> 2025-05-21T17:49:27.015+0200 7f36cfc7e640  5 quiesce.mgr.0
> <update_membership> starting the db mgr thread at epoch: 42055
>      -4> 2025-05-21T17:49:27.015+0200 7f36c5c6a640  5 quiesce.mgr.0
> <quiesce_db_thread_main> Entering the main thread
>      -3> 2025-05-21T17:49:27.015+0200 7f36c5c6a640  5 quiesce.mgr.0
> <membership_upkeep> a reset of the db has been requested
>      -2> 2025-05-21T17:49:27.015+0200 7f36c9471640 -1
> mds.0.cache.den(0x1 techfak) newly corrupt dentry to be committed:
> [dentry #0x1/techfak [c,head] auth (dversion lock) pv=0 v=52947746
> ino=0x1000a58d072 state=1073741824 | inodepin=1 0x559a755b2c80]
>      -1> 2025-05-21T17:49:27.015+0200 7f36c9471640 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.2.2/rpm/el9/BUILD/ceph-19.2.2/src/mds/MDCache.cc:
> In function 'void MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*,
> CDentry*, snapid_t, CInode**, CDentry::linkage_t*)' thread 7f36c9471640
> time 2025-05-21T17:49:27.020101+0200
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/19.2.2/rpm/el9/BUILD/ceph-19.2.2/src/mds/MDCache.cc:
> 1687: FAILED ceph_assert(follows >= realm->get_newest_seq())

You are running into

        https://tracker.ceph.com/issues/36349

which got closed since it wasn't reproducible and there wasn't any
more debug information to make progress. To recover from this
situation, please refer here

        https://tracker.ceph.com/issues/36349#note-5

>
>   ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid
> (stable)
>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x121) [0x7f36d5709cf9]
>   2: /usr/lib64/ceph/libceph-common.so.2(+0x182eb8) [0x7f36d5709eb8]
>   3: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*,
> snapid_t, CInode**, CDentry::linkage_t*)+0xac3) [0x559a51ca4583]
>   4: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*,
> snapid_t)+0xbd) [0x559a51ca50cd]
>   5:
> (MDCache::predirty_journal_parents(boost::intrusive_ptr<MutationImpl>,
> EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0xe71) [0x559a51cab8f1]
>   6: (Locker::check_inode_max_size(CInode*, bool, unsigned long,
> unsigned long, utime_t)+0x473) [0x559a51d55a33]
>   7: (RecoveryQueue::_recovered(CInode*, int, unsigned long,
> utime_t)+0x390) [0x559a51d2e750]
>   8: (MDSContext::complete(int)+0x5c) [0x559a51e4617c]
>   9: (MDSIOContextBase::complete(int)+0x34c) [0x559a51e4884c]
>   10: /usr/bin/ceph-mds(+0x4f5970) [0x559a51eed970]
>   11: /usr/bin/ceph-mds(+0x160f0d) [0x559a51b58f0d]
>   12: (Finisher::finisher_thread_entry()+0x17d) [0x7f36d57c885d]
>   13: /lib64/libc.so.6(+0x8a0ca) [0x7f36d50a30ca]
>   14: /lib64/libc.so.6(+0x10f150) [0x7f36d5128150]
>
>       0> 2025-05-21T17:49:27.019+0200 7f36c9471640 -1 *** Caught signal
> (Aborted) **
>   in thread 7f36c9471640 thread_name:
>
>   ceph version 19.2.2 (0eceb0defba60152a8182f7bd87d164b639885b8) squid
> (stable)
>   1: /lib64/libc.so.6(+0x3ebf0) [0x7f36d5057bf0]
>   2: /lib64/libc.so.6(+0x8be0c) [0x7f36d50a4e0c]
>   3: raise()
>   4: abort()
>   5: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x17b) [0x7f36d5709d53]
>   6: /usr/lib64/ceph/libceph-common.so.2(+0x182eb8) [0x7f36d5709eb8]
>   7: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*,
> snapid_t, CInode**, CDentry::linkage_t*)+0xac3) [0x559a51ca4583]
>   8: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*,
> snapid_t)+0xbd) [0x559a51ca50cd]
>   9:
> (MDCache::predirty_journal_parents(boost::intrusive_ptr<MutationImpl>,
> EMetaBlob*, CInode*, CDir*, int, int, snapid_t)+0xe71) [0x559a51cab8f1]
>   10: (Locker::check_inode_max_size(CInode*, bool, unsigned long,
> unsigned long, utime_t)+0x473) [0x559a51d55a33]
>   11: (RecoveryQueue::_recovered(CInode*, int, unsigned long,
> utime_t)+0x390) [0x559a51d2e750]
>   12: (MDSContext::complete(int)+0x5c) [0x559a51e4617c]
>   13: (MDSIOContextBase::complete(int)+0x34c) [0x559a51e4884c]
>   14: /usr/bin/ceph-mds(+0x4f5970) [0x559a51eed970]
>   15: /usr/bin/ceph-mds(+0x160f0d) [0x559a51b58f0d]
>   16: (Finisher::finisher_thread_entry()+0x17d) [0x7f36d57c885d]
>   17: /lib64/libc.so.6(+0x8a0ca) [0x7f36d50a30ca]
>   18: /lib64/libc.so.6(+0x10f150) [0x7f36d5128150]
>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
>
> << ----------------- Example log file mds-1:  END -----------------  >>
>
>
> << ----------------- Ceph fs fs dump output  START----------------- >>
>
> e41371
> btime 2025-05-21T16:19:27:085643+0200
> enable_multiple, ever_enabled_multiple: 1,1
> default compat: compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=dir inode in separate
> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no
> anchor table,9=file layout v2,10=snaprealm v2}
> legacy client fscid: 3
>
> Filesystem 'cephfs' (3)
> fs_name cephfs
> epoch   41370
> flags   73 allow_snaps allow_multimds_snaps allow_standby_replay
> refuse_client_session
> created 2024-03-31T23:36:25.302389+0200
> modified        2025-05-21T16:19:09.237977+0200
> tableserver     0
> root    0
> session_timeout 60
> session_autoclose       300
> max_file_size   1099511627776
> max_xattr_size  65536
> required_client_features        {}
> last_failure    0
> last_failure_osd_epoch  492429
> compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable
> ranges,3=default file layouts on dirs,4=dir inode in separate
> object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds
> uses inline data,8=no anchor table,9=file layout v2,10=snaprealm
> v2,11=minor log segments,12=quiesce subvolumes}
> max_mds 2
> in      0,1
> up      {1=7761302}
> failed  0
> damaged
> stopped
> data_pools      [5,3]
> metadata_pool   4
> inline_data     disabled
> balancer
> bal_rank_mask   -1
> standby_count_wanted    2
> qdb_cluster     leader: 7761302 members: 7761302
> [mds.mds-1{1:7761302} state up:active seq 5 addr
> [v2:[2001:638:504:2011:9:3:1:1]:6800/2463826788,v1:[2001:638:504:2011:9:3:1:1]:6801/2463826788]
> compat {c=[1],r=[1],i=[1fff]}]
>
>
> Standby daemons:
>
> [mds.mds-2{-1:7771506} state up:standby seq 1 addr
> [v2:[2001:638:504:2011:6:3:2:2]:6800/2809657192,v1:[2001:638:504:2011:6:3:2:2]:6801/2809657192]
> compat {c=[1],r=[1],i=[1fff]}]
> dumped fsmap epoch 41371
>
> << ----------------- Ceph fs fs dump output  END———————— >>
>
> But to be honest, out of all those things we tried, I don't know what to
> provide exactly. We can provide much more but ...
>
> We really need the service back online, so help will be very much
> appreciated.
>
>
> Regards,
> Michael
>
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to