Hi,
I am having a peculiar problem with my ceph octopus cluster. 2 weeks ago I
had an issue that started like too many scrub error and later random OSD
stopped which lead to pg corrupt, replica missing. since it's a testing
cluster I wanted to understand the issue.
I tried to recover PG but it didn't help. when I use `set norecover,
norebalance, nodown` OSD service running without stopping.
I have gone thru the steps in ceph osd troubleshooting but nothing helps or
leads to finding the issue.
I have mailed earlier but couldn't get any solution.
Any help would be appreciated to find out the issue.
*error msg in one of the OSD which failed.*
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/15.2.7/rpm/el8/BUILD/
ceph-15.2.7/src/osd/OSD.cc: 9521: FAILED ceph_assert(started <=
reserved_pushes)
ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus
(stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x158) [0x55fcb6621dbe]
2: (()+0x504fd8) [0x55fcb6621fd8]
3: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x5f5) [0x55fcb6704c25]
4: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x1d) [0x55fcb6960a3d]
5: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x12ef) [0x55fcb67224df]
6: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4)
[0x55fcb6d5b224]
7: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55fcb6d5de84]
8: (()+0x82de) [0x7f04c1b1c2de]
9: (clone()+0x43) [0x7f04c0853e83]
0> 2021-08-28T13:53:37.444+0000 7f04a128d700 -1 *** Caught signal
(Aborted) **
in thread 7f04a128d700 thread_name:tp_osd_tp
ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus
(stable)
1: (()+0x12dd0) [0x7f04c1b26dd0]
2: (gsignal()+0x10f) [0x7f04c078f70f]
3: (abort()+0x127) [0x7f04c0779b25]
4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x1a9) [0x55fcb6621e0f]
5: (()+0x504fd8) [0x55fcb6621fd8]
6: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x5f5) [0x55fcb6704c25]
7: (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x1d) [0x55fcb6960a3d]
8: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x12ef) [0x55fcb67224df]
9: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4)
[0x55fcb6d5b224]
10: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55fcb6d5de84]
11: (()+0x82de) [0x7f04c1b1c2de]
12: (clone()+0x43) [0x7f04c0853e83]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.
Thanks
Amudhan
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]