[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS

2025-02-21 Thread Maksym Medvied
The fix for the bug looks at the byte 4 bytes ahead (if the current position is 0x3C9, then the code would look at the byte at 0x3CD). In the squid release the byte most likely would be 0 (it could be non-zero for 4GiB+ extended attributes, which is highly unlikely). In the squid git snapshot from

[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS

2025-02-21 Thread Maksym Medvied
The following patch was used to get the hexdumps above: diff --git a/src/mon/MDSMonitor.cc b/src/mon/MDSMonitor.cc index 76a57ac443..d36bed2257 100644 --- a/src/mon/MDSMonitor.cc +++ b/src/mon/MDSMonitor.cc @@ -143,6 +143,7 @@ void MDSMonitor::update_from_paxos(bool *need_bootstrap) ceph_asser

[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS

2025-02-21 Thread Maksym Medvied
** Attachment removed: "src/mds/MDSMap: decode max_xattr_size and bal_rank_mask in the right order" https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/2089565/+attachment/5859087/+files/mds-MDSMap-decode-max_xattr_size-and-bal_rank_mask.patch -- You received this bug notification because yo

[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS

2025-02-20 Thread Maksym Medvied
(this description of the fix is added to the patch as well) bal_rank_mask is stored as a text string with a decimal representation of a number inside. The string is stored as length of the string (4 bytes, little endian) and then the string itself (without trailing 0, just the string itself). max

[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS

2025-02-19 Thread Maksym Medvied
** Changed in: ceph (Ubuntu) Status: Confirmed => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2089565 Title: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS To manage no

[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS

2025-01-27 Thread Maksym Medvied
** Changed in: ceph (Ubuntu) Assignee: (unassigned) => Maksym Medvied (medvied) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2089565 Title: MON and MDS crash upgrading CEPH on ubuntu 24

[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS

2024-12-21 Thread Maksym Medvied
Now we see that the dir with the Ceph source is is ceph-19.2.0. Let's create a symlink so gdb would be able to find it: > sudo ln -sv ceph-19.2.0 ceph-19.2.0-0ubuntu0.24.04.1 'ceph-19.2.0-0ubuntu0.24.04.1' -> 'ceph-19.2.0' Let's restart gdb with ceph-mon again: (gdb) start Temporary breakpoint 1

[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS

2024-12-21 Thread Maksym Medvied
The addresses here are not continuous, so it makes sense to look at the full disassembled version as well (i.e. disassemble without /m): (gdb) disassemble 'MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl&)' Dump of assembler code for function _ZN6MDSMap6decodeERN4ceph6buffer7v15_2_04li

[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS

2024-12-21 Thread Maksym Medvied
As we see in the diff above if (ev >= 17) { -decode(max_xattr_size, p); +decode(bal_rank_mask, p); } if (ev >= 18) { -decode(bal_rank_mask, p); +decode(max_xattr_size, p); + } + these two decode() calls were swapped. Let's find out why. To do so we need to clone the up

[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS

2024-12-21 Thread Maksym Medvied
git clone https://git.launchpad.net/ubuntu/+source/ceph cd ceph > git grep -n MDSMap::decode src/mds/FSMap.cc:1086: * Insert INLINE; see comment in MDSMap::decode. src/mds/MDSMap.cc:836:void MDSMap::decode(bufferlist::const_iterator& p) So we're interested in src/mds/MDSMap.cc (if the file was

[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS

2024-12-21 Thread Maksym Medvied
Let's find this offset in the disassembled function: (gdb) disassemble/m 'MDSMap::decode(ceph::buffer::v15_2_0::list::iterator_impl&)' Dump of assembler code for function _ZN6MDSMap6decodeERN4ceph6buffer7v15_2_04list13iterator_implILb1EEE: Address range 0x77cc2e10 to 0x77cc3c4d: 837

[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS

2024-12-21 Thread Maksym Medvied
This is the SIGABRT stack backtrace: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x45320) [0x749752045320] 2: pthread_kill() 3: gsignal() 4: abort() 5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa5ff5) [0x7497524a5ff5] 6: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xbb0da) [0x7497524bb0da] 7: (std::unexpec

[Bug 2089565] Re: MON and MDS crash upgrading CEPH on ubuntu 24.04 LTS

2024-12-21 Thread Maksym Medvied
The root cause of this bug is that the on-wire representation changed between the git snapshot 19.2.0~git20240301.4c76c50-0ubuntu6 and the squid release 19.2.0-0ubuntu0.24.04.1, so the cluster couldn't be upgraded without downtime. We don't have upgrade tests from the snapshot to the squid release,