[ceph-users] Re: nodes with high density of OSDs

Anthony D'Atri Sat, 12 Apr 2025 08:14:21 -0700


> Apparently those UUIDs aren't as reliable as I thought.
> 
> I've had problems with a server box that hosts a ceph VM.


VM?

> Looks like the mobo disk controller is unreliable

Lemme guess, it is an IR / RoC / RAID type? As opposed to JBOB / IT?

If the former and it’s an LSI SKU as most are, I’d love if you could send me 
privately the output of

storcli64 /c0 show termlog >/tmp/termlog.txt

Sometimes flakiness is actually with the drive backplane, especially when it 
has an embedded expander.  In either case, updating HBA firmware sometimes 
makes a real difference.

And drive firmware.

> AND one of the disks passes SMART

I’m curious if it shows SATA downshifts.

> but has interface problems. So I moved the disks to an alternate box.
> 
> Between relocation and dropping the one disk, neither of the 2 OSDs for that 
> host will come up. If everything was running solely on static UUIDs, the good 
> disk should have been findable even if its physical disk device name shifted. 
> But it wasn't.

Did you try

        ceph-volume lvm activate —all

?

> Which brings up something I've wondered about for some time. Shouldn't it be 
> possible for OSDs to be portable?

I haven’t tried it much, but that *should* be true, modulo CRUSH location.

> That is, if a box goes bad, in theory I should be able to remove the drive 
> and jack it into a hot-swap bay on another server and have that server able 
> to import the relocated OSD.

I’ve effectively done a chassis swap, moving all the drives including the boot 
volume, but that admittedly was in the ceph-disk days.


> True, the metadata for an OSD is currently located on its host, but it seems 
> like it should be possible to carry a copy on the actual device.

My limited understanding is that *is* the case with LVM.


> 
>    Tim
> 
> On 4/11/25 16:23, Anthony D'Atri wrote:
>> Filestore, pre-ceph-volume may have been entirely different.  IIRC LVM is 
>> used these days to exploit persistent metadata tags.
>> 
>>> On Apr 11, 2025, at 4:03 PM, Tim Holloway <[email protected]> wrote:
>>> 
>>> I just checked an OSD and the "block" entry is indeed linked to storage 
>>> using a /dev/mapper uuid LV, not a /dev/device. When ceph builds an 
>>> LV-based OSD, it creates a VG whose name is "ceph-uuuuu", where "uuuu" is a 
>>> UUID, and an LV named "osd-block-vvvv", where "vvvv" is also a uuid. So 
>>> although you'd map the osd to something like /dev/vdb in a VM, the actual 
>>> name ceph uses is uuid-based (and lvm-based) and thus not subject to change 
>>> with alterations in the hardware as the uuids are part of the metadata in 
>>> VGs and LVs created by ceph.
>>> 
>>> Since I got that from a VM, I can't vouch for all cases, but I thought it 
>>> especially interesting that a ceph was creating LVM counterparts even for 
>>> devices that were not themselves LVM-based.
>>> 
>>> And yeah, I understand that it's the amount of OSD replicate data that 
>>> counts more than the number of hosts, but when an entire host goes down and 
>>> there are few hosts, that can take a large bite out of the replicas.
>>> 
>>>    Tim
>>> 
>>> On 4/11/25 10:36, Anthony D'Atri wrote:
>>>> I thought those links were to the by-uuid paths for that reason?
>>>> 
>>>>> On Apr 11, 2025, at 6:39 AM, Janne Johansson <[email protected]> wrote:
>>>>> 
>>>>> Den fre 11 apr. 2025 kl 09:59 skrev Anthony D'Atri 
>>>>> <[email protected]>:
>>>>>> Filestore IIRC used partitions, with cute hex GPT types for various 
>>>>>> states and roles.  Udev activation was sometimes problematic, and LVM 
>>>>>> tags are more flexible and reliable than the prior approach.  There no 
>>>>>> doubt is more to it but that’s what I recall.
>>>>> Filestore used to have softlinks towards the journal device (if used)
>>>>> which pointed to sdX where that X of course would jump around if you
>>>>> changed the number of drives on the box, or the kernel disk detection
>>>>> order changed, breaking the OSD.
>>>>> 
>>>>> -- 
>>>>> May the most significant bit of your life be positive.
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- [email protected]
>>>>> To unsubscribe send an email to [email protected]
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: nodes with high density of OSDs

Reply via email to