[ceph-users] Re: Patching Ceph cluster

Michael Worsham Thu, 24 Apr 2025 21:06:39 -0700

I've been reading over the playbook code, and it's nicely written. I know it's 
primarily RHEL focused, but I think it could be modified for Ubuntu/Debian 
platforms as well.


A couple of questions though...

In the test example hosts file, what is the tiebreaker?

I know there isn't a role in the roles folder, but do you have an example of 
one, just so we know what it does?

Thanks.

-- Michael


Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: Sake Ceph <[email protected]>
Sent: Friday, June 14, 2024 4:28:34 AM
To: Michael Worsham <[email protected]>; [email protected] 
<[email protected]>
Subject: Re: [ceph-users] Re: Patching Ceph cluster

This is an external email. Please take care when clicking links or opening 
attachments. When in doubt, check with the Help Desk or Security.


I needed to do some cleaning before I could share this :)
Maybe you or someone else can use it.

Kind regards,
Sake

> Op 14-06-2024 03:53 CEST schreef Michael Worsham 
> <[email protected]>:
>
>
> I'd love to see what your playbook(s) looks like for doing this.
>
> -- Michael
> ________________________________
> From: Sake Ceph <[email protected]>
> Sent: Thursday, June 13, 2024 4:05 PM
> To: [email protected] <[email protected]>
> Subject: [ceph-users] Re: Patching Ceph cluster
>
> This is an external email. Please take care when clicking links or opening 
> attachments. When in doubt, check with the Help Desk or Security.
>
>
> Yeah we fully automated this with Ansible. In short we do the following.
>
> 1. Check if cluster is healthy before continuing (via REST-API) only 
> health_ok is good
> 2. Disable scrub and deep-scrub
> 3. Update all applications on all the hosts in the cluster
> 4. For every host, one by one, do the following:
> 4a. Check if applications got updated
> 4b. Check via reboot-hint if a reboot is necessary
> 4c. If applications got updated or reboot is necessary, do the following :
> 4c1. Put host in maintenance
> 4c2. Reboot host if necessary
> 4c3. Check and wait via 'ceph orch host ls' if status of the host is 
> maintance and nothing else
> 4c4. Get host out of maintenance
> 4d. Check if cluster is healthy before continuing (via Rest-API) only warning 
> about scrub and deep-scrub is allowed, but no pg's should be degraded
> 5. Enable scrub and deep-scrub when all hosts are done
> 6. Check if cluster is healthy (via Rest-API) only health_ok is good
> 7. Done
>
> For upgrade the OS we have something similar, but exiting maintenance mode is 
> broken (with 17.2.7) :(
> I need to check the tracker for similar issues and if I can't find anything, 
> I will create a ticket.
>
> Kind regards,
> Sake
>
> > Op 12-06-2024 19:02 CEST schreef Daniel Brown 
> > <[email protected]>:
> >
> >
> > I have two ansible roles, one for enter, one for exit. There’s likely 
> > better ways to do this — and I’ll not be surprised if someone here lets me 
> > know. They’re using orch commands via the cephadm shell. I’m using Ansible 
> > for other configuration management in my environment, as well, including 
> > setting up clients of the ceph cluster.
> >
> >
> > Below excerpts from main.yml in the “tasks” for the enter/exit roles. The 
> > host I’m running ansible from is one of my CEPH servers - I’ve limited 
> > which process run there though so it’s in the cluster but not equal to the 
> > others.
> >
> >
> > —————
> > Enter
> > —————
> >
> > - name: Ceph Maintenance Mode Enter
> >   shell:
> >
> >     cmd: ' cephadm shell ceph orch host maintenance enter {{ 
> > (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }} 
> > --force --yes-i-really-mean-it ‘
> >   become: True
> >
> >
> >
> > —————
> > Exit
> > —————
> >
> >
> > - name: Ceph Maintenance Mode Exit
> >   shell:
> >     cmd: 'cephadm shell ceph orch host maintenance exit {{ 
> > (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }} ‘
> >   become: True
> >   connection: local
> >
> >
> > - name: Wait for Ceph to be available
> >   ansible.builtin.wait_for:
> >     delay: 60
> >     host: '{{ 
> > (ansible_ssh_host|default(ansible_host))|default(inventory_hostname) }}’
> >     port: 9100
> >   connection: local
> >
> >
> >
> >
> >
> >
> > > On Jun 12, 2024, at 11:28 AM, Michael Worsham 
> > > <[email protected]> wrote:
> > >
> > > Interesting. How do you set this "maintenance mode"? If you have a series 
> > > of documented steps that you have to do and could provide as an example, 
> > > that would be beneficial for my efforts.
> > >
> > > We are in the process of standing up both a dev-test environment 
> > > consisting of 3 Ceph servers (strictly for testing purposes) and a new 
> > > production environment consisting of 20+ Ceph servers.
> > >
> > > We are using Ubuntu 22.04.
> > >
> > > -- Michael
> > > From: Daniel Brown <[email protected]>
> > > Sent: Wednesday, June 12, 2024 9:18 AM
> > > To: Anthony D'Atri <[email protected]>
> > > Cc: Michael Worsham <[email protected]>; [email protected] 
> > > <[email protected]>
> > > Subject: Re: [ceph-users] Patching Ceph cluster
> > >  This is an external email. Please take care when clicking links or 
> > > opening attachments. When in doubt, check with the Help Desk or Security.
> > >
> > >
> > > There’s also a Maintenance mode that you can set for each server, as 
> > > you’re doing updates, so that the cluster doesn’t try to move data from 
> > > affected OSD’s, while the server being updated is offline or down. I’ve 
> > > worked some on automating this with Ansible, but have found my process 
> > > (and/or my cluster) still requires some manual intervention while it’s 
> > > running to get things done cleanly.
> > >
> > >
> > >
> > > > On Jun 12, 2024, at 8:49 AM, Anthony D'Atri <[email protected]> 
> > > > wrote:
> > > >
> > > > Do you mean patching the OS?
> > > >
> > > > If so, easy -- one node at a time, then after it comes back up, wait 
> > > > until all PGs are active+clean and the mon quorum is complete before 
> > > > proceeding.
> > > >
> > > >
> > > >
> > > >> On Jun 12, 2024, at 07:56, Michael Worsham 
> > > >> <[email protected]> wrote:
> > > >>
> > > >> What is the proper way to patch a Ceph cluster and reboot the servers 
> > > >> in said cluster if a reboot is necessary for said updates? And is it 
> > > >> possible to automate it via Ansible? This message and its attachments 
> > > >> are from Data Dimensions and are intended only for the use of the 
> > > >> individual or entity to which it is addressed, and may contain 
> > > >> information that is privileged, confidential, and exempt from 
> > > >> disclosure under applicable law. If the reader of this message is not 
> > > >> the intended recipient, or the employee or agent responsible for 
> > > >> delivering the message to the intended recipient, you are hereby 
> > > >> notified that any dissemination, distribution, or copying of this 
> > > >> communication is strictly prohibited. If you have received this 
> > > >> communication in error, please notify the sender immediately and 
> > > >> permanently delete the original email and destroy any copies or 
> > > >> printouts of this email as well as any attachments.
> > > >> _______________________________________________
> > > >> ceph-users mailing list -- [email protected]
> > > >> To unsubscribe send an email to [email protected]
> > > > _______________________________________________
> > > > ceph-users mailing list -- [email protected]
> > > > To unsubscribe send an email to [email protected]
> > >
> > > This message and its attachments are from Data Dimensions and are 
> > > intended only for the use of the individual or entity to which it is 
> > > addressed, and may contain information that is privileged, confidential, 
> > > and exempt from disclosure under applicable law. If the reader of this 
> > > message is not the intended recipient, or the employee or agent 
> > > responsible for delivering the message to the intended recipient, you are 
> > > hereby notified that any dissemination, distribution, or copying of this 
> > > communication is strictly prohibited. If you have received this 
> > > communication in error, please notify the sender immediately and 
> > > permanently delete the original email and destroy any copies or printouts 
> > > of this email as well as any attachments.
> >
> > _______________________________________________
> > ceph-users mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> This message and its attachments are from Data Dimensions and are intended 
> only for the use of the individual or entity to which it is addressed, and 
> may contain information that is privileged, confidential, and exempt from 
> disclosure under applicable law. If the reader of this message is not the 
> intended recipient, or the employee or agent responsible for delivering the 
> message to the intended recipient, you are hereby notified that any 
> dissemination, distribution, or copying of this communication is strictly 
> prohibited. If you have received this communication in error, please notify 
> the sender immediately and permanently delete the original email and destroy 
> any copies or printouts of this email as well as any attachments.
> _______________________________________________
> ceph-users mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
This message and its attachments are from Data Dimensions and are intended only 
for the use of the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential, and exempt from 
disclosure under applicable law. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution, or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify the 
sender immediately and permanently delete the original email and destroy any 
copies or printouts of this email as well as any attachments.
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Patching Ceph cluster

Reply via email to