Re: [ansible-project] restart service, check if port is ready to accept and then move to next host

Zdenek Pyszko Fri, 24 Nov 2023 05:04:01 -0800

yea, getting better :)
Have a look to diff of my fork, what could work:
https://github.com/sameergithub5/prometheusrole/pull/1


It is still pretty raw, but it contains the idea.

Dne neděle 19. listopadu 2023 v 8:54:59 UTC+1 uživatel Sameer Modak napsal:

> Thanks a lot Zdenek.
>
> I got it now i have  heard your comments and converted this to something 
> closer .
>
>
> https://github.com/sameergithub5/prometheusrole/tree/main/node_exporter_and_prometheus_jmx_exporter
>
> Can you plz spot if  there is a room for an improvement .
>
> On Thursday, November 16, 2023 at 8:10:02 PM UTC+5:30 Zdenek Pyszko wrote:
>
>> Hello Sameer,
>> my two cents here as i made a quick lookup to your repo.
>> I would suggest to refactor your repo to use roles.
>> You have three different playbooks referenced in main.yml, which are 
>> doing more or less the same job.
>> Create a role 'enable prometheus' which will be dynamic enough to make 
>> decision based on input variables (zookeeper, Kafka,...)
>> And one tiny role to restart the services(if needed).
>> Outcome: single playbook, one prometheus role, one service mgmt(restart) 
>> role, no DRY code(dont repeat yourself), re-usable.
>>
>> Dne čtvrtek 9. listopadu 2023 v 17:29:28 UTC+1 uživatel Sameer Modak 
>> napsal:
>>
>>> Hello Todd,
>>>
>>> I tried serial and it works but my problem is, serial works in playbook 
>>> so when i write import_playbook inside include_task: zookeeper.yaml it 
>>> fails saying u cant import playbook inside task.
>>> Now, How do i do it then??
>>>
>>> ok so let me give you how i am running basically i have created role 
>>> prometheus which you can find here in below my personal public repo.  Role 
>>> has its usual main.yml which includes tasks and i have created 
>>> Restartandcheck.yml which i am unable to use because import_playbook error 
>>> if i put in zookeeper.yml file
>>>
>>>
>>> https://github.com/sameergithub5/prometheusrole/tree/main/prometheus
>>>
>>>
>>> On Friday, November 3, 2023 at 9:00:13 PM UTC+5:30 Todd Lewis wrote:
>>>
>>>> That's correct; serial is not a task or block key word. It's a playbook 
>>>> key word.
>>>>
>>>> - name: One host at a time
>>>>   hosts: ducks_in_a_row
>>>>   serial: 1
>>>>   max_fail_percentage: 0
>>>>   tasks:
>>>>     - task1
>>>>     - task2
>>>>     - task3
>>>>
>>>> Read up on serial 
>>>> <https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_strategies.html#setting-the-batch-size-with-serial>
>>>>  
>>>> and max_fail_percentage 
>>>> <https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_error_handling.html#setting-a-maximum-failure-percentage>
>>>>  
>>>> . Blocks don't come into it.
>>>>
>>>>
>>>> On 11/3/23 9:22 AM, Sameer Modak wrote:
>>>>
>>>> Hello will, 
>>>>
>>>>
>>>>
>>>> i tried to do it with block and serial no it does not work say's block 
>>>> cant have serial 
>>>>
>>>> tasks:
>>>>
>>>>   - name: block check
>>>>
>>>>     block:
>>>>
>>>>       - name: run this shell
>>>>
>>>>         shell: 'systemctl restart "{{zookeeper_service_name}}"'
>>>>
>>>>
>>>>       - name: debug
>>>>
>>>>         debug:
>>>>
>>>>           msg: "running my task"
>>>>
>>>>
>>>>       - name: now run this task
>>>>
>>>>         shell: timeout -k 3 1m sh -c 'until nc -zv localhost 
>>>> {{hostvars[inventory_hostname].zk_port}}; do sleep 1; done'
>>>>
>>>>
>>>>     when:
>>>>
>>>>     - not zkmode is search('leader')
>>>>
>>>>     serial: 1
>>>>
>>>> ~                                                
>>>>
>>>> On Wednesday, November 1, 2023 at 3:39:54 PM UTC+5:30 Sameer Modak 
>>>> wrote:
>>>>
>>>>> Let me try with block and serial and get back to you
>>>>>
>>>>> On Wednesday, November 1, 2023 at 5:33:14 AM UTC+5:30 Will McDonald 
>>>>> wrote:
>>>>>
>>>>>> Edit: s/along with a failed_when/along with wait_for/ 
>>>>>>
>>>>>> On Tue, 31 Oct 2023 at 23:58, Will McDonald <[email protected]> 
>>>>>> wrote:
>>>>>>
>>>>>>> I don't entirely understand your approach, constraints or end-to-end 
>>>>>>> requirements here, but trying to read between the lines... 
>>>>>>>
>>>>>>> 1. You have a cluster of zookeeper nodes (presumably 2n+1 so 3, 5 or 
>>>>>>> more nodes)
>>>>>>> 2. You want to do a rolling restart of these nodes 1 at a time, wait 
>>>>>>> for the node to come back up, check it's functioning, and if that 
>>>>>>> doesn't 
>>>>>>> work, fail the run
>>>>>>> 3. With your existing approach you can limit the restart of a 
>>>>>>> service using throttle at the task level, but then don't know how to 
>>>>>>> handle 
>>>>>>> failure in a subsequent task
>>>>>>> 4. You don't think wait_for will work because you only throttle on 
>>>>>>> the restart task
>>>>>>>
>>>>>>> (Essentially you want your condition "has the service restarted 
>>>>>>> successfully" to be in the task itself.)
>>>>>>>
>>>>>>> Again some thoughts that might help you work through this...
>>>>>>>
>>>>>>> 1. Any reason you couldn't just use serial at a playbook level? If 
>>>>>>> so, what is that?
>>>>>>> 2. If you must throttle rather than serial, consider using it in a 
>>>>>>> block along with a failed_when
>>>>>>> 3. Try and avoid using shell and use builtin constructs like 
>>>>>>> service, it'll save you longer term pain
>>>>>>>
>>>>>>> Read through the links I posted earlier and explain what might stop 
>>>>>>> you using the documented approach.
>>>>>>>
>>>>>>> This post from Vladimir on Superuser might be useful too: 
>>>>>>> https://superuser.com/questions/1664197/ansible-keyword-throttle 
>>>>>>> (loads of other 2n+1 rolling update/restart examples out there too: 
>>>>>>> https://stackoverflow.com/questions/62378317/ansible-rolling-restart-multi-cluster-environment
>>>>>>> )
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 31 Oct 2023 at 17:54, Sameer Modak <[email protected]> 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello Will, 
>>>>>>>>
>>>>>>>> I have used throttle so that part is sorted. But i dont think 
>>>>>>>> wait_for works here for example.
>>>>>>>> task 1 restart. <--- now in this task already he has restarted all 
>>>>>>>> hosts one by one 
>>>>>>>> task 2 wait_for <-- this will fail if port does not come up but no 
>>>>>>>> use because restart is triggered.
>>>>>>>>
>>>>>>>> we just want to know if in one task it restarts and checks if fails 
>>>>>>>> aborts play thats it. Now we got the results but used shell module.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tuesday, October 31, 2023 at 7:53:31 PM UTC+5:30 Will McDonald 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I'd suggest reading up on rolling updates using serial:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://docs.ansible.com/ansible/latest/playbook_guide/guide_rolling_upgrade.html#the-rolling-upgrade
>>>>>>>>>
>>>>>>>>> https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_strategies.html#setting-the-batch-size-with-serial
>>>>>>>>>
>>>>>>>>> You can use wait_for or wait_for_connection to ensure service 
>>>>>>>>> availability before continuing:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://docs.ansible.com/ansible/latest/collections/ansible/builtin/wait_for_module.html
>>>>>>>>>
>>>>>>>>> https://docs.ansible.com/ansible/latest/collections/ansible/builtin/wait_for_connection_module.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 31 Oct 2023 at 14:08, Sameer Modak <[email protected]> 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> restart service, check if service is ready to accept connection 
>>>>>>>>>> because it takes time to come up. Once we sure its listening on port 
>>>>>>>>>> then 
>>>>>>>>>> only move to next host. unless dont move because we can only afford 
>>>>>>>>>> to have 
>>>>>>>>>> one service down at a time.
>>>>>>>>>>
>>>>>>>>>> is there any to short hand or ansible native way to handle this 
>>>>>>>>>> using ansible module.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> code: 
>>>>>>>>>>
>>>>>>>>>> name: Restart zookeeper followers
>>>>>>>>>>
>>>>>>>>>>   throttle: 1
>>>>>>>>>>
>>>>>>>>>>   any_errors_fatal: true
>>>>>>>>>>
>>>>>>>>>>   shell: |
>>>>>>>>>>
>>>>>>>>>>      systemctl restart {{zookeeper_service_name}}  
>>>>>>>>>>
>>>>>>>>>>      timeout 22 sh -c 'until nc localhost 
>>>>>>>>>> {{zookeeper_server_port}}; do sleep 1; done'
>>>>>>>>>>
>>>>>>>>>>   when: not zkmode.stdout_lines is search('leader')
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>> Google Groups "Ansible Project" group.
>>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>>> send an email to [email protected].
>>>>>>>>>> To view this discussion on the web visit 
>>>>>>>>>> https://groups.google.com/d/msgid/ansible-project/67ca5f13-855d-4d40-a47a-c0fbe11ea3b5n%40googlegroups.com
>>>>>>>>>>  
>>>>>>>>>> <https://groups.google.com/d/msgid/ansible-project/67ca5f13-855d-4d40-a47a-c0fbe11ea3b5n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "Ansible Project" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>> send an email to [email protected].
>>>>>>>> To view this discussion on the web visit 
>>>>>>>> https://groups.google.com/d/msgid/ansible-project/3370b143-050a-4a14-a858-f5abe60c2678n%40googlegroups.com
>>>>>>>>  
>>>>>>>> <https://groups.google.com/d/msgid/ansible-project/3370b143-050a-4a14-a858-f5abe60c2678n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>>>> .
>>>>>>>>
>>>>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Ansible Project" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>>
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/ansible-project/69417f84-b761-4008-8284-ac644d3384f7n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/ansible-project/69417f84-b761-4008-8284-ac644d3384f7n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>>
>>>> -- 
>>>> Todd
>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/ansible-project/f52dca30-05ac-49ad-9622-5539f037cdf0n%40googlegroups.com.

Re: [ansible-project] restart service, check if port is ready to accept and then move to next host

Reply via email to