>>> "Ulrich Windl" <[email protected]> schrieb am 20.07.2021 um 12:01 in Nachricht <[email protected]>: > Hi! > > In the commands traced, no command (that is the monitor) too more than 3
s/too/took/ # Sorry! > seconds, so that either is *not* the timeout, or pacemaker got significantly > delayed. > One reason I could imagine is a "read stall". For example you could trigger > such if you rapidly fill your block cache with dirty blocks (to be written) > and > some read request would have to wait for buffers (to become written, thus > available). However if you are writing like mad, available read buffers > might > be rare. > Fortunately you can tune the kernel to limit the amount of dirty buffers. > I'm not saying that is your problem, but the trace looks OK. > > Regards, > Ulrich > >>>> PASERO Florent <[email protected]> schrieb am > 20.07.2021 um > 11:51 in Nachricht > <pr0p264mb2139d47d0acf66a81f3dcd37b4...@pr0p264mb2139.frap264.prod.outlook.co > M>: > >> Hi, >> >> Once or twice a week, we have a 'Timed out' on our VIP. >> >> The last : >> Cluster Summary: >> * Stack: corosync >> * Current DC: server07 (version 2.0.5-9.el8_4.1-ba59be7122) - partition >> with quorum >> * Last updated: Tue Jul 20 11:39:22 2021 >> * Last change: Mon Jul 5 09:42:14 2021 by hacluster via cibadmin on >> server06 >> * 2 nodes configured >> * 2 resource instances configured >> >> Node List: >> * Online: [ server06 server07 ] >> >> Active Resources: >> * Resource Group: zbx_prod_Web_Core: >> * VIP (ocf::heartbeat:IPaddr2): Started server07 >> * ZabbixServer (systemd:zabbix-server): Started server07 >> >> Failed Resource Actions: >> * VIP_monitor_10000 on server07 'error' (1): call=123, status='Timed Out', > >> exitreason='', last-rc-change='2021-07-19 15:02:27 +02:00', queued=0ms, >> exec=0ms >> >> Any idea ? because nothing very revealing in the following log files. >> >> Here are the monitoring files just before and just after the time out. >> >> VIP.monitor.2021-07-19.15:01:27 : >> +++ 15:01:27: ocf_start_trace:999: echo >> +++ 15:01:27: ocf_start_trace:999: printenv >> +++ 15:01:27: ocf_start_trace:999: sort >> ++ 15:01:27: ocf_start_trace:999: env=' >> HA_LOGFACILITY=daemon >> HA_LOGFILE=/var/log/pacemaker/pacemaker.log >> HA_cluster_type=corosync >> HA_debug=0 >> HA_logfacility=daemon >> HA_logfile=/var/log/pacemaker/pacemaker.log >> HA_mcp=true >> HA_quorum_type=corosync >> INVOCATION_ID=5cd03e610fbf4a9bb3ffe2b30e1fb5d4 >> JOURNAL_STREAM=9:4433035 >> LC_ALL=C >> OCF_EXIT_REASON_PREFIX=ocf-exit-reason: >> OCF_RA_VERSION_MAJOR=1 >> OCF_RA_VERSION_MINOR=0 >> OCF_RESKEY_CRM_meta_interval=10000 >> OCF_RESKEY_CRM_meta_name=monitor >> OCF_RESKEY_CRM_meta_on_node=server07 >> OCF_RESKEY_CRM_meta_on_node_uuid=2 >> OCF_RESKEY_CRM_meta_timeout=20000 >> OCF_RESKEY_crm_feature_set=3.7.1 >> OCF_RESKEY_ip=10.0.0.67 >> OCF_RESKEY_monitor_retries=10 >> OCF_RESKEY_trace_file=/apps/Zabbix_Log/Core >> OCF_RESKEY_trace_ra=1 >> OCF_RESOURCE_INSTANCE=VIP >> OCF_RESOURCE_PROVIDER=heartbeat >> OCF_RESOURCE_TYPE=IPaddr2 >> OCF_ROOT=/usr/lib/ocf >> > PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/sbin: >> /usr/bin:/usr/ucb >> PCMK_cluster_type=corosync >> PCMK_debug=0 >> PCMK_logfacility=daemon >> PCMK_logfile=/var/log/pacemaker/pacemaker.log >> PCMK_mcp=true >> PCMK_quorum_type=corosync >> PCMK_service=pacemaker-execd >> PCMK_watchdog=false >> PWD=/var/lib/pacemaker/cores >> SHLVL=1 >> VALGRIND_OPTS=--leak-check=full --trace-children=no --vgdb=no >> --num-callers=25 --log-file=/var/lib/pacemaker/valgrind-%p >> --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions >> --gen-suppressions=all >> _=/usr/bin/printenv >> > __OCF_TRC_DEST=/var/lib/heartbeat/trace_ra/IPaddr2/VIP.monitor.2021-07-19.15 >> :01:27 >> __OCF_TRC_MANAGE=1' >> ++ 15:01:27: source:1053: ocf_is_true '' >> ++ 15:01:27: ocf_is_true:103: case "$1" in >> ++ 15:01:27: ocf_is_true:103: case "$1" in >> ++ 15:01:27: ocf_is_true:105: false >> + 15:01:27: main:69: . /usr/lib/ocf/lib/heartbeat/findif.sh >> + 15:01:27: main:72: OCF_RESKEY_lvs_support_default=false >> + 15:01:27: main:73: OCF_RESKEY_lvs_ipv6_addrlabel_default=false >> + 15:01:27: main:74: OCF_RESKEY_lvs_ipv6_addrlabel_value_default=99 >> + 15:01:27: main:75: OCF_RESKEY_clusterip_hash_default=sourceip-sourceport >> + 15:01:27: main:76: OCF_RESKEY_unique_clone_address_default=false >> + 15:01:27: main:77: OCF_RESKEY_arp_interval_default=200 >> + 15:01:27: main:78: OCF_RESKEY_arp_count_default=5 >> + 15:01:27: main:79: OCF_RESKEY_arp_count_refresh_default=0 >> + 15:01:27: main:80: OCF_RESKEY_arp_bg_default=true >> + 15:01:27: main:81: OCF_RESKEY_run_arping_default=false >> + 15:01:27: main:82: OCF_RESKEY_noprefixroute_default=false >> + 15:01:27: main:83: OCF_RESKEY_preferred_lft_default=forever >> + 15:01:27: main:84: OCF_RESKEY_monitor_retries=1 >> + 15:01:27: main:86: : false >> + 15:01:27: main:87: : false >> + 15:01:27: main:88: : 99 >> + 15:01:27: main:89: : sourceip-sourceport >> + 15:01:27: main:90: : false >> + 15:01:27: main:91: : 200 >> + 15:01:27: main:92: : 5 >> + 15:01:27: main:93: : 0 >> + 15:01:27: main:94: : true >> + 15:01:27: main:95: : false >> + 15:01:27: main:96: : false >> + 15:01:27: main:97: : forever >> + 15:01:27: main:98: : 1 >> + 15:01:27: main:101: SENDARP=/usr/libexec/heartbeat/send_arp >> + 15:01:27: main:102: SENDUA=/usr/libexec/heartbeat/send_ua >> + 15:01:27: main:103: FINDIF=findif >> + 15:01:27: main:104: VLDIR=/run/resource-agents >> + 15:01:27: main:105: SENDARPPIDDIR=/run/resource-agents >> + 15:01:27: main:106: >> CIP_lockfile=/run/resource-agents/IPaddr2-CIP-10.0.0.67 >> + 15:01:27: main:108: IPADDR2_CIP_IPTABLES=iptables >> + 15:01:27: main:1261: ocf_is_true false >> + 15:01:27: ocf_is_true:103: case "$1" in >> + 15:01:27: ocf_is_true:105: false >> + 15:01:27: main:1268: case $__OCF_ACTION in >> + 15:01:27: main:1276: ip_validate >> + 15:01:27: ip_validate:1184: check_binary ip >> + 15:01:27: check_binary:57: have_binary ip >> + 15:01:27: have_binary:69: '[' '' = 1 ']' >> ++ 15:01:27: have_binary:72: echo ip >> ++ 15:01:27: have_binary:72: sed -e 's/ -.*//' >> + 15:01:27: have_binary:72: local bin=ip >> ++ 15:01:27: have_binary:73: which ip >> + 15:01:27: have_binary:73: test -x /usr/sbin/ip >> + 15:01:27: ip_validate:1185: IP_CIP= >> + 15:01:27: ip_validate:1187: ip_init >> + 15:01:27: ip_init:423: local rc >> ++ 15:01:27: ip_init:425: uname -s >> + 15:01:27: ip_init:425: '[' XLinux '!=' XLinux ']' >> + 15:01:27: ip_init:430: '[' X10.0.0.67 = X ']' >> + 15:01:27: ip_init:436: case $__OCF_ACTION in >> + 15:01:27: ip_init:438: true >> + 15:01:27: ip_init:441: : 'YAY!' >> + 15:01:27: ip_init:447: BASEIP=10.0.0.67 >> + 15:01:27: ip_init:448: BRDCAST= >> + 15:01:27: ip_init:449: NIC= >> + 15:01:27: ip_init:453: '[' '!' -z '' -a -z '' ']' >> + 15:01:27: ip_init:458: NETMASK= >> + 15:01:27: ip_init:459: IFLABEL= >> + 15:01:27: ip_init:460: IF_MAC= >> + 15:01:27: ip_init:462: IP_INC_GLOBAL=1 >> ++ 15:01:27: ip_init:463: expr 0 + 1 >> + 15:01:27: ip_init:463: IP_INC_NO=1 >> + 15:01:27: ip_init:465: ocf_is_true false >> + 15:01:27: ocf_is_true:103: case "$1" in >> + 15:01:27: ocf_is_true:105: false >> + 15:01:27: ip_init:470: ocf_is_decimal 1 >> + 15:01:27: ocf_is_decimal:94: case "$1" in >> + 15:01:27: ocf_is_decimal:98: true >> + 15:01:27: ip_init:470: '[' 1 -gt 0 ']' >> + 15:01:27: ip_init:471: : >> + 15:01:27: ip_init:477: echo 10.0.0.67 >> + 15:01:27: ip_init:477: grep -qs : >> + 15:01:27: ip_init:478: '[' 1 -ne 0 ']' >> + 15:01:27: ip_init:479: FAMILY=inet >> + 15:01:27: ip_init:480: ocf_is_true false >> + 15:01:27: ocf_is_true:103: case "$1" in >> + 15:01:27: ocf_is_true:105: false >> + 15:01:27: ip_init:507: case $NIC in >> + 15:01:27: ip_init:507: case $NIC in >> ++ 15:01:27: ip_init:518: findif >> ++ 15:01:27: findif:197: local match=10.0.0.67 >> ++ 15:01:27: findif:198: local family >> ++ 15:01:27: findif:199: local scope >> ++ 15:01:27: findif:200: local nic= >> ++ 15:01:27: findif:201: local netmask= >> ++ 15:01:27: findif:202: local brdcast= >> ++ 15:01:27: findif:204: echo 10.0.0.67 >> ++ 15:01:27: findif:204: grep -qs : >> ++ 15:01:27: findif:205: '[' 1 = 0 ']' >> ++ 15:01:27: findif:208: family=inet >> ++ 15:01:27: findif:209: scope='scope link' >> ++ 15:01:27: findif:211: findif_check_params inet >> ++ 15:01:27: findif_check_params:123: local family=inet >> ++ 15:01:27: findif_check_params:124: local match=10.0.0.67 >> ++ 15:01:27: findif_check_params:125: local nic= >> ++ 15:01:27: findif_check_params:127: netmask= >> ++ 15:01:27: findif_check_params:128: local brdcast= >> ++ 15:01:27: findif_check_params:129: local errmsg >> ++ 15:01:27: findif_check_params:131: maybe_convert_dotted_quad_to_cidr >> ++ 15:01:27: maybe_convert_dotted_quad_to_cidr:55: case $netmask in >> ++ 15:01:27: maybe_convert_dotted_quad_to_cidr:55: case $netmask in >> ++ 15:01:27: maybe_convert_dotted_quad_to_cidr:68: return >> ++ 15:01:27: findif_check_params:135: case $__OCF_ACTION in >> ++ 15:01:27: findif_check_params:135: case $__OCF_ACTION in >> ++ 15:01:27: findif_check_params:137: return 0 >> ++ 15:01:27: findif:213: '[' -n '' ']' >> ++ 15:01:27: findif:216: '[' -n '' ']' >> +++ 15:01:27: findif:220: ip -o -f inet route list match 10.0.0.67 scope >> link >> +++ 15:01:27: findif:220: awk 'BEGIN{best=0} /\// { mask=$1; sub(".*/", "", > >> mask); if( int(mask)>=best ) { best=int(mask); best_ln=$0; } } END{print >> best_ln}' >> ++ 15:01:27: findif:220: set -- 10.0.0.0/24 dev team0 proto kernel src >> 10.0.0.66 metric 350 >> ++ 15:01:27: findif:222: '[' 9 = 0 ']' >> ++ 15:01:27: findif:229: '[' -z '' -o -z '' ']' >> ++ 15:01:27: findif:230: '[' 9 = 0 ']' >> ++ 15:01:27: findif:234: case $1 in >> ++ 15:01:27: findif:234: case $1 in >> ++ 15:01:27: findif:235: : OK >> ++ 15:01:27: findif:243: '[' -z '' ']' >> ++ 15:01:27: findif:243: nic=team0 >> ++ 15:01:27: findif:244: '[' -z '' ']' >> ++ 15:01:27: findif:244: netmask=24 >> ++ 15:01:27: findif:245: '[' inet = inet ']' >> ++ 15:01:27: findif:246: '[' -z '' ']' >> ++ 15:01:27: findif:247: '[' -n 10.0.0.66 ']' >> +++ 15:01:27: findif:248: ip -o -f inet addr show >> +++ 15:01:27: findif:248: grep 10.0.0.66 >> ++ 15:01:27: findif:248: set -- 5: team0 inet 10.0.0.66/24 brd 10.0.0.255 >> scope global noprefixroute 'team0\' valid_lft forever preferred_lft forever >> ++ 15:01:27: findif:249: '[' brd = brd ']' >> ++ 15:01:27: findif:249: brdcast=10.0.0.255 >> ++ 15:01:27: findif:258: echo 'team0 netmask 24 broadcast 10.0.0.255' >> ++ 15:01:27: findif:259: return 0 >> + 15:01:27: ip_init:507: case $NIC in >> + 15:01:27: ip_init:518: NICINFO='team0 netmask 24 broadcast 10.0.0.255' >> + 15:01:27: ip_init:519: rc=0 >> + 15:01:27: ip_init:521: '[' 0 -eq 0 ']' >> ++ 15:01:27: ip_init:523: echo 'team0 netmask 24 broadcast 10.0.0.255' >> ++ 15:01:27: ip_init:523: sed -e 's/netmask\ //;s/broadcast\ //' >> + 15:01:27: ip_init:523: NICINFO='team0 24 10.0.0.255' >> ++ 15:01:27: ip_init:524: echo 'team0 24 10.0.0.255' >> ++ 15:01:27: ip_init:524: cut '-d ' -f1 >> + 15:01:27: ip_init:524: NIC=team0 >> ++ 15:01:27: ip_init:525: echo 'team0 24 10.0.0.255' >> ++ 15:01:27: ip_init:525: cut '-d ' -f2 >> + 15:01:27: ip_init:525: NETMASK=24 >> ++ 15:01:27: ip_init:526: echo 'team0 24 10.0.0.255' >> ++ 15:01:27: ip_init:526: cut '-d ' -f3 >> + 15:01:27: ip_init:526: BRDCAST=10.0.0.255 >> + 15:01:27: ip_init:541: >> SENDARPPIDFILE=/run/resource-agents/send_arp-10.0.0.67 >> + 15:01:27: ip_init:543: '[' -n '' ']' >> + 15:01:27: ip_init:551: '[' 1 -gt 1 ']' >> + 15:01:27: ip_validate:1189: set_send_arp_program >> + 15:01:27: set_send_arp_program:1149: ARP_SENDER=send_arp >> + 15:01:27: set_send_arp_program:1150: '[' -n '' ']' >> + 15:01:27: set_send_arp_program:1171: is_infiniband >> + 15:01:27: is_infiniband:767: grep link/infiniband >> + 15:01:27: is_infiniband:767: ip link show team0 >> + 15:01:27: ip_validate:1191: '[' -n '' ']' >> + 15:01:27: ip_validate:1202: ocf_is_true false >> + 15:01:27: ocf_is_true:103: case "$1" in >> + 15:01:27: ocf_is_true:105: false >> + 15:01:27: ip_validate:1208: ocf_is_decimal 200 >> + 15:01:27: ocf_is_decimal:94: case "$1" in >> + 15:01:27: ocf_is_decimal:98: true >> + 15:01:27: ip_validate:1208: '[' 200 -gt 0 ']' >> + 15:01:27: ip_validate:1209: : >> + 15:01:27: ip_validate:1215: ocf_is_decimal 5 >> + 15:01:27: ocf_is_decimal:94: case "$1" in >> + 15:01:27: ocf_is_decimal:98: true >> + 15:01:28: ip_validate:1215: '[' 5 -gt 0 ']' >> + 15:01:28: ip_validate:1216: : >> + 15:01:28: ip_validate:1222: '[' -z forever ']' >> + 15:01:28: ip_validate:1227: '[' -n '' ']' >> + 15:01:28: main:1278: case $__OCF_ACTION in >> + 15:01:28: main:1292: ip_monitor >> ++ 15:01:28: ip_monitor:1131: ip_served >> ++ 15:01:28: ip_served:925: '[' -z team0 ']' >> +++ 15:01:28: ip_served:930: find_interface 10.0.0.67 24 >> +++ 15:01:28: find_interface:579: local ipaddr=10.0.0.67 >> +++ 15:01:28: find_interface:580: local netmask=24 >> +++ 15:01:28: find_interface:581: local iface= >> ++++ 15:01:28: find_interface:586: seq 1 1 >> +++ 15:01:28: find_interface:586: for i in $(seq 1 >> $OCF_RESKEY_monitor_retries) >> ++++ 15:01:28: find_interface:590: ip -o -f inet addr show >> ++++ 15:01:28: find_interface:590: cut -d ' ' -f2 >> ++++ 15:01:28: find_interface:590: grep '\ 10.0.0.67/24' >> ++++ 15:01:28: find_interface:590: grep -v '^ipsec[0-9][0-9]*$' >> +++ 15:01:28: find_interface:590: iface=team0 >> +++ 15:01:28: find_interface:592: '[' -n team0 ']' >> +++ 15:01:28: find_interface:593: break >> +++ 15:01:28: find_interface:601: echo team0 >> +++ 15:01:28: find_interface:602: return 0 >> ++ 15:01:28: ip_served:930: cur_nic=team0 >> ++ 15:01:28: ip_served:932: '[' -z team0 ']' >> ++ 15:01:28: ip_served:937: '[' -z '' ']' >> ++ 15:01:28: ip_served:938: for i in $cur_nic >> ++ 15:01:28: ip_served:940: '[' team0 = team0 ']' >> ++ 15:01:28: ip_served:941: echo ok >> ++ 15:01:28: ip_served:942: return 0 >> + 15:01:28: ip_monitor:1131: local ip_status=ok >> + 15:01:28: ip_monitor:1132: case $ip_status in >> + 15:01:28: ip_monitor:1134: run_arp_sender refresh >> + 15:01:28: run_arp_sender:844: '[' xrefresh = xrefresh ']' >> + 15:01:28: run_arp_sender:845: ARP_COUNT=0 >> + 15:01:28: run_arp_sender:846: LOGLEVEL=debug >> + 15:01:28: run_arp_sender:851: '[' 0 -eq 0 ']' >> + 15:01:28: run_arp_sender:852: return >> + 15:01:28: ip_monitor:1135: return 0 >> >> VIP.monitor.2021-07-19.15:03:14 : >> +++ 15:03:14: ocf_start_trace:999: echo >> +++ 15:03:14: ocf_start_trace:999: printenv >> +++ 15:03:14: ocf_start_trace:999: sort >> ++ 15:03:14: ocf_start_trace:999: env=' >> HA_LOGFACILITY=daemon >> HA_LOGFILE=/var/log/pacemaker/pacemaker.log >> HA_cluster_type=corosync >> HA_debug=0 >> HA_logfacility=daemon >> HA_logfile=/var/log/pacemaker/pacemaker.log >> HA_mcp=true >> HA_quorum_type=corosync >> INVOCATION_ID=5cd03e610fbf4a9bb3ffe2b30e1fb5d4 >> JOURNAL_STREAM=9:4433035 >> LC_ALL=C >> OCF_EXIT_REASON_PREFIX=ocf-exit-reason: >> OCF_RA_VERSION_MAJOR=1 >> OCF_RA_VERSION_MINOR=0 >> OCF_RESKEY_CRM_meta_interval=10000 >> OCF_RESKEY_CRM_meta_name=monitor >> OCF_RESKEY_CRM_meta_on_node=server07 >> OCF_RESKEY_CRM_meta_on_node_uuid=2 >> OCF_RESKEY_CRM_meta_timeout=20000 >> OCF_RESKEY_crm_feature_set=3.7.1 >> OCF_RESKEY_ip=10.0.0.67 >> OCF_RESKEY_monitor_retries=10 >> OCF_RESKEY_trace_file=/apps/Zabbix_Log/Core >> OCF_RESKEY_trace_ra=1 >> OCF_RESOURCE_INSTANCE=VIP >> OCF_RESOURCE_PROVIDER=heartbeat >> OCF_RESOURCE_TYPE=IPaddr2 >> OCF_ROOT=/usr/lib/ocf >> > PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/sbin: >> /usr/bin:/usr/ucb >> PCMK_cluster_type=corosync >> PCMK_debug=0 >> PCMK_logfacility=daemon >> PCMK_logfile=/var/log/pacemaker/pacemaker.log >> PCMK_mcp=true >> PCMK_quorum_type=corosync >> PCMK_service=pacemaker-execd >> PCMK_watchdog=false >> PWD=/var/lib/pacemaker/cores >> SHLVL=1 >> VALGRIND_OPTS=--leak-check=full --trace-children=no --vgdb=no >> --num-callers=25 --log-file=/var/lib/pacemaker/valgrind-%p >> --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions >> --gen-suppressions=all >> _=/usr/bin/printenv >> > __OCF_TRC_DEST=/var/lib/heartbeat/trace_ra/IPaddr2/VIP.monitor.2021-07-19.15 >> :03:14 >> __OCF_TRC_MANAGE=1' >> ++ 15:03:14: source:1053: ocf_is_true '' >> ++ 15:03:14: ocf_is_true:103: case "$1" in >> ++ 15:03:14: ocf_is_true:103: case "$1" in >> ++ 15:03:14: ocf_is_true:105: false >> + 15:03:14: main:69: . /usr/lib/ocf/lib/heartbeat/findif.sh >> + 15:03:14: main:72: OCF_RESKEY_lvs_support_default=false >> + 15:03:14: main:73: OCF_RESKEY_lvs_ipv6_addrlabel_default=false >> + 15:03:14: main:74: OCF_RESKEY_lvs_ipv6_addrlabel_value_default=99 >> + 15:03:14: main:75: OCF_RESKEY_clusterip_hash_default=sourceip-sourceport >> + 15:03:14: main:76: OCF_RESKEY_unique_clone_address_default=false >> + 15:03:14: main:77: OCF_RESKEY_arp_interval_default=200 >> + 15:03:14: main:78: OCF_RESKEY_arp_count_default=5 >> + 15:03:14: main:79: OCF_RESKEY_arp_count_refresh_default=0 >> + 15:03:14: main:80: OCF_RESKEY_arp_bg_default=true >> + 15:03:14: main:81: OCF_RESKEY_run_arping_default=false >> + 15:03:14: main:82: OCF_RESKEY_noprefixroute_default=false >> + 15:03:14: main:83: OCF_RESKEY_preferred_lft_default=forever >> + 15:03:14: main:84: OCF_RESKEY_monitor_retries=1 >> + 15:03:14: main:86: : false >> + 15:03:14: main:87: : false >> + 15:03:14: main:88: : 99 >> + 15:03:14: main:89: : sourceip-sourceport >> + 15:03:14: main:90: : false >> + 15:03:14: main:91: : 200 >> + 15:03:14: main:92: : 5 >> + 15:03:14: main:93: : 0 >> + 15:03:14: main:94: : true >> + 15:03:14: main:95: : false >> + 15:03:14: main:96: : false >> + 15:03:14: main:97: : forever >> + 15:03:14: main:98: : 1 >> + 15:03:14: main:101: SENDARP=/usr/libexec/heartbeat/send_arp >> + 15:03:14: main:102: SENDUA=/usr/libexec/heartbeat/send_ua >> + 15:03:14: main:103: FINDIF=findif >> + 15:03:14: main:104: VLDIR=/run/resource-agents >> + 15:03:14: main:105: SENDARPPIDDIR=/run/resource-agents >> + 15:03:14: main:106: >> CIP_lockfile=/run/resource-agents/IPaddr2-CIP-10.0.0.67 >> + 15:03:14: main:108: IPADDR2_CIP_IPTABLES=iptables >> + 15:03:14: main:1261: ocf_is_true false >> + 15:03:14: ocf_is_true:103: case "$1" in >> + 15:03:14: ocf_is_true:105: false >> + 15:03:14: main:1268: case $__OCF_ACTION in >> + 15:03:14: main:1276: ip_validate >> + 15:03:14: ip_validate:1184: check_binary ip >> + 15:03:14: check_binary:57: have_binary ip >> + 15:03:14: have_binary:69: '[' '' = 1 ']' >> ++ 15:03:14: have_binary:72: echo ip >> ++ 15:03:14: have_binary:72: sed -e 's/ -.*//' >> + 15:03:14: have_binary:72: local bin=ip >> ++ 15:03:14: have_binary:73: which ip >> + 15:03:14: have_binary:73: test -x /usr/sbin/ip >> + 15:03:14: ip_validate:1185: IP_CIP= >> + 15:03:14: ip_validate:1187: ip_init >> + 15:03:14: ip_init:423: local rc >> ++ 15:03:14: ip_init:425: uname -s >> + 15:03:14: ip_init:425: '[' XLinux '!=' XLinux ']' >> + 15:03:14: ip_init:430: '[' X10.0.0.67 = X ']' >> + 15:03:14: ip_init:436: case $__OCF_ACTION in >> + 15:03:14: ip_init:438: true >> + 15:03:14: ip_init:441: : 'YAY!' >> + 15:03:14: ip_init:447: BASEIP=10.0.0.67 >> + 15:03:14: ip_init:448: BRDCAST= >> + 15:03:14: ip_init:449: NIC= >> + 15:03:14: ip_init:453: '[' '!' -z '' -a -z '' ']' >> + 15:03:14: ip_init:458: NETMASK= >> + 15:03:14: ip_init:459: IFLABEL= >> + 15:03:14: ip_init:460: IF_MAC= >> + 15:03:14: ip_init:462: IP_INC_GLOBAL=1 >> ++ 15:03:14: ip_init:463: expr 0 + 1 >> + 15:03:14: ip_init:463: IP_INC_NO=1 >> + 15:03:14: ip_init:465: ocf_is_true false >> + 15:03:14: ocf_is_true:103: case "$1" in >> + 15:03:14: ocf_is_true:105: false >> + 15:03:14: ip_init:470: ocf_is_decimal 1 >> + 15:03:14: ocf_is_decimal:94: case "$1" in >> + 15:03:14: ocf_is_decimal:98: true >> + 15:03:14: ip_init:470: '[' 1 -gt 0 ']' >> + 15:03:14: ip_init:471: : >> + 15:03:14: ip_init:477: echo 10.0.0.67 >> + 15:03:14: ip_init:477: grep -qs : >> + 15:03:14: ip_init:478: '[' 1 -ne 0 ']' >> + 15:03:14: ip_init:479: FAMILY=inet >> + 15:03:14: ip_init:480: ocf_is_true false >> + 15:03:14: ocf_is_true:103: case "$1" in >> + 15:03:14: ocf_is_true:105: false >> + 15:03:14: ip_init:507: case $NIC in >> + 15:03:14: ip_init:507: case $NIC in >> ++ 15:03:14: ip_init:518: findif >> ++ 15:03:14: findif:197: local match=10.0.0.67 >> ++ 15:03:14: findif:198: local family >> ++ 15:03:14: findif:199: local scope >> ++ 15:03:14: findif:200: local nic= >> ++ 15:03:14: findif:201: local netmask= >> ++ 15:03:14: findif:202: local brdcast= >> ++ 15:03:14: findif:204: echo 10.0.0.67 >> ++ 15:03:14: findif:204: grep -qs : >> ++ 15:03:14: findif:205: '[' 1 = 0 ']' >> ++ 15:03:14: findif:208: family=inet >> ++ 15:03:14: findif:209: scope='scope link' >> ++ 15:03:14: findif:211: findif_check_params inet >> ++ 15:03:14: findif_check_params:123: local family=inet >> ++ 15:03:14: findif_check_params:124: local match=10.0.0.67 >> ++ 15:03:14: findif_check_params:125: local nic= >> ++ 15:03:14: findif_check_params:127: netmask= >> ++ 15:03:14: findif_check_params:128: local brdcast= >> ++ 15:03:14: findif_check_params:129: local errmsg >> ++ 15:03:14: findif_check_params:131: maybe_convert_dotted_quad_to_cidr >> ++ 15:03:14: maybe_convert_dotted_quad_to_cidr:55: case $netmask in >> ++ 15:03:14: maybe_convert_dotted_quad_to_cidr:55: case $netmask in >> ++ 15:03:14: maybe_convert_dotted_quad_to_cidr:68: return >> ++ 15:03:14: findif_check_params:135: case $__OCF_ACTION in >> ++ 15:03:14: findif_check_params:135: case $__OCF_ACTION in >> ++ 15:03:14: findif_check_params:137: return 0 >> ++ 15:03:14: findif:213: '[' -n '' ']' >> ++ 15:03:14: findif:216: '[' -n '' ']' >> +++ 15:03:14: findif:220: ip -o -f inet route list match 10.0.0.67 scope >> link >> +++ 15:03:14: findif:220: awk 'BEGIN{best=0} /\// { mask=$1; sub(".*/", "", > >> mask); if( int(mask)>=best ) { best=int(mask); best_ln=$0; } } END{print >> best_ln}' >> ++ 15:03:14: findif:220: set -- 10.0.0.0/24 dev team0 proto kernel src >> 10.0.0.66 metric 350 >> ++ 15:03:14: findif:222: '[' 9 = 0 ']' >> ++ 15:03:14: findif:229: '[' -z '' -o -z '' ']' >> ++ 15:03:14: findif:230: '[' 9 = 0 ']' >> ++ 15:03:14: findif:234: case $1 in >> ++ 15:03:14: findif:234: case $1 in >> ++ 15:03:14: findif:235: : OK >> ++ 15:03:14: findif:243: '[' -z '' ']' >> ++ 15:03:14: findif:243: nic=team0 >> ++ 15:03:14: findif:244: '[' -z '' ']' >> ++ 15:03:14: findif:244: netmask=24 >> ++ 15:03:14: findif:245: '[' inet = inet ']' >> ++ 15:03:14: findif:246: '[' -z '' ']' >> ++ 15:03:14: findif:247: '[' -n 10.0.0.66 ']' >> +++ 15:03:14: findif:248: ip -o -f inet addr show >> +++ 15:03:14: findif:248: grep 10.0.0.66 >> ++ 15:03:14: findif:248: set -- 5: team0 inet 10.0.0.66/24 brd 10.0.0.255 >> scope global noprefixroute 'team0\' valid_lft forever preferred_lft forever >> ++ 15:03:14: findif:249: '[' brd = brd ']' >> ++ 15:03:14: findif:249: brdcast=10.0.0.255 >> ++ 15:03:14: findif:258: echo 'team0 netmask 24 broadcast 10.0.0.255' >> ++ 15:03:14: findif:259: return 0 >> + 15:03:14: ip_init:507: case $NIC in >> + 15:03:14: ip_init:518: NICINFO='team0 netmask 24 broadcast 10.0.0.255' >> + 15:03:14: ip_init:519: rc=0 >> + 15:03:14: ip_init:521: '[' 0 -eq 0 ']' >> ++ 15:03:14: ip_init:523: echo 'team0 netmask 24 broadcast 10.0.0.255' >> ++ 15:03:14: ip_init:523: sed -e 's/netmask\ //;s/broadcast\ //' >> + 15:03:14: ip_init:523: NICINFO='team0 24 10.0.0.255' >> ++ 15:03:14: ip_init:524: echo 'team0 24 10.0.0.255' >> ++ 15:03:14: ip_init:524: cut '-d ' -f1 >> + 15:03:14: ip_init:524: NIC=team0 >> ++ 15:03:14: ip_init:525: echo 'team0 24 10.0.0.255' >> ++ 15:03:14: ip_init:525: cut '-d ' -f2 >> + 15:03:14: ip_init:525: NETMASK=24 >> ++ 15:03:14: ip_init:526: echo 'team0 24 10.0.0.255' >> ++ 15:03:14: ip_init:526: cut '-d ' -f3 >> + 15:03:14: ip_init:526: BRDCAST=10.0.0.255 >> + 15:03:14: ip_init:541: >> SENDARPPIDFILE=/run/resource-agents/send_arp-10.0.0.67 >> + 15:03:14: ip_init:543: '[' -n '' ']' >> + 15:03:14: ip_init:551: '[' 1 -gt 1 ']' >> + 15:03:14: ip_validate:1189: set_send_arp_program >> + 15:03:14: set_send_arp_program:1149: ARP_SENDER=send_arp >> + 15:03:14: set_send_arp_program:1150: '[' -n '' ']' >> + 15:03:14: set_send_arp_program:1171: is_infiniband >> + 15:03:14: is_infiniband:767: ip link show team0 >> + 15:03:14: is_infiniband:767: grep link/infiniband >> + 15:03:14: ip_validate:1191: '[' -n '' ']' >> + 15:03:14: ip_validate:1202: ocf_is_true false >> + 15:03:14: ocf_is_true:103: case "$1" in >> + 15:03:14: ocf_is_true:105: false >> + 15:03:14: ip_validate:1208: ocf_is_decimal 200 >> + 15:03:14: ocf_is_decimal:94: case "$1" in >> + 15:03:14: ocf_is_decimal:98: true >> + 15:03:14: ip_validate:1208: '[' 200 -gt 0 ']' >> + 15:03:14: ip_validate:1209: : >> + 15:03:14: ip_validate:1215: ocf_is_decimal 5 >> + 15:03:14: ocf_is_decimal:94: case "$1" in >> + 15:03:14: ocf_is_decimal:98: true >> + 15:03:14: ip_validate:1215: '[' 5 -gt 0 ']' >> + 15:03:14: ip_validate:1216: : >> + 15:03:14: ip_validate:1222: '[' -z forever ']' >> + 15:03:14: ip_validate:1227: '[' -n '' ']' >> + 15:03:14: main:1278: case $__OCF_ACTION in >> + 15:03:14: main:1292: ip_monitor >> ++ 15:03:14: ip_monitor:1131: ip_served >> ++ 15:03:14: ip_served:925: '[' -z team0 ']' >> +++ 15:03:14: ip_served:930: find_interface 10.0.0.67 24 >> +++ 15:03:14: find_interface:579: local ipaddr=10.0.0.67 >> +++ 15:03:14: find_interface:580: local netmask=24 >> +++ 15:03:14: find_interface:581: local iface= >> ++++ 15:03:14: find_interface:586: seq 1 1 >> +++ 15:03:14: find_interface:586: for i in $(seq 1 >> $OCF_RESKEY_monitor_retries) >> ++++ 15:03:14: find_interface:590: ip -o -f inet addr show >> ++++ 15:03:14: find_interface:590: grep '\ 10.0.0.67/24' >> ++++ 15:03:14: find_interface:590: cut -d ' ' -f2 >> ++++ 15:03:14: find_interface:590: grep -v '^ipsec[0-9][0-9]*$' >> +++ 15:03:14: find_interface:590: iface=team0 >> +++ 15:03:14: find_interface:592: '[' -n team0 ']' >> +++ 15:03:14: find_interface:593: break >> +++ 15:03:14: find_interface:601: echo team0 >> +++ 15:03:14: find_interface:602: return 0 >> ++ 15:03:14: ip_served:930: cur_nic=team0 >> ++ 15:03:14: ip_served:932: '[' -z team0 ']' >> ++ 15:03:14: ip_served:937: '[' -z '' ']' >> ++ 15:03:14: ip_served:938: for i in $cur_nic >> ++ 15:03:14: ip_served:940: '[' team0 = team0 ']' >> ++ 15:03:14: ip_served:941: echo ok >> ++ 15:03:14: ip_served:942: return 0 >> + 15:03:14: ip_monitor:1131: local ip_status=ok >> + 15:03:14: ip_monitor:1132: case $ip_status in >> + 15:03:14: ip_monitor:1134: run_arp_sender refresh >> + 15:03:14: run_arp_sender:844: '[' xrefresh = xrefresh ']' >> + 15:03:14: run_arp_sender:845: ARP_COUNT=0 >> + 15:03:14: run_arp_sender:846: LOGLEVEL=debug >> + 15:03:14: run_arp_sender:851: '[' 0 -eq 0 ']' >> + 15:03:14: run_arp_sender:852: return >> + 15:03:14: ip_monitor:1135: return 0 >> >> Best regards, >> >> Florent >> >> De : Users > <[email protected]<mailto:[email protected]>> > >> De la part de Klaus Wenninger >> Envoyé : lundi 5 juillet 2021 09:14 >> À : Cluster Labs - All topics related to open-source clustering welcomed >> <[email protected]<mailto:[email protected]>> >> Objet : Re: [ClusterLabs] Antw: [EXT] VIP monitor Timed Out >> >> Using DHCP? Maybe a glitch/issue during renewal ... but elaborate monitoring > >> as suggested should show that ... >> >> On Mon, Jul 5, 2021 at 9:03 AM Ulrich Windl >> > <[email protected]<mailto:[email protected]>> > >> wrote: >> Hi! >> >> See "ip_served" and "find_interface" (essentially "$IP2UTIL -o -f $FAMILY >> addr >> show") in the RA. >> Basically it searches _all_ interfaces for $ipaddr/$netmask to locate the >> interface when it could also examine the interface and look at the address. >> For many interfaces it could make a difference performance-wise IMHO. >> Maybe so a periodic sampling how long the corresponding command takes for >> your >> setup. >> If it's not a timing issue, the interface may actually be gone temporarily, > >> or >> the tools could have bugs. >> >> Regards, >> Ulrich >> >>>>> PASERO Florent >> > <[email protected]<mailto:[email protected] > >> as.com>> schrieb am >> 01.07.2021 um >> 17:29 in Nachricht >> > <pr0p264mb21394030d5c5120bb885e95db4...@pr0p264mb2139.frap264.prod.outlook.co > >> > M<mailto:pr0p264mb21394030d5c5120bb885e95db4...@pr0p264mb2139.frap264.prod.ou > T > >> LOOK.COM>>: >> >>> Hi, >>> >>> Once or twice a week, we have a 'Timed out' on our VIP: >>> ~$ pcs status >>> Cluster name: zbx_pprod_Web_Core >>> Cluster Summary: >>> * Stack: corosync >>> * Current DC: #####(version 2.0.5‑9.el8_4.1‑ba59be7122) ‑ partition with >>> quorum >>> * Last updated: Mon Jun 28 16:32:09 2021 >>> * Last change: Mon Jun 14 12:42:57 2021 by root via cibadmin on ###### >>> * 2 nodes configured >>> * 2 resource instances configured >>> >>> Node List: >>> * Online: [ ##### #####] >>> >>> Full List of Resources: >>> * Resource Group: zbx_pprod_Web_Core: >>> * VIP (ocf::heartbeat:IPaddr2): Started ##### >>> * ZabbixServer (systemd:zabbix‑server): Started ###### >>> >>> Failed Resource Actions: >>> * VIP_monitor_5000 on ##### 'error' (1): call=69, status='Timed Out', >>> exitreason='', last‑rc‑change='2021‑06‑24 14:41:57 +02:00', queued=0ms, >> exec=0ms >>> * VIP_monitor_5000 on ##### 'error' (1): call=11, status='Timed Out', >>> exitreason='', last‑rc‑change='2021‑06‑17 14:18:20 +02:00', queued=0ms, >> exec=0ms >>> >>> >>> We have the same issue on two completely different clusters. >>> >>> We can see in the log : >>> Jun 24 14:41:29 ##### pacemaker‑execd [1442069] > (child_timeout_callback) >> >>> warning: VIP_monitor_5000 process (PID 2752333) timed out >>> Jun 24 14:41:34 #####pacemaker‑execd [1442069] > (child_timeout_callback) >> >>> crit: VIP_monitor_5000 process (PID 2752333) will not die! >>> Jun 24 14:41:57 ##### pacemaker‑execd [1442069] (operation_finished) >> >>> warning: VIP_monitor_5000[2752333] timed out after 20000ms >>> Jun 24 14:41:57 ##### pacemaker‑controld [1442072] (process_lrm_event) >>> error: Result of monitor operation for VIP on #####: Timed Out | call=69 >>> key=VIP_monitor_5000 timeout=20000ms >>> Jun 24 14:41:57 ##### pacemaker‑based [1442067] (cib_process_request) >> >>> info: Forwarding cib_modify operation for section status to all >>> (origin=local/crmd/722) >>> Jun 24 14:41:57 ##### pacemaker‑based [1442067] (cib_perform_op) >>> info: Diff: ‑‑‑ 0.54.443 2 >>> Jun 24 14:41:57 ##### pacemaker‑based [1442067] (cib_perform_op) >>> info: Diff: +++ 0.54.444 (null) >>> Jun 24 14:41:57 ##### pacemaker‑based [1442067] (cib_perform_op) >>> info: + /cib: @num_updates=444 >>> >>> >>> Thanks for help >>> >>> >>> >>> Classification : Internal >>> This message and any attachments (the "message") is >>> intended solely for the intended addressees and is confidential. >>> If you receive this message in error,or are not the intended recipient(s), >>> please delete it and any copies from your systems and immediately notify >>> the sender. Any unauthorized view, use that does not comply with its >>> purpose, >>> dissemination or disclosure, either whole or partial, is prohibited. Since >>> the internet >>> cannot guarantee the integrity of this message which may not be reliable, >>> BNP PARIBAS >>> (and its subsidiaries) shall not be liable for the message if modified, >>> changed or falsified. >>> Do not print this message unless it is necessary, consider the > environment. >>> >>> >> > ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ >> ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ >>> >>> Ce message et toutes les pieces jointes (ci‑apres le "message") >>> sont etablis a l'intention exclusive de ses destinataires et sont >>> confidentiels. >>> Si vous recevez ce message par erreur ou s'il ne vous est pas destine, >>> merci de le detruire ainsi que toute copie de votre systeme et d'en > avertir >>> immediatement l'expediteur. Toute lecture non autorisee, toute utilisation >>> de >>> ce message qui n'est pas conforme a sa destination, toute diffusion ou > toute >> >>> >>> publication, totale ou partielle, est interdite. L'Internet ne permettant >>> pas d'assurer >>> l'integrite de ce message electronique susceptible d'alteration, BNP > Paribas >> >>> >>> (et ses filiales) decline(nt) toute responsabilite au titre de ce message >>> dans l'hypothese >>> ou il aurait ete modifie, deforme ou falsifie. >>> N'imprimez ce message que si necessaire, pensez a l'environnement. >> >> >> >> _______________________________________________ >> Manage your subscription: >> https://lists.clusterlabs.org/mailman/listinfo/users >> >> ClusterLabs home: https://www.clusterlabs.org/ >> >> >> Classification : Internal > > > > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
