It is this system: https://www.supermicro.com/products/system/1u/1029/SYS-1029TP-DC0R.cfm it has a sas3 backplane with hotswap sas disks that are visible to both nodes at the same time. Gabriele Sonicle S.r.l. : http://www.sonicle.com Music: http://www.gabrielebulfon.com Quantum Mechanics : http://www.cdbaby.com/cd/gabrielebulfon ---------------------------------------------------------------------------------- Da: Ulrich Windl A: [email protected] Data: 29 luglio 2020 15.15.17 CEST Oggetto: [ClusterLabs] Antw: Re: Antw: Re: Antw: [EXT] Stonith failing Gabriele Bulfon schrieb am 29.07.2020 um 14:18 in Nachricht : Hi, it's a single controller, shared to both nodes, SM server. You mean external controller, like NAS or SAN? I thought you are talking about an internal controller like SCSI... I don't know what an "SM server" is. Regards, Ulrich Thanks! Gabriele Sonicle S.r.l. : http://www.sonicle.com Music: http://www.gabrielebulfon.com Quantum Mechanics : http://www.cdbaby.com/cd/gabrielebulfon ---------------------------------------------------------------------------- ------ Da: Ulrich Windl A: [email protected] Data: 29 luglio 2020 9.26.39 CEST Oggetto: [ClusterLabs] Antw: Re: Antw: [EXT] Stonith failing Gabriele Bulfon schrieb am 29.07.2020 um 08:01 in Nachricht : That one was taken from a specific implementation on Solaris 11. The situation is a dual node server with shared storage controller: both nodes see the same disks concurrently. You mean you have a dual-controler setup (one controller on each node, both connected to the same bus)? If so Use sbd! Here we must be sure that the two nodes are not going to import/mount the same zpool at the same time, or we will encounter data corruption: node 1 will be perferred for pool 1, node 2 for pool 2, only in case one of the node goes down or is taken offline the resources should be first free by the leaving node and taken by the other node. Would you suggest one of the available stonith in this case? Thanks! Gabriele Sonicle S.r.l. : http://www.sonicle.com Music: http://www.gabrielebulfon.com Quantum Mechanics : http://www.cdbaby.com/cd/gabrielebulfon ---------------------------------------------------------------------------- ------ Da: Strahil Nikolov A: Cluster Labs - All topics related to open-source clustering welcomed Gabriele Bulfon Data: 29 luglio 2020 6.39.08 CEST Oggetto: Re: [ClusterLabs] Antw: [EXT] Stonith failing Do you have a reason not to use any stonith already available ? Best Regards, Strahil Nikolov На 28 юли 2020 г. 13:26:52 GMT+03:00, Gabriele Bulfon написа: Thanks, I attach here the script. It basically runs ssh on the other node with no password (must be preconfigured via authorization keys) with commands. This was taken from a script by OpenIndiana (I think). As it stated in the comments, we don't want to halt or boot via ssh, only reboot. Maybe this is the problem, we should at least have it shutdown when asked for. Actually if I stop corosync in node 2, I don't want it to shutdown the system but just let node 1 keep control of all resources. Same if I just shutdown manually node 2, node 1 should keep control of all resources and release them back on reboot. Instead, when I stopped corosync on node 2, log was showing the temptative to stonith node 2: why? Thanks! Gabriele Sonicle S.r.l. : http://www.sonicle.com Music: http://www.gabrielebulfon.com Quantum Mechanics : http://www.cdbaby.com/cd/gabrielebulfon Da: Reid Wahl A: Cluster Labs - All topics related to open-source clustering welcomed Data: 28 luglio 2020 12.03.46 CEST Oggetto: Re: [ClusterLabs] Antw: [EXT] Stonith failing Gabriele, "No route to host" is a somewhat generic error message when we can't find anyone to fence the node. It doesn't mean there's necessarily a network routing issue at fault; no need to focus on that error message. I agree with Ulrich about needing to know what the script does. But based on your initial message, it sounds like your custom fence agent returns 1 in response to "on" and "off" actions. Am I understanding correctly? If so, why does it behave that way? Pacemaker is trying to run a poweroff action based on the logs, so it needs your script to support an off action. On Tue, Jul 28, 2020 at 2:47 AM Ulrich Windl [email protected] wrote: Gabriele Bulfon [email protected] schrieb am 28.07.2020 um 10:56 in Nachricht : Hi, now I have my two nodes (xstha1 and xstha2) with IPs configured by Corosync. To check how stonith would work, I turned off Corosync service on second node. First node try to attempt to stonith 2nd node and take care of its resources, but this fails. Stonith action is configured to run a custom script to run ssh commands, I think you should explain what that script does exactly. [...] _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ -- Regards, Reid Wahl, RHCA Software Maintenance Engineer, Red Hat CEE - Platform Support Delivery - ClusterHA _______________________________________________Manage your subscription:https://lists.clusterlabs.org/mailman/listinfo/usersClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ _______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/
