Hi Andrei,
no, this are the only ones:
Location Constraints:
resource 'Apache' (id: location-Apache)
Rules:
Rule: boolean-op=or score=-INFINITY (id: location-Apache-rule)
Expression: pingd lt 1 (id: location-Apache-rule-expr)
Expression: not_defined pingd (id: location-Apache-rule-expr-1)
Colocation Constraints:
resource 'Apache' with resource 'HA-IPs' (id:
colocation-Apache-HA-IPs-INFINITY)
score=INFINITY
resource 'Apache' with resource 'Webcontent_FS' (id:
colocation-Apache-Webcontent_FS-INFINITY)
score=INFINITY
Order Constraints:
start resource 'HA-IPs' then start resource 'Apache' (id:
order-HA-IPs-Apache-mandatory)
start resource 'Webcontent_FS' then start resource 'Apache' (id:
order-Webcontent_FS-Apache-mandatory)
> Gesendet: Dienstag, 22. Oktober 2024 um 15:41
> Von: "Andrei Borzenkov" <[email protected]>
> An: "Cluster Labs - All topics related to open-source clustering welcomed"
<[email protected]>
> CC: "Testuser SST" <[email protected]>
> Betreff: Re: [ClusterLabs] Problem with a new cluster with drbd on
AlmaLinux 9
>
> On Tue, Oct 22, 2024 at 3:18 PM Testuser SST via Users
> <[email protected]> wrote:
> >
> > Hi,
> > I'm running a 2-node-web-cluster on Almalinux-9, pacemaker 2.1.7,
drbd9 and corosync 3.1.
> > I have trouble with the promoting and mounting of the drbd-device.
After activating the cluster,
> > the drbd-device is not getting mounted and is showing quite fast an
error message:
> >
> > pacemaker-schedulerd[4879]: warning: Unexpected result (error:
Couldn't mount device [/dev/drbd1] as /mnt/clusterfs) was recorded for start of
Webcontent_FS on ...
> > pacemaker-schedulerd[4879]: warning: Webcontent_FS cannot run on
kathie3 due to reaching migration threshold (clean up resource to allow again)
> >
>
> Do you have any ordering constraints between Webcontent_DRBD and
Webcontent_FS?
>
> > It's like it's trying to mount the device, but the device is not
ready yet.
> > The device is the drbd1 and I'm trying to mount it on /mnt/clusterfs.
After the error occoured, and I do a "pcs resource cleanup" the cluster is able
to mount it.
> > the drbd-resource is named webcontend_DRBD
> > the mounted filesystem is named webcontend_FS
> > All other resources like httpd and HA-IP's working like a charm.
> >
> > This is the log from the start of the cluster:
> >
> > Oct 22 11:48:12 kathie3 pacemaker-controld[4880]: notice: State
transition S_ELECTION -> S_INTEGRATION
> > Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions:
Start HA-IP_1 ( kathie3 )
> > Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions:
Start HA-IP_2 ( kathie3 )
> > Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions:
Start HA-IP_3 ( kathie3 )
> > Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions:
Start Webcontent_DRBD:0 ( kathie3 )
> > Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions:
Start Webcontent_FS ( kathie3 )
> > Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice: Actions:
Start ping_fw:0 ( kathie3 )
> > Oct 22 11:48:13 kathie3 pacemaker-schedulerd[4879]: notice:
Calculated transition 1106, saving inputs in
/var/lib/pacemaker/pengine/pe-input-336.bz2
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating
start operation HA-IP_1_start_0 locally on kathie3
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating
start operation Webcontent_FS_start_0 locally on kathie3
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating
start operation ping_fw_start_0 locally on kathie3
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating
start operation Webcontent_DRBD_start_0 locally on kathie3
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting
local execution of start operation for HA-IP_1 on kathie3
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting
local execution of start operation for ping_fw on kathie3
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting
local execution of start operation for Webcontent_DRBD on kathie3
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting
local execution of start operation for Webcontent_FS on kathie3
> > Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682892]: INFO: Adding inet
address 192.168.16.75/24 with broadcast address 192.168.16.255 to device ens3
> > Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682912]: INFO: Bringing
device ens3 up
> > Oct 22 11:48:13 kathie3 Filesystem(Webcontent_FS)[1682923]: INFO:
Running start for /dev/drbd1 on /mnt/clusterfs
> > Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_1)[1682929]: INFO:
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/run/resource-agents/send_arp-192.168.16.75 ens3 192.168.16.75 auto not_used
not_used
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Starting worker
thread (node-id 0)
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of
start operation for HA-IP_1 on kathie3: ok
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating
monitor operation HA-IP_1_monitor_30000 locally on kathie3
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting
local execution of monitor operation for HA-IP_1 on kathie3
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Initiating
start operation HA-IP_2_start_0 locally on kathie3
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Requesting
local execution of start operation for HA-IP_2 on kathie3
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Auto-promote
failed: Need access to UpToDate data (-2)
> > Oct 22 11:48:13 kathie3 kernel: /dev/drbd1: Can't open blockdev
> > Oct 22 11:48:13 kathie3 kernel: /dev/drbd1: Can't open blockdev
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1:
meta-data IO uses: blk-bio
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: disk(
Diskless -> Attaching ) [attach]
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: Maximum
number of peer devices = 1
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data: Method to
ensure write ordering: flush
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1:
drbd_bm_resize called with capacity == 104854328
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: resync
bitmap: bits=13106791 words=204794 pages=400
> > Oct 22 11:48:13 kathie3 kernel: drbd1: detected capacity change from
0 to 104854328
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: size =
50 GB (52427164 KB)
> > Oct 22 11:48:13 kathie3 Filesystem(Webcontent_FS)[1683017]: ERROR:
Couldn't mount device [/dev/drbd1] as /mnt/clusterfs
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of
start operation for Webcontent_FS on kathie3: error (Couldn't mount device
[/dev/drbd1] as /mnt/clusterfs)
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice:
Webcontent_FS_start_0@kathie3 output [ blockdev: cannot open /dev/drbd1: No
data available\nmount: /mnt/clusterfs: mount(2) system call failed: No data
available.\nocf-exit-reason:Couldn't mount device [/dev/drbd1] as
/mnt/clusterfs\n ]
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition
1106 aborted by operation Webcontent_FS_start_0 'modify' on kathie3: Event
failed
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition
1106 action 37 (Webcontent_FS_start_0 on kathie3): expected 'ok' but got 'error'
> > Oct 22 11:48:13 kathie3 pacemaker-attrd[4878]: notice: Setting
last-failure-Webcontent_FS#start_0[kathie3] in instance_attributes: (unset)
-> 1729590493
> > Oct 22 11:48:13 kathie3 pacemaker-attrd[4878]: notice: Setting
fail-count-Webcontent_FS#start_0[kathie3] in instance_attributes: (unset) ->
INFINITY
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Transition
1106 aborted by status-1-last-failure-Webcontent_FS.start_0 doing create
last-failure-Webcontent_FS#start_0=1729590493: Transient attribute change
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: bitmap
READ of 400 pages took 34 ms
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: disk(
Attaching -> UpToDate ) [attach]
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1:
attached to current UUID: 826E8850CF10C812
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data/0 drbd1: Setting
exposed data uuid: 826E8850CF10C812
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of
monitor operation for HA-IP_1 on kathie3: ok
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: Starting
sender thread (peer-node-id 1)
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: conn(
StandAlone -> Unconnected ) [connect]
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: Starting
receiver thread (peer-node-id 1)
> > Oct 22 11:48:13 kathie3 kernel: drbd webcontent_data stacy3: conn(
Unconnected -> Connecting ) [connecting]
> > Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683100]: INFO: Adding inet
address 192.168.16.76/24 with broadcast address 192.168.16.255 to device ens3
> > Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683106]: INFO: Bringing
device ens3 up
> > Oct 22 11:48:13 kathie3 IPaddr2(HA-IP_2)[1683112]: INFO:
/usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
/run/resource-agents/send_arp-192.168.16.76 ens3 192.168.16.76 auto not_used
not_used
> > Oct 22 11:48:13 kathie3 pacemaker-controld[4880]: notice: Result of
start operation for HA-IP_2 on kathie3: ok
> > Oct 22 11:48:15 kathie3 pacemaker-attrd[4878]: notice: Setting
pingd[kathie3] in instance_attributes: (unset) -> 1000
> > Oct 22 11:48:15 kathie3 pacemaker-controld[4880]: notice: Result of
start operation for ping_fw on kathie3: ok
> > Oct 22 11:48:17 kathie3 IPaddr2(HA-IP_1)[1683126]: INFO: ARPING
192.168.16.75 from 192.168.16.75 ens3#012Sent 5 probes (5
broadcast(s))#012Received 0 response(s)
> > Oct 22 11:48:17 kathie3 IPaddr2(HA-IP_2)[1683130]: INFO: ARPING
192.168.16.76 from 192.168.16.76 ens3#012Sent 5 probes (5
broadcast(s))#012Received 0 response(s)
> > Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683138]: INFO:
webcontent_data: Called drbdsetup wait-connect-resource webcontent_data
--wfc-timeout=5 --degr-wfc-timeout=5 --outdated-wfc-timeout=5
> > Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683142]: INFO:
webcontent_data: Exit code 5
> > Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683146]: INFO:
webcontent_data: Command output:
> > Oct 22 11:48:18 kathie3 drbd(Webcontent_DRBD)[1683150]: INFO:
webcontent_data: Command stderr:
> > Oct 22 11:48:19 kathie3 pacemaker-attrd[4878]: notice: Setting
master-Webcontent_DRBD[kathie3] in instance_attributes: (unset) -> 1000
> > Oct 22 11:48:19 kathie3 pacemaker-controld[4880]: notice: Result of
start operation for Webcontent_DRBD on kathie3: ok
> > Oct 22 11:48:19 kathie3 pacemaker-controld[4880]: notice: Initiating
notify operation Webcontent_DRBD_post_notify_start_0 locally on kathie3
> > ...
> >
> > Is there some kind of timeout wrong or what am I missing ?
> >
> > Any suggestions are welcome
> >
> > Kind regards
> >
> > fatcharly
> >
> >
> > _______________________________________________
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
>
</[email protected]></[email protected]></[email protected]></[email protected]>
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/