Re: [ClusterLabs] Colocation of a primitive resource with a clone with limited copies

Jan Wrona Thu, 20 Apr 2017 13:19:27 -0700

On 20.4.2017 19:33, Ken Gaillot wrote:

On 04/20/2017 10:52 AM, Jan Wrona wrote:

Hello,


my problem is closely related to the thread [1], but I didn't find a
solution there. I have a resource that is set up as a clone C restricted
to two copies (using the clone-max=2 meta attribute||), because the
resource takes long time to get ready (it starts immediately though),

A resource agent must not return from "start" until a "monitor"
operation would return success.

Beyond that, the cluster doesn't care what "ready" means, so it's OK if
it's not fully operational by some measure. However, that raises the
question of what you're accomplishing with your monitor.

I know all that and my RA respects that. I didn't want to go intodetails about the service I'm running, but maybe it will help youunderstand. Its a data collector which receives and processes data froma UDP stream. To understand these data, it needs templates whichperiodically occur in the stream (every five minutes or so). After"start" the service is up and running, "monitor" operations aresuccessful, but until the templates arrive the service is not "ready". Ibasically need to somehow simulate this "ready" state.

and by having it ready as a clone, I can failover in the time it takes
to move an IP resource. I have a colocation constraint "resource IP with
clone C", which will make sure IP runs with a working instance of C:

Configuration:
  Clone: dummy-clone
   Meta Attrs: clone-max=2 interleave=true
   Resource: dummy (class=ocf provider=heartbeat type=Dummy)
    Operations: start interval=0s timeout=20 (dummy-start-interval-0s)
                stop interval=0s timeout=20 (dummy-stop-interval-0s)
                monitor interval=10 timeout=20 (dummy-monitor-interval-10)
  Resource: ip (class=ocf provider=heartbeat type=Dummy)
   Operations: start interval=0s timeout=20 (ip-start-interval-0s)
               stop interval=0s timeout=20 (ip-stop-interval-0s)
               monitor interval=10 timeout=20 (ip-monitor-interval-10)

Colocation Constraints:
   ip with dummy-clone (score:INFINITY)

State:
  Clone Set: dummy-clone [dummy]
      Started: [ sub1.example.org sub3.example.org ]
  ip     (ocf::heartbeat:Dummy): Started sub1.example.org


This is fine until the the active node (sub1.example.org) fails. Instead
of moving the IP to the passive node (sub3.example.org) with ready clone
instance, Pacemaker will move it to the node where it just started a
fresh instance of the clone (sub2.example.org in my case):

New state:
  Clone Set: dummy-clone [dummy]
      Started: [ sub2.example.org sub3.example.org ]
  ip     (ocf::heartbeat:Dummy): Started sub2.example.org


Documentation states that the cluster will choose a copy based on where
the clone is running and the resource's own location preferences, so I
don't understand why this is happening. Is there a way to tell Pacemaker
to move the IP to the node where the resource is already running?

Thanks!
Jan Wrona

[1] http://lists.clusterlabs.org/pipermail/users/2016-November/004540.html

The cluster places ip based on where the clone will be running at that
point in the recovery, rather than where it was running before recovery.

Unfortunately I can't think of a way to do exactly what you want,
hopefully someone else has an idea.

One possibility would be to use on-fail=standby on the clone monitor.
That way, instead of recovering the clone when it fails, all resources
on the node would move elsewhere. You'd then have to manually take the
node out of standby for it to be usable again.

I don't see how that would solve it. The node would be put into thestandby state, cluster would recover the clone instance on some othernode and possibly place the IP there too. Moreover I don't want to putthe whole node into standby because of one failed monitor.


It might be possible to do something more if you convert the clone to a
master/slave resource, and colocate ip with the master role. For
example, you could set the master score based on how long the service
has been running, so the longest-running instance is always master.

This sounds promising, I have heard about master/slave resources butnever actually used any. I'll look more into that, thanks you for youradvice!


_______________________________________________
Users mailing list: [email protected]
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Users mailing list: [email protected]
http://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Colocation of a primitive resource with a clone with limited copies

Reply via email to