Brian,

On Fri, Mar 22, 2019 at 08:57:20AM +0100, Jan Friesse wrote:
- If I manually set 'totem.token' to a higher value, am I responsible
   for tracking the number of nodes in the cluster, to keep in
   alignment with what Red Hat's page says?

Nope. I've tried to explain what is really happening in the manpage
corosync.conf(5). totem.token and totem.token_coefficient are used in
the following formula:

I do see this under token_coefficient, thanks.

Corosync used runtime.config.token.

Cool; thanks.  Bumping up totem.token to 2000 got me over this hump.

- Under these conditions, when corosync exits, why does it do so
   with a zero status? It seems to me that if it exited at all,

That's a good question. How reproducible is the issue? Corosync
shouldn't "exit" with zero status.

If I leave totem.token set to default, %100 in my case.

I stand corrected; yesterday, it was %100.  Today, I cannot reproduce
this at all, even with reverting to the defaults.


That's sad

Here is a snippet of output from yesterday's experiments; this is
based on a typescript capture file, so I apologize for the ANSI
screen codes.


Yep, np. Looks just fine.

- by default, systemd doesn't report full log lines.

- by default, CentOS's config of systemd doesn't persist journaled
   logs, so I can't directly review yesterday's efforts.

- and, it looks like I misinterpreted the 'exited' message; corosync
   was enabled and running, but the 'Process' line doesn't report
   on the 'corosync' process, but some systemd utility.

(Let me count the ways I'm coming to dislike systemd...)

I was able to recover logs from /var/log/messages, but other than
the 'Consider token timeout increase' message, it looks hunky-dory.

With what I've since learned;

- I cannot explain why I can't reproduce the symptoms, even with
   reverting to the defaults.

- And without being able to reproduce, I can't pursue why 'pcs
   status cluster' was actually failing for me. :/

So, I appreciate your attention to this message, and I guess I'm
off to further explore all of this.

   C]0;root@node1:~^G[root@node1 ~]# systemctl status corosync.service
   ESC[1;32m●ESC[0m corosync.service - Corosync Cluster Engine
    Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; vendor
preset: disabled)
      Active: ESC[1;32mactive (running)ESC[0m since Thu 2019-03-21 14:26:56
UTC; 1min 35s ago
        Docs: man:corosync
              man:corosync.conf
              man:corosync_overview
     Process: 5474 ExecStart=/usr/share/corosync/corosync start (code=exited,
status=0/SUCCESS)
    Main PID: 5490 (corosync)
      CGroup: /system.slice/corosync.service
            └─5490 corosync



As you can see, corosync service unit in COS 7 is executing init script which execs corosync and waits till connection to local IPC can be established. IPC connection can be established when corosync is ready. Initscript timeout for IPC is 1 minute and return code is 1 if connection cannot be established. On success initscript returns 0. So ExecStart (initscript) exited with 0/SUCESS = corosync was successfully started and it is running as a PID 5490.

Regards,
  Honza

   Honza


_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to