Re: [ClusterLabs] Upgrade corosync problem

Salvatore D'angelo Tue, 26 Jun 2018 02:04:15 -0700

Yes, sorry you’re right I could find it by myself.

However, I did the following:

1. Added the line you suggested to /etc/fstab

2. mount -o remount /dev/shm

3. Now I correctly see /dev/shm of 512M with df -h

Filesystem      Size  Used Avail Use% Mounted on

overlay          63G   11G   49G  19% /

tmpfs            64M  4.0K   64M   1% /dev

tmpfs          1000M     0 1000M   0% /sys/fs/cgroup

osxfs           466G  158G  305G  35% /Users

/dev/sda1        63G   11G   49G  19% /etc/hosts

shm             512M   15M  498M   3% /dev/shm

tmpfs          1000M     0 1000M   0% /sys/firmware

tmpfs           128M     0  128M   0% /tmp

The errors in log went away. Consider that I remove the log file before start corosync so it does not contains lines of previous executions.

corosync.log
Description: Binary data

But the command:

corosync-quorumtool -ps

still give:

Cannot initialize QUORUM service

Consider that few minutes before it gave me the message:

Cannot initialize CFG service

I do not know the differences between CFG and QUORUM in this case.

If I try to start pacemaker the service is OK but I see only pacemaker and the Transport does not work if I try to run a cam command.

Any suggestion?

On 26 Jun 2018, at 10:49, Christine Caulfield <[email protected]> wrote:

On 26/06/18 09:40, Salvatore D'angelo wrote:
Hi,

Yes,

I am reproducing only the required part for test. I think the original
system has a larger shm. The problem is that I do not know exactly how
to change it.
I tried the following steps, but I have the impression I didn’t
performed the right one:

1. remove everything under /tmp
2. Added the following line to /etc/fstab
tmpfs /tmp tmpfs defaults,nodev,nosuid,mode=1777,size=128M
  0 0
3. mount /tmp
4. df -h
Filesystem Size Used Avail Use% Mounted on
overlay 63G 11G 49G 19% /
tmpfs 64M 4.0K 64M 1% /dev
tmpfs 1000M 0 1000M 0% /sys/fs/cgroup
osxfs 466G 158G 305G 35% /Users
/dev/sda1 63G 11G 49G 19% /etc/hosts
shm 64M 11M 54M 16% /dev/shm
tmpfs 1000M 0 1000M 0% /sys/firmware
*tmpfs 128M 0 128M 0% /tmp*

The errors are exactly the same.
I have the impression that I changed the wrong parameter. Probably I
have to change:
shm   64M   11M   54M  16% /dev/shm

but I do not know how to do that. Any suggestion?

According to google, you just add a new line to /etc/fstab for /dev/shm

tmpfs      /dev/shm      tmpfs   defaults,size=512m   0   0

Chrissie

On 26 Jun 2018, at 09:48, Christine Caulfield <[email protected]
<mailto:[email protected]>> wrote:

On 25/06/18 20:41, Salvatore D'angelo wrote:
Hi,

Let me add here one important detail. I use Docker for my test with 5
containers deployed on my Mac.
Basically the team that worked on this project installed the cluster
on soft layer bare metal.
The PostgreSQL cluster was hard to test and if a misconfiguration
occurred recreate the cluster from scratch is not easy.
Test it was a cumbersome if you consider that we access to the
machines with a complex system hard to describe here.
For this reason I ported the cluster on Docker for test purpose. I am
not interested to have it working for months, I just need a proof of
concept.

When the migration works I’ll port everything on bare metal where the
size of resources are ambundant.

Now I have enough RAM and disk space on my Mac so if you tell me what
should be an acceptable size for several days of running it is ok for me.
It is ok also have commands to clean the shm when required.
I know I can find them on Google but if you can suggest me these info
I’ll appreciate. I have OS knowledge to do that but I would like to
avoid days of guesswork and try and error if possible.

I would recommend at least 128MB of space on /dev/shm, 256MB if you can
spare it. My 'standard' system uses 75MB under normal running allowing
for one command-line query to run.

If I read this right then you're reproducing a bare-metal system in
containers now? so the original systems will have a default /dev/shm
size which is probably much larger than your containers?

I'm just checking here that we don't have a regression in memory usage
as Poki suggested.

Chrissie

On 25 Jun 2018, at 21:18, Jan Pokorný <[email protected]
<mailto:[email protected]>> wrote:

On 25/06/18 19:06 +0200, Salvatore D'angelo wrote:
Thanks for reply. I scratched my cluster and created it again and
then migrated as before. This time I uninstalled pacemaker,
corosync, crmsh and resource agents with make uninstall

then I installed new packages. The problem is the same, when
I launch:
corosync-quorumtool -ps

I got: Cannot initialize QUORUM service

Here the log with debug enabled:

[18019] pg3 corosyncerror   [QB    ] couldn't create circular mmap
on /dev/shm/qb-cfg-event-18020-18028-23-data
[18019] pg3 corosyncerror   [QB    ]
qb_rb_open:cfg-event-18020-18028-23: Resource temporarily
unavailable (11)
[18019] pg3 corosyncdebug   [QB    ] Free'ing ringbuffer:
/dev/shm/qb-cfg-request-18020-18028-23-header
[18019] pg3 corosyncdebug   [QB    ] Free'ing ringbuffer:
/dev/shm/qb-cfg-response-18020-18028-23-header
[18019] pg3 corosyncerror   [QB    ] shm connection FAILED:
Resource temporarily unavailable (11)
[18019] pg3 corosyncerror   [QB    ] Error in connection setup
(18020-18028-23): Resource temporarily unavailable (11)

I tried to check /dev/shm and I am not sure these are the right
commands, however:

df -h /dev/shm
Filesystem      Size Used Avail Use% Mounted on
shm              64M   16M   49M 24% /dev/shm

ls /dev/shm
qb-cmap-request-18020-18036-25-data    qb-corosync-blackbox-data
   qb-quorum-request-18020-18095-32-data
qb-cmap-request-18020-18036-25-header qb-corosync-blackbox-header
qb-quorum-request-18020-18095-32-header

Is 64 Mb enough for /dev/shm. If no, why it worked with previous
corosync release?

For a start, can you try configuring corosync with
--enable-small-memory-footprint switch?

Hard to say why the space provisioned to /dev/shm is the direct
opposite of generous (per today's standards), but may be the result
of automatic HW adaptation, and if RAM is so scarce in your case,
the above build-time toggle might help.

If not, then exponentially increasing size of /dev/shm space is
likely your best bet (I don't recommended fiddling with mlockall()
and similar measures in corosync).

Of course, feel free to raise a regression if you have a reproducible
comparison between two corosync (plus possibly different libraries
like libqb) versions, one that works and one that won't, in
reproducible conditions (like this small /dev/shm, VM image, etc.).

--
Jan (Poki)
_______________________________________________
Users mailing list: [email protected] <mailto:[email protected]>
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: [email protected] <mailto:[email protected]>
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>

_______________________________________________
Users mailing list: [email protected] <mailto:[email protected]>
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org <http://www.clusterlabs.org/>
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org <http://bugs.clusterlabs.org/>

_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: [email protected]
https://lists.clusterlabs.org/mailman/listinfo/users


Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Upgrade corosync problem

Reply via email to