Oh, any any chance you could install the debug packages? It will make the output even more useful :-)
On 6 May 2014, at 7:06 pm, Andrew Beekhof <[email protected]> wrote: > > On 6 May 2014, at 6:05 pm, Greg Murphy <[email protected]> wrote: > >> Attached are the valgrind outputs from two separate runs of lrmd with the >> suggested variables set. Do they help narrow the issue down? > > They do somewhat. I'll investigate. But much of the memory is still > reachable: > > ==26203== indirectly lost: 17,945,950 bytes in 642,546 blocks > ==26203== possibly lost: 2,805 bytes in 60 blocks > ==26203== still reachable: 26,104,781 bytes in 544,782 blocks > ==26203== suppressed: 8,652 bytes in 176 blocks > ==26203== Reachable blocks (those to which a pointer was found) are not shown. > ==26203== To see them, rerun with: --leak-check=full --show-reachable=yes > > Could you add the --show-reachable=yes to VALGRIND_OPTS variable? > >> >> >> Thanks >> >> Greg >> >> >> On 02/05/2014 03:01, "Andrew Beekhof" <[email protected]> wrote: >> >>> >>> On 30 Apr 2014, at 9:01 pm, Greg Murphy <[email protected]> >>> wrote: >>> >>>> Hi >>>> >>>> I¹m running a two-node Pacemaker cluster on Ubuntu Saucy (13.10), >>>> kernel 3.11.0-17-generic and the Ubuntu Pacemaker package, version >>>> 1.1.10+git20130802-1ubuntu1. >>> >>> The problem is that I have no way of knowing what code is/isn't included >>> in '1.1.10+git20130802-1ubuntu1'. >>> You could try setting the following in your environment before starting >>> pacemaker though >>> >>> # Variables for running child daemons under valgrind and/or checking for >>> memory problems >>> G_SLICE=always-malloc >>> MALLOC_PERTURB_=221 # or 0 >>> MALLOC_CHECK_=3 # or 0,1,2 >>> PCMK_valgrind_enabled=lrmd >>> VALGRIND_OPTS="--leak-check=full --trace-children=no --num-callers=25 >>> --log-file=/var/lib/pacemaker/valgrind-%p >>> --suppressions=/usr/share/pacemaker/tests/valgrind-pcmk.suppressions >>> --gen-suppressions=all" >>> >>> >>>> The cluster is configured with a DRBD master/slave set and then a >>>> failover resource group containing MySQL (along with its DRBD >>>> filesystem) and a Zabbix Proxy and Agent. >>>> >>>> Since I built the cluster around two months ago I¹ve noticed that on >>>> the the active node the memory footprint of lrmd gradually grows to >>>> quite a significant size. The cluster was last restarted three weeks >>>> ago, and now lrmd has over 1GB of mapped memory on the active node and >>>> only 151MB on the passive node. Current excerpts from /proc/PID/status >>>> are: >>>> >>>> Active node >>>> VmPeak: >>>> 1146740 kB >>>> VmSize: >>>> 1146740 kB >>>> VmLck: >>>> 0 kB >>>> VmPin: >>>> 0 kB >>>> VmHWM: >>>> 267680 kB >>>> VmRSS: >>>> 188764 kB >>>> VmData: >>>> 1065860 kB >>>> VmStk: >>>> 136 kB >>>> VmExe: >>>> 32 kB >>>> VmLib: >>>> 10416 kB >>>> VmPTE: >>>> 2164 kB >>>> VmSwap: >>>> 822752 kB >>>> >>>> Passive node >>>> VmPeak: >>>> 220832 kB >>>> VmSize: >>>> 155428 kB >>>> VmLck: >>>> 0 kB >>>> VmPin: >>>> 0 kB >>>> VmHWM: >>>> 4568 kB >>>> VmRSS: >>>> 3880 kB >>>> VmData: >>>> 74548 kB >>>> VmStk: >>>> 136 kB >>>> VmExe: >>>> 32 kB >>>> VmLib: >>>> 10416 kB >>>> VmPTE: >>>> 172 kB >>>> VmSwap: >>>> 0 kB >>>> >>>> During the last week or so I¹ve taken a couple of snapshots of >>>> /proc/PID/smaps on the active node, and the heap particularly stands out >>>> as growing: (I have the full outputs captured if they¹ll help) >>>> >>>> 20140422 >>>> 7f92e1578000-7f92f218b000 rw-p 00000000 00:00 0 >>>> [heap] >>>> Size: 274508 kB >>>> Rss: 180152 kB >>>> Pss: 180152 kB >>>> Shared_Clean: 0 kB >>>> Shared_Dirty: 0 kB >>>> Private_Clean: 0 kB >>>> Private_Dirty: 180152 kB >>>> Referenced: 120472 kB >>>> Anonymous: 180152 kB >>>> AnonHugePages: 0 kB >>>> Swap: 91568 kB >>>> KernelPageSize: 4 kB >>>> MMUPageSize: 4 kB >>>> Locked: 0 kB >>>> VmFlags: rd wr mr mw me ac >>>> >>>> >>>> 20140423 >>>> 7f92e1578000-7f92f305e000 rw-p 00000000 00:00 0 >>>> [heap] >>>> Size: 289688 kB >>>> Rss: 184136 kB >>>> Pss: 184136 kB >>>> Shared_Clean: 0 kB >>>> Shared_Dirty: 0 kB >>>> Private_Clean: 0 kB >>>> Private_Dirty: 184136 kB >>>> Referenced: 69748 kB >>>> Anonymous: 184136 kB >>>> AnonHugePages: 0 kB >>>> Swap: 103112 kB >>>> KernelPageSize: 4 kB >>>> MMUPageSize: 4 kB >>>> Locked: 0 kB >>>> VmFlags: rd wr mr mw me ac >>>> >>>> 20140430 >>>> 7f92e1578000-7f92fc01d000 rw-p 00000000 00:00 0 >>>> [heap] >>>> Size: 436884 kB >>>> Rss: 140812 kB >>>> Pss: 140812 kB >>>> Shared_Clean: 0 kB >>>> Shared_Dirty: 0 kB >>>> Private_Clean: 744 kB >>>> Private_Dirty: 140068 kB >>>> Referenced: 43600 kB >>>> Anonymous: 140812 kB >>>> AnonHugePages: 0 kB >>>> Swap: 287392 kB >>>> KernelPageSize: 4 kB >>>> MMUPageSize: 4 kB >>>> Locked: 0 kB >>>> VmFlags: rd wr mr mw me ac >>>> >>>> I noticed in the release notes for 1.1.10-rc1 >>>> (https://github.com/ClusterLabs/pacemaker/releases/tag/Pacemaker-1.1.10-r >>>> c1) that there was work done to fix "crmd: lrmd: stonithd: fixed memory >>>> leaks² but I¹m not sure which particular bug this was related to. (And >>>> those fixes should be in the version I¹m running anyway). >>>> >>>> I¹ve also spotted a few memory leak fixes in >>>> https://github.com/beekhof/pacemaker, but I¹m not sure whether they >>>> relate to my issue (assuming I have a memory leak and this isn¹t >>>> expected behaviour). >>>> >>>> Is there additional debugging that I can perform to check whether I >>>> have a leak, or is there enough evidence to justify upgrading to 1.1.11? >>>> >>>> Thanks in advance >>>> >>>> Greg Murphy >>>> _______________________________________________ >>>> Pacemaker mailing list: [email protected] >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> >> >> <lrmd.tgz>_______________________________________________ >> Pacemaker mailing list: [email protected] >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: [email protected] http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
