RE: Probable memory leak in Hammer write path ?

Somnath Roy Wed, 01 Jul 2015 20:39:07 -0700

Hi,
I think I am eventually able to figure out what is happening.. First, here is 
the step to reproduce it in any kernel (not specific to 3.16/3.18 as I said 
earlier)

1. In a SSD create a big journal partition say > 20 GB or so.

2. Make all the filestore/journal parameter default other than the following. 
Set these value in your conf file. This is just to make sure journal writes are 
not throttled and going far ahead than backend writes.

        filestore_queue_max_ops = 5000000
        filestore_queue_max_bytes = 1000000000000
        filestore_queue_committing_max_ops = 5000000
        filestore_queue_committing_max_bytes = 1000000000000

3. Run any release say Firefly/Giant/Hammer and create a single OSD cluster 
giving rest of the SSD as data partition.

4. Run say fio_rbd random write  workload as 16K, QD 64 , num_jobs=8

5. run 'dstat -m' and see how used memory is rising !

Now, this behavior I found with glibcmalloc/tcmalloc/jemalloc and that I 
communicated earlier. But, I didn't wait enough earlier to see if memory is 
coming down when IO stopped :-)
I saw in case of tcmalloc it is not coming down , *but* in case of jemalloc it 
is *coming down* the following way to the old place. I didn't go back to 
glibcmalloc again, but, it should be releasing also..

1. If I stop IO, journal write stops, but backend flash is catching up and the 
memory is coming down accordingly. This is expected because all the 
transactions are piling up in workQ while journal is way ahead, but, the moment 
it is processed the transactions are deleted and memory usage coming down.

2. After journal is full, it started throttling and the overall IO rate is 
coming down.  The backend flash now has the opportunity to catch up and thus 
releasing the memory.

None of the above is happening in case of tcmalloc. There it is not releasing 
the memory at all..Digging down some of the doc I found out this can happen and 
there is a flag to control the release rate..But, no luck after changing that 
as well..Didn't invest much time on that though.
What I saw next time when I ran IO, in case of tcmalloc the memory is not 
rising at the beginning, probably, it is reusing those memory and started 
rising again after some time...But, I doubt this behavior is good..

So, this could be another step ahead of removing tcmalloc as Ceph's default 
allocator and moving to jemalloc.

Thanks Greg for asking me to relook at tcmalloc otherwise I was kind of out of 
option :-)..

Regards
Somnath

-----Original Message-----
From: Somnath Roy
Sent: Wednesday, July 01, 2015 4:58 PM
To: 'Gregory Farnum'
Cc: [email protected]
Subject: RE: Probable memory leak in Hammer write path ?

Thanks Greg!
Yeah, I will double check..But, I built the code without tcmalloc (with glibc) 
and it was also showing the similar behavior.

Thanks & Regards
Somnath

-----Original Message-----
From: Gregory Farnum [mailto:[email protected]]
Sent: Wednesday, July 01, 2015 9:07 AM
To: Somnath Roy
Cc: [email protected]
Subject: Re: Probable memory leak in Hammer write path ?

On Mon, Jun 29, 2015 at 4:39 PM, Somnath Roy <[email protected]> wrote:
> Greg,
> Updating to the new kernel updating the gcc version too. Recent kernel is 
> changing tcmalloc version too, but, 3.16 has old tcmalloc but still 
> exhibiting the issue.
> Yes, the behavior is very confusing and compiler is main variable I could 
> think of from application perspective.
> If you have a 3.16/3.19 kernel, you could reproduce this following these 
> steps.
>
> 1. Build ceph-hammer code base
>
> 2. Run with single OSD.
>
> 3. Create an image and run a fio-bed workload from client (say 16K bs,
> 8 num_jobs)
>
> 4. run 'dstat -m' and observe the memory usage.
>
> What I am thinking of doing is to install ceph from ceph.com and see the 
> behavior.

In addition to that, I'd look for if there are any known bugs in the tcmalloc 
version you're using on the leaky systems, and check the tcmalloc stats to see 
if they have a bunch of free memory which hasn't been released to the OS yet.
-Greg

________________________________

PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

N�����r��y����b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w���
���j:+v���w�j�m��������zZ+�����ݢj"��!�i

RE: Probable memory leak in Hammer write path ?

Reply via email to