Jeff,

Agree, this is a well know problem. It does not work well with some 
applications.
We will provide this option to our users and they will have the freedom to 
enable/disable this.

Best,
Pasha

From: Jeff Hammond [mailto:[email protected]]
Sent: Tuesday, October 20, 2015 5:54 PM
To: Shamis, Pavel
Cc: [email protected]; Baker, Matthew B.; Yossi Etigin
Subject: Re: Memory allocation/release hooks

You don't want to use malloc interception if you care about 
portability/interoperability.  For example, see 
http://mailman.cse.ohio-state.edu/pipermail/mvapich-discuss/2014-December/005276.html.

I can't find the archives of 
[email protected]<mailto:[email protected]> 
online (perhaps I am just stupid) but we recently encountered the same issue 
with JEMalloc and other MPI implementations.

I understand why it is tempting to intercepting malloc/free for MPI purposes, 
but the end result is brittle software that forces users to disable all the 
optimizations you are trying to enable.

And it is worth noting that the abuse of malloc/free interception by MPI 
developers has forced the MADNESS team (or rather me) to completely bypass ISO 
C/C++ language defined heap routines to prevent MPI from hijacking them and 
breaking our code.

Anyways, this is in no way a statement about JEMalloc.  The results with 
MADNESS indicate this it is the best available allocator around (vs GNU, TCE 
and TBB mallocs), but we will have to call it explicitly rather than via symbol 
interception.

Best,

Jeff

On Tue, Oct 20, 2015 at 12:18 PM, Shamis, Pavel 
<[email protected]<mailto:[email protected]>> wrote:
Hi Jeff,

Thanks for the link, seems like a very useful library.

Our goal is a bit different (and very simple/basic).
We are looking for a malloc library that we can use for integration with our 
registration cache.
Essentially, it redirects application's malloc() calls to (through LD_PRELOAD 
or rpath) jemalloc that is hooked up with a cache (just like in HPX).
At this stage we don't play with locality.

Thanks !

Pavel (Pasha) Shamis
---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory





On Oct 20, 2015, at 11:31 AM, Jeff Hammond 
<[email protected]<mailto:[email protected]>> wrote:


Hi Pavel,

You may find http://memkind.github.io/memkind/ relevant.  In particular, 
http://memkind.github.io/memkind/memkind_arch_20150318.pdf section 2.2 and 2.3 
discusses exactly the issues you raise.  We also note that memkind is intended 
to support multiple types of memory within a node, such as one might encounter 
in a platform such as Knights Landing.  You are free to imagine how it might 
map to OpenPOWER based upon your superior knowledge of that platform :-)

While I recognize that the origins of memkind at Intel may pose a challenge for 
some in the OpenPOWER family, it would be tremendously valuable to the 
community if it was reused for MPI and OpenSHMEM projects, rather than the UCX 
team trying to implement something new.  As you know, the both MPI and 
OpenSHMEM should run on a range of platforms, and it doubles the implementation 
effort in all relevant projects (MPICH, OpenMPI, OpenSHMEM reference, etc.) if 
UCX goes in a different direction.

I would be happy to introduce you to the memkind developers (I am not one of 
them, just someone who helps them understand user/client requirements).

Best,

Jeff


On Thu, Oct 15, 2015 at 8:45 AM, Shamis, Pavel 
<[email protected]<mailto:[email protected]>> wrote:
Dear Jemalloc Community,

We are developer of UCX project [1] and as part of the effort
we are looking for a malloc library that supports hooks for alloc/dealloc 
chunks and can be used for the following:

(a) Allocation of memory that can be shared transparently between processes on 
the same node. For this purpose we would like to mmap memory with MAP_SHARED. 
This is very useful for implementation for Remote Memory Access (RMA) 
operations in MPI-3 one-sided [2] and OpenSHMEM [3] communication libraries. 
This allow a remote process to map user allocated memory and provide RMA 
operations through memcpy().

(b) Implementation of memory de-allocation hooks for RDMA hardware (Infiniband, 
ROCE, iWarp etc.). For optimization purpose we implement a lazy memory 
de-registration (memory unpinning) policy and we use the hook for the  
notification of communication library about memory release event. On the event, 
we cleanup our registration cache and de-register (unpin) the memory on 
hardware.

Based on this requirements we would like to understand what is the best 
approach for integration this functionality within jemalloc.

Regards,
Pasha & Yossi

[1] OpenUCX: https://github.com/openucx/ucx or 
www.openucx.org<http://www.openucx.org/>
[2] MPI SPEC: http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf
[3] OpenSHMEM SPEC: 
http://bongo.cs.uh.edu/site/sites/default/site_files/openshmem-specification-1.2.pdf





_______________________________________________
jemalloc-discuss mailing list
[email protected]<mailto:[email protected]>
http://www.canonware.com/mailman/listinfo/jemalloc-discuss



--
Jeff Hammond
[email protected]<mailto:[email protected]>
http://jeffhammond.github.io/




--
Jeff Hammond
[email protected]<mailto:[email protected]>
http://jeffhammond.github.io/
_______________________________________________
jemalloc-discuss mailing list
[email protected]
http://www.canonware.com/mailman/listinfo/jemalloc-discuss

Reply via email to