[Gcl-devel] Using all physical effectively with multiple processes

Camm Maguire Wed, 06 May 2015 06:44:18 -0700

Greetings!  This is just a discussion post on where things stand.
Please feel free to skip whatever you wish, but any feedback is of
course helpful.

Bob makes the excellent point that we should design things to make one
process run as fast as possible, and forget about other jobs as much as
possible.  Given my experiments thus far, it looks like this approach
might win out in any case.  Bob, I hope you are pleased by this :-).  

That said, the attempt to use all of physical ram conflicts openly with
multiple jobs, so something must be done, even if minimal.  And the
minimal solution is this environment variable:

GCL_MEM_MULTIPLE=0.125

will multiply the physical ram seen by each process by this value.  So
make -j 8 GCL_MEM_MULTIPLE=0.125 is the logical approach, though one
might do better by raising the 0.125 somewhat as all jobs won't use all
that memory anyway.

On the plus side, each process decides when to start gc independently.
On the minus side, big jobs will bear a larger gc load then they would
have to in theory.

So the other approach is this environment variable:

GCL_MULTIPROCESS_MEMORY_POOL=t

which (only) when set, will maintain a shared locked file /tmp/gcl_pool
containing the summed resident set size of all processes, and use this
as the value to compare against physical ram when deciding we're full
enough to start gc.  This is working, and one can see (via top) how big
jobs are afforded more ram.  Paradoxically, it may or may not improve
the overall regression time.  We'll know more here soon.

There are two environment variables which jointly determine the gc
threshold:

GCL_GC_PAGE_THRESH (default 0.75)

means we will not start gc until the data size is at least 0.75 of
physical ram.  This can be set to 1.0, and perhaps should logically, but
remember that GCL is constantly calling gcc in a subprocess, and this
can be a memory hog leading to a swap storm.  Alas, at this point I know
of no way to manage the memory use of gcc, so this value is a heuristic.

GCL_GC_ALLOCATION_THRESH (default 0.125)

means we will not gc until we have allocated (since the last gc) one
eighth of physical ram.  

This is an alternative solution to the problem of rebalancing maxpages,
whereby a job could load up on cons for a long time, leave a tiny array
allocation, then start allocating arrays when there is no more physical
ram to expand into.  Recall that the variable
si::*optimize-maximum-pages* would attempt to collect gc statistics and
rebalance these maxpage limits based on the actual demand.  This is OK
as a workaround, but does require you start collecting statistics before
its 'too late' and you've already allocated most of physical ram.  

But the real problem is that gc cost is proportional to heap size and
live heap size, and triggering based on an unrelated quantity
(suballocation of a given data element size) makes no real sense.
Earlier in the 2.6.13 series, we found that simply scaling the maxpages
to physical ram at the outset was a big win, but then again, all we had
to scale by was the current allocation in the saved image, which makes
no real sense.

So in short, when si::*optimize-maximum-pages* is set GCL will now
ignore maxpage settings as a gc trigger, and use the above thresholds
instead.  When unset, GCL will use minimal maxpage expansion via its
traditional algorithm and trigger (frequent) gc when these maxpage
limits are hit, without any attempt to collect statistics to
expand/rebalance them.  This mode is to be used when preparing a small
image to be saved to disk, e.g. acl2 build time.

My concern is there appear to be too many variables here.  At a minimum,
we need a 'small image to be saved to disk' mode, and a 'use as much ram
as possible for speed' mode, and some mechanism to reduce the ram used
when running multiple jobs.  But in principle the last three environment
variables could be removed and replaced with constants.

Takg Version_2_6_13pre14a is build and installed at ut, and undergoing
testing since last night.  It looks solid so far.

Thoughts most appreciated.

Take care, 

Robert Boyer <[email protected]> writes:

>> This seems closest in the spirit to sol-gc.
>
> As best I can guess, Acl2 is headed towards
> not using sol-gc in CCL in the 7.1 release of Acl2.
>
> It's not my place to speak, and those who know may
> say that any problem with sol-gc may have been, who really knows, that it was
> using interrupts of the gc and that was too dangerous to do.Â  Interrupts
> should scare the crap out of anyone.
>
> But Sol's main idea I think was to allocate a hell of a lot of memory, all of 
> the memory, for
> the heap to free space after a gc in order to keep gc costs as low as
> possible for this one process.Â  And to hell with any other processes except 
> this one.
>
> Camm,
>
> I think that your objective should be for Â j=1 speed and not j=8 at all.Â  
> The ordinary user almost
> all of the time is using j=1, and as far as I know, only people like Matt 
> regularly use j=8
> and that only for regression testing before they release a new version of 
> Acl2.
>
> Just my two cents worth.Â  I would certainly go with whatever Matt advises,
> rather than with what I advise.
>
> Bob
>
> On Mon, May 4, 2015 at 11:51 AM, Camm Maguire <[email protected]> wrote:
-- 
Camm Maguire                                        [email protected]
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

_______________________________________________
Gcl-devel mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/gcl-devel

[Gcl-devel] Using all physical effectively with multiple processes

Reply via email to