from:"Camm Maguire"

Uninitialized stack gaps and conservative garbage collection

2005-05-24 Thread Camm Maguire

Greetings!  GCL is a lisp system that compiles to native object code
via the intermediary of gcc.  It uses a conservative garbage
collection algorithm, which means that it walks the C stack to find
likely objects in use by automatic variables, and holds on to these.
This works quite well in practice.

For very large systems, the likelihood of holding onto objects which
should be collected increases.  In looking into this, it has come to
my attention that regardless of how carefully the C programmer
initializes variables on the stack, gcc will quite commonly allocate
extra space inaccessbile via any C variable.  These 'stack gaps' can
wind up permanently preventing a large portion of user memory from
ever being collected with this algorithm.

Ideally, I'd like to be able to have each C function carefully
initialize a contiguous stack to minimize or eliminate this problem.
Even better would be a gcc switch which would simply zero out each
stack frame at the beginning of each function, though this could be
expensive in terms of performance.  Advice is most appreciated.  I'd
most like the solution to be portable.  I know about
-mpreferred-stack-boundary, but I am unsure if its absense on other
architectures means there is no stack padding there anyway.

Here is my ridiculous kludge which nevertheless works at least here:

void wipe_stack(VOL void *) __attribute__ ((noinline));
void
wipe_stack(VOL void *l) {

#if CSTACK_DIRECTION == -1
   if (l>(void *)&l) bzero((void *)&l,l-(void *)&l);
#else
  l+=sizeof(l);
  if ((void *)&l>l) bzero(l,(void *)&l-l);
#endif

}

object fLfuncall(object fun,...) { 

  va_list ap;
  object *new;
  int i,n = VFUN_NARGS-1;

  if (n>=65) FEerror("arg limit exceeded",0);
  new=ZALLOCA(n*sizeof(*new));

  /* There are 3 unused words on top of esp before and after the
  alloca, as well as extra esp allocation in the alloca itself, even
  though the alloca call is padded to 16 bytes.  So let's wipe the
  crud from the stack... */

  wipe_stack(&n);
  ...

}

Take care,

-- 
Camm Maguire[EMAIL PROTECTED]
==
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Re: [Fwd: Uninitialized stack gaps and conservative garbage collection]

2005-05-27 Thread Camm Maguire

Greetings!

"Boehm, Hans" <[EMAIL PROTECTED]> writes:

> The collector also tries to address this with a complicated hack that
> occasionally checks for opportune times to clear sections of the
> stack that are not currently in use.  (See GC_clear_stack()
> in misc.c.)
> 

BTW, where is the official repository of your source these days?

> I think this is currently much more sophisticated in the single-threaded
> case.  It may be possible to enhance this code to deal with the
> problems that were observed.  But it has some inherent limitations.
> It only rarely gets control, and it can't clear areas in frames that
> are always live when it gets control.  It should handle deep recursions
> somewhat acceptably.
> 

I have no problem with accidental junk on the stack, nor values left
over after function return.  The issue to me is that when one pushes
up the stack a second time, one does not wipe out the old values, but
rather they could fall into 'stack holes'.  For GCL, this is most
critical above toplevel, as one never clears this stack again once
entered.  It also appears important in alloca, which appears to 16byte
pad the stack increment regardless of -mprefered-stack-boundary.
Beyond this, I think the best strategy, if any additional strategy is
needed, is to randomly allocate a little zeroed out stack in certain
key functions just to ensure that large loops don't preserve the
location of the stack holes forever.

Given the stack clearing algorithm, my other question is how to
portably implement it.  Here again is my trial function, which appears
quite dangerous.  It assumes that anything above the address of the
first argument passed to a function belongs to the parent stack frame,
for example.

void wipe_stack(VOL void *) __attribute__ ((noinline));
void
wipe_stack(VOL void *l) {

#if CSTACK_DIRECTION == -1
   if (l>(void *)&l) bzero((void *)&l,l-(void *)&l);
#else
  l+=sizeof(l);
  if ((void *)&l>l) bzero(l,(void *)&l-l);
#endif

}

void clear_c_stack(VOL unsigned) __attribute__ ((noinline));
void
clear_c_stack(VOL unsigned n) {

  void *v=OBJNULL;
  alloca(n);
  wipe_stack(&v);

}

void
funcall(object fun)
{ 
object VOL sfirst=NULL;
wipe_stack(&sfirst);
{
object temporary=OBJNULL;
object x=OBJNULL;
object * VOL top=NULL;
object *lex=NULL;
bds_ptr old_bds_top=NULL;
VOL bool b=0;
bool c=0;
DEBUG_AVMA


}}

Raymond Toy writes:
>On the sparc port, this area can be zeroed out with appropriate
>optimization settings.  I ran some tests using Eric Marsden's
>cl-bench.   If the stack is always cleared, the cost of some
>benchmarks go up, but some go down, because the cost of GC is
>decreased.  (The benchmarks include GC time.)

Thanks for the tip -- does this use gcc, and if so, what is the
relevant switch?

Take care,

> Hans
> 
> > -Original Message-
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> > On Behalf Of Bryce McKinlay
> > Sent: Wednesday, May 25, 2005 6:55 AM
> > To: Ranjit Mathew
> > Cc: [EMAIL PROTECTED]
> > Subject: Re: [Fwd: Uninitialized stack gaps and conservative 
> > garbage collection]
> > 
> > 
> > Ranjit Mathew wrote:
> > 
> > >Interesting stuff found on the GCC list. Doesn't this
> > >affect GCJ as well where we use the conservative
> > >Boehm GC?
> > >  
> > >
> > > Original Message 
> > >Subject: Uninitialized stack gaps and conservative garbage collection
> > >Date: Tue, 24 May 2005 17:40:46 -0400
> > >From: Camm Maguire <[EMAIL PROTECTED]>
> > >Newsgroups: gmane.comp.gcc.devel,gmane.lisp.gcl.devel
> > >
> > >Greetings!  GCL is a lisp system that compiles to native object code 
> > >via the intermediary of gcc.  It uses a conservative garbage 
> > collection 
> > >algorithm, which means that it walks the C stack to find 
> > likely objects 
> > >in use by automatic variables, and holds on to these. This 
> > works quite 
> > >well in practice.
> > >
> > >For very large systems, the likelihood of holding onto objects which 
> > >should be collected increases.  In looking into this, it has 
> > come to my 
> > >attention that regardless of how carefully the C programmer 
> > initializes 
> > >variables on the stack, gcc will quite commonly allocate extra space 
> > >inaccessbile via any C variable.  These 'stack gaps' can wind up 
> > >permanently preventing a large portion of user memory from 
> > ever being 
> > >collected with this algorithm.
> > >
> > 
> > 
>

Re: [Fwd: Uninitialized stack gaps and conservative garbage collection]

2005-06-03 Thread Camm Maguire

Greetings!

Raymond Toy <[EMAIL PROTECTED]> writes:

> Camm Maguire wrote:
> > Raymond Toy writes:
> >
> >>On the sparc port, this area can be zeroed out with appropriate
> >>optimization settings.  I ran some tests using Eric Marsden's
> >>cl-bench.   If the stack is always cleared, the cost of some
> >>benchmarks go up, but some go down, because the cost of GC is
> >>decreased.  (The benchmarks include GC time.)
> > Thanks for the tip -- does this use gcc, and if so, what is the
> 
> You know, of course, that CMUCL is a native compiler that doesn't use
> gcc.  I hand-wrote the assembly code to clear the stack area. :-)
> 

:-)  I forgot.  I think I've discovered a way to do the same from
:within C, but I don't like it.  Right now we've addressed most issues
:by pushing the stack mark origin to the frame just above toplevel --
at least a toplevel gc should be able to reclaim all memory.

The stress test here is repeatedly making a list the size of the whole
heap :-).

Take care,

> Ray
> 
> 
> 

-- 
Camm Maguire[EMAIL PROTECTED]
==
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Incremental gcc

2006-03-25 Thread Camm Maguire

Greetings! GCL is a lisp compiler system which outputs C code normally
compiled by gcc into an object, which is then loaded and relocated
into the running GCL image.  In lisp, compiling is a very incremental
process, with many, often thousands of small functions compiled one at
a time.  GCL/gcc compilation speed would be greatly improved if gcc
could be run in some sort of incremental mode, perhaps piping from
stdin and outputting assemly only on stdout, if there was some way to
instruct gcc to flush intermediary results.  gas, which requires a
seekable bfd device realistically on some sort of file system, could
be run separately.

Is anything like this conceivable?

To Robert Dodier -- yes I agree, filesystem i/o is negligible as it is
all cached anyway on typical modern machines.

To Robert Boyer -- what an idea!  Eventually, the proper way is likely
to write GCL as a gcc frontend, as previously discussed briefly on
both lists.  Now is not the time, however, alas.

Take care,

Robert Boyer <[EMAIL PROTECTED]> writes:

> Let me make the following zany suggestion.  The first time gcl starts gcc, do
> it under gdb.  Give the gcc command two files, not one, with one of them
> being the cmpinclude.h and the other being the ordinary user's gazonk.c; I
> think gcc can work this way on several files to be compiled to one.  Before
> running, in gdb, put in a break point trap for the opening of files; in this
> way, trap the opening of the typical, second user file.  When that file
> opening point is reached after starting gcc under gdb, force feed GDB a
> machine code "fork" instruction.  Keep one of two forked processes alive but
> waiting as the "warm start" version for use later, and use the other fork to
> continue compiling gazonk.c as usual.  When the second fork later starts up,
> force feed it the new correct file and force it to fork itself.  I couldn't
> do any of this myself in 100 years, but maybe it would be possible for
> someone!  Just ignore this message as soon as it sounds insane.
> 
> Bob
> 
> 
> 
> 
> 

-- 
Camm Maguire[EMAIL PROTECTED]
==
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Re: Incremental gcc

2006-03-30 Thread Camm Maguire

Greetings, and thanks for your reply!


Mike Stump <[EMAIL PROTECTED]> writes:

> On Mar 25, 2006, at 9:14 PM, Camm Maguire wrote:
> > Greetings! GCL is a lisp compiler system which outputs C code normally
> > compiled by gcc into an object, which is then loaded and relocated
> > into the running GCL image.  In lisp, compiling is a very incremental
> > process, with many, often thousands of small functions compiled one at
> > a time.  GCL/gcc compilation speed would be greatly improved if gcc
> > could be run in some sort of incremental mode
> 
> Ignoring an ideal world for a second, if you have large amounts of  headers 
> that are stable that are needed for compilation, be sure to
> precompile them.  After that, batch up all code to be compiled into  larger 
> unit and compile it all in one go, as it saves a substantial

Thanks!  We've done this just recently to some benefit.

> amount of time (2x faster maybe), this way startup costs are amortized.
> 

This is often impossible to arrange at the compiler level in lisp.

> Longer term, it would be nice to have someone from your camp layout  where 
> the time is spent and what changes might be worth while in gcc  to
> make it more suitable for that type of work.

This would be interesting, how does one benchmark gcc performance in
this way?


> 
> You can run the compiler via stdin and stdout:
> 
> $ gcc -x c - -o - -S
> int i;
> main() { }
> ^D  .section __TEXT,__text,regular,pure_instructions
>  .section  __TEXT,__picsymbolstub1,symbol_stubs,pure_instructions,32
>  .machine ppc
>  .text
>  .align 2
>  .globl _main
> _main:
>  stmw r30,-8(r1)
>  stwu r1,-48(r1)
>  mr r30,r1
>  lwz r1,0(r1)
>  lmw r30,-8(r1)
>  blr
> .comm _i,4
>  .subsections_via_symbols
> 
> if you want, so that isn't very hard to do.  The sub-problem of an as  that 
> can operate this way is off-topic for this list, so I won't

This was the most promising.  If I could run gcc as a pipe with
assembler only output, all I need is a 'flush instruction on stdin to
get the assemly of the function(s) input thus far.  Then the pipe
should be ready for another function definition followed by a flush,
etc.  Is there such?

> address it.

The assembly problem is different, as you state.  Among other reasons,
the output must be seekable as it is bfd based.  Coincidentally, GCL's
binary input is bfd based, so it should be possible to incorporate the
assembler and leave the output in memory.

> 
> Long term, might be nice to have a way to wire in the assemblers into  gcc 
> directly.  If you guys feel ambitious, that might be doable; but
> since gc is driven by people that want to code, you'll have to find  someone 
> that wants to code it up.  The java folks might be interested
> in JITifying the compiler as well, if enough people are interested in  doing 
> that, it might be an interesting direction to take gcc, though
> building in as strikes me as a prerequisite.
> 

Thanks again.  There is interest, but it might be a ways off before
sufficient time becomes available.

Take care,

> 
> 

-- 
Camm Maguire[EMAIL PROTECTED]
==
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Re: clear_cache on Alpha architecture not implemented?

2012-05-03 Thread Camm Maguire

Greetings, and thanks foro looking into this!

Witold Baryluk  writes:

> On 05-03 09:19, Camm Maguire wrote:
>> Greetings!  The build succeeded, which, alas, means the failure at
>> 
>> http://buildd.debian-ports.org/status/fetch.php?pkg=axiom&arch=alpha&ver=20120301-3&stamp=1335722507
>> 
>> is again unreproducible, like the kfree_amd64 and ppc examples.  Sigh.
>> Are there known differences in the cpu cache clearing instructions
>> between imago and your machine? 
>> 
>> In any case, can one upload by hand builds to debian-ports like one can
>> into the official repository?
>> 
>> Take care,
>> -- 
>> Camm Maguire c...@maguirefamily.org
>> ==
>> "The earth is but one country, and mankind its citizens."  --  Baha'u'llah
>
> After lots of investigation, it looks that imb (I-cache memory barrier,
> used for flushing instruction cache on single cpu), is not emited or
> called when doing __builtin___clear_cache!
>
> It shouldn't be hard to add, for example using (define_expand
> "clear_cache" ... in gcc/config/alpha/alpha.md in gcc, or in 
> libgcc/config/alpha/linux.h
> as #define CLEAR_INSN_CACHE(beg, end) ...
>
> Either of them should just emit, "call_pal 0x86". It is better than
> "imb" because imb is not implemented in Tru64 assembler, and better than
> "call_pal pal_imb", because pal_imb (which is equal 134), requires
> include .
>
> Beyond that it would be naccassary to update kernel, to detect that user
> code called imb, by checking if proper flag in HWPCB structure is set,
> and call smp_imb() if naccassary on multi-processor system (probably
> when rescheduling this process on different cpu) and clear hwpcb flag,
> becuause imb invalidated Icache only on running cpu.
>
>
> Build of axiom package successed on my machine probably by luck, and by
> the fact it is single CPU machine with smaller cache than imago (and
> older subarchitecture, thus probably less agressive caching).
>
> What alpha maintainers thinks?

Pending a true fix, here is the gcl situation when built on alpha:

checking for main in -lX11... yes
checking for xdr_double... yes
checking __builtin___clear_cache... yes


and the relvant gcl code:

#define PAL_imb 134
#define imb() \
__asm__ __volatile__ ("call_pal %0 #imb" : : "i" (PAL_imb) : "memory")
#define CLEAR_CACHE imb()


#ifdef HAVE_BUILTIN_CLEAR_CACHE
static int
clear_protect_memory(object memory) {

  void *p,*pe;
  int i;

  p=(void *)((unsigned long)memory->cfd.cfd_start & ~(PAGESIZE-1));
  pe=(void *)((unsigned long)(memory->cfd.cfd_start+memory->cfd.cfd_size) & 
~(PAGESIZE-1)) + PAGESIZE-1;

  i=mprotect(p,pe-p,PROT_READ|PROT_WRITE|PROT_EXEC);

  __builtin___clear_cache((void *)memory->cfd.cfd_start,(void 
*)memory->cfd.cfd_start+memory->cfd.cfd_size);

  return i;

}
#endif



#ifdef HAVE_BUILTIN_CLEAR_CACHE
  massert(!clear_protect_memory(memory));
#else
#ifdef CLEAR_CACHE
  CLEAR_CACHE;
#endif
#endif  


The goal was to exercise the very helpful gcc __builtin___clear_cache
support, and to avoid having to maintain our own assembler for all the
different cpus in this regard.  Clearly, it is easy to revert this on a
per architecture basis if absolutely necessary.  If gcc does or does not
plan on fixing this, please let me know so gcl can adjust as needed.

Take care,
-- 
Camm Maguirec...@maguirefamily.org
==
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Re: clear_cache on Alpha architecture not implemented?

2012-05-04 Thread Camm Maguire

Greetings, and thanks for this very helpful synopsis.

I'm wondering if there is a simple configure time test to detect when
this has been fixed.  If I just aborted using __builtin___clear_cache if
it is in fact a noop on alpha, ppc, ppc64, and ia64, would this suffice?

Take care,

Richard Henderson  writes:

> On 05/03/2012 10:51 AM, Camm Maguire wrote:
>> The goal was to exercise the very helpful gcc __builtin___clear_cache
>> support, and to avoid having to maintain our own assembler for all the
>> different cpus in this regard.  Clearly, it is easy to revert this on a
>> per architecture basis if absolutely necessary.  If gcc does or does not
>> plan on fixing this, please let me know so gcl can adjust as needed.
>
> While we can probably fix this, you should know that __builtin_clear_cache
> is highly tied to the implementation of trampolines for the target.  Thus
> there are at least 3 targets that do not handle this "properly":
>
> For alpha, we emit imb directly during the trampoline_init target hook.
>
> For powerpc32, the libgcc routine __clear_cache is unimplemented, but the
> cache flushing for trampolines is inside the __trampoline_setup routine.
>
> For powerpc64 and ia64, the ABI for function calls allows trampolines to
> be implemented without emitting any insns, and thus the icache need not be
> flushed at all.  And thus we never bothered implementing 
> __builtin_clear_cache.
>
> So, the fact of the matter is that you can't reliably use this builtin for
> arbitrary targets for any gcc version up to 4.7.  Feel free to submit an
> enhancement request via bugzilla so that we can remember to address this
> for gcc 4.8.
>

-- 
Camm Maguirec...@maguirefamily.org
==
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Re: clear_cache on Alpha architecture not implemented?

2012-05-04 Thread Camm Maguire

Greetings!  As a followup, I should note that sh4 and hppa are also
broken.  I currently have this in configure.in

case $use in
 sh4*) ;; #FIXME
 hppa*) ;; #FIXME
 *) 
 AC_MSG_CHECKING(__builtin___clear_cache)
AC_TRY_COMPILE([],
   [void *v,*ve;
__builtin___clear_cache(v,ve);
],
[AC_DEFINE(HAVE_BUILTIN_CLEAR_CACHE)
 AC_MSG_RESULT(yes)],
 AC_MSG_RESULT(no));;
esac

Perhaps I should just add alpha, powerpc, ppc64, and ia64 to the
exception list and deal with this in the future if memory serves.

Take care,


Richard Henderson  writes:

> On 05/03/2012 10:51 AM, Camm Maguire wrote:
>> The goal was to exercise the very helpful gcc __builtin___clear_cache
>> support, and to avoid having to maintain our own assembler for all the
>> different cpus in this regard.  Clearly, it is easy to revert this on a
>> per architecture basis if absolutely necessary.  If gcc does or does not
>> plan on fixing this, please let me know so gcl can adjust as needed.
>
> While we can probably fix this, you should know that __builtin_clear_cache
> is highly tied to the implementation of trampolines for the target.  Thus
> there are at least 3 targets that do not handle this "properly":
>
> For alpha, we emit imb directly during the trampoline_init target hook.
>
> For powerpc32, the libgcc routine __clear_cache is unimplemented, but the
> cache flushing for trampolines is inside the __trampoline_setup routine.
>
> For powerpc64 and ia64, the ABI for function calls allows trampolines to
> be implemented without emitting any insns, and thus the icache need not be
> flushed at all.  And thus we never bothered implementing 
> __builtin_clear_cache.
>
> So, the fact of the matter is that you can't reliably use this builtin for
> arbitrary targets for any gcc version up to 4.7.  Feel free to submit an
> enhancement request via bugzilla so that we can remember to address this
> for gcc 4.8.
>
>
>
> r~
>
>
>
>

-- 
Camm Maguirec...@maguirefamily.org
==
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Re: clear_cache on Alpha architecture not implemented?

2012-05-04 Thread Camm Maguire

Greetings!  Last followup:

Nothing here should affect any kfreebsd_amd64 machine, right?  My
understanding is that these do not require cache flushing.  Thus the
failure:

https://buildd.debian.org/status/fetch.php?pkg=acl2&arch=kfreebsd-amd64&ver=4.3-1&stamp=1326315213

mv *saved_acl2.gcl saved_acl2
/usr/bin/make mini-proveall
make[1]: Entering directory 
`/build/buildd-acl2_4.3-1-kfreebsd-amd64-SScGlk/acl2-4.3'
Aborted
make[1]: *** [mini-proveall] Error 134
make[1]: Leaving directory 
`/build/buildd-acl2_4.3-1-kfreebsd-amd64-SScGlk/acl2-4.3'
make: *** [debian/mini-proveall.out] Error 2

which has shown up on

  * Host name:fasch.debian.org
  * Architecture:kfreebsd-amd64
  * Distribution:Debian GNU/Linux
  * Sponsor:
  + Greek Research and Technology Network (GRNET) (hosting)
  + Nordic Gaming (hardware)
  * Processor:PowerEdge 1855

and 

  * Host name:fano.debian.org
  * Architecture:kfreebsd-amd64
  * Distribution:Debian GNU/kFreeBSD
  * Sponsor:
  + University of British Columbia - Department of Electrical and Computer 
Engineering (hosting)
  + Hewlett-Packard (hardware)


but is not reproducible on

  * Host name:asdfasdf.debian.net
  * Architecture:kfreebsd-amd64
  * Distribution:Debian GNU/kFreeBSD
  * Sponsor:
  + ETH Zurich - Department of Physics (hosting+hw)
  * Processor:AMD Sempron 3000+ 1600 MHz

is not related?

Thanks so much!

Richard Henderson  writes:

> On 05/03/2012 10:51 AM, Camm Maguire wrote:
>> The goal was to exercise the very helpful gcc __builtin___clear_cache
>> support, and to avoid having to maintain our own assembler for all the
>> different cpus in this regard.  Clearly, it is easy to revert this on a
>> per architecture basis if absolutely necessary.  If gcc does or does not
>> plan on fixing this, please let me know so gcl can adjust as needed.
>
> While we can probably fix this, you should know that __builtin_clear_cache
> is highly tied to the implementation of trampolines for the target.  Thus
> there are at least 3 targets that do not handle this "properly":
>
> For alpha, we emit imb directly during the trampoline_init target hook.
>
> For powerpc32, the libgcc routine __clear_cache is unimplemented, but the
> cache flushing for trampolines is inside the __trampoline_setup routine.
>
> For powerpc64 and ia64, the ABI for function calls allows trampolines to
> be implemented without emitting any insns, and thus the icache need not be
> flushed at all.  And thus we never bothered implementing 
> __builtin_clear_cache.
>
> So, the fact of the matter is that you can't reliably use this builtin for
> arbitrary targets for any gcc version up to 4.7.  Feel free to submit an
> enhancement request via bugzilla so that we can remember to address this
> for gcc 4.8.
>
>
>
> r~
>
>
>
>

-- 
Camm Maguirec...@maguirefamily.org
==
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Uninitialized stack gaps and conservative garbage collection

Re: [Fwd: Uninitialized stack gaps and conservative garbage collection]

Re: [Fwd: Uninitialized stack gaps and conservative garbage collection]

Incremental gcc

Re: Incremental gcc

Re: clear_cache on Alpha architecture not implemented?

Re: clear_cache on Alpha architecture not implemented?

Re: clear_cache on Alpha architecture not implemented?

Re: clear_cache on Alpha architecture not implemented?

9 matches

Site Navigation

Mail list logo

Footer information