4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure.

2012-05-03 Thread Mailaripillai, Kannan Jeganathan
Hi,

This is regarding gcc.c-torture/execute/vla-dealloc-1.c failure.

4.7.0 ia64-hp-hpux:   program timed out (time out 300 seconds).
4.7.0 ia64-redhat-linux:  program timed out (time out 300 seconds).
4.7.0 x86_64-suse-linux:  execution completes successfully.

Inserting a printf statement in the loop path makes the executable to 
complete executing without any issues.

4.6.3 ia64-hp-hpux:   execution completes successfully.

So it looks like a regression in 4.7.0. Any suggestion, which fix (check in) 
between 4.6.3 and 4.7.0 could have caused this failure?

 gcc.c-torture/execute/vla-dealloc-1.c

#if (__SIZEOF_INT__ <= 2)
#define LIMIT 1
#else
#define LIMIT 100
#endif

void *volatile p;

int
main (void)
{
  int n = 0;
  if (0)
{
lab:;
}
  int x[n % 1000 + 1];
  x[0] = 1;
  x[n % 1000] = 2;
  p = x;
  n++;
  if (n < LIMIT)
goto lab;
  return 0;
}

Regards,
Kannan



RE: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure.

2012-05-03 Thread Mailaripillai, Kannan Jeganathan
Hi,

Similarly for the following two test case which deals with VLA 
de-allocation in a branch back situation:
  1. gcc.c-torture/execute/pr43220.c
  2. gcc.c-torture/execute/20040811-1.c

Regards,
Kannan

-Original Message-
From: Mailaripillai, Kannan Jeganathan 
Sent: Thursday, May 03, 2012 4:48 PM
To: 'gcc@gcc.gnu.org'
Subject: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure.

Hi,

This is regarding gcc.c-torture/execute/vla-dealloc-1.c failure.

4.7.0 ia64-hp-hpux:   program timed out (time out 300 seconds).
4.7.0 ia64-redhat-linux:  program timed out (time out 300 seconds).
4.7.0 x86_64-suse-linux:  execution completes successfully.

Inserting a printf statement in the loop path makes the executable to 
complete executing without any issues.

4.6.3 ia64-hp-hpux:   execution completes successfully.

So it looks like a regression in 4.7.0. Any suggestion, which fix (check in) 
between 4.6.3 and 4.7.0 could have caused this failure?

 gcc.c-torture/execute/vla-dealloc-1.c

#if (__SIZEOF_INT__ <= 2)
#define LIMIT 1
#else
#define LIMIT 100
#endif

void *volatile p;

int
main (void)
{
  int n = 0;
  if (0)
{
lab:;
}
  int x[n % 1000 + 1];
  x[0] = 1;
  x[n % 1000] = 2;
  p = x;
  n++;
  if (n < LIMIT)
goto lab;
  return 0;
}

Regards,
Kannan



Re: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure.

2012-05-03 Thread Richard Guenther
On Thu, May 3, 2012 at 1:33 PM, Mailaripillai, Kannan Jeganathan
 wrote:
> Hi,
>
> Similarly for the following two test case which deals with VLA
> de-allocation in a branch back situation:
>  1. gcc.c-torture/execute/pr43220.c
>  2. gcc.c-torture/execute/20040811-1.c

Can you try --param large-stack-frame=1?  Is sizeof (int) <= 2?

> Regards,
> Kannan
>
> -Original Message-
> From: Mailaripillai, Kannan Jeganathan
> Sent: Thursday, May 03, 2012 4:48 PM
> To: 'gcc@gcc.gnu.org'
> Subject: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure.
>
> Hi,
>
> This is regarding gcc.c-torture/execute/vla-dealloc-1.c failure.
>
> 4.7.0 ia64-hp-hpux:       program timed out (time out 300 seconds).
> 4.7.0 ia64-redhat-linux:  program timed out (time out 300 seconds).
> 4.7.0 x86_64-suse-linux:  execution completes successfully.
>
> Inserting a printf statement in the loop path makes the executable to
> complete executing without any issues.
>
> 4.6.3 ia64-hp-hpux:       execution completes successfully.
>
> So it looks like a regression in 4.7.0. Any suggestion, which fix (check in)
> between 4.6.3 and 4.7.0 could have caused this failure?
>
>  gcc.c-torture/execute/vla-dealloc-1.c
>
> #if (__SIZEOF_INT__ <= 2)
> #define LIMIT 1
> #else
> #define LIMIT 100
> #endif
>
> void *volatile p;
>
> int
> main (void)
> {
>  int n = 0;
>  if (0)
>    {
>    lab:;
>    }
>  int x[n % 1000 + 1];
>  x[0] = 1;
>  x[n % 1000] = 2;
>  p = x;
>  n++;
>  if (n < LIMIT)
>    goto lab;
>  return 0;
> }
>
> Regards,
> Kannan
>


Re: Porting new target architecture to GCC

2012-05-03 Thread Ben Morgan

Thank you all very much for your comments and advice!

It certainly has helped me to gain a better perspective on porting GCC
to a new architecture.

It's not directly a suggestion from my "Betreuer"; a few suggested it
as a Praktikum or as a masters thesis, and I wanted to look into it and
see what was possible.

Ben


RE: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure.

2012-05-03 Thread Mailaripillai, Kannan Jeganathan

> Can you try --param large-stack-frame=1?  
No change. The program did not return/exit for more than 12 minutes.

> Is sizeof (int) <= 2?
No.

Regards,
Kannan

-Original Message-
From: Richard Guenther [mailto:richard.guent...@gmail.com] 
Sent: Thursday, May 03, 2012 5:14 PM
To: Mailaripillai, Kannan Jeganathan
Cc: gcc@gcc.gnu.org
Subject: Re: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure.

On Thu, May 3, 2012 at 1:33 PM, Mailaripillai, Kannan Jeganathan
 wrote:
> Hi,
>
> Similarly for the following two test case which deals with VLA
> de-allocation in a branch back situation:
>  1. gcc.c-torture/execute/pr43220.c
>  2. gcc.c-torture/execute/20040811-1.c

Can you try --param large-stack-frame=1?  Is sizeof (int) <= 2?

> Regards,
> Kannan
>
> -Original Message-
> From: Mailaripillai, Kannan Jeganathan
> Sent: Thursday, May 03, 2012 4:48 PM
> To: 'gcc@gcc.gnu.org'
> Subject: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure.
>
> Hi,
>
> This is regarding gcc.c-torture/execute/vla-dealloc-1.c failure.
>
> 4.7.0 ia64-hp-hpux:       program timed out (time out 300 seconds).
> 4.7.0 ia64-redhat-linux:  program timed out (time out 300 seconds).
> 4.7.0 x86_64-suse-linux:  execution completes successfully.
>
> Inserting a printf statement in the loop path makes the executable to
> complete executing without any issues.
>
> 4.6.3 ia64-hp-hpux:       execution completes successfully.
>
> So it looks like a regression in 4.7.0. Any suggestion, which fix (check in)
> between 4.6.3 and 4.7.0 could have caused this failure?
>
>  gcc.c-torture/execute/vla-dealloc-1.c
>
> #if (__SIZEOF_INT__ <= 2)
> #define LIMIT 1
> #else
> #define LIMIT 100
> #endif
>
> void *volatile p;
>
> int
> main (void)
> {
>  int n = 0;
>  if (0)
>    {
>    lab:;
>    }
>  int x[n % 1000 + 1];
>  x[0] = 1;
>  x[n % 1000] = 2;
>  p = x;
>  n++;
>  if (n < LIMIT)
>    goto lab;
>  return 0;
> }
>
> Regards,
> Kannan
>


Re: Porting new target architecture to GCC

2012-05-03 Thread Ben Morgan

On 02/05/12 16:36, Ian Lance Taylor wrote:

I don't know what a bachelor thesis is, so I don't know if this would be
suitable.  A GCC port by itself would be too simple for a masters thesis
in the U.S.


Yes, on second thought, the Betreuer I mentioned did not say anything 
about a masters thesis; I just recall seeing that someone at some 
university had done a port for his masters thesis (albeit with GCC 2.9).


Re: Paradoxical subreg reload issue

2012-05-03 Thread Aurelien Buhrig
02/05/2012 21:36, Eric Botcazou :
>> I have an issue (gcc 4.6.3, private bacakend) when reloading operands of
>> this insn:
>> (set (subreg:SI (reg:QI 21 [ iftmp.1 ]) 0)
>>  (lshiftrt:SI (reg/v:SI 24 [ w ]) (const_int 31 [0x1f]))
>>
>> The register 21 is reloaded into
>> (reg:QI 0 r0 [orig:21 iftmp.1 ] [21]), which is a HI-wide hw register.
>> Since it is a BIG_ENDIAN target, the SI subreg regno is then -1.
>>
>> Note that word_mode is SImode, whereas the class r0 belongs to is
>> HI-wide. I don't know if this matters when reloading.
>>
>> I have no idea how to debug this, if it is a backend or a reload bug.
> 
> RA/reload is known to have issues with word-mode paradoxical subregs on 
> big-endian machines.  For example, on SPARC 64-bit, we run into similar 
> problems for FP regs, which are 32-bit.  Likewise on HP-PA 64-bit I think.
> 
> So we have kludges in the back-end:
> 
> /* Defines invalid mode changes.  Borrowed from the PA port.
> 
>SImode loads to floating-point registers are not zero-extended.
>The definition for LOAD_EXTEND_OP specifies that integer loads
>narrower than BITS_PER_WORD will be zero-extended.  As a result,
>we inhibit changes from SImode unless they are to a mode that is
>identical in size.
> 
>Likewise for SFmode, since word-mode paradoxical subregs are
>problematic on big-endian architectures.  */
> 
> #define CANNOT_CHANGE_MODE_CLASS(FROM, TO, CLASS) \
>   (TARGET_ARCH64  \
>&& GET_MODE_SIZE (FROM) == 4   \
>&& GET_MODE_SIZE (TO) != 4 \
>? reg_classes_intersect_p (CLASS, FP_REGS) : 0)
> 


I modified CANNOT_CHANGE_MODE_CLASS as you suggested. But strange as it
may seem, it has no effect on such a reload, and I can't find a way to
make it work...

BTW, has this bug already been filed?
Do you have an idea how deep is this bug in the reload, how complex it
is and which part in the reload it is related to?

Thanks,
Aurélien


Re: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure.

2012-05-03 Thread Richard Guenther
On Thu, May 3, 2012 at 2:02 PM, Mailaripillai, Kannan Jeganathan
 wrote:
>
>> Can you try --param large-stack-frame=1?
> No change. The program did not return/exit for more than 12 minutes.
>
>> Is sizeof (int) <= 2?
> No.

I suppose you open a bugreport then (or check if one is already present
for these failures).  I have no idea what could have caused them, but
ia64 is basically unmaintained.

Richard.

> Regards,
> Kannan
>
> -Original Message-
> From: Richard Guenther [mailto:richard.guent...@gmail.com]
> Sent: Thursday, May 03, 2012 5:14 PM
> To: Mailaripillai, Kannan Jeganathan
> Cc: gcc@gcc.gnu.org
> Subject: Re: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure.
>
> On Thu, May 3, 2012 at 1:33 PM, Mailaripillai, Kannan Jeganathan
>  wrote:
>> Hi,
>>
>> Similarly for the following two test case which deals with VLA
>> de-allocation in a branch back situation:
>>  1. gcc.c-torture/execute/pr43220.c
>>  2. gcc.c-torture/execute/20040811-1.c
>
> Can you try --param large-stack-frame=1?  Is sizeof (int) <= 2?
>
>> Regards,
>> Kannan
>>
>> -Original Message-
>> From: Mailaripillai, Kannan Jeganathan
>> Sent: Thursday, May 03, 2012 4:48 PM
>> To: 'gcc@gcc.gnu.org'
>> Subject: 4.7.0 regression? gcc.c-torture/execute/vla-dealloc-1.c failure.
>>
>> Hi,
>>
>> This is regarding gcc.c-torture/execute/vla-dealloc-1.c failure.
>>
>> 4.7.0 ia64-hp-hpux:       program timed out (time out 300 seconds).
>> 4.7.0 ia64-redhat-linux:  program timed out (time out 300 seconds).
>> 4.7.0 x86_64-suse-linux:  execution completes successfully.
>>
>> Inserting a printf statement in the loop path makes the executable to
>> complete executing without any issues.
>>
>> 4.6.3 ia64-hp-hpux:       execution completes successfully.
>>
>> So it looks like a regression in 4.7.0. Any suggestion, which fix (check in)
>> between 4.6.3 and 4.7.0 could have caused this failure?
>>
>>  gcc.c-torture/execute/vla-dealloc-1.c
>>
>> #if (__SIZEOF_INT__ <= 2)
>> #define LIMIT 1
>> #else
>> #define LIMIT 100
>> #endif
>>
>> void *volatile p;
>>
>> int
>> main (void)
>> {
>>  int n = 0;
>>  if (0)
>>    {
>>    lab:;
>>    }
>>  int x[n % 1000 + 1];
>>  x[0] = 1;
>>  x[n % 1000] = 2;
>>  p = x;
>>  n++;
>>  if (n < LIMIT)
>>    goto lab;
>>  return 0;
>> }
>>
>> Regards,
>> Kannan
>>


-Os seems to remove a variable necessary for my transformation

2012-05-03 Thread Matt Davis
I have been fighting with a simple problem for the past few nights.  In my
GIMPLE pass I create a variable, an ssa name for that variable, and then assign
it to the LHS of a call I build:

  call = gimple_build_call(fndecl, 0);
  decl = create_tmp_var(TREE_TYPE(TREE_TYPE(test_decode_fndecl)), "FOO");
  ssa  = make_ssa_name(decl, call);
  gimple_set_lhs(call, ssa);
  gsi_insert_before(&gsi, call, GSI_SAME_STMT);

I verify the GIMPLE and it looks proper:

  char * FOO.2;
  /* some gimple code */
  FOO.2_8 = fn();
  strcat(&mylocal, FOO.2_8);

However, when I build with -Os the GIMPLE looks like:
  char * FOO.2;
  /* some gimple code */
  fn();
  strcat(&mylocal, FOO.2_8);

And the compiler segfaults in "ptr_deref_may_alias_decl_p" for 'ptr' FOO.2_8 and
'decl' mylocal arguments.  The segfault occurs because TREE_TYPE(ptr) is NULL
and calling POINTER_TYPE_P(TREE_TYPE(ptr)) dereferences a NULL.  

After more debugging, it is eliminate_unnecessary_stmts() which tries to remove
the 'FOO.2_8 = fn();' statement.  In that routine, the LHS is considered "dead"
and because of that the routine sets the LHS to NULL.

So, I guess my question is, how can I force this stmt to hang around?  I looked
at eliminate_unnecessary_stmts, and do not see any specific flags I can set to
the stmt to make 'em hang around, and I do not know what to do to make LHS
appear not "dead."  Even if I set 'ssa' TREE_USED and 'decl' as DECL_PRESERVE_P.

Thanks for any information!

-Matt


Re: -Os seems to remove a variable necessary for my transformation

2012-05-03 Thread Richard Guenther
On Thu, May 3, 2012 at 3:30 PM, Matt Davis  wrote:
> I have been fighting with a simple problem for the past few nights.  In my
> GIMPLE pass I create a variable, an ssa name for that variable, and then 
> assign
> it to the LHS of a call I build:
>
>  call = gimple_build_call(fndecl, 0);
>  decl = create_tmp_var(TREE_TYPE(TREE_TYPE(test_decode_fndecl)), "FOO");
>  ssa  = make_ssa_name(decl, call);
>  gimple_set_lhs(call, ssa);
>  gsi_insert_before(&gsi, call, GSI_SAME_STMT);
>
> I verify the GIMPLE and it looks proper:
>
>  char * FOO.2;
>  /* some gimple code */
>  FOO.2_8 = fn();
>  strcat(&mylocal, FOO.2_8);

Probably for the use stmt you did not call update_stmt.  You can check
in the debugger by debug_immediate_uses () which likely prints no use
for FOO.2_8

>
> However, when I build with -Os the GIMPLE looks like:
>  char * FOO.2;
>  /* some gimple code */
>  fn();
>  strcat(&mylocal, FOO.2_8);
>
> And the compiler segfaults in "ptr_deref_may_alias_decl_p" for 'ptr' FOO.2_8 
> and
> 'decl' mylocal arguments.  The segfault occurs because TREE_TYPE(ptr) is NULL
> and calling POINTER_TYPE_P(TREE_TYPE(ptr)) dereferences a NULL.

That's what happens when an SSA name is released.

> After more debugging, it is eliminate_unnecessary_stmts() which tries to 
> remove
> the 'FOO.2_8 = fn();' statement.  In that routine, the LHS is considered 
> "dead"
> and because of that the routine sets the LHS to NULL.

It's dead if it has no uses.

> So, I guess my question is, how can I force this stmt to hang around?  I 
> looked
> at eliminate_unnecessary_stmts, and do not see any specific flags I can set to
> the stmt to make 'em hang around, and I do not know what to do to make LHS
> appear not "dead."  Even if I set 'ssa' TREE_USED and 'decl' as 
> DECL_PRESERVE_P.

Well, that's not how SSA uses/defs work.

> Thanks for any information!
>
> -Matt


Re: Missing compile-time warning for orphaned memory

2012-05-03 Thread Samuel David
Hi gcc team,

The following code creates orphaned memory by ignoring the return value of new:

#include
using namespace std;

int main ( void ) {
for (int i=0; i< 1000; i++)
new int[1000];
int a;
cin >> a;

return 0;
}

Should g++ report a compile-time warning for this case?

root@quant:/tmp# g++ -Wall -Wextra MemoryLeakCheckCompilerWarning.cpp
-o MemoryLeakCheckCompilerWarning.exe
root@quant:/tmp#

Warm regards,
-S


clear_cache on Alpha architecture not implemented?

2012-05-03 Thread Witold Baryluk
On 05-03 09:19, Camm Maguire wrote:
> Greetings!  The build succeeded, which, alas, means the failure at
> 
> http://buildd.debian-ports.org/status/fetch.php?pkg=axiom&arch=alpha&ver=20120301-3&stamp=1335722507
> 
> is again unreproducible, like the kfree_amd64 and ppc examples.  Sigh.
> Are there known differences in the cpu cache clearing instructions
> between imago and your machine? 
> 
> In any case, can one upload by hand builds to debian-ports like one can
> into the official repository?
> 
> Take care,
> -- 
> Camm Maguire  c...@maguirefamily.org
> ==
> "The earth is but one country, and mankind its citizens."  --  Baha'u'llah

After lots of investigation, it looks that imb (I-cache memory barrier,
used for flushing instruction cache on single cpu), is not emited or
called when doing __builtin___clear_cache!

It shouldn't be hard to add, for example using (define_expand
"clear_cache" ... in gcc/config/alpha/alpha.md in gcc, or in 
libgcc/config/alpha/linux.h
as #define CLEAR_INSN_CACHE(beg, end) ...

Either of them should just emit, "call_pal 0x86". It is better than
"imb" because imb is not implemented in Tru64 assembler, and better than
"call_pal pal_imb", because pal_imb (which is equal 134), requires
include .

Beyond that it would be naccassary to update kernel, to detect that user
code called imb, by checking if proper flag in HWPCB structure is set,
and call smp_imb() if naccassary on multi-processor system (probably
when rescheduling this process on different cpu) and clear hwpcb flag,
becuause imb invalidated Icache only on running cpu.


Build of axiom package successed on my machine probably by luck, and by
the fact it is single CPU machine with smaller cache than imago (and
older subarchitecture, thus probably less agressive caching).

What alpha maintainers thinks?

Regards,
Witek

-- 
Witold Baryluk


Re: Missing compile-time warning for orphaned memory

2012-05-03 Thread Jonathan Wakely
On 3 May 2012 17:16, Samuel David wrote:
> Hi gcc team,
>
> The following code creates orphaned memory by ignoring the return value of 
> new:

[...]

> Should g++ report a compile-time warning for this case?

Maybe.  If you think so then please open a bugzilla report asking for
a new warning, thanks.


Re: Missing compile-time warning for orphaned memory

2012-05-03 Thread Samuel David
On Thu, May 3, 2012 at 1:29 PM, Jonathan Wakely  wrote:
>
> Maybe.  If you think so then please open a bugzilla report asking for
> a new warning, thanks.

Done! http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53215

With sincere regards,
-S


Re: clear_cache on Alpha architecture not implemented?

2012-05-03 Thread Camm Maguire
Greetings, and thanks foro looking into this!

Witold Baryluk  writes:

> On 05-03 09:19, Camm Maguire wrote:
>> Greetings!  The build succeeded, which, alas, means the failure at
>> 
>> http://buildd.debian-ports.org/status/fetch.php?pkg=axiom&arch=alpha&ver=20120301-3&stamp=1335722507
>> 
>> is again unreproducible, like the kfree_amd64 and ppc examples.  Sigh.
>> Are there known differences in the cpu cache clearing instructions
>> between imago and your machine? 
>> 
>> In any case, can one upload by hand builds to debian-ports like one can
>> into the official repository?
>> 
>> Take care,
>> -- 
>> Camm Maguire c...@maguirefamily.org
>> ==
>> "The earth is but one country, and mankind its citizens."  --  Baha'u'llah
>
> After lots of investigation, it looks that imb (I-cache memory barrier,
> used for flushing instruction cache on single cpu), is not emited or
> called when doing __builtin___clear_cache!
>
> It shouldn't be hard to add, for example using (define_expand
> "clear_cache" ... in gcc/config/alpha/alpha.md in gcc, or in 
> libgcc/config/alpha/linux.h
> as #define CLEAR_INSN_CACHE(beg, end) ...
>
> Either of them should just emit, "call_pal 0x86". It is better than
> "imb" because imb is not implemented in Tru64 assembler, and better than
> "call_pal pal_imb", because pal_imb (which is equal 134), requires
> include .
>
> Beyond that it would be naccassary to update kernel, to detect that user
> code called imb, by checking if proper flag in HWPCB structure is set,
> and call smp_imb() if naccassary on multi-processor system (probably
> when rescheduling this process on different cpu) and clear hwpcb flag,
> becuause imb invalidated Icache only on running cpu.
>
>
> Build of axiom package successed on my machine probably by luck, and by
> the fact it is single CPU machine with smaller cache than imago (and
> older subarchitecture, thus probably less agressive caching).
>
> What alpha maintainers thinks?

Pending a true fix, here is the gcl situation when built on alpha:

checking for main in -lX11... yes
checking for xdr_double... yes
checking __builtin___clear_cache... yes


and the relvant gcl code:

#define PAL_imb 134
#define imb() \
__asm__ __volatile__ ("call_pal %0 #imb" : : "i" (PAL_imb) : "memory")
#define CLEAR_CACHE imb()


#ifdef HAVE_BUILTIN_CLEAR_CACHE
static int
clear_protect_memory(object memory) {

  void *p,*pe;
  int i;

  p=(void *)((unsigned long)memory->cfd.cfd_start & ~(PAGESIZE-1));
  pe=(void *)((unsigned long)(memory->cfd.cfd_start+memory->cfd.cfd_size) & 
~(PAGESIZE-1)) + PAGESIZE-1;

  i=mprotect(p,pe-p,PROT_READ|PROT_WRITE|PROT_EXEC);

  __builtin___clear_cache((void *)memory->cfd.cfd_start,(void 
*)memory->cfd.cfd_start+memory->cfd.cfd_size);

  return i;

}
#endif



#ifdef HAVE_BUILTIN_CLEAR_CACHE
  massert(!clear_protect_memory(memory));
#else
#ifdef CLEAR_CACHE
  CLEAR_CACHE;
#endif
#endif  


The goal was to exercise the very helpful gcc __builtin___clear_cache
support, and to avoid having to maintain our own assembler for all the
different cpus in this regard.  Clearly, it is easy to revert this on a
per architecture basis if absolutely necessary.  If gcc does or does not
plan on fixing this, please let me know so gcl can adjust as needed.

Take care,
-- 
Camm Maguirec...@maguirefamily.org
==
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah


Re: clear_cache on Alpha architecture not implemented?

2012-05-03 Thread Richard Henderson

On 05/03/2012 10:51 AM, Camm Maguire wrote:

The goal was to exercise the very helpful gcc __builtin___clear_cache
support, and to avoid having to maintain our own assembler for all the
different cpus in this regard.  Clearly, it is easy to revert this on a
per architecture basis if absolutely necessary.  If gcc does or does not
plan on fixing this, please let me know so gcl can adjust as needed.


While we can probably fix this, you should know that __builtin_clear_cache
is highly tied to the implementation of trampolines for the target.  Thus
there are at least 3 targets that do not handle this "properly":

For alpha, we emit imb directly during the trampoline_init target hook.

For powerpc32, the libgcc routine __clear_cache is unimplemented, but the
cache flushing for trampolines is inside the __trampoline_setup routine.

For powerpc64 and ia64, the ABI for function calls allows trampolines to
be implemented without emitting any insns, and thus the icache need not be
flushed at all.  And thus we never bothered implementing __builtin_clear_cache.

So, the fact of the matter is that you can't reliably use this builtin for
arbitrary targets for any gcc version up to 4.7.  Feel free to submit an
enhancement request via bugzilla so that we can remember to address this
for gcc 4.8.



r~


GNU Tools Cauldron 2012 - Hotels and registered presentations

2012-05-03 Thread Diego Novillo

An update on the GNU Tools Cauldron (http://gcc.gnu.org/wiki/cauldron2012)

We have published an initial schedule for the workshop.  It is
available at http://gcc.gnu.org/wiki/cauldron2012.

Presenters, please double-check your entries.  If you find
anything missing or wrong, please contact me and I will correct
it.


Thank you.


Re: clear_cache on Alpha architecture not implemented?

2012-05-03 Thread Witold Baryluk
On 05-03 11:34, Richard Henderson wrote:
> On 05/03/2012 10:51 AM, Camm Maguire wrote:
> >The goal was to exercise the very helpful gcc __builtin___clear_cache
> >support, and to avoid having to maintain our own assembler for all the
> >different cpus in this regard.  Clearly, it is easy to revert this on a
> >per architecture basis if absolutely necessary.  If gcc does or does not
> >plan on fixing this, please let me know so gcl can adjust as needed.
> 
> While we can probably fix this, you should know that __builtin_clear_cache
> is highly tied to the implementation of trampolines for the target.  Thus
> there are at least 3 targets that do not handle this "properly":
> 
> For alpha, we emit imb directly during the trampoline_init target hook.
> 
> For powerpc32, the libgcc routine __clear_cache is unimplemented, but the
> cache flushing for trampolines is inside the __trampoline_setup routine.
> 
> For powerpc64 and ia64, the ABI for function calls allows trampolines to
> be implemented without emitting any insns, and thus the icache need not be
> flushed at all.  And thus we never bothered implementing 
> __builtin_clear_cache.
> 
> So, the fact of the matter is that you can't reliably use this builtin for
> arbitrary targets for any gcc version up to 4.7.  Feel free to submit an
> enhancement request via bugzilla so that we can remember to address this
> for gcc 4.8.
> 
> 
> 
> r~

__builtin__clear_cache was introduced in gcc 4.3.0 (November, 2008), so
I understand alpha could be omited.

I belive on alpha trampoline init just emits imb directly using inline
asm. Implementing nacassary part into clear_cache should not break it,
and actual will make it possible to simplify it in the future.

Also kernel side improvment, should not break trampoline init. It is
just now more a matter of luck that it works currently on multi cpu
systems. Alpha ARM does say that Alpha implementation do not need to
guarantee that  imb will invalidate Icache on other CPUs. It currently
works maybe because actuall implementation actually invalidate Icache
even on multi CPU system, or because trampoline init happens very early,
definietly before any threads other than main is started, and with high
probability it will not be migrated to other physical CPU.

I cannot find in kernel actuall code for handling invalidation of Icache
in userspace, so I currently assume it is not implemented. But should
be.

Generally Icache invalidation on alpha is IMHO designed badly, because
any user can invalidate all Icache in whole system (including other
users/processes code and kernel code), thus decressing performance
considerably. Many other architectures, have explicit memory range given
for such operation, and ownership of this memory region is checked (by
hardware or kernel). I understand it is artificial problem now probably
(due small importantce of alpha arch nowdays), but there are possible
workarounds for handling such wrong-doing processes. Anyway process can
still invalidate Icache without kernel help, by just starting multiple
threads, pining them to all procesors and doing imb in userspace anyway.
Despite being unpriviliged userspace PAL_CALL, there is probably some
way to trap imb call (maybe even without patching actuall PALcode), and
handle it in kernel space?



The problem with __builtin___clear_cache is that it defaults to noop,
and in code like axiom, which checks if __builtin___clear_cache is
present, it is assumes that presence is equal support. I think
__builtin___clear_cache should at least default to compile-time warning
about its unimplemented status. (for architectures which doesn't need
cache invalidation like x86 or amd64, do not emit such warning, if any
other architecture need it too, it can always just make sure
CACHE_INSN_CLEAR is defined as constant macro). If you do not want to do
this (change defaults), then probably better make sure that usage of
__builtin___clear_cache on such architectures like Itanium, PowerPC or
Alpha, actually makes error at compile-time.



As of brokness again, it is of course better to add support (especially
that it is not hard to) for __builting_clear_cache, but there always
will be new archs, and changing defaults will be better for future
archs, and this which are less maintained. It took me few hours to find
problem in axiom add gcc. Stoping compilation with error on such
architectures will be the best thing.

This will make to use __builtin_clear_cache reliabile for clearing
cache, and will make possible to clear code of similar programs to gcl
(mainly various compilers, JITs and interpreters). Isn't this what
compiler is for? To abstract machine releated differences, like cache
handling or vector manipulations? Similar atomic instructions should be
available in gcc in abstract way. Currently, software which for example
adds 2 64-bit numbers in atomic way in memory, needs to steal code from
kernel or other libraries. This is compiler job, and should sit in
compiler as buil

gcc-4.5-20120503 is now available

2012-05-03 Thread gccadmin
Snapshot gcc-4.5-20120503 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20120503/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch 
revision 187123

You'll find:

 gcc-4.5-20120503.tar.bz2 Complete GCC

  MD5=5e03e53436893d24f7bd29ce3dd3f7c4
  SHA1=e60271c72fa3d4684078f92b388ee73d0f67240a

Diffs from 4.5-20120426 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.