Re: [RFC] Slightly fix up vgather* patterns

2011-10-10 Thread Jakub Jelinek
On Sun, Oct 09, 2011 at 12:55:40PM +0200, Uros Bizjak wrote:
> About memory - can't we use (mem:BLK (match_operand:P
> "register_operand" "r")) here?

I don't think it is sufficient.
Consider e.g. _mm_i32gather_pd (NULL, index, 1); where index
is initialized from loading consecutive (32-bit) double * pointers from an
array.  Then it loads for elt 0 through 1 *(double *)(0 + index[elt]).
Describing this as mem:BLK (register initialized to 0) is wrong.
But even with non-zero base, say if base is a pointer pointing into
a middle of some array and some offsets are positive and some negative
using mem:BLK of the base would just mean non-negative offsets from it.

OT, seems avx2intrin.h is weird for many of the gather patterns:
E.g. the _mm_i32gather_pd inline uses:
  __v2df src = _mm_setzero_pd ();
  __v2df mask = _mm_cmpeq_pd (src, src);
which will work and set mask to all ones floating point vector, but
e.g. _mm256_i32gather_pd uses
  __v4df src = _mm256_setzero_pd ();
  __v4df mask = _mm256_set1_pd((double)(long long int) -1);
which I believe will create a { -1.0, -1.0, -1.0, -1.0 }; vector.
Either it could be
  __v4df src = _mm256_setzero_pd ();
  __v4df mask = _mm256_cmp_pd (src, src, _CMP_EQ_OQ);
or it would need to be something like
#define __MM_ALL_ONES_DOUBLE \
  (__extension__ ((union { long long int __l; double __d; }) { __l: -1 }).__d)
  __v4df src = _mm256_setzero_pd ();
  __v4df mask = _mm256_set1_pd (__MM_ALL_ONES_DOUBLE);

Though, only the most significant bit of the mask is used by the instruction
and thus perhaps -1.0 is useful too.  Though, it is certainly more
expensive than the _mm256_cmp_pd alternative (needs to be loaded from
memory).  BTW, the expander probably needs some help to emit code for
the second case for the third case, it loads it from memory too.

> BTW: No need to use %c modifier:
> 
> /* Meaning of CODE:
>L,W,B,Q,S,T -- print the opcode suffix for specified size of operand.
>C -- print opcode suffix for set/cmov insn.
>c -- like C, but print reversed condition
>...
> */

Ok.

Jakub


Re: [Patch, Fortran] Fix PR 50564

2011-10-10 Thread Tobias Burnus
Hi Thomas,

> the attached patch fixes the PR by removing common function elimination
> in FORALL statements.

I think that part is OK. However, you do more - you also avoid the handling
for DO CONCURRENT.

However, I do not see why it could fail for DO CONCURRENT or rather: I do
not see a case where it would be possible to fail for DO CONCURRENT but
still work for a "normal" DO loop.

In case of the FORALL "assignment", the RHS needs to be evaluated before
the LHS and thus the compiler needs to insert temporaries if needed. That
will potentially also lead to wrong code if there is an interdependence
and one adds some other temporary variable. Additionally, essentially only
assignment is allowed in the FORALL body. The idea was to allow parallel
execution but due to the requirement to evalute first the RHS, parallel
executions are in the general case not possible and the compiler needs to
insert temporaries if it is unsure about the interdependence.

That different to the DO CONTIGUOUS construct. Here, one has essentially
a normal loop, where one may, for instance, allocate variables, do I/O etc.
The only thing *the programmer* guarantees is that the result remains the
same independent which order the loop is executed. (There are a few
constrains to avoid some obvious constraint violations.) Hence, the
compiler could parallelize the loop without doing further analysis.
(That's currently not done - for the ME it is a normal loop. One could
either give hints to the ME - for which one either needs to code something
or wait/port some patch from the CILK++ branch. Or one could insert OMP
statements.)

However, in terms of the front-end optimization, I do not see how that's
different to a normal loop: Also a normal loop can be parallelized using
OpenMP or different means and thus run out of order and the statements
which are allowed in a loop are essentially the same.

In conclusion: I am fine with the FORALL part, but not with the
DO CONCURRENT part.

Tobias

--- frontend-passes.c   (Revision 179709)
+++ frontend-passes.c
...
case EXEC_FORALL:
case EXEC_DO_CONCURRENT:
  {
gfc_forall_iterator *fa;
for (fa = co->ext.forall_iterator; fa; fa = fa->next)
  {
WALK_SUBEXPR (fa->var);
WALK_SUBEXPR (fa->start);
WALK_SUBEXPR (fa->end);
WALK_SUBEXPR (fa->stride);
  }
+   forall_level ++;
break;
  }




Re: [wwwdocs] add libstdc++/1773 change to gcc-4.7/changes.html

2011-10-10 Thread Jonathan Wakely
On 10 October 2011 02:10, Gerald Pfeifer wrote:
> On Tue, 4 Oct 2011, Jonathan Wakely wrote:
>> I've committed this, which documents the fix for
>> http://gcc.gnu.org/PR1773 in gcc-4.7/changes.html, and also replaces
>> some > characters with the > entity.
>
> Interesting that the latter was not caught by the validator?

I think it might be valid (at least xmllint says so in --html mode) so
that part was probably unnecessary.


[Backport 4.6,AVR,Comitted] Fix PR50652

2011-10-10 Thread Georg-Johann Lay
Backported to 4.6 branch:

http://gcc.gnu.org/viewcvs?view=revision&revision=179738

Johann

PR target/50652
Backport from Mainline r179737.
* config/avr/avr-devices.c (avr_mcu_types): Set
.data_section_start of atmega164a to 0x100.



Re: Avoid double mangling at WHOPR

2011-10-10 Thread Richard Guenther
On Sun, 9 Oct 2011, Jan Hubicka wrote:

> Hi,
> whopr currently produce local_static.1234.43124 type symbols. This is because
> everything gets mangled at WPA time and then again at ltrans time.  This 
> simply
> avoids the second mangling. This save some space & makes WHOPR/non_WHOPR 
> symbol
> tables comparable more directly.
> 
> Bootstrapped/regtested x86_64-linux, also tested with Mozilla LTO, OK?

Hmmm.  I'd really like to defer mangling to LTRANS at one point, as
we will know if there will be conflicts of local symbols at that point
(another point would be the partitioning code).  So this patch goes
backward ... (ideally a first step would move it to symbol resolution
time, another place where we can check for conflicts, and then only
apply it a LTRANs stage).

Richard.

> Honza
> 
>   * lto.c (lto_register_var_decl_in_symtab,
>   lto_register_function_decl_in_symtab): Do not mangle at ltrans time.
>   * lto-lang.c (lto_set_decl_assembler_name): Likewise.
> Index: lto/lto.c
> ===
> --- lto/lto.c (revision 179664)
> +++ lto/lto.c (working copy)
> @@ -604,7 +604,7 @@ lto_register_var_decl_in_symtab (struct
>  
>/* Variable has file scope, not local. Need to ensure static variables
>   between different files don't clash unexpectedly.  */
> -  if (!TREE_PUBLIC (decl)
> +  if (!TREE_PUBLIC (decl) && !flag_ltrans
>&& !((context = decl_function_context (decl))
>  && auto_var_in_fn_p (decl, context)))
>  {
> @@ -646,7 +646,7 @@ lto_register_function_decl_in_symtab (st
>  {
>/* Need to ensure static entities between different files
>   don't clash unexpectedly.  */
> -  if (!TREE_PUBLIC (decl))
> +  if (!TREE_PUBLIC (decl) && !flag_ltrans)
>  {
>/* We must not use the DECL_ASSEMBLER_NAME macro here, as it
>may set the assembler name where it was previously empty.  */
> Index: lto/lto-lang.c
> ===
> --- lto/lto-lang.c(revision 179664)
> +++ lto/lto-lang.c(working copy)
> @@ -954,7 +954,7 @@ lto_set_decl_assembler_name (tree decl)
>   TREE_PUBLIC, to avoid conflicts between individual files.  */
>tree id;
>  
> -  if (TREE_PUBLIC (decl))
> +  if (TREE_PUBLIC (decl) || flag_ltrans)
>  id = targetm.mangle_decl_assembler_name (decl, DECL_NAME (decl));
>else
>  {
> 
> 

-- 
Richard Guenther 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

[PATCH] Mark static const strings as read-only.

2011-10-10 Thread Tom de Vries
Eric,

without this patch, the mips code for the memcpy test-case in the patch looks
like this. The loadbytes (lbu) and the storebytes (sb) are considered to alias,
so the scheduler cannot rearrange them in a more optimal order.
...
lui $2,%hi($LC0)
lbu $3,%lo($LC0)($2)
addiu   $2,$2,%lo($LC0)
sb  $3,0($4)
lbu $3,1($2)
nop
sb  $3,1($4)
lbu $2,2($2)
j   $31
sb  $2,2($4)
...

With the patch, the loadbytes and the storebytes are considered not to alias
because the static const string src argument is marked as read-only, and the
scheduler rearranges them.
...
lui $3,%hi($LC0)
addiu   $2,$3,%lo($LC0)
lbu $5,%lo($LC0)($3)
lbu $3,1($2)
lbu $2,2($2)
sb  $5,0($4)
sb  $3,1($4)
j   $31
sb  $2,2($4)
...

The patch makes sure that the src_mem computed here in expand_builtin_memcpy is
marked as MEM_READONLY_P:
...
  src_mem = get_memory_rtx (src, len);
...

build and reg-tested on i686, arm, and mips.

OK for trunk?

Thanks,
- Tom

2011-10-10  Tom de Vries  

* gcc/emit-rtl.c (set_mem_attributes_minus_bitpos): Set MEM_READONLY_P
for static const strings.

* gcc/testsuite/gcc.dg/memcpy-3.c: New test.


Index: gcc/emit-rtl.c
===
--- gcc/emit-rtl.c (revision 179662)
+++ gcc/emit-rtl.c (working copy)
@@ -1696,6 +1696,11 @@ set_mem_attributes_minus_bitpos (rtx ref
 	  && !TREE_THIS_VOLATILE (base))
 	MEM_READONLY_P (ref) = 1;
 
+  /* Mark static const strings readonly as well.  */
+  if (base && TREE_CODE (base) == STRING_CST && TREE_READONLY (base)
+	  && TREE_STATIC (base))
+	MEM_READONLY_P (ref) = 1;
+
   /* If this expression uses it's parent's alias set, mark it such
 	 that we won't change it.  */
   if (component_uses_parent_alias_set (t))
Index: gcc/testsuite/gcc.dg/memcpy-4.c
===
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/memcpy-4.c (revision 0)
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand" } */
+
+void
+f1 (char *p)
+{
+  __builtin_memcpy (p, "123", 3);
+}
+
+/* { dg-final { scan-rtl-dump-times "mem/s/u" 3 "expand" { target mips*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */


[Patch,AVR]: Save instruction when squaring HI

2011-10-10 Thread Georg-Johann Lay
This is an obvious mini-optimization if the input operands of mulhi are the
same, one multiply operation can be saved.

Ok for trunk?

Johann

* config/avr/avr.md (*mulhi3_enh): Treat squaring smarter.
Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 179738)
+++ config/avr/avr.md	(working copy)
@@ -1560,13 +1560,11 @@ (define_insn "*mulhi3_enh"
 	(mult:HI (match_operand:HI 1 "register_operand" "r")
 		 (match_operand:HI 2 "register_operand" "r")))]
   "AVR_HAVE_MUL"
-  "mul %A1,%A2
-	movw %0,r0
-	mul %A1,%B2
-	add %B0,r0
-	mul %B1,%A2
-	add %B0,r0
-	clr r1"
+  {
+return REGNO (operands[1]) == REGNO (operands[2])
+   ? "mul %A1,%A1\;movw %0,r0\;mul %A1,%B1\;add %B0,r0\;add %B0,r0\;clr r1"
+   : "mul %A1,%A2\;movw %0,r0\;mul %A1,%B2\;add %B0,r0\;mul %B1,%A2\;add %B0,r0\;clr r1";
+  }
   [(set_attr "length" "7")
(set_attr "cc" "clobber")])
 


Re: [committed] More e-mail address fixes in ChangeLogs: dead e-mail address

2011-10-10 Thread Richard Guenther
On Sun, Oct 9, 2011 at 8:07 PM, Mikael Morin  wrote:
> On Sunday 09 October 2011 19:30:20 Richard Guenther wrote:
>> We usually don't retroactively change ChangeLogs this way.
> On the other hand, ChangeLogs usually don't need to be changed.
>
>> Please refrain from making further changes like this.
> OK, I will. Is there a reason for such a policy?

e-mail addresses in changelogs show contributor association information
(and they did so in your case).  It also makes ChangeLog diffs hard to
read when you are bisecting and looking at a ChangeLog diff of a
revision range.

Richard.

> Mikael
>


Re: [PATCH 3/5] On a Linux kernel ask explicitely for a huge page in ggc

2011-10-10 Thread Richard Guenther
On Sun, Oct 9, 2011 at 9:55 PM, Andi Kleen  wrote:
> From: Andi Kleen 
>
> Benchmarks show slightly faster build times on a kernel
> build, near the measurement error unfortunately.
>
> This will only work with a recent glibc that defines MADV_HUGEPAGE.

Will partial unmaps fail or split the page?

Richard.

> 2011-10-08   Andi Kleen 
>
>        * ggc-page.c (alloc_page): Add madvise for hugepage
> ---
>  gcc/ggc-page.c |    5 +
>  1 files changed, 5 insertions(+), 0 deletions(-)
>
> diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
> index 1f52b56..6e08cda 100644
> --- a/gcc/ggc-page.c
> +++ b/gcc/ggc-page.c
> @@ -779,6 +779,11 @@ alloc_page (unsigned order)
>
>       page = alloc_anon (NULL, G.pagesize * GGC_QUIRE_SIZE);
>
> +#if defined(HAVE_MADVISE) && defined(MADV_HUGEPAGE)
> +      /* Kernel, I would like to have hugepages, please. */
> +      madvise(page, G.pagesize * GGC_QUIRE_SIZE, MADV_HUGEPAGE);
> +#endif
> +
>       /* This loop counts down so that the chain will be in ascending
>         memory order.  */
>       for (i = GGC_QUIRE_SIZE - 1; i >= 1; i--)
> --
> 1.7.5.4
>
>


Re: [PATCH 2/5] Increase the GGC quite size to 2MB

2011-10-10 Thread Richard Guenther
On Sun, Oct 9, 2011 at 9:55 PM, Andi Kleen  wrote:
> From: Andi Kleen 
>
> Using 2MB allows modern kernels to use 2MB huge pages on x86.

Ok.

Thanks,
Richard.

> gcc/:
>
> 2011-10-08   Andi Kleen 
>
>        * ggc-page.c (GGC_QUIRE_SIZE): Increase to 512
> ---
>  gcc/ggc-page.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
> index b0b3b3f..1f52b56 100644
> --- a/gcc/ggc-page.c
> +++ b/gcc/ggc-page.c
> @@ -469,7 +469,7 @@ static struct globals
>    can override this by defining GGC_QUIRE_SIZE explicitly.  */
>  #ifndef GGC_QUIRE_SIZE
>  # ifdef USING_MMAP
> -#  define GGC_QUIRE_SIZE 256
> +#  define GGC_QUIRE_SIZE 512   /* 2MB for 4K pages */
>  # else
>  #  define GGC_QUIRE_SIZE 16
>  # endif
> --
> 1.7.5.4
>
>


Re: [PATCH 5/5] Add error checking to lto_section_read

2011-10-10 Thread Richard Guenther
On Sun, Oct 9, 2011 at 9:55 PM, Andi Kleen  wrote:
> From: Andi Kleen 
>
> Various callers of lto_section_read segfault on a NULL return
> when the mmap fails. Add some internal_errors to give a better
> message to the user.

Hm, shouldn't these be fatal_error () then?  Ok with that change.

Thanks,
Richard.

> gcc/lto/:
>
> 2011-10-09   Andi Kleen 
>
>        * lto.c (lto_section_read): Call internal_error on IO or mmap errors.
> ---
>  gcc/lto/lto.c |   11 +--
>  1 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> index a77eeb4..dc16db4 100644
> --- a/gcc/lto/lto.c
> +++ b/gcc/lto/lto.c
> @@ -1237,7 +1237,10 @@ lto_read_section_data (struct lto_file_decl_data 
> *file_data,
>     {
>       fd = open (file_data->file_name, O_RDONLY|O_BINARY);
>       if (fd == -1)
> -       return NULL;
> +        {
> +         internal_error ("Cannot open %s", file_data->file_name);
> +         return NULL;
> +        }
>       fd_name = xstrdup (file_data->file_name);
>     }
>
> @@ -1255,7 +1258,10 @@ lto_read_section_data (struct lto_file_decl_data 
> *file_data,
>   result = (char *) mmap (NULL, computed_len, PROT_READ, MAP_PRIVATE,
>                          fd, computed_offset);
>   if (result == MAP_FAILED)
> -    return NULL;
> +    {
> +      internal_error ("Cannot map %s", file_data->file_name);
> +      return NULL;
> +    }
>
>   return result + diff;
>  #else
> @@ -1264,6 +1270,7 @@ lto_read_section_data (struct lto_file_decl_data 
> *file_data,
>       || read (fd, result, len) != (ssize_t) len)
>     {
>       free (result);
> +      internal_error ("Cannot read %s", file_data->file_name);
>       result = NULL;
>     }
>  #ifdef __MINGW32__
> --
> 1.7.5.4
>
>


Re: [PATCH 4/5] Add a freeing threshold for the garbage collector.

2011-10-10 Thread Richard Guenther
On Sun, Oct 9, 2011 at 9:55 PM, Andi Kleen  wrote:
> From: Andi Kleen 
>
> Add a threshold to avoid freeing pages back too early to the OS.
> This avoid virtual memory map fragmentation.
>
> Based on a idea from Honza

Less than 20% looks high.  Shouldn't ggc-free-min be enough to
avoid fragmentation?

This will hide gc bugs with always-collect (ggc-checking), so
the parameter(s) need to be adjusted for that case to always
give pages back.  The current values should probably be printed
where the two existing ones are printed as well (with -v).

Thanks,
Richard.

> ggc/doc/:
>
> 2011-10-08   Andi Kleen 
>
>        PR other/50636
>        * invoke.texi (ggc-free-threshold, ggc-free-min): Add.
>
> ggc/:
>
> 2011-10-08   Andi Kleen 
>
>        PR other/50636
>        * ggc-page.c (ggc_collect): Add free threshold.
>        * params.def (GGC_FREE_THRESHOLD, GGC_FREE_MIN): Add.
> ---
>  gcc/doc/invoke.texi |   11 +++
>  gcc/ggc-page.c      |   13 +
>  gcc/params.def      |   10 ++
>  3 files changed, 30 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index ef7ac68..6557f66 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -8837,6 +8837,17 @@ very large effectively disables garbage collection.  
> Setting this
>  parameter and @option{ggc-min-expand} to zero causes a full collection
>  to occur at every opportunity.
>
> +@item ggc-free-threshold
> +
> +Only free memory back to the system when it would free more than this
> +many percent of the total allocated memory. Default is 20 percent.
> +This avoids memory fragmentation.
> +
> +@item ggc-free-min
> +
> +Only free memory back to the system when it would free more than this.
> +Unit is kilobytes.
> +
>  @item max-reload-search-insns
>  The maximum number of instruction reload should look backward for equivalent
>  register.  Increasing values mean more aggressive optimization, making the
> diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
> index 6e08cda..cd1c41a 100644
> --- a/gcc/ggc-page.c
> +++ b/gcc/ggc-page.c
> @@ -1968,14 +1968,19 @@ ggc_collect (void)
>   if (GGC_DEBUG_LEVEL >= 2)
>     fprintf (G.debug_file, "BEGIN COLLECTING\n");
>
> +  /* Release the pages we freed the last time we collected, but didn't
> +     reuse in the interim.  But only do this if this would free a
> +     reasonable number of pages. Otherwise hold on to them
> +     to avoid virtual memory fragmentation. */
> +  if (G.bytes_mapped - G.allocated >=
> +       (PARAM_VALUE (GGC_FREE_THRESHOLD) / 100.0) * G.bytes_mapped &&
> +      G.bytes_mapped - G.allocated >= (size_t)PARAM_VALUE (GGC_FREE_MIN) * 
> 1024)
> +    release_pages ();
> +
>   /* Zero the total allocated bytes.  This will be recalculated in the
>      sweep phase.  */
>   G.allocated = 0;
>
> -  /* Release the pages we freed the last time we collected, but didn't
> -     reuse in the interim.  */
> -  release_pages ();
> -
>   /* Indicate that we've seen collections at this context depth.  */
>   G.context_depth_collections = ((unsigned long)1 << (G.context_depth + 1)) - 
> 1;
>
> diff --git a/gcc/params.def b/gcc/params.def
> index 5e49c48..ca28715 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -561,6 +561,16 @@ DEFPARAM(GGC_MIN_HEAPSIZE,
>  #undef GGC_MIN_EXPAND_DEFAULT
>  #undef GGC_MIN_HEAPSIZE_DEFAULT
>
> +DEFPARAM(GGC_FREE_THRESHOLD,
> +       "ggc-free-threshold",
> +       "Dont free memory back to system less this percent of the total 
> memory",
> +       20, 0, 100)
> +
> +DEFPARAM(GGC_FREE_MIN,
> +        "ggc-free-min",
> +        "Dont free less memory than this back to the system, in kilobytes",
> +        8 * 1024, 0, 0)
> +
>  DEFPARAM(PARAM_MAX_RELOAD_SEARCH_INSNS,
>         "max-reload-search-insns",
>         "The maximum number of instructions to search backward when looking 
> for equivalent reload",
> --
> 1.7.5.4
>
>


Re: [PATCH 1/5] Use MADV_DONTNEED for freeing in garbage collector

2011-10-10 Thread Richard Guenther
On Sun, Oct 9, 2011 at 9:55 PM, Andi Kleen  wrote:
> From: Andi Kleen 
>
> Use the Linux MADV_DONTNEED call to unmap free pages in the garbage
> collector.Then keep the unmapped pages in the free list. This avoid
> excessive memory fragmentation on large LTO bulds, which can lead
> to gcc bumping into the Linux vm_max_map limit per process.
>
> Based on a idea from Jakub.

Shouldn't we prefer still "mapped" pages when allocating?  Thus, keep
the freepages list "sorted"?

With the new params to call release_pages less, how does this
interact with using MADV_DONTNEED?  The only reason to delay
MADV_DONTNEED is to avoid splitting huge-pages?  Which would
mean that we should rather be better at controlling where we allocate
from from the free-list?

Richard.

> gcc/:
>
> 2011-10-08   Andi Kleen 
>
>        PR other/50636
>        * config.in, configure: Regenerate.
>        * configure.ac (madvise): Add to AC_CHECK_FUNCS.
>        * ggc-page.c (USING_MADVISE): Add.
>        (page_entry): Add unmapped field.
>        (alloc_page): Check for unmapped pages.
>        (release_pages): Add USING_MADVISE branch.
> ---
>  gcc/config.in    |    6 ++
>  gcc/configure    |    2 +-
>  gcc/configure.ac |    2 +-
>  gcc/ggc-page.c   |   48 +++-
>  4 files changed, 55 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/config.in b/gcc/config.in
> index f2847d8..e8148b6 100644
> --- a/gcc/config.in
> +++ b/gcc/config.in
> @@ -1276,6 +1276,12 @@
>  #endif
>
>
> +/* Define to 1 if you have the `madvise' function. */
> +#ifndef USED_FOR_TARGET
> +#undef HAVE_MADVISE
> +#endif
> +
> +
>  /* Define to 1 if you have the  header file. */
>  #ifndef USED_FOR_TARGET
>  #undef HAVE_MALLOC_H
> diff --git a/gcc/configure b/gcc/configure
> index cb55dda..4a54adf 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -9001,7 +9001,7 @@ fi
>  for ac_func in times clock kill getrlimit setrlimit atoll atoq \
>        sysconf strsignal getrusage nl_langinfo \
>        gettimeofday mbstowcs wcswidth mmap setlocale \
> -       clearerr_unlocked feof_unlocked   ferror_unlocked fflush_unlocked 
> fgetc_unlocked fgets_unlocked   fileno_unlocked fprintf_unlocked 
> fputc_unlocked fputs_unlocked   fread_unlocked fwrite_unlocked 
> getchar_unlocked getc_unlocked   putchar_unlocked putc_unlocked
> +       clearerr_unlocked feof_unlocked   ferror_unlocked fflush_unlocked 
> fgetc_unlocked fgets_unlocked   fileno_unlocked fprintf_unlocked 
> fputc_unlocked fputs_unlocked   fread_unlocked fwrite_unlocked 
> getchar_unlocked getc_unlocked   putchar_unlocked putc_unlocked madvise
>  do :
>   as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
>  ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index a7b94e6..357902e 100644
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -1027,7 +1027,7 @@ define(gcc_UNLOCKED_FUNCS, clearerr_unlocked 
> feof_unlocked dnl
>  AC_CHECK_FUNCS(times clock kill getrlimit setrlimit atoll atoq \
>        sysconf strsignal getrusage nl_langinfo \
>        gettimeofday mbstowcs wcswidth mmap setlocale \
> -       gcc_UNLOCKED_FUNCS)
> +       gcc_UNLOCKED_FUNCS madvise)
>
>  if test x$ac_cv_func_mbstowcs = xyes; then
>   AC_CACHE_CHECK(whether mbstowcs works, gcc_cv_func_mbstowcs_works,
> diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
> index 624f029..b0b3b3f 100644
> --- a/gcc/ggc-page.c
> +++ b/gcc/ggc-page.c
> @@ -50,6 +50,10 @@ along with GCC; see the file COPYING3.  If not see
>  #define USING_MALLOC_PAGE_GROUPS
>  #endif
>
> +#if defined(HAVE_MADVISE) && defined(MADV_DONTNEED)
> +# define USING_MADVISE
> +#endif
> +
>  /* Strategy:
>
>    This garbage-collecting allocator allocates objects on one of a set
> @@ -277,6 +281,9 @@ typedef struct page_entry
>   /* The lg of size of objects allocated from this page.  */
>   unsigned char order;
>
> +  /* Unmapped page? */
> +  bool unmapped;
> +
>   /* A bit vector indicating whether or not objects are in use.  The
>      Nth bit is one if the Nth object on this page is allocated.  This
>      array is dynamically sized.  */
> @@ -740,6 +747,10 @@ alloc_page (unsigned order)
>
>   if (p != NULL)
>     {
> +      if (p->unmapped)
> +        G.bytes_mapped += p->bytes;
> +      p->unmapped = false;
> +
>       /* Recycle the allocated memory from this page ...  */
>       *pp = p->next;
>       page = p->page;
> @@ -956,7 +967,42 @@ free_page (page_entry *entry)
>  static void
>  release_pages (void)
>  {
> -#ifdef USING_MMAP
> +#ifdef USING_MADVISE
> +  page_entry *p, *start_p;
> +  char *start;
> +  size_t len;
> +
> +  for (p = G.free_pages; p; )
> +    {
> +      if (p->unmapped)
> +        {
> +          p = p->next;
> +          continue;
> +        }
> +      start = p->page;
> +      len = p->bytes;
> +      start_p = p;
> +      p = p->next;
> +      while (p && p->page == start + len)
> +        {
> +          len += p->bytes;
> +          p = p->next;
> +        }

Re: [PATCH 3/5] On a Linux kernel ask explicitely for a huge page in ggc

2011-10-10 Thread Jakub Jelinek
On Mon, Oct 10, 2011 at 12:15:14PM +0200, Richard Guenther wrote:
> On Sun, Oct 9, 2011 at 9:55 PM, Andi Kleen  wrote:
> > From: Andi Kleen 
> >
> > Benchmarks show slightly faster build times on a kernel
> > build, near the measurement error unfortunately.
> >
> > This will only work with a recent glibc that defines MADV_HUGEPAGE.
> 
> Will partial unmaps fail or split the page?

I think it will not do anything, because with G.pagesize * GGC_QUIRE_SIZE
being just 2MB (and most likely not 2MB aligned either) it would do
something only if it happens to be 2MB aligned or the following VMA
is also MADV_HUGEPAGE hinted and no pages in the 2MB region have been
paged in yet.
So, either we need to ensure that the address is likely 2MB aligned,
or use on x86_64 even bigger GGC_QUIRE_SIZE (such as 16MB or 32MB) and
then huge pages would be likely used at least for the aligned inner
huge pages in the region.

Additionally, IMHO we shouldn't use this MADV_HUGEPAGE size on all hosts
that define it, that is a useless syscall for hosts which don't support
huge pages.  So should be guarded by some macro defined in host headers.
> > +#if defined(HAVE_MADVISE) && defined(MADV_HUGEPAGE)
> > +      /* Kernel, I would like to have hugepages, please. */
> > +      madvise(page, G.pagesize * GGC_QUIRE_SIZE, MADV_HUGEPAGE);

Please watch formatting, space is missing after madvise.

Jakub


Re: [patch] C6X unwinding/exception handling

2011-10-10 Thread Andrew Haley
On 10/09/2011 12:09 PM, Matthias Klose wrote:
> This did break libobjc and libjava on arm-linux-gnueabi.
>
> libobjc now has an undefined reference to _Unwind_decode_target2, which can be
> avoided with
>
> --- libobjc/exception.c.orig2011-07-21 15:33:57.0 +
> +++ libobjc/exception.c 2011-10-09 10:53:12.554940776 +
> @@ -182,7 +182,7 @@
>_Unwind_Ptr ptr;
>
>ptr = (_Unwind_Ptr) (info->TType - (i * 4));
> -  ptr = _Unwind_decode_target2 (ptr);
> +  ptr = _Unwind_decode_typeinfo_ptr (info->ttype_base, (_Unwind_Word) ptr);
>
>/* NULL ptr means catch-all.  Note that if the class is not found,
>   this will abort the program.  */
>
> libjava fails to build, the same change doesn't work for libjava/exception.cc,
> because the struct lsda_header_info in exception.cc is missing the ttype_base
> member. Any suggestions?

Is this fixable without backing out Paul Brooks' patch?

Andrew.



[Patch,testsuite,AVR]: target-supports.exp: AVR does not support TLS

2011-10-10 Thread Georg-Johann Lay
The auto-detect machinery fails to correctly detect that TLS is not supported
on AVR.  This leads to gross noise of several hundreds of FAILs in testsuite.

This patch explicitly tells that AVR doesn't support TLS.

Lightly tested with testsuite run on avr-unknown-none where tests like
gcc.dg/debug/tls-1.c ran into ICE are UNSUPPORTED now.

Okay for trunk?

testsuite/
* lib/target-supports.exp (check_effective_target_tls): AVR has no
support for TLS.
(check_effective_target_tls_native): Ditto.
(check_effective_target_tls_emulated): Ditto.
(check_effective_target_tls_runtime): Ditto.
Index: testsuite/lib/target-supports.exp
===
--- testsuite/lib/target-supports.exp	(revision 179738)
+++ testsuite/lib/target-supports.exp	(working copy)
@@ -607,6 +607,11 @@ proc add_options_for_tls { flags } {
 # Return 1 if thread local storage (TLS) is supported, 0 otherwise.
 
 proc check_effective_target_tls {} {
+# AVR has no support for TLS.  Auto-detect fails here.
+if { [istarget avr-*-*] } {
+	return 0
+}
+
 return [check_no_compiler_messages tls assembly {
 	__thread int i;
 	int f (void) { return i; }
@@ -623,6 +628,11 @@ proc check_effective_target_tls_native {
 	return 0
 }
 
+# AVR has no support for TLS.  Auto-detect fails here.
+if { [istarget avr-*-*] } {
+	return 0
+}
+
 return [check_no_messages_and_pattern tls_native "!emutls" assembly {
 	__thread int i;
 	int f (void) { return i; }
@@ -639,6 +649,11 @@ proc check_effective_target_tls_emulated
 	return 1
 }
 
+# AVR has no support for TLS.  Auto-detect fails here.
+if { [istarget avr-*-*] } {
+	return 0
+}
+
 return [check_no_messages_and_pattern tls_emulated "emutls" assembly {
 	__thread int i;
 	int f (void) { return i; }
@@ -649,6 +664,11 @@ proc check_effective_target_tls_emulated
 # Return 1 if TLS executables can run correctly, 0 otherwise.
 
 proc check_effective_target_tls_runtime {} {
+# AVR has no support for TLS.  Auto-detect fails here.
+if { [istarget avr-*-*] } {
+	return 0
+}
+
 return [check_runtime tls_runtime {
 	__thread int thr = 0;
 	int main (void) { return thr; }


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-10 Thread Kai Tietz
2011/10/7 Kai Tietz :
> Hello,
>
> this is the updated version with the suggestion
>
> 2011/10/7 Richard Guenther :
>> On Thu, Oct 6, 2011 at 4:25 PM, Kai Tietz  wrote:
>>> +      && ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
>>> +           && TREE_CODE (arg1) != TRUTH_NOT_EXPR
>>> +           && simple_operand_p (arg1))
>>
>> As I said previously simple_operand_p already rejects covers
>> comparisons and TRUTH_NOT_EXPR.  Also arg1 had better
>> TREE_SIDE_EFFECTS set if the comparison might trap, as
>> it might just be hidden in something more complicated - so
>> the simple check isn't enough anyway (and if simple_operand_p
>> would cover it, the check would be better placed there).
>
> I reworked simple_operand_p so that it does this special-casing and 
> additionally
> also checks for trapping.
>
>>> +      if (TREE_CODE (arg0) == code
>>> +          && !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
>>> +          && simple_operand_p (TREE_OPERAND (arg0, 1)))
>>> +       {
>>> +         tem = build2_loc (loc,
>>> +                           (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
>>> +                                                     : TRUTH_OR_EXPR),
>>> +                           type, TREE_OPERAND (arg0, 1), arg1);
>>> +         return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0),
>>> +                            tem);
>>
>> All trees should be folded, don't use plain build without a good reason.
>
> Ok, done
>
>>> +       }
>>> +      /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y
>>> +        are simple operands and have no side-effects.  */
>>> +      if (simple_operand_p (arg0)
>>> +          && !TREE_SIDE_EFFECTS (arg0))
>>
>> Again, the checks you do for arg0 do not match those for arg1.  OTOH
>> it doesn't matter whether arg0 is simple or not or has side-effects or
>> not for this transformation, so why check it at all?
>
> It is required.  For left-hand operand, if it isn't a logical
> and/or/xor, we need to check for side-effects (and for trapping).  I
> see that calling of simple_operand_p is wrong here, as it rejects too
> much.  Nevertheless the check for side-effects is necessary for having
> valid sequence-points.  Without that checking a simple test

So said, it is even required to use for right-hand and left-hand side
of arguments, if one of them have side-effects or isn't simple.  Means
the check in my patch should use for

> +     else if (TREE_CODE (arg0) != TRUTH_AND_EXPR
> +             && TREE_CODE (arg0) != TRUTH_OR_EXPR
> +             && TREE_CODE (arg0) != TRUTH_ANDIF_EXPR
> +             && TREE_CODE (arg0) != TRUTH_ORIF_EXPR
> +             && TREE_CODE (arg0) != TRUTH_XOR_EXPR
> +             /* Needed for sequence points and trappings, or side-effects.  
> */
> +             && !TREE_SIDE_EFFECTS (arg0)
> +             && !tree_could_trap_p (arg0))
> +       return fold_build2_loc (loc, ncode, type, arg0, arg1);

instead if (!TREE_SIDE_EFFECTS (arg0) && simple_operand_p (arg0)) 
instead.

The cause for this are the consitancies of sequences in tree.  I
noticed that by tests in gcc.dg/tree-ssa about builitin_expect.

for example we have

extern int foo (void); /* foo modifies gbl1 */
int gbl1 = 0;

int foo (int ns1)
{
  if (ns1 && foo () && gbl1)
return 1;
  return 0;
}

so chain of trees has to look like this:
(ANDIF (ns1 (ANDIF foo () gbl1))

but if we don't check here for side-effects for left-hand chaining
operand, then we end up with
(AND ns1 (ANDIF foo () gbl1))

As AND and has associative property, tree says that right-hand and
left-hand are exchangable, which is obviously wrong.

Cheers,
Kai


Re: [Patch,testsuite,AVR]: target-supports.exp: AVR does not support TLS

2011-10-10 Thread Rainer Orth
Georg-Johann Lay  writes:

> The auto-detect machinery fails to correctly detect that TLS is not supported
> on AVR.  This leads to gross noise of several hundreds of FAILs in testsuite.
>
> This patch explicitly tells that AVR doesn't support TLS.
>
> Lightly tested with testsuite run on avr-unknown-none where tests like
> gcc.dg/debug/tls-1.c ran into ICE are UNSUPPORTED now.

This seems wrong: all targets not supporting native TLS fall back to
emutls and pass the tests just fine.  You should investigate what's
happening on AVR instead of papering over the problem.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-10 Thread Richard Guenther
On Mon, Oct 10, 2011 at 12:35 PM, Kai Tietz  wrote:
> 2011/10/7 Kai Tietz :
>> Hello,
>>
>> this is the updated version with the suggestion
>>
>> 2011/10/7 Richard Guenther :
>>> On Thu, Oct 6, 2011 at 4:25 PM, Kai Tietz  wrote:
 +      && ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
 +           && TREE_CODE (arg1) != TRUTH_NOT_EXPR
 +           && simple_operand_p (arg1))
>>>
>>> As I said previously simple_operand_p already rejects covers
>>> comparisons and TRUTH_NOT_EXPR.  Also arg1 had better
>>> TREE_SIDE_EFFECTS set if the comparison might trap, as
>>> it might just be hidden in something more complicated - so
>>> the simple check isn't enough anyway (and if simple_operand_p
>>> would cover it, the check would be better placed there).
>>
>> I reworked simple_operand_p so that it does this special-casing and 
>> additionally
>> also checks for trapping.
>>
 +      if (TREE_CODE (arg0) == code
 +          && !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
 +          && simple_operand_p (TREE_OPERAND (arg0, 1)))
 +       {
 +         tem = build2_loc (loc,
 +                           (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
 +                                                     : TRUTH_OR_EXPR),
 +                           type, TREE_OPERAND (arg0, 1), arg1);
 +         return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0),
 +                            tem);
>>>
>>> All trees should be folded, don't use plain build without a good reason.
>>
>> Ok, done
>>
 +       }
 +      /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y
 +        are simple operands and have no side-effects.  */
 +      if (simple_operand_p (arg0)
 +          && !TREE_SIDE_EFFECTS (arg0))
>>>
>>> Again, the checks you do for arg0 do not match those for arg1.  OTOH
>>> it doesn't matter whether arg0 is simple or not or has side-effects or
>>> not for this transformation, so why check it at all?
>>
>> It is required.  For left-hand operand, if it isn't a logical
>> and/or/xor, we need to check for side-effects (and for trapping).  I
>> see that calling of simple_operand_p is wrong here, as it rejects too
>> much.  Nevertheless the check for side-effects is necessary for having
>> valid sequence-points.  Without that checking a simple test
>
> So said, it is even required to use for right-hand and left-hand side
> of arguments, if one of them have side-effects or isn't simple.  Means
> the check in my patch should use for
>
>> +     else if (TREE_CODE (arg0) != TRUTH_AND_EXPR
>> +             && TREE_CODE (arg0) != TRUTH_OR_EXPR
>> +             && TREE_CODE (arg0) != TRUTH_ANDIF_EXPR
>> +             && TREE_CODE (arg0) != TRUTH_ORIF_EXPR
>> +             && TREE_CODE (arg0) != TRUTH_XOR_EXPR
>> +             /* Needed for sequence points and trappings, or side-effects.  
>> */
>> +             && !TREE_SIDE_EFFECTS (arg0)
>> +             && !tree_could_trap_p (arg0))
>> +       return fold_build2_loc (loc, ncode, type, arg0, arg1);
>
> instead if (!TREE_SIDE_EFFECTS (arg0) && simple_operand_p (arg0)) 
> instead.
>
> The cause for this are the consitancies of sequences in tree.  I
> noticed that by tests in gcc.dg/tree-ssa about builitin_expect.
>
> for example we have
>
> extern int foo (void); /* foo modifies gbl1 */
> int gbl1 = 0;
>
> int foo (int ns1)
> {
>  if (ns1 && foo () && gbl1)
>    return 1;
>  return 0;
> }
>
> so chain of trees has to look like this:
> (ANDIF (ns1 (ANDIF foo () gbl1))
>
> but if we don't check here for side-effects for left-hand chaining
> operand, then we end up with
> (AND ns1 (ANDIF foo () gbl1))

No we don't, as the right-hand (ANDIF foo () glbl1) has side-effects.

> As AND and has associative property, tree says that right-hand and
> left-hand are exchangable, which is obviously wrong.

The poitn is that if the right-hand does not have side-effects it doesn't
matter if we execute it before the left-hand (independent on whether
that has side-effects or not).

Richard.

> Cheers,
> Kai
>


Re: [Patch,testsuite,AVR]: target-supports.exp: AVR does not support TLS

2011-10-10 Thread Georg-Johann Lay
Rainer Orth schrieb:
> Georg-Johann Lay  writes:
> 
>> The auto-detect machinery fails to correctly detect that TLS is not supported
>> on AVR.  This leads to gross noise of several hundreds of FAILs in testsuite.
>>
>> This patch explicitly tells that AVR doesn't support TLS.
>>
>> Lightly tested with testsuite run on avr-unknown-none where tests like
>> gcc.dg/debug/tls-1.c ran into ICE are UNSUPPORTED now.
> 
> This seems wrong: all targets not supporting native TLS fall back to
> emutls and pass the tests just fine.  You should investigate what's
> happening on AVR instead of papering over the problem.
> 
>   Rainer

For example, after updating trunk to 179738,




(gdb) set args  -fpreprocessed tls-1.i -quiet -dumpbase tls-1.c -mmcu=atmega128
-auxbase tls-1 -gdwarf-2 -O3 -O2 -version -o tls-1.s
(gdb) cd ~/test
(gdb) r
GNU C (GCC) version 4.7.0 20111010 (experimental) (avr)
compiled by GNU C version 4.3.2 [gcc-4_3-branch revision 141291], GMP
version 5.0.1, MPFR version 3.0.0-p8, MPC version 0.8.2
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU C (GCC) version 4.7.0 20111010 (experimental) (avr)
compiled by GNU C version 4.3.2 [gcc-4_3-branch revision 141291], GMP 
version
5.0.1, MPFR version 3.0.0-p8, MPC version 0.8.2
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 4e4900df2fe6cd3a5bd440dbb3a30bf2

Program received signal SIGSEGV, Segmentation fault.
0x0864e398 in set_is_used (var=0xb7dd3600) at
../../../gcc.gnu.org/trunk/gcc/tree-flow-inline.h:562
(gdb) bt
#0  0x0864e398 in set_is_used (var=0xb7dd3600) at
../../../gcc.gnu.org/trunk/gcc/tree-flow-inline.h:562
#1  0x0864e256 in mark_all_vars_used_1 (tp=0xb7dc97c4,
walk_subtrees=0xbfffe004, data=0x0) at
../../../gcc.gnu.org/trunk/gcc/tree-ssa-live.c:379
#2  0x087c69b6 in walk_tree_1 ()
#3  0x0864f37c in mark_all_vars_used (expr_p=0xb7dd24e0, data=0x0) at
../../../gcc.gnu.org/trunk/gcc/tree-ssa-live.c:595
#4  0x0864fd71 in remove_unused_locals () at
../../../gcc.gnu.org/trunk/gcc/tree-ssa-live.c:727
#5  0x08492d5d in execute_function_todo (data=0x8000) at
../../../gcc.gnu.org/trunk/gcc/passes.c:1695
#6  0x08492e6c in do_per_function (callback=0x8492bf0 ,
data=0x8000) at ../../../gcc.gnu.org/trunk/gcc/passes.c:1548
#7  0x08492fad in execute_todo (flags=32768) at
../../../gcc.gnu.org/trunk/gcc/passes.c:1741
#8  0x08493e9a in execute_one_pass (pass=0x8b98560) at
../../../gcc.gnu.org/trunk/gcc/passes.c:2087
#9  0x0849419d in execute_pass_list (pass=0x8b98560) at
../../../gcc.gnu.org/trunk/gcc/passes.c:2119
#10 0x085d3461 in tree_rest_of_compilation (fndecl=0xb7dca500) at
../../../gcc.gnu.org/trunk/gcc/tree-optimize.c:420
#11 0x08202f4a in cgraph_expand_function (node=0xb7dcb09c) at
../../../gcc.gnu.org/trunk/gcc/cgraphunit.c:1805
#12 0x08203f61 in cgraph_optimize () at
../../../gcc.gnu.org/trunk/gcc/cgraphunit.c:1864
#13 0x08205325 in cgraph_finalize_compilation_unit () at
../../../gcc.gnu.org/trunk/gcc/cgraphunit.c:1312
#14 0x080bb5b8 in c_write_global_declarations () at
../../../gcc.gnu.org/trunk/gcc/c-decl.c:9936
#15 0x08565cce in toplev_main (argc=15, argv=0xbfffe5d4) at
../../../gcc.gnu.org/trunk/gcc/toplev.c:581
#16 0x0816cc02 in main (argc=-1210281312, argv=0xb7dd1070) at
../../../gcc.gnu.org/trunk/gcc/main.c:36
(gdb) frame 1
#1  0x0864e256 in mark_all_vars_used_1 (tp=0xb7dc97c4,
walk_subtrees=0xbfffe004, data=0x0) at
../../../gcc.gnu.org/trunk/gcc/tree-ssa-live.c:379
(gdb) p t
$1 = (tree) 0xb7dd3600
(gdb) pt
 
unit size 
align 8 symtab 0 alias set -1 canonical type 0xb7dd35a0
fields 
unsigned QI file (null) line 0 col 0
size 
unit size 
align 8 offset_align 8
offset 
bit offset  context  chain >
pointer_to_this >
addressable used static ignored BLK file
/mnt/nfs/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.dg/debug/tls-1.c
line 7 col 21 size  unit size 
align 8 context  initial
>
(gdb) frame 0
#0  0x0864e398 in set_is_used (var=0xb7dd3600) at
../../../gcc.gnu.org/trunk/gcc/tree-flow-inline.h:562
(gdb) p ann
$2 = (var_ann_t) 0x0
(gdb)

So there is a NULL pointer dereference.


Re: [PATCH 1/5] Use MADV_DONTNEED for freeing in garbage collector

2011-10-10 Thread Jakub Jelinek
On Mon, Oct 10, 2011 at 12:25:15PM +0200, Richard Guenther wrote:
> On Sun, Oct 9, 2011 at 9:55 PM, Andi Kleen  wrote:
> > From: Andi Kleen 
> >
> > Use the Linux MADV_DONTNEED call to unmap free pages in the garbage
> > collector.Then keep the unmapped pages in the free list. This avoid
> > excessive memory fragmentation on large LTO bulds, which can lead
> > to gcc bumping into the Linux vm_max_map limit per process.
> >
> > Based on a idea from Jakub.
> 
> Shouldn't we prefer still "mapped" pages when allocating?  Thus, keep
> the freepages list "sorted"?

I don't see why.  MADV_DONTNEED isn't perfect, what it does (on Linux)
is that it zaps the whole page range, which essentially brings it into
the exact same state as immediately after mmap.  Any touch of the
pages will result in a zeroed page being inserted into the page tables.

4 years ago there was a MADV_FREE proposal which behaved much better
(page was removed from page tables only when the kernel actually needed
them for something else, if the page wasn't needed and has been accessed
again by the application, it would still contain the old content (which
the app couldn't rely on, it could as well be cleared), but it would be much
cheaper in that case.  With MADV_FREE it would be actually preferrable
to pick the MADV_FREEd pages over picking up freshly munmapped but not yet
touched pages.

> With the new params to call release_pages less, how does this
> interact with using MADV_DONTNEED?  The only reason to delay
> MADV_DONTNEED is to avoid splitting huge-pages?  Which would

Not just that.  MADV_DONTNEED needs to flush the dirty pages from the page
tables and when they are touched again, they need to be cleared (or
pre-cleared pages inserted).  So, while MADV_DONTNEED is less expensive than
munmap + mmap, it is still not free.

> > 2011-10-08   Andi Kleen 

Two space in between name and <.
> >
> >        PR other/50636
> >        * config.in, configure: Regenerate.

Please write each file on a separate line, and better below
* configure.ac line because of which they have been regenerated.

> >
> > +  /* Unmapped page? */
> > +  bool unmapped;
> > +

Not sure if unmapped is the best name of the flag here, because
it hasn't been unmapped, it just has been madvised.  Under unmap
most people would imagine munmap I'd say.

Jakub


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-10 Thread Kai Tietz
2011/10/10 Richard Guenther :
> On Mon, Oct 10, 2011 at 12:35 PM, Kai Tietz  wrote:
>> 2011/10/7 Kai Tietz :
>>> Hello,
>>>
>>> this is the updated version with the suggestion
>>>
>>> 2011/10/7 Richard Guenther :
 On Thu, Oct 6, 2011 at 4:25 PM, Kai Tietz  wrote:
> +      && ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
> +           && TREE_CODE (arg1) != TRUTH_NOT_EXPR
> +           && simple_operand_p (arg1))

 As I said previously simple_operand_p already rejects covers
 comparisons and TRUTH_NOT_EXPR.  Also arg1 had better
 TREE_SIDE_EFFECTS set if the comparison might trap, as
 it might just be hidden in something more complicated - so
 the simple check isn't enough anyway (and if simple_operand_p
 would cover it, the check would be better placed there).
>>>
>>> I reworked simple_operand_p so that it does this special-casing and 
>>> additionally
>>> also checks for trapping.
>>>
> +      if (TREE_CODE (arg0) == code
> +          && !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
> +          && simple_operand_p (TREE_OPERAND (arg0, 1)))
> +       {
> +         tem = build2_loc (loc,
> +                           (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
> +                                                     : TRUTH_OR_EXPR),
> +                           type, TREE_OPERAND (arg0, 1), arg1);
> +         return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0),
> +                            tem);

 All trees should be folded, don't use plain build without a good reason.
>>>
>>> Ok, done
>>>
> +       }
> +      /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y
> +        are simple operands and have no side-effects.  */
> +      if (simple_operand_p (arg0)
> +          && !TREE_SIDE_EFFECTS (arg0))

 Again, the checks you do for arg0 do not match those for arg1.  OTOH
 it doesn't matter whether arg0 is simple or not or has side-effects or
 not for this transformation, so why check it at all?
>>>
>>> It is required.  For left-hand operand, if it isn't a logical
>>> and/or/xor, we need to check for side-effects (and for trapping).  I
>>> see that calling of simple_operand_p is wrong here, as it rejects too
>>> much.  Nevertheless the check for side-effects is necessary for having
>>> valid sequence-points.  Without that checking a simple test
>>
>> So said, it is even required to use for right-hand and left-hand side
>> of arguments, if one of them have side-effects or isn't simple.  Means
>> the check in my patch should use for
>>
>>> +     else if (TREE_CODE (arg0) != TRUTH_AND_EXPR
>>> +             && TREE_CODE (arg0) != TRUTH_OR_EXPR
>>> +             && TREE_CODE (arg0) != TRUTH_ANDIF_EXPR
>>> +             && TREE_CODE (arg0) != TRUTH_ORIF_EXPR
>>> +             && TREE_CODE (arg0) != TRUTH_XOR_EXPR
>>> +             /* Needed for sequence points and trappings, or side-effects. 
>>>  */
>>> +             && !TREE_SIDE_EFFECTS (arg0)
>>> +             && !tree_could_trap_p (arg0))
>>> +       return fold_build2_loc (loc, ncode, type, arg0, arg1);
>>
>> instead if (!TREE_SIDE_EFFECTS (arg0) && simple_operand_p (arg0)) 
>> instead.
>>
>> The cause for this are the consitancies of sequences in tree.  I
>> noticed that by tests in gcc.dg/tree-ssa about builitin_expect.
>>
>> for example we have
>>
>> extern int foo (void); /* foo modifies gbl1 */
>> int gbl1 = 0;
>>
>> int foo (int ns1)
>> {
>>  if (ns1 && foo () && gbl1)
>>    return 1;
>>  return 0;
>> }
>>
>> so chain of trees has to look like this:
>> (ANDIF (ns1 (ANDIF foo () gbl1))
>>
>> but if we don't check here for side-effects for left-hand chaining
>> operand, then we end up with
>> (AND ns1 (ANDIF foo () gbl1))
>
> No we don't, as the right-hand (ANDIF foo () glbl1) has side-effects.
>
>> As AND and has associative property, tree says that right-hand and
>> left-hand are exchangable, which is obviously wrong.
>
> The poitn is that if the right-hand does not have side-effects it doesn't
> matter if we execute it before the left-hand (independent on whether
> that has side-effects or not).
>
> Richard.

thats just true as long as we don't make use of associative law for
AND expressions.
Otherwise we would fail for much simpler cases like
entern int doo ();
int foo ()
{
  int c, r = 0;
  if ((c = foo ()) != 0 && c < 20)
r = 1;
  return 0;
}

as for this left-hand operand has side-effects, but as it is the first
one, we would chain it as AND.  So we get wrong sequence.

Kai


Re: [Patch,testsuite,AVR]: target-supports.exp: AVR does not support TLS

2011-10-10 Thread Rainer Orth
Georg-Johann Lay  writes:

> For example, after updating trunk to 179738,
>
>
>
>
> (gdb) set args  -fpreprocessed tls-1.i -quiet -dumpbase tls-1.c 
> -mmcu=atmega128
> -auxbase tls-1 -gdwarf-2 -O3 -O2 -version -o tls-1.s
> (gdb) cd ~/test
> (gdb) r
> GNU C (GCC) version 4.7.0 20111010 (experimental) (avr)
> compiled by GNU C version 4.3.2 [gcc-4_3-branch revision 141291], GMP
> version 5.0.1, MPFR version 3.0.0-p8, MPC version 0.8.2
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> GNU C (GCC) version 4.7.0 20111010 (experimental) (avr)
>   compiled by GNU C version 4.3.2 [gcc-4_3-branch revision 141291], GMP 
> version
> 5.0.1, MPFR version 3.0.0-p8, MPC version 0.8.2
> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
> Compiler executable checksum: 4e4900df2fe6cd3a5bd440dbb3a30bf2
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x0864e398 in set_is_used (var=0xb7dd3600) at
> ../../../gcc.gnu.org/trunk/gcc/tree-flow-inline.h:562
> (gdb) bt
> #0  0x0864e398 in set_is_used (var=0xb7dd3600) at
> ../../../gcc.gnu.org/trunk/gcc/tree-flow-inline.h:562
> #1  0x0864e256 in mark_all_vars_used_1 (tp=0xb7dc97c4,
> walk_subtrees=0xbfffe004, data=0x0) at
> ../../../gcc.gnu.org/trunk/gcc/tree-ssa-live.c:379

This has nothing to do with AVR, but is PR middle-end/50638.  It breaks
TLS on all emutls targets.  A patch has been posted and approved, but
apparently not yet installed.

So please be more careful investigating testsuite failures before
XFAILing or skipping them.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-10 Thread Kai Tietz
Sample had a typo. Correct sample has of course to return r.

 int foo ()
 {
  int c, r = 0;
  if ((c = foo ()) != 0 && c < 20)
    r = 1;
  return r;
}


Re: New warning for expanded vector operations

2011-10-10 Thread Richard Guenther
On Fri, Oct 7, 2011 at 9:44 AM, Artem Shinkarov
 wrote:
> On Fri, Oct 7, 2011 at 6:22 AM, Artem Shinkarov
>  wrote:
>> On Wed, Oct 5, 2011 at 12:35 PM, Richard Guenther
>>  wrote:
>>> On Wed, Oct 5, 2011 at 1:28 PM, Artem Shinkarov
>>>  wrote:
 On Wed, Oct 5, 2011 at 9:40 AM, Richard Guenther
  wrote:
> On Wed, Oct 5, 2011 at 12:18 AM, Artem Shinkarov
>  wrote:
>> Hi
>>
>> Here is a patch to inform a programmer about the expanded vector 
>> operation.
>> Bootstrapped on x86-unknown-linux-gnu.
>>
>> ChangeLog:
>>
>>        * gcc/tree-vect-generic.c (expand_vector_piecewise): Adjust to
>>          produce the warning.
>>          (expand_vector_parallel): Adjust to produce the warning.
>
> Entries start without gcc/, they are relative to the gcc/ChangeLog file.

 Sure, sorry.

>>          (lower_vec_shuffle): Adjust to produce the warning.
>>        * gcc/common.opt: New warning Wvector-operation-expanded.
>>        * gcc/doc/invoke.texi: Document the wawning.
>>
>>
>> Ok?
>
> I don't like the name -Wvector-operation-expanded.  We emit a
> similar warning for missed inline expansions with -Winline, so
> maybe -Wvector-extensions (that's the name that appears
> in the C extension documentation).

 Hm, I don't care much about the name, unless it gets clear what the
 warning is used for.  I am not really sure that Wvector-extensions
 makes it clear.  Also, I don't see anything bad if the warning will
 pop up during the vectorisation. Any vector operation performed
 outside the SIMD accelerator looks suspicious, because it actually
 doesn't improve performance.  Such a warning during the vectorisation
 could mean that a programmer forgot some flag, or the constant
 propagation failed to deliver a constant, or something else.

 Conceptually the text I am producing is not really a warning, it is
 more like an information, but I am not aware of the mechanisms that
 would allow me to introduce a flag triggering inform () or something
 similar.

 What I think we really need to avoid is including this warning in the
 standard Ox.

> +  location_t loc = gimple_location (gsi_stmt (*gsi));
> +
> +  warning_at (loc, OPT_Wvector_operation_expanded,
> +             "vector operation will be expanded piecewise");
>
>   v = VEC_alloc(constructor_elt, gc, (nunits + delta - 1) / delta);
>   for (i = 0; i < nunits;
> @@ -260,6 +264,10 @@ expand_vector_parallel (gimple_stmt_iter
>   tree result, compute_type;
>   enum machine_mode mode;
>   int n_words = tree_low_cst (TYPE_SIZE_UNIT (type), 1) / UNITS_PER_WORD;
> +  location_t loc = gimple_location (gsi_stmt (*gsi));
> +
> +  warning_at (loc, OPT_Wvector_operation_expanded,
> +             "vector operation will be expanded in parallel");
>
> what's the difference between 'piecewise' and 'in parallel'?

 Parallel is a little bit better for performance than piecewise.
>>>
>>> I see.  That difference should probably be documented, maybe with
>>> an example.
>>>
>>> Richard.
>>>
> @@ -301,16 +309,15 @@ expand_vector_addition (gimple_stmt_iter
>  {
>   int parts_per_word = UNITS_PER_WORD
>                       / tree_low_cst (TYPE_SIZE_UNIT (TREE_TYPE (type)), 
> 1);
> +  location_t loc = gimple_location (gsi_stmt (*gsi));
>
>   if (INTEGRAL_TYPE_P (TREE_TYPE (type))
>       && parts_per_word >= 4
>       && TYPE_VECTOR_SUBPARTS (type) >= 4)
> -    return expand_vector_parallel (gsi, f_parallel,
> -                                  type, a, b, code);
> +    return expand_vector_parallel (gsi, f_parallel, type, a, b, code);
>   else
> -    return expand_vector_piecewise (gsi, f,
> -                                   type, TREE_TYPE (type),
> -                                   a, b, code);
> +    return expand_vector_piecewise (gsi, f, type,
> +                                   TREE_TYPE (type), a, b, code);
>  }
>
>  /* Check if vector VEC consists of all the equal elements and
>
> unless i miss something loc is unused here.  Please avoid random
> whitespace changes (just review your patch yourself before posting
> and revert pieces that do nothing).

 Yes you are right, sorry.

> +@item -Wvector-operation-expanded
> +@opindex Wvector-operation-expanded
> +@opindex Wno-vector-operation-expanded
> +Warn if vector operation is not implemented via SIMD capabilities of the
> +architecture. Mainly useful for the performance tuning.
>
> I'd mention that this is for vector operations as of the C extension
> documented in "Vector Extensions".
>
> The vectorizer can produce some operations that will need further
> lowering - we probably should make sure to _not_ warn

Re: [PATCH 1/5] Use MADV_DONTNEED for freeing in garbage collector

2011-10-10 Thread Richard Guenther
On Mon, Oct 10, 2011 at 12:45 PM, Jakub Jelinek  wrote:
> On Mon, Oct 10, 2011 at 12:25:15PM +0200, Richard Guenther wrote:
>> On Sun, Oct 9, 2011 at 9:55 PM, Andi Kleen  wrote:
>> > From: Andi Kleen 
>> >
>> > Use the Linux MADV_DONTNEED call to unmap free pages in the garbage
>> > collector.Then keep the unmapped pages in the free list. This avoid
>> > excessive memory fragmentation on large LTO bulds, which can lead
>> > to gcc bumping into the Linux vm_max_map limit per process.
>> >
>> > Based on a idea from Jakub.
>>
>> Shouldn't we prefer still "mapped" pages when allocating?  Thus, keep
>> the freepages list "sorted"?
>
> I don't see why.  MADV_DONTNEED isn't perfect, what it does (on Linux)
> is that it zaps the whole page range, which essentially brings it into
> the exact same state as immediately after mmap.  Any touch of the
> pages will result in a zeroed page being inserted into the page tables.

Which means we save the zeroing when allocating non-MADV_DONTNEEDed
pages first.  And will be eventually able to unmap zapped pages.

> 4 years ago there was a MADV_FREE proposal which behaved much better
> (page was removed from page tables only when the kernel actually needed
> them for something else, if the page wasn't needed and has been accessed
> again by the application, it would still contain the old content (which
> the app couldn't rely on, it could as well be cleared), but it would be much
> cheaper in that case.  With MADV_FREE it would be actually preferrable
> to pick the MADV_FREEd pages over picking up freshly munmapped but not yet
> touched pages.
>
>> With the new params to call release_pages less, how does this
>> interact with using MADV_DONTNEED?  The only reason to delay
>> MADV_DONTNEED is to avoid splitting huge-pages?  Which would
>
> Not just that.  MADV_DONTNEED needs to flush the dirty pages from the page
> tables and when they are touched again, they need to be cleared (or
> pre-cleared pages inserted).  So, while MADV_DONTNEED is less expensive than
> munmap + mmap, it is still not free.

But it's free at madvise time.  munmap is "synchronous" at least (well,
when file-backed).

>> > 2011-10-08   Andi Kleen 
>
> Two space in between name and <.
>> >
>> >        PR other/50636
>> >        * config.in, configure: Regenerate.
>
> Please write each file on a separate line, and better below
> * configure.ac line because of which they have been regenerated.
>
>> >
>> > +  /* Unmapped page? */
>> > +  bool unmapped;
>> > +
>
> Not sure if unmapped is the best name of the flag here, because
> it hasn't been unmapped, it just has been madvised.  Under unmap
> most people would imagine munmap I'd say.
>
>        Jakub
>


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-10 Thread Richard Guenther
On Mon, Oct 10, 2011 at 12:47 PM, Kai Tietz  wrote:
> 2011/10/10 Richard Guenther :
>> On Mon, Oct 10, 2011 at 12:35 PM, Kai Tietz  wrote:
>>> 2011/10/7 Kai Tietz :
 Hello,

 this is the updated version with the suggestion

 2011/10/7 Richard Guenther :
> On Thu, Oct 6, 2011 at 4:25 PM, Kai Tietz  wrote:
>> +      && ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
>> +           && TREE_CODE (arg1) != TRUTH_NOT_EXPR
>> +           && simple_operand_p (arg1))
>
> As I said previously simple_operand_p already rejects covers
> comparisons and TRUTH_NOT_EXPR.  Also arg1 had better
> TREE_SIDE_EFFECTS set if the comparison might trap, as
> it might just be hidden in something more complicated - so
> the simple check isn't enough anyway (and if simple_operand_p
> would cover it, the check would be better placed there).

 I reworked simple_operand_p so that it does this special-casing and 
 additionally
 also checks for trapping.

>> +      if (TREE_CODE (arg0) == code
>> +          && !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
>> +          && simple_operand_p (TREE_OPERAND (arg0, 1)))
>> +       {
>> +         tem = build2_loc (loc,
>> +                           (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
>> +                                                     : TRUTH_OR_EXPR),
>> +                           type, TREE_OPERAND (arg0, 1), arg1);
>> +         return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0),
>> +                            tem);
>
> All trees should be folded, don't use plain build without a good reason.

 Ok, done

>> +       }
>> +      /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y
>> +        are simple operands and have no side-effects.  */
>> +      if (simple_operand_p (arg0)
>> +          && !TREE_SIDE_EFFECTS (arg0))
>
> Again, the checks you do for arg0 do not match those for arg1.  OTOH
> it doesn't matter whether arg0 is simple or not or has side-effects or
> not for this transformation, so why check it at all?

 It is required.  For left-hand operand, if it isn't a logical
 and/or/xor, we need to check for side-effects (and for trapping).  I
 see that calling of simple_operand_p is wrong here, as it rejects too
 much.  Nevertheless the check for side-effects is necessary for having
 valid sequence-points.  Without that checking a simple test
>>>
>>> So said, it is even required to use for right-hand and left-hand side
>>> of arguments, if one of them have side-effects or isn't simple.  Means
>>> the check in my patch should use for
>>>
 +     else if (TREE_CODE (arg0) != TRUTH_AND_EXPR
 +             && TREE_CODE (arg0) != TRUTH_OR_EXPR
 +             && TREE_CODE (arg0) != TRUTH_ANDIF_EXPR
 +             && TREE_CODE (arg0) != TRUTH_ORIF_EXPR
 +             && TREE_CODE (arg0) != TRUTH_XOR_EXPR
 +             /* Needed for sequence points and trappings, or 
 side-effects.  */
 +             && !TREE_SIDE_EFFECTS (arg0)
 +             && !tree_could_trap_p (arg0))
 +       return fold_build2_loc (loc, ncode, type, arg0, arg1);
>>>
>>> instead if (!TREE_SIDE_EFFECTS (arg0) && simple_operand_p (arg0)) 
>>> instead.
>>>
>>> The cause for this are the consitancies of sequences in tree.  I
>>> noticed that by tests in gcc.dg/tree-ssa about builitin_expect.
>>>
>>> for example we have
>>>
>>> extern int foo (void); /* foo modifies gbl1 */
>>> int gbl1 = 0;
>>>
>>> int foo (int ns1)
>>> {
>>>  if (ns1 && foo () && gbl1)
>>>    return 1;
>>>  return 0;
>>> }
>>>
>>> so chain of trees has to look like this:
>>> (ANDIF (ns1 (ANDIF foo () gbl1))
>>>
>>> but if we don't check here for side-effects for left-hand chaining
>>> operand, then we end up with
>>> (AND ns1 (ANDIF foo () gbl1))
>>
>> No we don't, as the right-hand (ANDIF foo () glbl1) has side-effects.
>>
>>> As AND and has associative property, tree says that right-hand and
>>> left-hand are exchangable, which is obviously wrong.
>>
>> The poitn is that if the right-hand does not have side-effects it doesn't
>> matter if we execute it before the left-hand (independent on whether
>> that has side-effects or not).
>>
>> Richard.
>
> thats just true as long as we don't make use of associative law for
> AND expressions.
> Otherwise we would fail for much simpler cases like
> entern int doo ();
> int foo ()
> {
>  int c, r = 0;
>  if ((c = foo ()) != 0 && c < 20)
>    r = 1;
>  return 0;
> }
>
> as for this left-hand operand has side-effects, but as it is the first
> one, we would chain it as AND.  So we get wrong sequence.

Well, then the predicates checking for simplicity and side-effects
should better match.

Richard.

> Kai
>


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-10 Thread Richard Guenther
On Fri, Oct 7, 2011 at 11:36 PM, Kai Tietz  wrote:
> Hello,
>
> this is the updated version with the suggestion
>
> 2011/10/7 Richard Guenther :
>> On Thu, Oct 6, 2011 at 4:25 PM, Kai Tietz  wrote:
>>> +      && ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
>>> +           && TREE_CODE (arg1) != TRUTH_NOT_EXPR
>>> +           && simple_operand_p (arg1))
>>
>> As I said previously simple_operand_p already rejects covers
>> comparisons and TRUTH_NOT_EXPR.  Also arg1 had better
>> TREE_SIDE_EFFECTS set if the comparison might trap, as
>> it might just be hidden in something more complicated - so
>> the simple check isn't enough anyway (and if simple_operand_p
>> would cover it, the check would be better placed there).
>
> I reworked simple_operand_p so that it does this special-casing and 
> additionally
> also checks for trapping.
>
>>> +      if (TREE_CODE (arg0) == code
>>> +          && !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
>>> +          && simple_operand_p (TREE_OPERAND (arg0, 1)))
>>> +       {
>>> +         tem = build2_loc (loc,
>>> +                           (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
>>> +                                                     : TRUTH_OR_EXPR),
>>> +                           type, TREE_OPERAND (arg0, 1), arg1);
>>> +         return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0),
>>> +                            tem);
>>
>> All trees should be folded, don't use plain build without a good reason.
>
> Ok, done
>
>>> +       }
>>> +      /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y
>>> +        are simple operands and have no side-effects.  */
>>> +      if (simple_operand_p (arg0)
>>> +          && !TREE_SIDE_EFFECTS (arg0))
>>
>> Again, the checks you do for arg0 do not match those for arg1.  OTOH
>> it doesn't matter whether arg0 is simple or not or has side-effects or
>> not for this transformation, so why check it at all?
>
> It is required.  For left-hand operand, if it isn't a logical
> and/or/xor, we need to check for side-effects (and for trapping).  I
> see that calling of simple_operand_p is wrong here, as it rejects too
> much.  Nevertheless the check for side-effects is necessary for having
> valid sequence-points.  Without that checking a simple test
>
> int getter (void);
>
> int foo (void)
> {
>  int c, r = 0;
>  while ((c = getter ()) != '"' && c >= 0)
>    r +=c;
>  return r;
> }
>
> would give a warning about sequence-points.  As left-hand operand has
> side-effects, but right-hand not.  If we would combine it as AND, the
> operands are exchange-able.  So right-hand operand needs to be another
> ANDIF expression instead.
> Same apply on trapping.
>
>> In fold_truthop we still have the same (albeit more restricted transform),
>> but guarded with
>>
>>  if (BRANCH_COST (optimize_function_for_speed_p (cfun),
>>                   false) >= 2
>>
>> too.  Why not here?  Please delete redundant code in fold_truthop.
> Well, in general this is the default definition of
> LOGICAL_OP_NON_SHORT_CIRCUIT, so I missed that.  As for some targets
> the macro LOGICAL_OP_NON_SHORT_CIRCUIT might be defined differently,
> it might make sense to check for BRANCH_COST again.
>
>> It's also odd that this is only called from fold_truth_andor but has
>> a more generic name, so maybe rename it to fold_truth_andor_1 on the way.
>
> I renamed it.
>
>> Richard.
>
> ChangeLog
>
> 2011-10-07  Kai Tietz  
>
>        * fold-const.c (simple_operand_p): Make argument non-const
>        and add floating-point trapping check, and special cases for
>        comparisons, and logical-not's.
>        (fold_truthop): Rename to
>        (fold_truth_andor_1): function name.
>        Additionally remove here TRUTH-AND|OR_EXPR generation.
>        (fold_truth_andor): Handle branching at one place.
>
> Bootstrapped and regression-tested for all languages plus Ada and
> Obj-C++ on x86_64-pc-linux-gnu.
> Ok for apply?
>
> Regards,
> Kai
>
> Index: gcc/gcc/fold-const.c
> ===
> --- gcc.orig/gcc/fold-const.c
> +++ gcc/gcc/fold-const.c
> @@ -111,14 +111,13 @@ static tree decode_field_reference (loca
>                                    tree *, tree *);
>  static int all_ones_mask_p (const_tree, int);
>  static tree sign_bit_p (tree, const_tree);
> -static int simple_operand_p (const_tree);
> +static int simple_operand_p (tree);
>  static tree range_binop (enum tree_code, tree, tree, int, tree, int);
>  static tree range_predecessor (tree);
>  static tree range_successor (tree);
>  static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
>  static tree fold_cond_expr_with_comparison (location_t, tree, tree,
> tree, tree);
>  static tree unextend (tree, int, int, tree);
> -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree);
>  static tree optimize_minmax_comparison (location_t, enum tree_code,
>                                        tree, tree, tree);
>  static tree ex

[patch] remove empty directory common/config/m32c

2011-10-10 Thread Matthias Klose
2011-10-10  Matthias Klose 

* common/config/m32c: Remove empty directory.

committed as obvious. the last file in this directory was removed in r175969.


Re: [patch] C6X unwinding/exception handling

2011-10-10 Thread Matthias Klose
On 10/10/2011 12:32 PM, Andrew Haley wrote:
> On 10/09/2011 12:09 PM, Matthias Klose wrote:
>> This did break libobjc and libjava on arm-linux-gnueabi.
>>
>> libobjc now has an undefined reference to _Unwind_decode_target2, which can 
>> be
>> avoided with

with this patch, the libobjc testsuite results looks as before this change.

>>
>> --- libobjc/exception.c.orig2011-07-21 15:33:57.0 +
>> +++ libobjc/exception.c 2011-10-09 10:53:12.554940776 +
>> @@ -182,7 +182,7 @@
>>_Unwind_Ptr ptr;
>>
>>ptr = (_Unwind_Ptr) (info->TType - (i * 4));
>> -  ptr = _Unwind_decode_target2 (ptr);
>> +  ptr = _Unwind_decode_typeinfo_ptr (info->ttype_base, (_Unwind_Word) ptr);
>>
>>/* NULL ptr means catch-all.  Note that if the class is not found,
>>   this will abort the program.  */
>>
>> libjava fails to build, the same change doesn't work for 
>> libjava/exception.cc,
>> because the struct lsda_header_info in exception.cc is missing the ttype_base
>> member. Any suggestions?
> 
> Is this fixable without backing out Paul Brooks' patch?

currently testing

Index: libjava/exception.cc
===
--- libjava/exception.cc(revision 179739)
+++ libjava/exception.cc(working copy)
@@ -135,6 +135,7 @@
 {
   _Unwind_Ptr Start;
   _Unwind_Ptr LPStart;
+  _Unwind_Ptr ttype_base;
   const unsigned char *TType;
   const unsigned char *action_table;
   unsigned char ttype_encoding;
@@ -184,7 +185,7 @@
   _Unwind_Ptr ptr;

   ptr = (_Unwind_Ptr) (info->TType - (i * 4));
-  ptr = _Unwind_decode_target2(ptr);
+  ptr = _Unwind_decode_typeinfo_ptr (info->ttype_base, (_Unwind_Word) ptr);

   return reinterpret_cast(ptr);
 }
@@ -325,6 +326,7 @@

   // Parse the LSDA header.
   p = parse_lsda_header (context, language_specific_data, &info);
+  info.ttype_base = base_of_encoded_value (info.ttype_encoding, context);
 #ifdef HAVE_GETIPINFO
   ip = _Unwind_GetIPInfo (context, &ip_before_insn);
 #else




[patch, libgo] remove empty directories

2011-10-10 Thread Matthias Klose
libgo currently has some empty directories. ok to remove?

D   go/encoding/line
D   go/exp/ogle
D   go/exp/eval
D   go/exp/draw
D   go/exp/draw/x11


Re: [PATCH] Fix PR48193 and dups

2011-10-10 Thread Richard Guenther
On Wed, Mar 23, 2011 at 4:14 PM, Richard Guenther  wrote:
>
> This fixes VOP renaming in IPA split (well, or rather adds to its
> crudeness).  We really need to rename the reaching VDEF of the
> exit of the SESE region we kill, as that VDEF will be released
> when removing the definitions basic-block.  What we really would
> want to do is replace that SESE region with the call using the
> reaching VDEF of its entry and the reaching VDEF of its exit
> as virtual operands.  But the current code is somewhat twisted
> so I got lost and the following is what also works for me.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> Honza, I fail to see the exact constraints on the region we
> outline, so you might want to provide a better fix?

Honza?  Ping?

Thanks,
Richard.

> Thanks,
> Richard.
>
> 2011-03-23  Richard Guenther  
>
>        PR tree-optimization/48193
>        * ipa-slit.c (split_function): Properly rename the reaching
>        VDEF.
>
> Index: gcc/ipa-split.c
> ===
> --- gcc/ipa-split.c     (revision 171352)
> +++ gcc/ipa-split.c     (working copy)
> @@ -169,8 +169,9 @@ static void
>  dump_split_point (FILE * file, struct split_point *current)
>  {
>   fprintf (file,
> -          "Split point at BB %i header time:%i header size: %i"
> -          " split time: %i split size: %i\n  bbs: ",
> +          "Split point at BB %i\n"
> +          "  header time: %i header size: %i\n"
> +          "  split time: %i split size: %i\n  bbs: ",
>           current->entry_bb->index, current->header_time,
>           current->header_size, current->split_time, current->split_size);
>   dump_bitmap (file, current->split_bbs);
> @@ -1036,12 +1037,13 @@ split_function (struct split_point *spli
>
>   /* If RETURN_BB has virtual operand PHIs, they must be removed and the
>      virtual operand marked for renaming as we change the CFG in a way that
> -     tree-inline is not able to compensate for.
> +     tree-inline is not able to compensate for.
>
>      Note this can happen whether or not we have a return value.  If we have
>      a return value, then RETURN_BB may have PHIs for real operands too.  */
>   if (return_bb != EXIT_BLOCK_PTR)
>     {
> +      bool phi_p = false;
>       for (gsi = gsi_start_phis (return_bb); !gsi_end_p (gsi);)
>        {
>          gimple stmt = gsi_stmt (gsi);
> @@ -1052,7 +1054,28 @@ split_function (struct split_point *spli
>            }
>          mark_virtual_phi_result_for_renaming (stmt);
>          remove_phi_node (&gsi, true);
> +         phi_p = true;
>        }
> +      /* In reality we have to rename the reaching definition of the
> +        virtual operand at return_bb as we will eventually release it
> +        when we remove the code region we outlined.
> +        So we have to rename all immediate virtual uses of that region
> +        if we didn't see a PHI definition yet.  */
> +      /* ???  In real reality we want to set the reaching vdef of the
> +         entry of the SESE region as the vuse of the call and the reaching
> +        vdef of the exit of the SESE region as the vdef of the call.  */
> +      if (!phi_p)
> +       for (gsi = gsi_start_bb (return_bb); !gsi_end_p (gsi); gsi_next 
> (&gsi))
> +         {
> +           gimple stmt = gsi_stmt (gsi);
> +           if (gimple_vuse (stmt))
> +             {
> +               gimple_set_vuse (stmt, NULL_TREE);
> +               update_stmt (stmt);
> +             }
> +           if (gimple_vdef (stmt))
> +             break;
> +         }
>     }
>
>   /* Now create the actual clone.  */
>


[C++ Patch / RFC] PR 33067

2011-10-10 Thread Paolo Carlini

Hi,

reporter complains that, for:

struct T {} t;
bool b = 1.1 < t;

we output (on x86_64-linux):

33067.C:2:18: error: no match for ‘operator<’ in 
‘1.100088817841970012523233890533447265625e+0 < t’


which is clearly pretty dumb. In my opinion, a definite improvement 
would be following:


http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1822.pdf

thus output a "minimal" number of decimal digits. If I apply the 
attached patchlet, I get:


33067.C:2:18: error: no match for ‘operator<’ in ‘1.1001e+0 < t’

which looks much better to me. Comments? Does the idea make sense to 
everybody?


Thanks,
Paolo.

PS: I didn't carefully check the decimal floating point case, maybe we 
could do better.




Index: c-family/c-pretty-print.c
===
--- c-family/c-pretty-print.c   (revision 179739)
+++ c-family/c-pretty-print.c   (working copy)
@@ -1018,8 +1018,19 @@ pp_c_enumeration_constant (c_pretty_printer *pp, t
 static void
 pp_c_floating_constant (c_pretty_printer *pp, tree r)
 {
+  const struct real_format *fmt
+= REAL_MODE_FORMAT (TYPE_MODE (TREE_TYPE (r)));
+
+  REAL_VALUE_TYPE floating_cst = TREE_REAL_CST (r);
+  bool is_decimal = floating_cst.decimal;
+
+  // The fraction 643/2136 approximates log10(2) to 7 significant digits.
+  int max_digits10 = 2 + (is_decimal ? fmt->p : fmt->p * 643L / 2136);
+
   real_to_decimal (pp_buffer (pp)->digit_buffer, &TREE_REAL_CST (r),
-  sizeof (pp_buffer (pp)->digit_buffer), 0, 1);
+  sizeof (pp_buffer (pp)->digit_buffer),
+  max_digits10, 1);
+
   pp_string (pp, pp_buffer(pp)->digit_buffer);
   if (TREE_TYPE (r) == float_type_node)
 pp_character (pp, 'f');


Re: [PATCH 1/5] Use MADV_DONTNEED for freeing in garbage collector

2011-10-10 Thread Jakub Jelinek
On Mon, Oct 10, 2011 at 01:11:13PM +0200, Richard Guenther wrote:
> > I don't see why.  MADV_DONTNEED isn't perfect, what it does (on Linux)
> > is that it zaps the whole page range, which essentially brings it into
> > the exact same state as immediately after mmap.  Any touch of the
> > pages will result in a zeroed page being inserted into the page tables.
> 
> Which means we save the zeroing when allocating non-MADV_DONTNEEDed
> pages first.  And will be eventually able to unmap zapped pages.

Well, there are 3 kind of pages in G.free_pages list after the patch.
1) pages added by alloc_page (GGC_QUIRK_SIZE - 1 pages each time, for p->bytes 
==  G.pagesize only)
2) pages on which free_page has been called during last ggc_collect's 
sweep_pages
3) MADV_DONTNEED hinted pages from release_pages

Pages of 1) and 3) category have the same cost, pages of 2) category are
cheaper to access.  Pages of the 3) category are guaranteed to be at the
end of list (simply because free_pages marks all pages in the G.free_pages
list).  So this patch doesn't make things worse than it was before in this
regard, though maybe we should munmap p->bytes != G.pagesize pages right
away in release_pages instead of MADV_DONTNEED marking them.  They are
rarely to be reused.  For p->bytes == G.pagesize pages (the vast majority)
on the other side 1) shouldn't be really added until there are no 2) and 3)
category pages.

> > Not just that.  MADV_DONTNEED needs to flush the dirty pages from the page
> > tables and when they are touched again, they need to be cleared (or
> > pre-cleared pages inserted).  So, while MADV_DONTNEED is less expensive than
> > munmap + mmap, it is still not free.
> 
> But it's free at madvise time.  munmap is "synchronous" at least (well,
> when file-backed).

The zapping means some TLB flush and page table clearly, so not free even at
madvise time.  And here we are talking about anon memory anyway, so not
file-backed.

Jakub


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-10 Thread Kai Tietz
2011/10/10 Richard Guenther :
> On Fri, Oct 7, 2011 at 11:36 PM, Kai Tietz  wrote:
>> Hello,
>>
>> this is the updated version with the suggestion
>>
>> 2011/10/7 Richard Guenther :
>>> On Thu, Oct 6, 2011 at 4:25 PM, Kai Tietz  wrote:
 +      && ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
 +           && TREE_CODE (arg1) != TRUTH_NOT_EXPR
 +           && simple_operand_p (arg1))
>>>
>>> As I said previously simple_operand_p already rejects covers
>>> comparisons and TRUTH_NOT_EXPR.  Also arg1 had better
>>> TREE_SIDE_EFFECTS set if the comparison might trap, as
>>> it might just be hidden in something more complicated - so
>>> the simple check isn't enough anyway (and if simple_operand_p
>>> would cover it, the check would be better placed there).
>>
>> I reworked simple_operand_p so that it does this special-casing and 
>> additionally
>> also checks for trapping.
>>
 +      if (TREE_CODE (arg0) == code
 +          && !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
 +          && simple_operand_p (TREE_OPERAND (arg0, 1)))
 +       {
 +         tem = build2_loc (loc,
 +                           (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
 +                                                     : TRUTH_OR_EXPR),
 +                           type, TREE_OPERAND (arg0, 1), arg1);
 +         return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0),
 +                            tem);
>>>
>>> All trees should be folded, don't use plain build without a good reason.
>>
>> Ok, done
>>
 +       }
 +      /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y
 +        are simple operands and have no side-effects.  */
 +      if (simple_operand_p (arg0)
 +          && !TREE_SIDE_EFFECTS (arg0))
>>>
>>> Again, the checks you do for arg0 do not match those for arg1.  OTOH
>>> it doesn't matter whether arg0 is simple or not or has side-effects or
>>> not for this transformation, so why check it at all?
>>
>> It is required.  For left-hand operand, if it isn't a logical
>> and/or/xor, we need to check for side-effects (and for trapping).  I
>> see that calling of simple_operand_p is wrong here, as it rejects too
>> much.  Nevertheless the check for side-effects is necessary for having
>> valid sequence-points.  Without that checking a simple test
>>
>> int getter (void);
>>
>> int foo (void)
>> {
>>  int c, r = 0;
>>  while ((c = getter ()) != '"' && c >= 0)
>>    r +=c;
>>  return r;
>> }
>>
>> would give a warning about sequence-points.  As left-hand operand has
>> side-effects, but right-hand not.  If we would combine it as AND, the
>> operands are exchange-able.  So right-hand operand needs to be another
>> ANDIF expression instead.
>> Same apply on trapping.
>>
>>> In fold_truthop we still have the same (albeit more restricted transform),
>>> but guarded with
>>>
>>>  if (BRANCH_COST (optimize_function_for_speed_p (cfun),
>>>                   false) >= 2
>>>
>>> too.  Why not here?  Please delete redundant code in fold_truthop.
>> Well, in general this is the default definition of
>> LOGICAL_OP_NON_SHORT_CIRCUIT, so I missed that.  As for some targets
>> the macro LOGICAL_OP_NON_SHORT_CIRCUIT might be defined differently,
>> it might make sense to check for BRANCH_COST again.
>>
>>> It's also odd that this is only called from fold_truth_andor but has
>>> a more generic name, so maybe rename it to fold_truth_andor_1 on the way.
>>
>> I renamed it.
>>
>>> Richard.
>>
>> ChangeLog
>>
>> 2011-10-07  Kai Tietz  
>>
>>        * fold-const.c (simple_operand_p): Make argument non-const
>>        and add floating-point trapping check, and special cases for
>>        comparisons, and logical-not's.
>>        (fold_truthop): Rename to
>>        (fold_truth_andor_1): function name.
>>        Additionally remove here TRUTH-AND|OR_EXPR generation.
>>        (fold_truth_andor): Handle branching at one place.
>>
>> Bootstrapped and regression-tested for all languages plus Ada and
>> Obj-C++ on x86_64-pc-linux-gnu.
>> Ok for apply?
>>
>> Regards,
>> Kai
>>
>> Index: gcc/gcc/fold-const.c
>> ===
>> --- gcc.orig/gcc/fold-const.c
>> +++ gcc/gcc/fold-const.c
>> @@ -111,14 +111,13 @@ static tree decode_field_reference (loca
>>                                    tree *, tree *);
>>  static int all_ones_mask_p (const_tree, int);
>>  static tree sign_bit_p (tree, const_tree);
>> -static int simple_operand_p (const_tree);
>> +static int simple_operand_p (tree);
>>  static tree range_binop (enum tree_code, tree, tree, int, tree, int);
>>  static tree range_predecessor (tree);
>>  static tree range_successor (tree);
>>  static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
>>  static tree fold_cond_expr_with_comparison (location_t, tree, tree,
>> tree, tree);
>>  static tree unextend (tree, int, int, tree);
>> -static tree fold_truthop (location_t, enum tree_code, tree, tree,

Re: [4/4] Make SMS schedule register moves

2011-10-10 Thread Richard Sandiford
Ayal Zaks  writes:
>> I agree it's natural to schedule moves for intra-iteration dependencies
>> in the normal get_sched_window way.  But suppose we have a dependency:
>>
>>   A --(T,N,1)--> B
>>
>> that requires two moves M1 and M2.  If we think in terms of cycles
>> (in the SCHED_TIME sense), then this effectively becomes:
>>
>>   A --(T,N1,1)--> M1 -->(T,N2,0)--> M2 -->(T,N3,0)--> B
>>
>> because it is now M1 that is fed by both the loop and the incoming edge.
>> But if there is a second dependency:
>>
>>   A --(T,M,0)--> C
>>
>> that also requires two moves, we end up with:
>>
>>   A --(T,N1,1)--> M1 -->(T,N2,0)--> M2 -->(T,N3,0)--> B
>>                                        -->(T,M3,-1)--> B
>>
>> and dependence distances of -1 feel a bit weird. :-)  Of course,
>> what we really have are two parallel dependencies:
>>
>>   A --(T,N1,1)--> M1 -->(T,N2,0)--> M2 -->(T,N3,0)--> B
>>
>>   A --(T,M1,0)--> M1' -->(T,M2,0)--> M2' -->(T,N3,0)--> B
>>
>> where M1' and M2' occupy the same position as M1 and M2 in the schedule,
>> but are one stage further along.  But we only schedule them once,
>> so if we take the cycle/SCHED_TIME route, we have to introduce
>> dependencies of distance -1.
>>
>
> Interesting; had to digest this distance 1 business, a result of
> thinking in cycles instead of rows (or conversely), and mixing
> dependences with scheduling; here's my understanding, based on your
> explanations:
>
> Suppose a Use is truely dependent on a Def, where both have been
> scheduled at some absolute cycles; think of them as timing the first
> iteration of the loop.
> Assume first that Use appears originally after Def in the original
> instruction sequence of the loop (dependence distance 0). In this
> case, Use requires register moves if its distance D from Def according
> to the schedule is more than ii cycles long -- by the time Use is
> executed, the value it needs is no longer available in the def'd
> register due to intervening occurrences of Def. So in this case, the
> first reg-move (among D/ii) should appear after Def, recording its
> value before the next occurrence of Def overwrites it, and feeding
> subsequent moves as needed before each is overwritten. Thus the
> scheduling window of this first reg-move is within (Def, Def+ii).
>
> Now, suppose Use appears before Def, i.e., Use is upwards-exposed; if
> it remains scheduled before the Def, no reg-move is needed. If, otoh,
> Def is scheduled (D>0 cycles) before Use, breaking the anti-dependence
> between them, (D/ii + 1) reg-moves are needed in order to feed Use
> with the live-in value before Def. So in this case, the first reg-move
> should appear before Def (again feeding subsequent moves as needed
> before each is overwritten). Thus the scheduling window of this first
> reg-move is within (Def-ii, Def).
>
> In any case, each use requires a specific number of reg-moves, which
> begin either before or after the def; it is always safe (though
> potentially redundant) to start reg-moving before the def -- uses that
> do not need the live-in value will ignore it by feeding from a later
> reg-move.

Right.  The distance on the Def->Use dependency is effectively transferred
to the dependency between the Def and first move.

And we can potentially have both kinds of Use at the same time.
We then end up scheduling two moves, one in each window, but require
them to occupy the same row and column.  It feels more convenient to
schedule the earlier of the two moves.

> Q: if we generate reg-moves starting from before the def, would
> redundant reg-moves be eliminated if no use needs access to live-in
> value? If so, would that simplify their generation? (If not, it may be
> interesting to understand how to improve DCE to catch it.)

To be clear, the new version of the patch (unlike the old one)
doesn't emit reg-moves before the def if all uses are distance 0.
We only do it where some or all uses are distance 1.  The first move
before the def is always needed in that case.

So redudant moves are confined to the case where (a) we have more
than one move, (b) we have both distance 0 and distance 1 uses, and
(c) one of the two distance sets requires more moves than the other.
If the distance 0 dependencies require more moves than the distance
1 dependencies, we will have a redudant move in the prologue.
If the distance 1 dependencies require more moves than the distance
0 dependencies, we will have a redudant move in the epilogue.
But I believe that this is already handled by dce.

(For avoidance of doubt, the new code is more accurate than
current trunk in this respect.)

> The issue of assigning stages to reg-moves is mostly relevant for
> prolog and epilog generation, which requires and receives special
> attention -- handled very nicely by ps_num_consecutive_stages! Note
> that currently a simple boolean indicator for (the exceptional case
> of) double stages would suffice, instead of generalizing to arbitrary
> nums of consecutive stages (see any potenti

Re: [C++ Patch / RFC] PR 33067

2011-10-10 Thread Paolo Carlini
. looks like we want to do something else, not printing the number at 
all. See audit trail.


Paolo.



Re: [PATCH] Remove "bogus" g++.dg/init/copy7.C testcase

2011-10-10 Thread Richard Guenther
On Mon, Aug 15, 2011 at 2:42 PM, Richard Guenther  wrote:
>
> The g++.dg/init/copy7.C testcase checks whether the C++ frontend
> guards memcpy it emits via a conditional verifying that src != dst
> because calling memcpy with overlapping source / destination is
> not supported.
>
> The testcase is misguided though (and the C++ frontend was, until
> recently) - the middle-end itself will replace aggregate copies
> with memcpy libcalls if it suits - without such conditional.
> As PR39480 shows (the bug that prompted to "fixing" the C++ frontend),
> the "error" was diagnosed by valgrind, not any real memcpy implemenation.
>
> The argument still holds that no reasonable memcpy implementation
> will reject the src == dest case.  Arguing about explicit cache
> write-allocation is moot, as you'd still have to handle the
> case of memcpy (&a, &a+1, 1) correctly - and thus any reasonable
> implementation would handle the src == dest case explicitly if
> that is necessary.
>
> Thus, the following simply removes the now FAILing testcase on
> the basis that it never was PASSing really (as my modified
> C testcases in PR50079 show).  If we ever encounter a platform
> that fails for memcpy (&a, &a, ...) and we decide it's not the
> platform that is broken we have to invent a fix in the middle-end
> and (conditionally) guard any libcall block moves.
>
> Comments?  Ok to commit?

Ping?  Richard? Jason?

Thanks,
Richard.

> Thanks,
> Richard.
>
> 2011-08-15  Richard Guenther  
>
>        PR middle-end/50079
>        * g++.dg/init/copy7.C: Remove testcase.
>
> Index: gcc/testsuite/g++.dg/init/copy7.C
> ===
> --- gcc/testsuite/g++.dg/init/copy7.C   (revision 177759)
> +++ gcc/testsuite/g++.dg/init/copy7.C   (working copy)
> @@ -1,39 +0,0 @@
> -// PR c++/39480
> -// It isn't always safe to call memcpy with identical arguments.
> -// { dg-do run }
> -
> -extern "C" void abort();
> -extern "C" void *
> -memcpy(void *dest, void *src, __SIZE_TYPE__ n)
> -{
> -  if (dest == src)
> -    abort();
> -  else
> -    {
> -      __SIZE_TYPE__ i;
> -      for (i = 0; i < n; i++)
> -        ((char *)dest)[i] = ((const char*)src)[i];
> -    }
> -}
> -
> -struct A
> -{
> -  double d[10];
> -};
> -
> -struct B: public A
> -{
> -  char bc;
> -};
> -
> -B b;
> -
> -void f(B *a1, B* a2)
> -{
> -  *a1 = *a2;
> -}
> -
> -int main()
> -{
> -  f(&b,&b);
> -}
>


Re: [patch] C6X unwinding/exception handling

2011-10-10 Thread Paul Brook
> Index: libjava/exception.cc
> ===
> --- libjava/exception.cc  (revision 179739)
> +++ libjava/exception.cc  (working copy)
> @@ -135,6 +135,7 @@
>  {
>_Unwind_Ptr Start;
>_Unwind_Ptr LPStart;
> +  _Unwind_Ptr ttype_base;
>const unsigned char *TType;
>const unsigned char *action_table;
>unsigned char ttype_encoding;
> @@ -184,7 +185,7 @@
>_Unwind_Ptr ptr;
> 
>ptr = (_Unwind_Ptr) (info->TType - (i * 4));
> -  ptr = _Unwind_decode_target2(ptr);
> +  ptr = _Unwind_decode_typeinfo_ptr (info->ttype_base, (_Unwind_Word)
> ptr);
> 
>return reinterpret_cast(ptr);
>  }
> @@ -325,6 +326,7 @@
> 
>// Parse the LSDA header.
>p = parse_lsda_header (context, language_specific_data, &info);
> +  info.ttype_base = base_of_encoded_value (info.ttype_encoding, context);
>  #ifdef HAVE_GETIPINFO
>ip = _Unwind_GetIPInfo (context, &ip_before_insn);
>  #else

No.  The purpose of my patch was to remove the arm specific code.

The only difference I can see in this bit of code is that libstdc++ uses 
base_of_encoded_value/read_encoded_value_with_base whereas libjava uses 
context/read_encoded_value.

I expect you want something like the patch below (completely untested).

Paul

Index: exception.cc
===
--- exception.cc(revision 178906)
+++ exception.cc(working copy)
@@ -161,6 +161,11 @@ parse_lsda_header (_Unwind_Context *cont
   info->ttype_encoding = *p++;
   if (info->ttype_encoding != DW_EH_PE_omit)
 {
+#if _GLIBCXX_OVERRIDE_TTYPE_ENCODING
+  /* Older ARM EABI toolchains set this value incorrectly, so use a
+hardcoded OS-specific format.  */
+  info->ttype_encoding = _GLIBCXX_OVERRIDE_TTYPE_ENCODING;
+#endif
   p = read_uleb128 (p, &tmp);
   info->TType = p + tmp;
 }
@@ -176,21 +181,6 @@ parse_lsda_header (_Unwind_Context *cont
   return p;
 }
 
-#ifdef __ARM_EABI_UNWINDER__
-
-static void **
-get_ttype_entry(_Unwind_Context *, lsda_header_info* info, _uleb128_t i)
-{
-  _Unwind_Ptr ptr;
-
-  ptr = (_Unwind_Ptr) (info->TType - (i * 4));
-  ptr = _Unwind_decode_target2(ptr);
-  
-  return reinterpret_cast(ptr);
-}
-
-#else
-
 static void **
 get_ttype_entry (_Unwind_Context *context, lsda_header_info *info, long i)
 {
@@ -202,8 +192,6 @@ get_ttype_entry (_Unwind_Context *contex
   return reinterpret_cast(ptr);
 }
 
-#endif
-
 // Using a different personality function name causes link failures
 // when trying to mix code using different exception handling models.
 #ifdef SJLJ_EXCEPTIONS


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-10 Thread Kai Tietz
Recent patch had a thinko on rhs of inner lhs check for TRUTH-IF.  It
has to be checked that the LHS code is same as outer CODE, as
otherwise we wouldn't apply different TRUTH-IF only on inner RHS of
LHS, which is of course wrong.

Index: gcc/gcc/fold-const.c
===
--- gcc.orig/gcc/fold-const.c
+++ gcc/gcc/fold-const.c
@@ -111,14 +111,13 @@ static tree decode_field_reference (loca
tree *, tree *);
 static int all_ones_mask_p (const_tree, int);
 static tree sign_bit_p (tree, const_tree);
-static int simple_operand_p (const_tree);
+static int simple_operand_p (tree);
 static tree range_binop (enum tree_code, tree, tree, int, tree, int);
 static tree range_predecessor (tree);
 static tree range_successor (tree);
 static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
 static tree fold_cond_expr_with_comparison (location_t, tree, tree,
tree, tree);
 static tree unextend (tree, int, int, tree);
-static tree fold_truthop (location_t, enum tree_code, tree, tree, tree);
 static tree optimize_minmax_comparison (location_t, enum tree_code,
tree, tree, tree);
 static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
@@ -3500,7 +3499,7 @@ optimize_bit_field_compare (location_t l
   return lhs;
 }
 
-/* Subroutine for fold_truthop: decode a field reference.
+/* Subroutine for fold_truth_andor_1: decode a field reference.

If EXP is a comparison reference, we return the innermost reference.

@@ -3668,17 +3667,43 @@ sign_bit_p (tree exp, const_tree val)
   return NULL_TREE;
 }

-/* Subroutine for fold_truthop: determine if an operand is simple enough
+/* Subroutine for fold_truth_andor_1: determine if an operand is simple enough
to be evaluated unconditionally.  */

 static int
-simple_operand_p (const_tree exp)
+simple_operand_p (tree exp)
 {
+  enum tree_code code;
   /* Strip any conversions that don't change the machine mode.  */
   STRIP_NOPS (exp);

+  code = TREE_CODE (exp);
+
+  /* Handle some trivials   */
+  if (TREE_CODE_CLASS (code) == tcc_comparison)
+return (tree_could_trap_p (exp)
+   && simple_operand_p (TREE_OPERAND (exp, 0))
+   && simple_operand_p (TREE_OPERAND (exp, 1)));
+
+  if (FLOAT_TYPE_P (TREE_TYPE (exp))
+  && tree_could_trap_p (exp))
+return false;
+
+  switch (code)
+{
+case SSA_NAME:
+  return true;
+case TRUTH_NOT_EXPR:
+  return simple_operand_p (TREE_OPERAND (exp, 0));
+case BIT_NOT_EXPR:
+  if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE)
+return false;
+  return simple_operand_p (TREE_OPERAND (exp, 0));
+default:
+  break;
+}
+
   return (CONSTANT_CLASS_P (exp)
- || TREE_CODE (exp) == SSA_NAME
  || (DECL_P (exp)
  && ! TREE_ADDRESSABLE (exp)
  && ! TREE_THIS_VOLATILE (exp)
@@ -4888,7 +4913,7 @@ fold_range_test (location_t loc, enum tr
   return 0;
 }
 
-/* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P
+/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P
bit value.  Arrange things so the extra bits will be set to zero if and
only if C is signed-extended to its full width.  If MASK is nonzero,
it is an INTEGER_CST that should be AND'ed with the extra bits.  */
@@ -5025,8 +5050,8 @@ merge_truthop_with_opposite_arm (locatio
We return the simplified tree or 0 if no optimization is possible.  */

 static tree
-fold_truthop (location_t loc, enum tree_code code, tree truth_type,
- tree lhs, tree rhs)
+fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
+   tree lhs, tree rhs)
 {
   /* If this is the "or" of two comparisons, we can do something if
  the comparisons are NE_EXPR.  If this is the "and", we can do something
@@ -5054,8 +5079,6 @@ fold_truthop (location_t loc, enum tree_
   tree lntype, rntype, result;
   HOST_WIDE_INT first_bit, end_bit;
   int volatilep;
-  tree orig_lhs = lhs, orig_rhs = rhs;
-  enum tree_code orig_code = code;

   /* Start by getting the comparison codes.  Fail if anything is volatile.
  If one operand is a BIT_AND_EXPR with the constant one, treat it as if
@@ -5119,8 +5142,7 @@ fold_truthop (location_t loc, enum tree_
   /* If the RHS can be evaluated unconditionally and its operands are
  simple, it wins to evaluate the RHS unconditionally on machines
  with expensive branches.  In this case, this isn't a comparison
- that can be merged.  Avoid doing this if the RHS is a floating-point
- comparison since those can trap.  */
+ that can be merged.  */

   if (BRANCH_COST (optimize_function_for_speed_p (cfun),
   false) >= 2
@@ -5149,13 +5171,6 @@ fold_truthop (location_t loc, enum tree_
   build2 (BIT_IOR_EXPR, TREE_TYPE (ll_arg),
   ll_arg, rl_arg),
   

Ping^3: PR middle-end/48660: Assigning to BLKmode RESULT_DECL

2011-10-10 Thread Richard Sandiford
Ping for:

http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00558.html

which fixes an expand-time bug in thunks that return BLKmode structures
in registers.  Tested on x86_64-linux-gnu and arm-linux-gnueabi.

Thanks,
Richard


Re: [C++ Patch / RFC] PR 33067

2011-10-10 Thread Paolo Carlini

On 10/10/2011 02:13 PM, Paolo Carlini wrote:
. looks like we want to do something else, not printing the number at 
all. See audit trail.
An option, for 4.7 at least, would be, instead of just closing 33067 as 
a duplicate of the much more general 49152, doing something like:


Index: c-family/c-pretty-print.c
===
--- c-family/c-pretty-print.c   (revision 179745)
+++ c-family/c-pretty-print.c   (working copy)
@@ -1019,7 +1019,7 @@ static void
 pp_c_floating_constant (c_pretty_printer *pp, tree r)
 {
   real_to_decimal (pp_buffer (pp)->digit_buffer, &TREE_REAL_CST (r),
-  sizeof (pp_buffer (pp)->digit_buffer), 0, 1);
+  sizeof (pp_buffer (pp)->digit_buffer), 6, 1);
   pp_string (pp, pp_buffer(pp)->digit_buffer);
   if (TREE_TYPE (r) == float_type_node)
 pp_character (pp, 'f');

which gives:

33067.C:2:18: error: no match for ‘operator<’ in ‘1.1e+0 < t’

Given in particular that we now have column numbers, as a user of GCC I 
would certainly consider it a good improvement.


Just let me know...

Paolo.


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-10 Thread Richard Guenther
On Mon, Oct 10, 2011 at 2:29 PM, Kai Tietz  wrote:
> Recent patch had a thinko on rhs of inner lhs check for TRUTH-IF.  It
> has to be checked that the LHS code is same as outer CODE, as
> otherwise we wouldn't apply different TRUTH-IF only on inner RHS of
> LHS, which is of course wrong.
>
> Index: gcc/gcc/fold-const.c
> ===
> --- gcc.orig/gcc/fold-const.c
> +++ gcc/gcc/fold-const.c
> @@ -111,14 +111,13 @@ static tree decode_field_reference (loca
>                                    tree *, tree *);
>  static int all_ones_mask_p (const_tree, int);
>  static tree sign_bit_p (tree, const_tree);
> -static int simple_operand_p (const_tree);
> +static int simple_operand_p (tree);
>  static tree range_binop (enum tree_code, tree, tree, int, tree, int);
>  static tree range_predecessor (tree);
>  static tree range_successor (tree);
>  static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
>  static tree fold_cond_expr_with_comparison (location_t, tree, tree,
> tree, tree);
>  static tree unextend (tree, int, int, tree);
> -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree);
>  static tree optimize_minmax_comparison (location_t, enum tree_code,
>                                        tree, tree, tree);
>  static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
> @@ -3500,7 +3499,7 @@ optimize_bit_field_compare (location_t l
>   return lhs;
>  }
>
> -/* Subroutine for fold_truthop: decode a field reference.
> +/* Subroutine for fold_truth_andor_1: decode a field reference.
>
>    If EXP is a comparison reference, we return the innermost reference.
>
> @@ -3668,17 +3667,43 @@ sign_bit_p (tree exp, const_tree val)
>   return NULL_TREE;
>  }
>
> -/* Subroutine for fold_truthop: determine if an operand is simple enough
> +/* Subroutine for fold_truth_andor_1: determine if an operand is simple 
> enough
>    to be evaluated unconditionally.  */
>
>  static int
> -simple_operand_p (const_tree exp)
> +simple_operand_p (tree exp)
>  {
> +  enum tree_code code;
>   /* Strip any conversions that don't change the machine mode.  */
>   STRIP_NOPS (exp);
>
> +  code = TREE_CODE (exp);
> +
> +  /* Handle some trivials   */
> +  if (TREE_CODE_CLASS (code) == tcc_comparison)
> +    return (tree_could_trap_p (exp)
> +           && simple_operand_p (TREE_OPERAND (exp, 0))
> +           && simple_operand_p (TREE_OPERAND (exp, 1)));

And that's still wrong.

Stopped reading here.

Richard.

> +  if (FLOAT_TYPE_P (TREE_TYPE (exp))
> +      && tree_could_trap_p (exp))
> +    return false;
> +
> +  switch (code)
> +    {
> +    case SSA_NAME:
> +      return true;
> +    case TRUTH_NOT_EXPR:
> +      return simple_operand_p (TREE_OPERAND (exp, 0));
> +    case BIT_NOT_EXPR:
> +      if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE)
> +        return false;
> +      return simple_operand_p (TREE_OPERAND (exp, 0));
> +    default:
> +      break;
> +    }
> +
>   return (CONSTANT_CLASS_P (exp)
> -         || TREE_CODE (exp) == SSA_NAME
>          || (DECL_P (exp)
>              && ! TREE_ADDRESSABLE (exp)
>              && ! TREE_THIS_VOLATILE (exp)
> @@ -4888,7 +4913,7 @@ fold_range_test (location_t loc, enum tr
>   return 0;
>  }
>
> -/* Subroutine for fold_truthop: C is an INTEGER_CST interpreted as a P
> +/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P
>    bit value.  Arrange things so the extra bits will be set to zero if and
>    only if C is signed-extended to its full width.  If MASK is nonzero,
>    it is an INTEGER_CST that should be AND'ed with the extra bits.  */
> @@ -5025,8 +5050,8 @@ merge_truthop_with_opposite_arm (locatio
>    We return the simplified tree or 0 if no optimization is possible.  */
>
>  static tree
> -fold_truthop (location_t loc, enum tree_code code, tree truth_type,
> -             tree lhs, tree rhs)
> +fold_truth_andor_1 (location_t loc, enum tree_code code, tree truth_type,
> +                   tree lhs, tree rhs)
>  {
>   /* If this is the "or" of two comparisons, we can do something if
>      the comparisons are NE_EXPR.  If this is the "and", we can do something
> @@ -5054,8 +5079,6 @@ fold_truthop (location_t loc, enum tree_
>   tree lntype, rntype, result;
>   HOST_WIDE_INT first_bit, end_bit;
>   int volatilep;
> -  tree orig_lhs = lhs, orig_rhs = rhs;
> -  enum tree_code orig_code = code;
>
>   /* Start by getting the comparison codes.  Fail if anything is volatile.
>      If one operand is a BIT_AND_EXPR with the constant one, treat it as if
> @@ -5119,8 +5142,7 @@ fold_truthop (location_t loc, enum tree_
>   /* If the RHS can be evaluated unconditionally and its operands are
>      simple, it wins to evaluate the RHS unconditionally on machines
>      with expensive branches.  In this case, this isn't a comparison
> -     that can be merged.  Avoid doing this if the RHS is a floating-point
> -     comparison since those can trap.  */
> +     th

Re: Ping^3: PR middle-end/48660: Assigning to BLKmode RESULT_DECL

2011-10-10 Thread Richard Guenther
On Mon, Oct 10, 2011 at 2:35 PM, Richard Sandiford
 wrote:
> Ping for:
>
>    http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00558.html
>
> which fixes an expand-time bug in thunks that return BLKmode structures
> in registers.  Tested on x86_64-linux-gnu and arm-linux-gnueabi.

Ok with the changes suggested by Micha.

Richard.

> Thanks,
> Richard
>


Re: Avoid double mangling at WHOPR

2011-10-10 Thread Jan Hubicka
> On Sun, 9 Oct 2011, Jan Hubicka wrote:
> 
> > Hi,
> > whopr currently produce local_static.1234.43124 type symbols. This is 
> > because
> > everything gets mangled at WPA time and then again at ltrans time.  This 
> > simply
> > avoids the second mangling. This save some space & makes WHOPR/non_WHOPR 
> > symbol
> > tables comparable more directly.
> > 
> > Bootstrapped/regtested x86_64-linux, also tested with Mozilla LTO, OK?
> 
> Hmmm.  I'd really like to defer mangling to LTRANS at one point, as
> we will know if there will be conflicts of local symbols at that point
> (another point would be the partitioning code).  So this patch goes
> backward ... (ideally a first step would move it to symbol resolution
> time, another place where we can check for conflicts, and then only
> apply it a LTRANs stage).

Actually it seems to me that mangling at WPA makes more sense - you don't get
symbol name sensitive on partitioning decisions so things go a bit more
consistently. Partitioning depends on global properties of program so any code
that depends on particular partitioning decisions will have tendency to
randomly break and unbreak.

We can not really make any promises about our ability to not mangle particular
symbol (for use in asm code or whatever):  we need to mangle on the occasion of
conflict with another static symbol but also when we decide to promote it
hidden. We have no information about another hidden symbol in the non-LTO
world.  Either linker plugin API needs to be extended by providing us with list
of forbidden names or ability to have "hidden in LTO world" visibility or we
probably need to start using random seeds on all promoted symbols.
(in fact I already do sort of mangling to avoid conflict on comdats that has 
been
brought local and then again promoted global, I just did not noticed it is a 
general
problem back then when I first saw the linker complaining).

Honza


Re: Avoid double mangling at WHOPR

2011-10-10 Thread Richard Guenther
On Mon, 10 Oct 2011, Jan Hubicka wrote:

> > On Sun, 9 Oct 2011, Jan Hubicka wrote:
> > 
> > > Hi,
> > > whopr currently produce local_static.1234.43124 type symbols. This is 
> > > because
> > > everything gets mangled at WPA time and then again at ltrans time.  This 
> > > simply
> > > avoids the second mangling. This save some space & makes WHOPR/non_WHOPR 
> > > symbol
> > > tables comparable more directly.
> > > 
> > > Bootstrapped/regtested x86_64-linux, also tested with Mozilla LTO, OK?
> > 
> > Hmmm.  I'd really like to defer mangling to LTRANS at one point, as
> > we will know if there will be conflicts of local symbols at that point
> > (another point would be the partitioning code).  So this patch goes
> > backward ... (ideally a first step would move it to symbol resolution
> > time, another place where we can check for conflicts, and then only
> > apply it a LTRANs stage).
> 
> Actually it seems to me that mangling at WPA makes more sense - you don't get
> symbol name sensitive on partitioning decisions so things go a bit more
> consistently. Partitioning depends on global properties of program so any code
> that depends on particular partitioning decisions will have tendency to
> randomly break and unbreak.
> 
> We can not really make any promises about our ability to not mangle particular
> symbol (for use in asm code or whatever):  we need to mangle on the occasion 
> of
> conflict with another static symbol but also when we decide to promote it
> hidden. We have no information about another hidden symbol in the non-LTO
> world.  Either linker plugin API needs to be extended by providing us with 
> list
> of forbidden names or ability to have "hidden in LTO world" visibility or we
> probably need to start using random seeds on all promoted symbols.
> (in fact I already do sort of mangling to avoid conflict on comdats that has 
> been
> brought local and then again promoted global, I just did not noticed it is a 
> general
> problem back then when I first saw the linker complaining).

Ok, I see why it makes sense on WPA time.  But then why not mangle
during partitioning?  I think it still makes sense to avoid mangling local
decls, if not for debugging experience.

We do mangle late when we bring symbols local anyway, no?  I also
seem to remember we mangle at LTO time, too ...

Richard.


Re: New warning for expanded vector operations

2011-10-10 Thread Artem Shinkarov
On Mon, Oct 10, 2011 at 12:02 PM, Richard Guenther
 wrote:
> On Fri, Oct 7, 2011 at 9:44 AM, Artem Shinkarov
>  wrote:
>> On Fri, Oct 7, 2011 at 6:22 AM, Artem Shinkarov
>>  wrote:
>>> On Wed, Oct 5, 2011 at 12:35 PM, Richard Guenther
>>>  wrote:
 On Wed, Oct 5, 2011 at 1:28 PM, Artem Shinkarov
  wrote:
> On Wed, Oct 5, 2011 at 9:40 AM, Richard Guenther
>  wrote:
>> On Wed, Oct 5, 2011 at 12:18 AM, Artem Shinkarov
>>  wrote:
>>> Hi
>>>
>>> Here is a patch to inform a programmer about the expanded vector 
>>> operation.
>>> Bootstrapped on x86-unknown-linux-gnu.
>>>
>>> ChangeLog:
>>>
>>>        * gcc/tree-vect-generic.c (expand_vector_piecewise): Adjust to
>>>          produce the warning.
>>>          (expand_vector_parallel): Adjust to produce the warning.
>>
>> Entries start without gcc/, they are relative to the gcc/ChangeLog file.
>
> Sure, sorry.
>
>>>          (lower_vec_shuffle): Adjust to produce the warning.
>>>        * gcc/common.opt: New warning Wvector-operation-expanded.
>>>        * gcc/doc/invoke.texi: Document the wawning.
>>>
>>>
>>> Ok?
>>
>> I don't like the name -Wvector-operation-expanded.  We emit a
>> similar warning for missed inline expansions with -Winline, so
>> maybe -Wvector-extensions (that's the name that appears
>> in the C extension documentation).
>
> Hm, I don't care much about the name, unless it gets clear what the
> warning is used for.  I am not really sure that Wvector-extensions
> makes it clear.  Also, I don't see anything bad if the warning will
> pop up during the vectorisation. Any vector operation performed
> outside the SIMD accelerator looks suspicious, because it actually
> doesn't improve performance.  Such a warning during the vectorisation
> could mean that a programmer forgot some flag, or the constant
> propagation failed to deliver a constant, or something else.
>
> Conceptually the text I am producing is not really a warning, it is
> more like an information, but I am not aware of the mechanisms that
> would allow me to introduce a flag triggering inform () or something
> similar.
>
> What I think we really need to avoid is including this warning in the
> standard Ox.
>
>> +  location_t loc = gimple_location (gsi_stmt (*gsi));
>> +
>> +  warning_at (loc, OPT_Wvector_operation_expanded,
>> +             "vector operation will be expanded piecewise");
>>
>>   v = VEC_alloc(constructor_elt, gc, (nunits + delta - 1) / delta);
>>   for (i = 0; i < nunits;
>> @@ -260,6 +264,10 @@ expand_vector_parallel (gimple_stmt_iter
>>   tree result, compute_type;
>>   enum machine_mode mode;
>>   int n_words = tree_low_cst (TYPE_SIZE_UNIT (type), 1) / UNITS_PER_WORD;
>> +  location_t loc = gimple_location (gsi_stmt (*gsi));
>> +
>> +  warning_at (loc, OPT_Wvector_operation_expanded,
>> +             "vector operation will be expanded in parallel");
>>
>> what's the difference between 'piecewise' and 'in parallel'?
>
> Parallel is a little bit better for performance than piecewise.

 I see.  That difference should probably be documented, maybe with
 an example.

 Richard.

>> @@ -301,16 +309,15 @@ expand_vector_addition (gimple_stmt_iter
>>  {
>>   int parts_per_word = UNITS_PER_WORD
>>                       / tree_low_cst (TYPE_SIZE_UNIT (TREE_TYPE (type)), 
>> 1);
>> +  location_t loc = gimple_location (gsi_stmt (*gsi));
>>
>>   if (INTEGRAL_TYPE_P (TREE_TYPE (type))
>>       && parts_per_word >= 4
>>       && TYPE_VECTOR_SUBPARTS (type) >= 4)
>> -    return expand_vector_parallel (gsi, f_parallel,
>> -                                  type, a, b, code);
>> +    return expand_vector_parallel (gsi, f_parallel, type, a, b, code);
>>   else
>> -    return expand_vector_piecewise (gsi, f,
>> -                                   type, TREE_TYPE (type),
>> -                                   a, b, code);
>> +    return expand_vector_piecewise (gsi, f, type,
>> +                                   TREE_TYPE (type), a, b, code);
>>  }
>>
>>  /* Check if vector VEC consists of all the equal elements and
>>
>> unless i miss something loc is unused here.  Please avoid random
>> whitespace changes (just review your patch yourself before posting
>> and revert pieces that do nothing).
>
> Yes you are right, sorry.
>
>> +@item -Wvector-operation-expanded
>> +@opindex Wvector-operation-expanded
>> +@opindex Wno-vector-operation-expanded
>> +Warn if vector operation is not implemented via SIMD capabilities of the
>> +architecture. Mainly useful for the performance tuning.
>>
>> I'd mention that this is for vector operations as of the C extension

Re: Avoid double mangling at WHOPR

2011-10-10 Thread Jan Hubicka
> > Actually it seems to me that mangling at WPA makes more sense - you don't 
> > get
> > symbol name sensitive on partitioning decisions so things go a bit more
> > consistently. Partitioning depends on global properties of program so any 
> > code
> > that depends on particular partitioning decisions will have tendency to
> > randomly break and unbreak.
> > 
> > We can not really make any promises about our ability to not mangle 
> > particular
> > symbol (for use in asm code or whatever):  we need to mangle on the 
> > occasion of
> > conflict with another static symbol but also when we decide to promote it
> > hidden. We have no information about another hidden symbol in the non-LTO
> > world.  Either linker plugin API needs to be extended by providing us with 
> > list
> > of forbidden names or ability to have "hidden in LTO world" visibility or we
> > probably need to start using random seeds on all promoted symbols.
> > (in fact I already do sort of mangling to avoid conflict on comdats that 
> > has been
> > brought local and then again promoted global, I just did not noticed it is 
> > a general
> > problem back then when I first saw the linker complaining).
> 
> Ok, I see why it makes sense on WPA time.  But then why not mangle
> during partitioning?  I think it still makes sense to avoid mangling local
> decls, if not for debugging experience.

Yeah, we could do that. Debugging experience is quite good reason (though it 
will
also make bogus asm statements magically work and break on random basis. In a 
way
just breaking them seems more sensible behaviour to me ;) ).
> 
> We do mangle late when we bring symbols local anyway, no?  I also
> seem to remember we mangle at LTO time, too ...

Hmm, I think we mangle at stream in since we do make hashtables based on symbol 
names
that are supposed to be unique, but perhaps I am wrong.
LTO time you mean at compilation time when streaming out? I am not aware of 
that.

Honza
> 
> Richard.


[Patch,AVR]: Reuse __tablejump2__

2011-10-10 Thread Georg-Johann Lay
This patch does better re-usage of __tablejump2__ from libgcc so that
switch/case statements with jump tables become smaller because the sequence
needs 6 to 9 words compared to 2 when JMPing to __tablejump2__.

Moreover, __tablejump2__ does no more rely in the content of EIND (by using
EIJMP) and uses RET instead.

The insn conditions of tablejump insns now are the same as in
avr_output_addr_vec_elt.

Ok for trunk?

Johann

* config/avr/avr.md (*tablejump_rjmp): Change insn condition to
!AVR_HAVE_JMP_CALL.
(*tablejump_lib): Change insn condition to AVR_HAVE_JMP_CALL.
(*tablejump_enh, *tablejump): Remove insns.
* config/avr/libgcc.S (__tablejump__): Use RET instead of EIND +
EIJM for indirect jump.  Use LPM Z+ where available.
Index: config/avr/libgcc.S
===
--- config/avr/libgcc.S	(revision 179744)
+++ config/avr/libgcc.S	(working copy)
@@ -821,13 +821,17 @@ ENDF __tablejump2__
 
 DEFUN __tablejump__
 #if defined (__AVR_HAVE_LPMX__)
+#if defined (__AVR_HAVE_EIJMP_EICALL__)
+	lpm  __tmp_reg__, Z+		
+	push __tmp_reg__
+	lpm  __tmp_reg__, Z		
+	push __tmp_reg__
+	push __zero_reg__
+	ret
+#else
 	lpm	__tmp_reg__, Z+
 	lpm	r31, Z
 	mov	r30, __tmp_reg__
-
-#if defined (__AVR_HAVE_EIJMP_EICALL__)
-	eijmp
-#else
 	ijmp
 #endif
 
Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 179744)
+++ config/avr/avr.md	(working copy)
@@ -3719,62 +3719,36 @@ (define_insn "*indirect_jump_avr6"
(set_attr "cc" "none")])
 
 ;; table jump
+;; For entries in jump table see avr_output_addr_vec_elt.
 
-;; Table made from "rjmp" instructions for <=8K devices.
+;; Table made from "rjmp .L" instructions for <= 8K devices.
 (define_insn "*tablejump_rjmp"
-  [(set (pc) (unspec:HI [(match_operand:HI 0 "register_operand" "!z,*r")]
-			UNSPEC_INDEX_JMP))
+  [(set (pc)
+(unspec:HI [(match_operand:HI 0 "register_operand" "!z,*r")]
+   UNSPEC_INDEX_JMP))
(use (label_ref (match_operand 1 "" "")))
(clobber (match_dup 0))]
-  "(!AVR_HAVE_JMP_CALL) && (!AVR_HAVE_EIJMP_EICALL)"
+  "!AVR_HAVE_JMP_CALL"
   "@
 	ijmp
 	push %A0\;push %B0\;ret"
   [(set_attr "length" "1,3")
(set_attr "cc" "none,none")])
 
-;; Not a prologue, but similar idea - move the common piece of code to libgcc.
+;; Move the common piece of code to libgcc.
+;; Table made from ".word gs(.L)" addresses for > 8K devices.
+;; Read jump address from table and perform indirect jump.
 (define_insn "*tablejump_lib"
-  [(set (pc) (unspec:HI [(match_operand:HI 0 "register_operand" "z")]
-			UNSPEC_INDEX_JMP))
+  [(set (pc)
+(unspec:HI [(match_operand:HI 0 "register_operand" "z")]
+   UNSPEC_INDEX_JMP))
(use (label_ref (match_operand 1 "" "")))
(clobber (match_dup 0))]
-  "AVR_HAVE_JMP_CALL && TARGET_CALL_PROLOGUES"
-  "%~jmp __tablejump2__"
+  "AVR_HAVE_JMP_CALL"
+  "jmp __tablejump2__"
   [(set_attr "length" "2")
(set_attr "cc" "clobber")])
 
-(define_insn "*tablejump_enh"
-  [(set (pc) (unspec:HI [(match_operand:HI 0 "register_operand" "z")]
-			UNSPEC_INDEX_JMP))
-   (use (label_ref (match_operand 1 "" "")))
-   (clobber (match_dup 0))]
-  "AVR_HAVE_JMP_CALL && AVR_HAVE_LPMX"
-  "lsl r30
-	rol r31
-	lpm __tmp_reg__,Z+
-	lpm r31,Z
-	mov r30,__tmp_reg__
-	%!ijmp"
-  [(set_attr "length" "6")
-   (set_attr "cc" "clobber")])
-
-(define_insn "*tablejump"
-  [(set (pc) (unspec:HI [(match_operand:HI 0 "register_operand" "z")]
-			UNSPEC_INDEX_JMP))
-   (use (label_ref (match_operand 1 "" "")))
-   (clobber (match_dup 0))]
-  "AVR_HAVE_JMP_CALL && !AVR_HAVE_EIJMP_EICALL"
-  "lsl r30
-	rol r31
-	lpm
-	inc r30
-	push r0
-	lpm
-	push r0
-	ret"
-  [(set_attr "length" "8")
-   (set_attr "cc" "clobber")])
 
 (define_expand "casesi"
   [(set (match_dup 6)


Re: Avoid double mangling at WHOPR

2011-10-10 Thread Richard Guenther
On Mon, 10 Oct 2011, Jan Hubicka wrote:

> > > Actually it seems to me that mangling at WPA makes more sense - you don't 
> > > get
> > > symbol name sensitive on partitioning decisions so things go a bit more
> > > consistently. Partitioning depends on global properties of program so any 
> > > code
> > > that depends on particular partitioning decisions will have tendency to
> > > randomly break and unbreak.
> > > 
> > > We can not really make any promises about our ability to not mangle 
> > > particular
> > > symbol (for use in asm code or whatever):  we need to mangle on the 
> > > occasion of
> > > conflict with another static symbol but also when we decide to promote it
> > > hidden. We have no information about another hidden symbol in the non-LTO
> > > world.  Either linker plugin API needs to be extended by providing us 
> > > with list
> > > of forbidden names or ability to have "hidden in LTO world" visibility or 
> > > we
> > > probably need to start using random seeds on all promoted symbols.
> > > (in fact I already do sort of mangling to avoid conflict on comdats that 
> > > has been
> > > brought local and then again promoted global, I just did not noticed it 
> > > is a general
> > > problem back then when I first saw the linker complaining).
> > 
> > Ok, I see why it makes sense on WPA time.  But then why not mangle
> > during partitioning?  I think it still makes sense to avoid mangling local
> > decls, if not for debugging experience.
> 
> Yeah, we could do that. Debugging experience is quite good reason (though it 
> will
> also make bogus asm statements magically work and break on random basis. In a 
> way
> just breaking them seems more sensible behaviour to me ;) ).

;)

> > We do mangle late when we bring symbols local anyway, no?  I also
> > seem to remember we mangle at LTO time, too ...
> 
> Hmm, I think we mangle at stream in since we do make hashtables based on 
> symbol names
> that are supposed to be unique, but perhaps I am wrong.

I think we do not hash local symbols, so that shouldn't be an issue.

> LTO time you mean at compilation time when streaming out? I am not aware of 
> that.

Maybe that changed then or I misremember.

Can you try the "obvious" and simply mangle all local statics at
partitioning time?  (leaving the non-conflict case for a further
improvement)

Richard.


Re: Avoid double mangling at WHOPR

2011-10-10 Thread Jan Hubicka
> Can you try the "obvious" and simply mangle all local statics at
> partitioning time?  (leaving the non-conflict case for a further
> improvement)

Hmm, it needs to be done with non-WHOPR too, but sure I can give it a try.
I should however finally push out the weakref bits.  (here one can produce
weakref of static that is bit nonsential but possible and that breaks at
renaming time).  Will try to look into that today.

Will lto-symtab be happy about this?

Honza
> 
> Richard.


Re: patch ping: thread testing infrastructure

2011-10-10 Thread Aldy Hernandez

On 10/07/11 17:50, Mike Stump wrote:

On Oct 7, 2011, at 2:52 PM, Aldy Hernandez wrote:

First, thanks so much for tackling this review.  I don't think anyone's overly 
enthusiastic about reviewing dejagnu crap^H^H^H^Hcode.


I'll review it.  :-)  The last version looks fine to me, watch out for failures.


How do you like this approach?


Ok.


Thank you.  Tested on x86-64 Linux and committed.

I will now adjust the cxx-memory-model branch accordingly and merge 
these changes into the branch.


Aldy


Re: [PATCH] Mark static const strings as read-only.

2011-10-10 Thread Eric Botcazou
> The patch makes sure that the src_mem computed here in
> expand_builtin_memcpy is marked as MEM_READONLY_P:
> ...
>   src_mem = get_memory_rtx (src, len);
> ...
>
> build and reg-tested on i686, arm, and mips.
>
> OK for trunk?

Do you get the same result by calling gen_const_mem from build_constant_desc 
instead of gen_rtx_MEM?  If so, this would be better I think.

-- 
Eric Botcazou


Re: [PATCH 3/5] On a Linux kernel ask explicitely for a huge page in ggc

2011-10-10 Thread Andi Kleen
On Mon, Oct 10, 2011 at 12:15:14PM +0200, Richard Guenther wrote:
> On Sun, Oct 9, 2011 at 9:55 PM, Andi Kleen  wrote:
> > From: Andi Kleen 
> >
> > Benchmarks show slightly faster build times on a kernel
> > build, near the measurement error unfortunately.
> >
> > This will only work with a recent glibc that defines MADV_HUGEPAGE.
> 
> Will partial unmaps fail or split the page?

They split the page.

-Andi


Re: [PATCH 2/5] Increase the GGC quite size to 2MB

2011-10-10 Thread Jan Hubicka
> From: Andi Kleen 
> 
> Using 2MB allows modern kernels to use 2MB huge pages on x86.
> 
> gcc/:
> 
> 2011-10-08   Andi Kleen 
> 
>   * ggc-page.c (GGC_QUIRE_SIZE): Increase to 512
> ---
>  gcc/ggc-page.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
> index b0b3b3f..1f52b56 100644
> --- a/gcc/ggc-page.c
> +++ b/gcc/ggc-page.c
> @@ -469,7 +469,7 @@ static struct globals
> can override this by defining GGC_QUIRE_SIZE explicitly.  */
>  #ifndef GGC_QUIRE_SIZE
>  # ifdef USING_MMAP
> -#  define GGC_QUIRE_SIZE 256
> +#  define GGC_QUIRE_SIZE 512 /* 2MB for 4K pages */
Perhaps comment that 2MB was chosen to help Kernel huge pages logic?

Honza
>  # else
>  #  define GGC_QUIRE_SIZE 16
>  # endif
> -- 
> 1.7.5.4


Re: [PATCH 3/5] On a Linux kernel ask explicitely for a huge page in ggc

2011-10-10 Thread Andi Kleen
On Mon, Oct 10, 2011 at 12:29:03PM +0200, Jakub Jelinek wrote:
> On Mon, Oct 10, 2011 at 12:15:14PM +0200, Richard Guenther wrote:
> > On Sun, Oct 9, 2011 at 9:55 PM, Andi Kleen  wrote:
> > > From: Andi Kleen 
> > >
> > > Benchmarks show slightly faster build times on a kernel
> > > build, near the measurement error unfortunately.
> > >
> > > This will only work with a recent glibc that defines MADV_HUGEPAGE.
> > 
> > Will partial unmaps fail or split the page?
> 
> I think it will not do anything, because with G.pagesize * GGC_QUIRE_SIZE

On large builds (LTO) I see the vmstat THP counter increasing, 
it doesn't do too much on small builds.

> being just 2MB (and most likely not 2MB aligned either) it would do
> something only if it happens to be 2MB aligned or the following VMA
> is also MADV_HUGEPAGE hinted and no pages in the 2MB region have been
> paged in yet.
> So, either we need to ensure that the address is likely 2MB aligned,
> or use on x86_64 even bigger GGC_QUIRE_SIZE (such as 16MB or 32MB) and
> then huge pages would be likely used at least for the aligned inner
> huge pages in the region.

Hmm, that gets too complicated for mee. I'll drop the patch for now.
It's not strictly needed to fix the fragmentation problem
and it's also possible to force THP from the kernel.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.


Re: [PATCH 4/5] Add a freeing threshold for the garbage collector.

2011-10-10 Thread Andi Kleen
On Mon, Oct 10, 2011 at 12:20:53PM +0200, Richard Guenther wrote:
> On Sun, Oct 9, 2011 at 9:55 PM, Andi Kleen  wrote:
> > From: Andi Kleen 
> >
> > Add a threshold to avoid freeing pages back too early to the OS.
> > This avoid virtual memory map fragmentation.
> >
> > Based on a idea from Honza
> 
> Less than 20% looks high.  Shouldn't ggc-free-min be enough to
> avoid fragmentation?

It depends on the working set. If there's more to garbage collect
than max(ggc-free-min, threshold*total) a host without MADV_DONTNEED
will get holes. And ggc-free-min isn't very much on a large 
build.

So it seems safer to me to have a threshold which adjusts for large
working sets. What value do you prefer instead of 20%? (or just 0)

> 
> This will hide gc bugs with always-collect (ggc-checking), so
> the parameter(s) need to be adjusted for that case to always
> give pages back.  The current values should probably be printed
> where the two existing ones are printed as well (with -v).

Will fix.
-Andi


Re: [PATCH 1/5] Use MADV_DONTNEED for freeing in garbage collector

2011-10-10 Thread Andi Kleen
> Shouldn't we prefer still "mapped" pages when allocating?  Thus, keep
> the freepages list "sorted"?

Possibly. I can look at it in a followup if you want. 
I would prefer to not complicate this patch too much.

> 
> With the new params to call release_pages less, how does this
> interact with using MADV_DONTNEED?  The only reason to delay
> MADV_DONTNEED is to avoid splitting huge-pages?  Which would
> mean that we should rather be better at controlling where we allocate
> from from the free-list?

I first had a patch that tried to cluster inside the freelist
with multiple passes (and only free aligned quire clusters first), but it 
ran into various problems, so I chose this simpler approach.

With MADV_DONTNEED the param is not really needed I think,
I mainly added the param for the benefit of hosts that don't 
have MADV_DONTNEED to let them not suffer from fragmentation too much.
It would be possible to set the thresholds all to 0 if MADV_DONTNEED
is available.

-Andi


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-10 Thread Kai Tietz
2011/10/10 Richard Guenther :
> On Mon, Oct 10, 2011 at 2:29 PM, Kai Tietz  wrote:
>> Recent patch had a thinko on rhs of inner lhs check for TRUTH-IF.  It
>> has to be checked that the LHS code is same as outer CODE, as
>> otherwise we wouldn't apply different TRUTH-IF only on inner RHS of
>> LHS, which is of course wrong.
>>
>> Index: gcc/gcc/fold-const.c
>> ===
>> --- gcc.orig/gcc/fold-const.c
>> +++ gcc/gcc/fold-const.c
>> @@ -111,14 +111,13 @@ static tree decode_field_reference (loca
>>                                    tree *, tree *);
>>  static int all_ones_mask_p (const_tree, int);
>>  static tree sign_bit_p (tree, const_tree);
>> -static int simple_operand_p (const_tree);
>> +static int simple_operand_p (tree);
>>  static tree range_binop (enum tree_code, tree, tree, int, tree, int);
>>  static tree range_predecessor (tree);
>>  static tree range_successor (tree);
>>  static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
>>  static tree fold_cond_expr_with_comparison (location_t, tree, tree,
>> tree, tree);
>>  static tree unextend (tree, int, int, tree);
>> -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree);
>>  static tree optimize_minmax_comparison (location_t, enum tree_code,
>>                                        tree, tree, tree);
>>  static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
>> @@ -3500,7 +3499,7 @@ optimize_bit_field_compare (location_t l
>>   return lhs;
>>  }
>>
>> -/* Subroutine for fold_truthop: decode a field reference.
>> +/* Subroutine for fold_truth_andor_1: decode a field reference.
>>
>>    If EXP is a comparison reference, we return the innermost reference.
>>
>> @@ -3668,17 +3667,43 @@ sign_bit_p (tree exp, const_tree val)
>>   return NULL_TREE;
>>  }
>>
>> -/* Subroutine for fold_truthop: determine if an operand is simple enough
>> +/* Subroutine for fold_truth_andor_1: determine if an operand is simple 
>> enough
>>    to be evaluated unconditionally.  */
>>
>>  static int
>> -simple_operand_p (const_tree exp)
>> +simple_operand_p (tree exp)
>>  {
>> +  enum tree_code code;
>>   /* Strip any conversions that don't change the machine mode.  */
>>   STRIP_NOPS (exp);
>>
>> +  code = TREE_CODE (exp);
>> +
>> +  /* Handle some trivials   */
>> +  if (TREE_CODE_CLASS (code) == tcc_comparison)
>> +    return (tree_could_trap_p (exp)
>> +           && simple_operand_p (TREE_OPERAND (exp, 0))
>> +           && simple_operand_p (TREE_OPERAND (exp, 1)));
>
> And that's still wrong.
>
> Stopped reading here.
>
> Richard.

Oh, there is a not missing.  I didn't spot that, sorry.

To the point why we need to handle comparisons within simple_operand_p.

If we reject comparisons and logical not here, we won't have any
branching optimization anymore, as this the patch moves into
fold_truthandor.

The result with rejecting in simple_operand_p compares and logic-not
provides for the following example:

extern int doo (void);
extern int doo2 (void);

int foo (int a, int b, int c, int d, int e, int f)
{
  if (a && b && c && d && e && f)
return doo ();
  return doo2 ();
}

we get the following gimple dump (-fdump-tree-gimple):

foo (int a, int b, int c, int d, int e, int f)
{
  int D.2752;

  if (a != 0) goto ; else goto ;
  :
  if (b != 0) goto ; else goto ;
  :
  if (c != 0) goto ; else goto ;
  :
  if (d != 0) goto ; else goto ;
  :
  if (e != 0) goto ; else goto ;
  :
  if (f != 0) goto ; else goto ;
  :
  D.2752 = doo ();
  return D.2752;
  :
  :
  :
  :
  :
  :
  D.2752 = doo2 ();
  return D.2752;
}

So this result is caused by the fact that a logical-and/or is always a
comparison.  As we are rejecting comparisons/logical-not,  we end up
with a sequence of TRUTH_(AND|OR)_IFs.  So the hole loop in
fold_truth_andor for trying to pack simple and side-effect-less
operands in tupels of two is useless.

For the new patch (attached with proper ! fix for compares) we get for
the same sample gimple output:

foo (int a, int b, int c, int d, int e, int f)
{
  _Bool D.2740;
  _Bool D.2741;
  _Bool D.2742;
  _Bool D.2745;
  _Bool D.2746;
  _Bool D.2747;
  _Bool D.2750;
  _Bool D.2751;
  _Bool D.2752;
  int D.2755;

  D.2740 = a != 0;
  D.2741 = b != 0;
  D.2742 = D.2740 & D.2741;
  if (D.2742 != 0) goto ; else goto ;
  :
  D.2745 = c != 0;
  D.2746 = d != 0;
  D.2747 = D.2745 & D.2746;
  if (D.2747 != 0) goto ; else goto ;
  :
  D.2750 = e != 0;
  D.2751 = f != 0;
  D.2752 = D.2750 & D.2751;
  if (D.2752 != 0) goto ; else goto ;
  :
  D.2755 = doo ();
  return D.2755;
  :
  :
  :
  D.2755 = doo2 ();
  return D.2755;
}

which is the proper output for high branching cost, as it tries to
avoid branches and not to tend them.

--
Kai

Index: gcc/gcc/fold-const.c
===
--- gcc.orig/gcc/fold-const.c
+++ gcc/gcc/fold-const.c
@@ -111,14 +111,13 @@ static tree decode_field_reference (loca
 

[PATCH] Fix PR50195

2011-10-10 Thread Richard Guenther

Canonicalize x*x to pow(x,2) only when optimizing which is when
we do optimized expansion.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2011-10-10  Richard Guenther  

PR middle-end/50195
* fold-const.c (fold_binary_loc): Canonicalize x*x to pow (x, 2)
only when optimizing.

* gcc.dg/builtins-47.c: Optimize.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 179738)
+++ gcc/fold-const.c(working copy)
@@ -10693,9 +10693,9 @@ fold_binary_loc (location_t loc,
}
}
 
- /* Optimize x*x as pow(x,2.0), which is expanded as x*x.  */
+ /* Canonicalize x*x as pow(x,2.0), which is expanded as x*x.  */
  if (!in_gimple_form
- && optimize_function_for_speed_p (cfun)
+ && optimize
  && operand_equal_p (arg0, arg1, 0))
{
  tree powfn = mathfn_built_in (type, BUILT_IN_POW);
Index: gcc/testsuite/gcc.dg/builtins-47.c
===
--- gcc/testsuite/gcc.dg/builtins-47.c  (revision 179738)
+++ gcc/testsuite/gcc.dg/builtins-47.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do run } */
-/* { dg-options "-ffast-math -fdump-tree-gimple" } */
+/* { dg-options "-O -ffast-math -fdump-tree-gimple" } */
 
 extern double sqrt (double);
 extern double pow (double, double);


Re: [PATCH 4/5] Add a freeing threshold for the garbage collector.

2011-10-10 Thread Richard Guenther
On Mon, Oct 10, 2011 at 3:58 PM, Andi Kleen  wrote:
> On Mon, Oct 10, 2011 at 12:20:53PM +0200, Richard Guenther wrote:
>> On Sun, Oct 9, 2011 at 9:55 PM, Andi Kleen  wrote:
>> > From: Andi Kleen 
>> >
>> > Add a threshold to avoid freeing pages back too early to the OS.
>> > This avoid virtual memory map fragmentation.
>> >
>> > Based on a idea from Honza
>>
>> Less than 20% looks high.  Shouldn't ggc-free-min be enough to
>> avoid fragmentation?
>
> It depends on the working set. If there's more to garbage collect
> than max(ggc-free-min, threshold*total) a host without MADV_DONTNEED
> will get holes. And ggc-free-min isn't very much on a large
> build.
>
> So it seems safer to me to have a threshold which adjusts for large
> working sets. What value do you prefer instead of 20%? (or just 0)

I'm not sure honestly - 10% maybe?  I realize it's quite arbitrary ...

>>
>> This will hide gc bugs with always-collect (ggc-checking), so
>> the parameter(s) need to be adjusted for that case to always
>> give pages back.  The current values should probably be printed
>> where the two existing ones are printed as well (with -v).
>
> Will fix.
> -Andi
>


Re: [PATCH 1/5] Use MADV_DONTNEED for freeing in garbage collector

2011-10-10 Thread Richard Guenther
On Mon, Oct 10, 2011 at 4:04 PM, Andi Kleen  wrote:
>> Shouldn't we prefer still "mapped" pages when allocating?  Thus, keep
>> the freepages list "sorted"?
>
> Possibly. I can look at it in a followup if you want.
> I would prefer to not complicate this patch too much.
>
>>
>> With the new params to call release_pages less, how does this
>> interact with using MADV_DONTNEED?  The only reason to delay
>> MADV_DONTNEED is to avoid splitting huge-pages?  Which would
>> mean that we should rather be better at controlling where we allocate
>> from from the free-list?
>
> I first had a patch that tried to cluster inside the freelist
> with multiple passes (and only free aligned quire clusters first), but it
> ran into various problems, so I chose this simpler approach.
>
> With MADV_DONTNEED the param is not really needed I think,
> I mainly added the param for the benefit of hosts that don't
> have MADV_DONTNEED to let them not suffer from fragmentation too much.
> It would be possible to set the thresholds all to 0 if MADV_DONTNEED
> is available.

So can we move the param patch back as a possible followup?

Thanks,
Richard.

> -Andi
>


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-10 Thread Richard Guenther
On Mon, Oct 10, 2011 at 4:06 PM, Kai Tietz  wrote:
> 2011/10/10 Richard Guenther :
>> On Mon, Oct 10, 2011 at 2:29 PM, Kai Tietz  wrote:
>>> Recent patch had a thinko on rhs of inner lhs check for TRUTH-IF.  It
>>> has to be checked that the LHS code is same as outer CODE, as
>>> otherwise we wouldn't apply different TRUTH-IF only on inner RHS of
>>> LHS, which is of course wrong.
>>>
>>> Index: gcc/gcc/fold-const.c
>>> ===
>>> --- gcc.orig/gcc/fold-const.c
>>> +++ gcc/gcc/fold-const.c
>>> @@ -111,14 +111,13 @@ static tree decode_field_reference (loca
>>>                                    tree *, tree *);
>>>  static int all_ones_mask_p (const_tree, int);
>>>  static tree sign_bit_p (tree, const_tree);
>>> -static int simple_operand_p (const_tree);
>>> +static int simple_operand_p (tree);
>>>  static tree range_binop (enum tree_code, tree, tree, int, tree, int);
>>>  static tree range_predecessor (tree);
>>>  static tree range_successor (tree);
>>>  static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
>>>  static tree fold_cond_expr_with_comparison (location_t, tree, tree,
>>> tree, tree);
>>>  static tree unextend (tree, int, int, tree);
>>> -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree);
>>>  static tree optimize_minmax_comparison (location_t, enum tree_code,
>>>                                        tree, tree, tree);
>>>  static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
>>> @@ -3500,7 +3499,7 @@ optimize_bit_field_compare (location_t l
>>>   return lhs;
>>>  }
>>>
>>> -/* Subroutine for fold_truthop: decode a field reference.
>>> +/* Subroutine for fold_truth_andor_1: decode a field reference.
>>>
>>>    If EXP is a comparison reference, we return the innermost reference.
>>>
>>> @@ -3668,17 +3667,43 @@ sign_bit_p (tree exp, const_tree val)
>>>   return NULL_TREE;
>>>  }
>>>
>>> -/* Subroutine for fold_truthop: determine if an operand is simple enough
>>> +/* Subroutine for fold_truth_andor_1: determine if an operand is simple 
>>> enough
>>>    to be evaluated unconditionally.  */
>>>
>>>  static int
>>> -simple_operand_p (const_tree exp)
>>> +simple_operand_p (tree exp)
>>>  {
>>> +  enum tree_code code;
>>>   /* Strip any conversions that don't change the machine mode.  */
>>>   STRIP_NOPS (exp);
>>>
>>> +  code = TREE_CODE (exp);
>>> +
>>> +  /* Handle some trivials   */
>>> +  if (TREE_CODE_CLASS (code) == tcc_comparison)
>>> +    return (tree_could_trap_p (exp)
>>> +           && simple_operand_p (TREE_OPERAND (exp, 0))
>>> +           && simple_operand_p (TREE_OPERAND (exp, 1)));
>>
>> And that's still wrong.
>>
>> Stopped reading here.
>>
>> Richard.
>
> Oh, there is a not missing.  I didn't spot that, sorry.
>
> To the point why we need to handle comparisons within simple_operand_p.
>
> If we reject comparisons and logical not here, we won't have any
> branching optimization anymore, as this the patch moves into
> fold_truthandor.
>
> The result with rejecting in simple_operand_p compares and logic-not
> provides for the following example:

But you change what simple_operand_p accepts and thus change what
fold_truthop accepts as operands to its simplifications.

Richard.

> extern int doo (void);
> extern int doo2 (void);
>
> int foo (int a, int b, int c, int d, int e, int f)
> {
>  if (a && b && c && d && e && f)
>    return doo ();
>  return doo2 ();
> }
>
> we get the following gimple dump (-fdump-tree-gimple):
>
> foo (int a, int b, int c, int d, int e, int f)
> {
>  int D.2752;
>
>  if (a != 0) goto ; else goto ;
>  :
>  if (b != 0) goto ; else goto ;
>  :
>  if (c != 0) goto ; else goto ;
>  :
>  if (d != 0) goto ; else goto ;
>  :
>  if (e != 0) goto ; else goto ;
>  :
>  if (f != 0) goto ; else goto ;
>  :
>  D.2752 = doo ();
>  return D.2752;
>  :
>  :
>  :
>  :
>  :
>  :
>  D.2752 = doo2 ();
>  return D.2752;
> }
>
> So this result is caused by the fact that a logical-and/or is always a
> comparison.  As we are rejecting comparisons/logical-not,  we end up
> with a sequence of TRUTH_(AND|OR)_IFs.  So the hole loop in
> fold_truth_andor for trying to pack simple and side-effect-less
> operands in tupels of two is useless.
>
> For the new patch (attached with proper ! fix for compares) we get for
> the same sample gimple output:
>
> foo (int a, int b, int c, int d, int e, int f)
> {
>  _Bool D.2740;
>  _Bool D.2741;
>  _Bool D.2742;
>  _Bool D.2745;
>  _Bool D.2746;
>  _Bool D.2747;
>  _Bool D.2750;
>  _Bool D.2751;
>  _Bool D.2752;
>  int D.2755;
>
>  D.2740 = a != 0;
>  D.2741 = b != 0;
>  D.2742 = D.2740 & D.2741;
>  if (D.2742 != 0) goto ; else goto ;
>  :
>  D.2745 = c != 0;
>  D.2746 = d != 0;
>  D.2747 = D.2745 & D.2746;
>  if (D.2747 != 0) goto ; else goto ;
>  :
>  D.2750 = e != 0;
>  D.2751 = f != 0;
>  D.2752 = D.2750 & D.2751;
>  if (D.2752 != 0) goto ; else goto ;
>  :
>  D.2755 = doo ();
>  return D.2755;
>  :
>  :
>  :
>  D.2755 = doo2 

Re: [PATCH] Don't assume that constants can clobber vtbl

2011-10-10 Thread Martin Jambor
Hi,

sorry that taking care of the devirtualization patches takes me longer
than expected for various reasons.  But I have not forgotten about
this.

On Sat, Oct 01, 2011 at 01:58:57PM +1300, Maxim Kuvyrkov wrote:
> This patch makes detect_type_change analysis assume that only ADDR_EXPRs can 
> be assigned to vtable entries.
> 
> Initially, the patch made a less strict assumption that constants
> are not assigned to vtables.  I then bumped the assumption to "only
> ADDR_EXPRs can be assigned to vtables".  I have this patch since GCC
> 4.6 and did not came across a testcase that would invalidate either
> of the assumptions.
> 

> diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
> index 3e56e70..491bb11 100644
> --- a/gcc/ipa-prop.c
> +++ b/gcc/ipa-prop.c
> @@ -320,6 +320,7 @@ stmt_may_be_vtbl_ptr_store (gimple stmt)
>else if (is_gimple_assign (stmt))
>  {
>tree lhs = gimple_assign_lhs (stmt);
> +  tree rhs;
> 
>if (!AGGREGATE_TYPE_P (TREE_TYPE (lhs)))
> {
> @@ -334,6 +335,13 @@ stmt_may_be_vtbl_ptr_store (gimple stmt)
>  if there is a field corresponding to the offset and if
> so, procee> d
>  almost like if it was a component ref.  */
> }

Just before this, there is the following test (guarded by
flag_strict_aliasing per explicit Richi's request). 

  if (flag_strict_aliasing
  && !POINTER_TYPE_P (TREE_TYPE (lhs)))
return false;

Why is it not enough to catch integer constants?

I'd be OK with ignoring invariants which are not pointers but those
should already be handled by the test above.  Your code also ignores
even variable right hand sides which I think might be dangerous,
e.g. if SRA ever decided to copy an object by copying all fields.

Thanks,

Martin


> +
> +  /* Assume that only ADDR_EXPRs can be assigned to vtables.
> +In particular, ignore *ptr =  when searching for aliasing
> +entries.  */
> +  rhs = gimple_assign_rhs1 (stmt);
> +  if (TREE_CODE (rhs) != ADDR_EXPR)
> +   return false;
>  }
>return true;
>  }
> 


> 
> Martin, you are the author of stmt_may_be_vtbl_ptr_store; is there
> any reaso> n to assume that something other than ADDR_EXPR can be
> assigned to a vtable?
> 
> Bootstrapped and regtested on x86_64-linux-gnu {-m64/-m32} with no 
> regressions.
> 
> OK for trunk?
> 
> Thank you,
> 
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
> 
> 



fsf-gcc-vtbl-assign.patch
Description: Binary data


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-10 Thread Kai Tietz
2011/10/10 Richard Guenther :
> On Mon, Oct 10, 2011 at 4:06 PM, Kai Tietz  wrote:
>> 2011/10/10 Richard Guenther :
>>> On Mon, Oct 10, 2011 at 2:29 PM, Kai Tietz  wrote:
 Recent patch had a thinko on rhs of inner lhs check for TRUTH-IF.  It
 has to be checked that the LHS code is same as outer CODE, as
 otherwise we wouldn't apply different TRUTH-IF only on inner RHS of
 LHS, which is of course wrong.

 Index: gcc/gcc/fold-const.c
 ===
 --- gcc.orig/gcc/fold-const.c
 +++ gcc/gcc/fold-const.c
 @@ -111,14 +111,13 @@ static tree decode_field_reference (loca
                                    tree *, tree *);
  static int all_ones_mask_p (const_tree, int);
  static tree sign_bit_p (tree, const_tree);
 -static int simple_operand_p (const_tree);
 +static int simple_operand_p (tree);
  static tree range_binop (enum tree_code, tree, tree, int, tree, int);
  static tree range_predecessor (tree);
  static tree range_successor (tree);
  static tree fold_range_test (location_t, enum tree_code, tree, tree, 
 tree);
  static tree fold_cond_expr_with_comparison (location_t, tree, tree,
 tree, tree);
  static tree unextend (tree, int, int, tree);
 -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree);
  static tree optimize_minmax_comparison (location_t, enum tree_code,
                                        tree, tree, tree);
  static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
 @@ -3500,7 +3499,7 @@ optimize_bit_field_compare (location_t l
   return lhs;
  }

 -/* Subroutine for fold_truthop: decode a field reference.
 +/* Subroutine for fold_truth_andor_1: decode a field reference.

    If EXP is a comparison reference, we return the innermost reference.

 @@ -3668,17 +3667,43 @@ sign_bit_p (tree exp, const_tree val)
   return NULL_TREE;
  }

 -/* Subroutine for fold_truthop: determine if an operand is simple enough
 +/* Subroutine for fold_truth_andor_1: determine if an operand is simple 
 enough
    to be evaluated unconditionally.  */

  static int
 -simple_operand_p (const_tree exp)
 +simple_operand_p (tree exp)
  {
 +  enum tree_code code;
   /* Strip any conversions that don't change the machine mode.  */
   STRIP_NOPS (exp);

 +  code = TREE_CODE (exp);
 +
 +  /* Handle some trivials   */
 +  if (TREE_CODE_CLASS (code) == tcc_comparison)
 +    return (tree_could_trap_p (exp)
 +           && simple_operand_p (TREE_OPERAND (exp, 0))
 +           && simple_operand_p (TREE_OPERAND (exp, 1)));
>>>
>>> And that's still wrong.
>>>
>>> Stopped reading here.
>>>
>>> Richard.
>>
>> Oh, there is a not missing.  I didn't spot that, sorry.
>>
>> To the point why we need to handle comparisons within simple_operand_p.
>>
>> If we reject comparisons and logical not here, we won't have any
>> branching optimization anymore, as this the patch moves into
>> fold_truthandor.
>>
>> The result with rejecting in simple_operand_p compares and logic-not
>> provides for the following example:
>
> But you change what simple_operand_p accepts and thus change what
> fold_truthop accepts as operands to its simplifications.
>
> Richard.

Well, not really.  I assume you mean fold_truth_andor_1 (aka fold_truthop).

It checks for
...
  if (TREE_CODE_CLASS (lcode) != tcc_comparison
  || TREE_CODE_CLASS (rcode) != tcc_comparison)
return 0;
...
before checking for simple_operand_p.  So there is actual no change.
It might be of some interest here to add in a different patch support
for logic-not, but well, this is would be material for a different
patch.
So, it won't operate on anything else then comparisons as before.

Kai


Re: [Patch,testsuite,AVR]: target-supports.exp: AVR does not support TLS

2011-10-10 Thread Georg-Johann Lay
Rainer Orth schrieb:
> Georg-Johann Lay  writes:
> 
>> For example, after updating trunk to 179738,
>>
>> (gdb) set args  -fpreprocessed tls-1.i -quiet -dumpbase tls-1.c 
>> -mmcu=atmega128
>> -auxbase tls-1 -gdwarf-2 -O3 -O2 -version -o tls-1.s
>> (gdb) cd ~/test
>> (gdb) r
>> GNU C (GCC) version 4.7.0 20111010 (experimental) (avr)
>> compiled by GNU C version 4.3.2 [gcc-4_3-branch revision 141291], GMP
>> version 5.0.1, MPFR version 3.0.0-p8, MPC version 0.8.2
>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>> GNU C (GCC) version 4.7.0 20111010 (experimental) (avr)
>>  compiled by GNU C version 4.3.2 [gcc-4_3-branch revision 141291], GMP 
>> version
>> 5.0.1, MPFR version 3.0.0-p8, MPC version 0.8.2
>> GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
>> Compiler executable checksum: 4e4900df2fe6cd3a5bd440dbb3a30bf2
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> 0x0864e398 in set_is_used (var=0xb7dd3600) at
>> ../../../gcc.gnu.org/trunk/gcc/tree-flow-inline.h:562
>> (gdb) bt
>> #0  0x0864e398 in set_is_used (var=0xb7dd3600) at
>> ../../../gcc.gnu.org/trunk/gcc/tree-flow-inline.h:562
>> #1  0x0864e256 in mark_all_vars_used_1 (tp=0xb7dc97c4,
>> walk_subtrees=0xbfffe004, data=0x0) at
>> ../../../gcc.gnu.org/trunk/gcc/tree-ssa-live.c:379
> 
> This has nothing to do with AVR, but is PR middle-end/50638.  It breaks
> TLS on all emutls targets.  A patch has been posted and approved, but
> apparently not yet installed.

Thanks for the notice.
The GCC documentation says that TLS need support of a dynamic linker. It's
unlikely there will ever be a dynamic linker on a device with some tens k for
code and some k of RAM?

> So please be more careful investigating testsuite failures before
> XFAILing or skipping them.

Sorry, experience is that many test cases are not well written to parametrize
for small targets and it's merely impossible to catch up with hundreds of test
case fall out from them and to see the real problems.

Johann

> 
>   Rainer
> 


Re: [PATCH] Mark static const strings as read-only.

2011-10-10 Thread Tom de Vries
On 10/10/2011 03:44 PM, Eric Botcazou wrote:
>> The patch makes sure that the src_mem computed here in
>> expand_builtin_memcpy is marked as MEM_READONLY_P:
>> ...
>>   src_mem = get_memory_rtx (src, len);
>> ...
>>
>> build and reg-tested on i686, arm, and mips.
>>
>> OK for trunk?
> 
> Do you get the same result by calling gen_const_mem from build_constant_desc 
> instead of gen_rtx_MEM?  If so, this would be better I think.
> 

I tried the patch that you describe:
...
Index: gcc/varasm.c
===
--- gcc/varasm.c (revision 179662)
+++ gcc/varasm.c (working copy)
@@ -3119,7 +3119,7 @@ build_constant_desc (tree exp)
   SET_SYMBOL_REF_DECL (symbol, decl);
   TREE_CONSTANT_POOL_ADDRESS_P (symbol) = 1;

-  rtl = gen_rtx_MEM (TYPE_MODE (TREE_TYPE (exp)), symbol);
+  rtl = gen_const_mem (TYPE_MODE (TREE_TYPE (exp)), symbol);
   set_mem_attributes (rtl, exp, 1);
   set_mem_alias_set (rtl, 0);
   set_mem_alias_set (rtl, const_alias_set);
...

The src_mem in expand_builtin_memcpy is generated in get_memory_rtx:
...
static rtx
get_memory_rtx (tree exp, tree len)
{
  tree orig_exp = exp;
  rtx addr, mem;
  HOST_WIDE_INT off;

  /* When EXP is not resolved SAVE_EXPR, MEM_ATTRS can be still derived
 from its expression, for expr->a.b only .a.b is recorded.  */
  if (TREE_CODE (exp) == SAVE_EXPR && !SAVE_EXPR_RESOLVED_P (exp))
exp = TREE_OPERAND (exp, 0);

  addr = expand_expr (orig_exp, NULL_RTX, ptr_mode, EXPAND_NORMAL);
  mem = gen_rtx_MEM (BLKmode, memory_address (BLKmode, addr));
...

With the patch for build_constant_desc, a mem with MEM_READONLY_P set (/u) is
generated in expand_expr_constant during expand_expr:
...
(mem/s/u/c:BLK (symbol_ref/f:SI ("*$LC0") [flags 0x2]  ) [1 S4 A32])
...

but the expand_expr returns this:
...
(symbol_ref/f:SI ("*$LC0") [flags 0x2]  )
...

The subsequent gen_rtx_MEM generates this:
...
(mem:BLK (lo_sum:SI (reg:SI 195)
(symbol_ref/f:SI ("*$LC0") [flags 0x2]  )) [0
A8])
...

and after set_mem_attributes it looks like this:
...
(mem/s/c:BLK (lo_sum:SI (reg:SI 195)
(symbol_ref/f:SI ("*$LC0") [flags 0x2]  )) [0
S4 A32])
...

So, the patch for build_constant_desc does not have the desired effect.

Thanks,
- Tom


Re: [Patch,testsuite,AVR]: target-supports.exp: AVR does not support TLS

2011-10-10 Thread Rainer Orth
Georg-Johann Lay  writes:

>> This has nothing to do with AVR, but is PR middle-end/50638.  It breaks
>> TLS on all emutls targets.  A patch has been posted and approved, but
>> apparently not yet installed.
>
> Thanks for the notice.
> The GCC documentation says that TLS need support of a dynamic linker. It's
> unlikely there will ever be a dynamic linker on a device with some tens k for
> code and some k of RAM?

I suppose this applies to native TLS, not emutls.

>> So please be more careful investigating testsuite failures before
>> XFAILing or skipping them.
>
> Sorry, experience is that many test cases are not well written to parametrize
> for small targets and it's merely impossible to catch up with hundreds of test

Not only small, but unusual targets in general ;-)

> case fall out from them and to see the real problems.

But it's still important to either do the analysis (or have someone who
knows that area in depth do the analysis :-) to avoid papering over the
real problem.

I like to keep the number of testsuite failure manageable myself, so I
know how your feeling ;-)

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH] sel-sched: fix merging of LHS reg availability (PR 50340)

2011-10-10 Thread Alexander Monakov
Ping.  This is now a P1 bug, as it breaks spec2k gcc on ia64.

On Tue, 13 Sep 2011, Alexander Monakov wrote:

> Hello,
> 
> This patches fixes an ICE on an assert that performs a sanity check on
> target_available field of expr_t, which is tri-state: LHS register is
> available (1), not available (0) or unknown (-1).
> 
> The problem is, when merging expr data of separable exprs with differing
> LHSes, we can't claim we know anything about availability of the target
> register if the unavailable LHS of the other expr is not the same register.
> Thus, we should set the field to -1, not 0.
> 
> Fixed as follows, bootstrapped and regtested on x86_64-linux and ia64-linux
> (without java, with one recent SRA patch reverted to unbreak bootstrap) with
> sel-sched enabled at -O2.  OK for trunk?
> 
> (a small testcase is not available at the moment, but I can try to produce one
> using delta before committing)
> 
> 
> 2011-09-13  Andrey Belevantsev  
> 
>   * sel-sched-ir.c (update_target_availability): LHS register
>   availability is not known if the unavailable LHS of the other
>   expression is a different register.
> 
> diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
> index 4878460..b132392 100644
> --- a/gcc/sel-sched-ir.c
> +++ b/gcc/sel-sched-ir.c
> @@ -1746,7 +1746,13 @@ update_target_availability (expr_t to, expr_t from,
> insn_t split_point)
>  EXPR_TARGET_AVAILABLE (to) = -1;
>  }
>else
> -EXPR_TARGET_AVAILABLE (to) &= EXPR_TARGET_AVAILABLE (from);
> +if (EXPR_TARGET_AVAILABLE (from) == 0
> +&& EXPR_LHS (from)
> +&& REG_P (EXPR_LHS (from))
> +&& REGNO (EXPR_LHS (to)) != REGNO (EXPR_LHS (from)))
> +  EXPR_TARGET_AVAILABLE (to) = -1;
> +else
> +  EXPR_TARGET_AVAILABLE (to) &= EXPR_TARGET_AVAILABLE (from);
>  }
>  }
> 


[PATCH] Verify stuff after IPA split

2011-10-10 Thread Richard Guenther

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2011-10-10  Richard Guenther  

* ipa-split.c (pass_split_functions): Add verification TODOs.
(pass_feedback_split_functions): Likewise.

Index: gcc/ipa-split.c
===
--- gcc/ipa-split.c (revision 179738)
+++ gcc/ipa-split.c (working copy)
@@ -1451,7 +1451,7 @@ struct gimple_opt_pass pass_split_functi
   0,   /* properties_provided */
   0,   /* properties_destroyed */
   0,   /* todo_flags_start */
-  0/* todo_flags_finish */
+  TODO_verify_all  /* todo_flags_finish */
  }
 };
 
@@ -1492,6 +1492,6 @@ struct gimple_opt_pass pass_feedback_spl
   0,   /* properties_provided */
   0,   /* properties_destroyed */
   0,   /* todo_flags_start */
-  0/* todo_flags_finish */
+  TODO_verify_all  /* todo_flags_finish */
  }
 };


Re: [PATCH] Don't assume that constants can clobber vtbl

2011-10-10 Thread Martin Jambor
On Mon, Oct 10, 2011 at 05:05:14PM +0200, Martin Jambor wrote:
> Hi,
> 
> sorry that taking care of the devirtualization patches takes me longer
> than expected for various reasons.  But I have not forgotten about
> this.

Ah, please ignore the attachment, I was trying to persuade mutt to add
it to the original email so that I could comment on it inline but it
re-attached it instead, for some weird reason.

Thanks,

Martin


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-10 Thread Michael Matz
Hi,

On Mon, 10 Oct 2011, Kai Tietz wrote:

> extern int foo (void); /* foo modifies gbl1 */
> int gbl1 = 0;
> 
> int foo (int ns1)
> {
>   if (ns1 && foo () && gbl1)
> return 1;
>   return 0;
> }
> 
> so chain of trees has to look like this:
> (ANDIF (ns1 (ANDIF foo () gbl1))

Okay, indeed.  I was wrong when I claimed that only the RHS needs to be 
side-effect-free for ANDIF->AND to be valid.  It's true when the RHS can't 
depend on the LHS, but I only thought about the fact that the side-effect 
must occur, not that it possibly influences RHS.


Ciao,
Michael.


Re: [PATCH] sel-sched: forbid differing modes in substitution (PR 50205)

2011-10-10 Thread Alexander Monakov
Ping?

On Wed, 7 Sep 2011, Alexander Monakov wrote:

> Hello,
> 
> The patch repairs a problem when we attempt to substitute an insn like
> (... (cmp (mem (reg:DI ax)) (reg:SI ax))) (note different modes) through
> (set (reg:DI ax) (reg:DI dx)), which leaves the (reg:SI ax) part of the
> comparison intact, causing an ICE later on when we notice that the dependency
> on ax is still present.  As this is quite rare, we can simply forbid
> substitution in such circumstances, much like substitution of multiple hard
> reg references is forbidden now.
> 
> En passant, the patch simplifes the code a little, as we never try to
> substitute anything but registers.
> 
> Bootstrapped and regtested on x86_64-linux with sel-sched enabled at -O2, OK?
> (I'll add the testcase from Bugzilla when committing)
> 
> 2011-09-07  Alexander Monakov  
> 
>   PR rtl-optimization/50205
>   * sel-sched.c (count_occurrences_1): Simplify on the assumption that
>   p->x is a register.  Forbid substitution when the same register is
>   found in a different mode.
>   (count_occurrences_equiv): Assert that 'what' is a register.
> 
> diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
> index f11faca..2af01ae 100644
> --- a/gcc/sel-sched.c
> +++ b/gcc/sel-sched.c
> @@ -813,18 +813,12 @@ count_occurrences_1 (rtx *cur_rtx, void *arg)
>  {
>rtx_search_arg_p p = (rtx_search_arg_p) arg;
>  
> -  /* The last param FOR_GCSE is true, because otherwise it performs excessive
> -substitutions like
> - r8 = r33
> - r16 = r33
> -for the last insn it presumes r33 equivalent to r8, so it changes it to
> -r33.  Actually, there's no change, but it spoils debugging.  */
> -  if (exp_equiv_p (*cur_rtx, p->x, 0, true))
> -{
> -  /* Bail out if we occupy more than one register.  */
> -  if (REG_P (*cur_rtx)
> -  && HARD_REGISTER_P (*cur_rtx)
> -  && hard_regno_nregs[REGNO(*cur_rtx)][GET_MODE (*cur_rtx)] > 1)
> +  if (REG_P (*cur_rtx) && REGNO (*cur_rtx) == REGNO (p->x))
> +{
> +  /* Bail out if mode is different or more than one register is used.  */
> +  if (GET_MODE (*cur_rtx) != GET_MODE (p->x)
> +  || (HARD_REGISTER_P (*cur_rtx)
> +   && hard_regno_nregs[REGNO(*cur_rtx)][GET_MODE (*cur_rtx)] > 1))
>  {
>p->n = 0;
>return 1;
> @@ -837,7 +831,6 @@ count_occurrences_1 (rtx *cur_rtx, void *arg)
>  }
>  
>if (GET_CODE (*cur_rtx) == SUBREG
> -  && REG_P (p->x)
>&& (!REG_P (SUBREG_REG (*cur_rtx))
> || REGNO (SUBREG_REG (*cur_rtx)) == REGNO (p->x)))
>  {
> @@ -859,6 +852,7 @@ count_occurrences_equiv (rtx what, rtx where)
>  {
>struct rtx_search_arg arg;
>  
> +  gcc_assert (REG_P (what));
>arg.x = what;
>arg.n = 0;
>  
> 
> 


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-10 Thread Richard Guenther
On Mon, Oct 10, 2011 at 5:07 PM, Kai Tietz  wrote:
> 2011/10/10 Richard Guenther :
>> On Mon, Oct 10, 2011 at 4:06 PM, Kai Tietz  wrote:
>>> 2011/10/10 Richard Guenther :
 On Mon, Oct 10, 2011 at 2:29 PM, Kai Tietz  wrote:
> Recent patch had a thinko on rhs of inner lhs check for TRUTH-IF.  It
> has to be checked that the LHS code is same as outer CODE, as
> otherwise we wouldn't apply different TRUTH-IF only on inner RHS of
> LHS, which is of course wrong.
>
> Index: gcc/gcc/fold-const.c
> ===
> --- gcc.orig/gcc/fold-const.c
> +++ gcc/gcc/fold-const.c
> @@ -111,14 +111,13 @@ static tree decode_field_reference (loca
>                                    tree *, tree *);
>  static int all_ones_mask_p (const_tree, int);
>  static tree sign_bit_p (tree, const_tree);
> -static int simple_operand_p (const_tree);
> +static int simple_operand_p (tree);
>  static tree range_binop (enum tree_code, tree, tree, int, tree, int);
>  static tree range_predecessor (tree);
>  static tree range_successor (tree);
>  static tree fold_range_test (location_t, enum tree_code, tree, tree, 
> tree);
>  static tree fold_cond_expr_with_comparison (location_t, tree, tree,
> tree, tree);
>  static tree unextend (tree, int, int, tree);
> -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree);
>  static tree optimize_minmax_comparison (location_t, enum tree_code,
>                                        tree, tree, tree);
>  static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
> @@ -3500,7 +3499,7 @@ optimize_bit_field_compare (location_t l
>   return lhs;
>  }
>
> -/* Subroutine for fold_truthop: decode a field reference.
> +/* Subroutine for fold_truth_andor_1: decode a field reference.
>
>    If EXP is a comparison reference, we return the innermost reference.
>
> @@ -3668,17 +3667,43 @@ sign_bit_p (tree exp, const_tree val)
>   return NULL_TREE;
>  }
>
> -/* Subroutine for fold_truthop: determine if an operand is simple enough
> +/* Subroutine for fold_truth_andor_1: determine if an operand is simple 
> enough
>    to be evaluated unconditionally.  */
>
>  static int
> -simple_operand_p (const_tree exp)
> +simple_operand_p (tree exp)
>  {
> +  enum tree_code code;
>   /* Strip any conversions that don't change the machine mode.  */
>   STRIP_NOPS (exp);
>
> +  code = TREE_CODE (exp);
> +
> +  /* Handle some trivials   */
> +  if (TREE_CODE_CLASS (code) == tcc_comparison)
> +    return (tree_could_trap_p (exp)
> +           && simple_operand_p (TREE_OPERAND (exp, 0))
> +           && simple_operand_p (TREE_OPERAND (exp, 1)));

 And that's still wrong.

 Stopped reading here.

 Richard.
>>>
>>> Oh, there is a not missing.  I didn't spot that, sorry.
>>>
>>> To the point why we need to handle comparisons within simple_operand_p.
>>>
>>> If we reject comparisons and logical not here, we won't have any
>>> branching optimization anymore, as this the patch moves into
>>> fold_truthandor.
>>>
>>> The result with rejecting in simple_operand_p compares and logic-not
>>> provides for the following example:
>>
>> But you change what simple_operand_p accepts and thus change what
>> fold_truthop accepts as operands to its simplifications.
>>
>> Richard.
>
> Well, not really.  I assume you mean fold_truth_andor_1 (aka fold_truthop).
>
> It checks for
> ...
>  if (TREE_CODE_CLASS (lcode) != tcc_comparison
>      || TREE_CODE_CLASS (rcode) != tcc_comparison)
>    return 0;
> ...
> before checking for simple_operand_p.  So there is actual no change.
> It might be of some interest here to add in a different patch support
> for logic-not, but well, this is would be material for a different
> patch.
> So, it won't operate on anything else then comparisons as before.

Sure, because simple_operand_p is checked on the comparison
operands, not the comparison itself.

Richard.

> Kai
>


Re: [4/4] Make SMS schedule register moves

2011-10-10 Thread Ayal Zaks
On Mon, Oct 10, 2011 at 1:57 PM, Richard Sandiford
 wrote:
> Ayal Zaks  writes:
>>> I agree it's natural to schedule moves for intra-iteration dependencies
>>> in the normal get_sched_window way.  But suppose we have a dependency:
>>>
>>>   A --(T,N,1)--> B
>>>
>>> that requires two moves M1 and M2.  If we think in terms of cycles
>>> (in the SCHED_TIME sense), then this effectively becomes:
>>>
>>>   A --(T,N1,1)--> M1 -->(T,N2,0)--> M2 -->(T,N3,0)--> B
>>>
>>> because it is now M1 that is fed by both the loop and the incoming edge.
>>> But if there is a second dependency:
>>>
>>>   A --(T,M,0)--> C
>>>
>>> that also requires two moves, we end up with:
>>>
>>>   A --(T,N1,1)--> M1 -->(T,N2,0)--> M2 -->(T,N3,0)--> B
>>>                                        -->(T,M3,-1)--> B
>>>
>>> and dependence distances of -1 feel a bit weird. :-)  Of course,
>>> what we really have are two parallel dependencies:
>>>
>>>   A --(T,N1,1)--> M1 -->(T,N2,0)--> M2 -->(T,N3,0)--> B
>>>
>>>   A --(T,M1,0)--> M1' -->(T,M2,0)--> M2' -->(T,N3,0)--> B
>>>
>>> where M1' and M2' occupy the same position as M1 and M2 in the schedule,
>>> but are one stage further along.  But we only schedule them once,
>>> so if we take the cycle/SCHED_TIME route, we have to introduce
>>> dependencies of distance -1.
>>>
>>
>> Interesting; had to digest this distance 1 business, a result of
>> thinking in cycles instead of rows (or conversely), and mixing
>> dependences with scheduling; here's my understanding, based on your
>> explanations:
>>
>> Suppose a Use is truely dependent on a Def, where both have been
>> scheduled at some absolute cycles; think of them as timing the first
>> iteration of the loop.
>> Assume first that Use appears originally after Def in the original
>> instruction sequence of the loop (dependence distance 0). In this
>> case, Use requires register moves if its distance D from Def according
>> to the schedule is more than ii cycles long -- by the time Use is
>> executed, the value it needs is no longer available in the def'd
>> register due to intervening occurrences of Def. So in this case, the
>> first reg-move (among D/ii) should appear after Def, recording its
>> value before the next occurrence of Def overwrites it, and feeding
>> subsequent moves as needed before each is overwritten. Thus the
>> scheduling window of this first reg-move is within (Def, Def+ii).
>>
>> Now, suppose Use appears before Def, i.e., Use is upwards-exposed; if
>> it remains scheduled before the Def, no reg-move is needed. If, otoh,
>> Def is scheduled (D>0 cycles) before Use, breaking the anti-dependence
>> between them, (D/ii + 1) reg-moves are needed in order to feed Use
>> with the live-in value before Def. So in this case, the first reg-move
>> should appear before Def (again feeding subsequent moves as needed
>> before each is overwritten). Thus the scheduling window of this first
>> reg-move is within (Def-ii, Def).
>>
>> In any case, each use requires a specific number of reg-moves, which
>> begin either before or after the def; it is always safe (though
>> potentially redundant) to start reg-moving before the def -- uses that
>> do not need the live-in value will ignore it by feeding from a later
>> reg-move.
>
> Right.  The distance on the Def->Use dependency is effectively transferred
> to the dependency between the Def and first move.
>
> And we can potentially have both kinds of Use at the same time.
> We then end up scheduling two moves, one in each window, but require
> them to occupy the same row and column.  It feels more convenient to
> schedule the earlier of the two moves.
>
>> Q: if we generate reg-moves starting from before the def, would
>> redundant reg-moves be eliminated if no use needs access to live-in
>> value? If so, would that simplify their generation? (If not, it may be
>> interesting to understand how to improve DCE to catch it.)
>
> To be clear, the new version of the patch (unlike the old one)
> doesn't emit reg-moves before the def if all uses are distance 0.
> We only do it where some or all uses are distance 1.  The first move
> before the def is always needed in that case.
>

Understood. The question was whether it would simplify things if we
always emit reg-moves before the def. And rely on DCE to hopefully
eliminate redundancies.

> So redudant moves are confined to the case where (a) we have more
> than one move, (b) we have both distance 0 and distance 1 uses, and
> (c) one of the two distance sets requires more moves than the other.
> If the distance 0 dependencies require more moves than the distance
> 1 dependencies, we will have a redudant move in the prologue.
> If the distance 1 dependencies require more moves than the distance
> 0 dependencies, we will have a redudant move in the epilogue.
> But I believe that this is already handled by dce.
>
> (For avoidance of doubt, the new code is more accurate than
> current trunk in this respect.)
>
>> The issue of assigning stages to reg-moves is mostl

[PATCH] Fix PR50389

2011-10-10 Thread Richard Guenther

This fixes missed placements of VUSEs during 
gimplify_and_update_call_from_tree.  The gimplifier doesn't fill
out virtual operands (it doesn't call update_stmt) - for a reason,
we'd mark it for renaming else.  So we have to stick a VUSE on
every stmt that can possibly have one.  update_stmt will remove
it again when it is not necessary.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2011-10-10  Richard Guenther  

PR middle-end/50389
* gimple-fold.c (gimplify_and_update_call_from_tree): Do not
mark symbols for renaming.  Append the VUSE to all statements
that possibly can have one.

* gcc.dg/torture/pr50389.c: New testcase.

Index: gcc/gimple-fold.c
===
*** gcc/gimple-fold.c   (revision 179738)
--- gcc/gimple-fold.c   (working copy)
*** gimplify_and_update_call_from_tree (gimp
*** 588,600 
}
new_stmt = gsi_stmt (i);
if (gimple_in_ssa_p (cfun))
!   {
! find_new_referenced_vars (new_stmt);
! mark_symbols_for_renaming (new_stmt);
!   }
!   /* If the new statement has a VUSE, update it with exact SSA name we
!  know will reach this one.  */
!   if (gimple_vuse (new_stmt))
{
  /* If we've also seen a previous store create a new VDEF for
 the latter one, and make that the new reaching VUSE.  */
--- 588,597 
}
new_stmt = gsi_stmt (i);
if (gimple_in_ssa_p (cfun))
!   find_new_referenced_vars (new_stmt);
!   /* If the new statement possibly has a VUSE, update it with exact SSA
!name we know will reach this one.  */
!   if (gimple_has_mem_ops (new_stmt))
{
  /* If we've also seen a previous store create a new VDEF for
 the latter one, and make that the new reaching VUSE.  */
Index: gcc/testsuite/gcc.dg/torture/pr50389.c
===
*** gcc/testsuite/gcc.dg/torture/pr50389.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr50389.c  (revision 0)
***
*** 0 
--- 1,15 
+ /* { dg-do compile } */
+ /* { dg-options "-freorder-blocks -ftracer" } */
+ 
+ extern int data[];
+ extern int i;
+ 
+ void
+ foo (void)
+ {
+   char buf[8];
+   __builtin___memcpy_chk (buf, data, i ? 8 : 4,
+ __builtin_object_size (buf, 0));
+   __builtin___memcpy_chk (buf, data, i ? 8 : 4,
+ __builtin_object_size (buf, 0));
+ }


Re: [Patch,AVR]: Reuse __tablejump2__

2011-10-10 Thread Denis Chertykov
2011/10/10 Georg-Johann Lay :
> This patch does better re-usage of __tablejump2__ from libgcc so that
> switch/case statements with jump tables become smaller because the sequence
> needs 6 to 9 words compared to 2 when JMPing to __tablejump2__.
>
> Moreover, __tablejump2__ does no more rely in the content of EIND (by using
> EIJMP) and uses RET instead.
>
> The insn conditions of tablejump insns now are the same as in
> avr_output_addr_vec_elt.
>
> Ok for trunk?

Approved.

Denis.


Re: [PATCH] Mark static const strings as read-only.

2011-10-10 Thread Eric Botcazou
> So, the patch for build_constant_desc does not have the desired effect.

OK, too bad that we need to play this back-and-forth game with MEMs.  So the 
original patch is OK (with TREE_READONLY (base) on the next line to mimic what 
is done just above and without the gcc/ prefix in the ChangeLog).  If you have 
some available cycles, you can test and install the build_constant_desc change 
in the same commit, otherwise I'll do it myself.

-- 
Eric Botcazou


simulate-thread changes to avoid infinite loops

2011-10-10 Thread Aldy Hernandez

Hi folks.

This was a change Andrew had made to the simulate-thread framework to 
avoid infinite loops in hostile threads.  I forgot to merge this in my 
original submission, and am doing so now.


The original patch is here:

http://permalink.gmane.org/gmane.comp.gcc.patches/247678

My patch is a mere sed run on Andrew's patch, plus a few fixed typos in 
the comments.  I also kept the original simulate_thread_fini name as 
opposed to the __gdb_memmodel_fini name Andrew has (to keep names 
homogeneous throughout).


Tested on x86-64 Linux by running "make check 
RUNTESTFLAGS=simulate-thread.exp" on the build tree I already had for 
the original submission.


OK for trunk?
* gcc.dg/simulate-thread/simulate-thread.gdb: Call
wrappers for *other_threads() and *final_verify().
* gcc.dg/simulate-thread/simulate-thread.h
(simulate_thread_wrapper_other_threads): New.
(simulate_thread_wrapper_final_verify): New.

Index: gcc.dg/simulate-thread/simulate-thread.gdb
===
--- gcc.dg/simulate-thread/simulate-thread.gdb  (revision 179751)
+++ gcc.dg/simulate-thread/simulate-thread.gdb  (working copy)
@@ -5,13 +5,13 @@ run
 
 set $ret = 0
 while (simulate_thread_fini != 1) && (! $ret)
-  call simulate_thread_other_threads()
+  call simulate_thread_wrapper_other_threads()
   stepi
   set $ret |= simulate_thread_step_verify()
 end
 
 if (! $ret)
-  set $ret |= simulate_thread_final_verify()
+  set $ret |= simulate_thread_wrapper_final_verify()
 end
 continue
 quit $ret
Index: gcc.dg/simulate-thread/simulate-thread.h
===
--- gcc.dg/simulate-thread/simulate-thread.h(revision 179751)
+++ gcc.dg/simulate-thread/simulate-thread.h(working copy)
@@ -5,3 +5,107 @@ simulate_thread_done ()
 {
   simulate_thread_fini = 1;
 }
+
+/* A hostile thread is one which changes a memory location so quickly
+   that another thread may never see the same value again.  This is
+   simulated when simulate_thread_other_thread() is defined to modify
+   a memory location every cycle.
+
+   A process implementing a dependency on this value can run into
+   difficulties with such a hostile thread.  For instance,
+   implementing an add with a compare_and_swap loop goes something
+   like:
+
+ expected = *mem;
+   loop:
+ new = expected += value;
+ if (!succeed (expected = compare_and_swap (mem, expected, new)))
+   goto loop;
+
+   If the content of 'mem' are changed every cycle by
+   simulate_thread_other_thread () this will become an infinite loop
+   since the value *mem will never be 'expected' by the time the
+   compare_and_swap is executed.
+
+   HOSTILE_THREAD_THRESHOLD defines the number of intructions which a
+   program will execute before triggering the hostile thread
+   pause. The pause will last for HOSTILE_THREAD_PAUSE instructions,
+   and then the counter will reset and begin again.  During the pause
+   period, simulate_thread_other_thread will not be called.
+
+   This provides a chance for forward progress to be made and the
+   infinite loop to be avoided.
+
+   If the testcase defines HOSTILE_PAUSE_ERROR, then it will be
+   considered an RUNTIME FAILURE if the hostile pause is triggered.
+   This will allow to test for guaranteed forward progress routines.
+
+   If the default values for HOSTILE_THREAD_THRESHOLD or
+   HOSTILE_THREAD_PAUSE are insufficient, then the testcase may
+   override these by defining the values before including this file.
+
+   Most testcase are intended to run for very short periods of time,
+   so these defaults are considered to be high enough to not trigger
+   on a typical case, but not drag the test time out too much if a
+   hostile condition is interferring.  */
+
+  
+/* Define the threshold to start pausing the hostile thread.  */
+#if !defined (HOSTILE_THREAD_THRESHOLD)
+#define HOSTILE_THREAD_THRESHOLD   500
+#endif
+
+/* Define the length of pause in cycles for the hostile thread to pause to
+   allow forward progress to be made.  */
+#if !defined (HOSTILE_THREAD_PAUSE)
+#define HOSTILE_THREAD_PAUSE   20
+#endif
+
+void simulate_thread_other_threads (void);
+int simulate_thread_final_verify (void);
+
+static int simulate_thread_hostile_pause = 0;
+
+/* This function wraps simulate_thread_other_threads an monitors for
+   an infinite loop.  If the threshold value HOSTILE_THREAD_THRESHOLD
+   is reached, the other_thread process is paused for
+   HOSTILE_THREAD_PAUSE cycles before resuming, and the counters start
+   again.  */
+void
+simulate_thread_wrapper_other_threads()
+{
+  static int count = 0;
+  static int pause = 0;
+
+  if (++count >= HOSTILE_THREAD_THRESHOLD)
+{
+  if (!simulate_thread_hostile_pause)
+simulate_thread_hostile_pause = 1;
+
+  /* Count cycles before calling the hostile thread again.  */
+  if (pause++ < HOSTILE_THREAD_PAUSE)
+   return;
+
+  /

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-10 Thread Kai Tietz
2011/10/10 Richard Guenther :
> On Mon, Oct 10, 2011 at 5:07 PM, Kai Tietz  wrote:
>> 2011/10/10 Richard Guenther :
>>> On Mon, Oct 10, 2011 at 4:06 PM, Kai Tietz  wrote:
 2011/10/10 Richard Guenther :
> On Mon, Oct 10, 2011 at 2:29 PM, Kai Tietz  
> wrote:
>> Recent patch had a thinko on rhs of inner lhs check for TRUTH-IF.  It
>> has to be checked that the LHS code is same as outer CODE, as
>> otherwise we wouldn't apply different TRUTH-IF only on inner RHS of
>> LHS, which is of course wrong.
>>
>> Index: gcc/gcc/fold-const.c
>> ===
>> --- gcc.orig/gcc/fold-const.c
>> +++ gcc/gcc/fold-const.c
>> @@ -111,14 +111,13 @@ static tree decode_field_reference (loca
>>                                    tree *, tree *);
>>  static int all_ones_mask_p (const_tree, int);
>>  static tree sign_bit_p (tree, const_tree);
>> -static int simple_operand_p (const_tree);
>> +static int simple_operand_p (tree);
>>  static tree range_binop (enum tree_code, tree, tree, int, tree, int);
>>  static tree range_predecessor (tree);
>>  static tree range_successor (tree);
>>  static tree fold_range_test (location_t, enum tree_code, tree, tree, 
>> tree);
>>  static tree fold_cond_expr_with_comparison (location_t, tree, tree,
>> tree, tree);
>>  static tree unextend (tree, int, int, tree);
>> -static tree fold_truthop (location_t, enum tree_code, tree, tree, tree);
>>  static tree optimize_minmax_comparison (location_t, enum tree_code,
>>                                        tree, tree, tree);
>>  static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
>> @@ -3500,7 +3499,7 @@ optimize_bit_field_compare (location_t l
>>   return lhs;
>>  }
>>
>> -/* Subroutine for fold_truthop: decode a field reference.
>> +/* Subroutine for fold_truth_andor_1: decode a field reference.
>>
>>    If EXP is a comparison reference, we return the innermost reference.
>>
>> @@ -3668,17 +3667,43 @@ sign_bit_p (tree exp, const_tree val)
>>   return NULL_TREE;
>>  }
>>
>> -/* Subroutine for fold_truthop: determine if an operand is simple enough
>> +/* Subroutine for fold_truth_andor_1: determine if an operand is simple 
>> enough
>>    to be evaluated unconditionally.  */
>>
>>  static int
>> -simple_operand_p (const_tree exp)
>> +simple_operand_p (tree exp)
>>  {
>> +  enum tree_code code;
>>   /* Strip any conversions that don't change the machine mode.  */
>>   STRIP_NOPS (exp);
>>
>> +  code = TREE_CODE (exp);
>> +
>> +  /* Handle some trivials   */
>> +  if (TREE_CODE_CLASS (code) == tcc_comparison)
>> +    return (tree_could_trap_p (exp)
>> +           && simple_operand_p (TREE_OPERAND (exp, 0))
>> +           && simple_operand_p (TREE_OPERAND (exp, 1)));
>
> And that's still wrong.
>
> Stopped reading here.
>
> Richard.

 Oh, there is a not missing.  I didn't spot that, sorry.

 To the point why we need to handle comparisons within simple_operand_p.

 If we reject comparisons and logical not here, we won't have any
 branching optimization anymore, as this the patch moves into
 fold_truthandor.

 The result with rejecting in simple_operand_p compares and logic-not
 provides for the following example:
>>>
>>> But you change what simple_operand_p accepts and thus change what
>>> fold_truthop accepts as operands to its simplifications.
>>>
>>> Richard.
>>
>> Well, not really.  I assume you mean fold_truth_andor_1 (aka fold_truthop).
>>
>> It checks for
>> ...
>>  if (TREE_CODE_CLASS (lcode) != tcc_comparison
>>      || TREE_CODE_CLASS (rcode) != tcc_comparison)
>>    return 0;
>> ...
>> before checking for simple_operand_p.  So there is actual no change.
>> It might be of some interest here to add in a different patch support
>> for logic-not, but well, this is would be material for a different
>> patch.
>> So, it won't operate on anything else then comparisons as before.
>
> Sure, because simple_operand_p is checked on the comparison
> operands, not the comparison itself.
>
> Richard.

Right,  we would allow by this things like (a != b) < (c != d) etc.
Here it seems that only some cases for comparison in comparison are
folded.  (eg (a == b) == (c == d) -> (a == b) ^ (c == d) ). Well,
other material for a different patch.

To ensure that we use simple_operand_p in all cases, beside for
branching AND/OR chains, in same way as before, I added to this
function an additional argument, by which
the looking into comparisons can be activated.

Regards,
Kai

Index: gcc/gcc/fold-const.c
===
--- gcc.orig/gcc/fold-const.c
+++ gcc/gcc/fold-const.c
@@ -111,14 +111,13 @@ static tree decode_field_reference (loca

Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD

2011-10-10 Thread H.J. Lu
On Sun, Oct 9, 2011 at 12:49 PM, Kirill Yukhin  wrote:
> Hi guys,
> This is a Ping. Could anyboady with appropriate rights commit that?
>
>

I checked it in for you. Please provide ChangeLog entries together
with the new patch next time.

-- 
H.J.


Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD

2011-10-10 Thread Kirill Yukhin
Thank you

K

On Mon, Oct 10, 2011 at 8:08 PM, H.J. Lu  wrote:
> On Sun, Oct 9, 2011 at 12:49 PM, Kirill Yukhin  
> wrote:
>> Hi guys,
>> This is a Ping. Could anyboady with appropriate rights commit that?
>>
>>
>
> I checked it in for you. Please provide ChangeLog entries together
> with the new patch next time.
>
> --
> H.J.
>


Re: simulate-thread changes to avoid infinite loops

2011-10-10 Thread Jeff Law

On 10/10/11 09:54, Aldy Hernandez wrote:

Hi folks.

This was a change Andrew had made to the simulate-thread framework to
avoid infinite loops in hostile threads. I forgot to merge this in my
original submission, and am doing so now.

The original patch is here:

http://permalink.gmane.org/gmane.comp.gcc.patches/247678

My patch is a mere sed run on Andrew's patch, plus a few fixed typos in
the comments. I also kept the original simulate_thread_fini name as
opposed to the __gdb_memmodel_fini name Andrew has (to keep names
homogeneous throughout).

Tested on x86-64 Linux by running "make check
RUNTESTFLAGS=simulate-thread.exp" on the build tree I already had for
the original submission.

OK for trunk?

OK.
jeff


Re: Rename vshuffle/vec_shuffle to vec_perm

2011-10-10 Thread Richard Henderson
On 10/09/2011 06:09 AM, Joseph S. Myers wrote:
> That looks like it broke "make pdf" (something did at about that time and 
> this patch is the most likely candidate).
> 
>> +@cindex @code{vec_perm_const@var{m}) instruction pattern
> 
> has opening '{' matched with closing ')'.
> 

Fixed, thanks.


r~


[Patch,AVR]: Avoid unwind warning from toplev.c

2011-10-10 Thread Georg-Johann Lay
toplev.c complains about "unwind tables currently require a frame pointer for
correctness".

This patchlet supplies a fix to avoid build warnings/test fails in that it sets
flag_omit_frame_pointer to 0 if unwind needs FP.

toplev.c:process_options sets flag_unwind_tables depending on
flag_non_call_exceptions and flag_asynchronous_unwind_tables after calling
targetm.target_option.override() so that the test includes these flags, too.

Ok?

Johann

* config/avr/avr.c (avr_option_override): Set
flag_omit_frame_pointer to 0 if frame pointer is needed for
unwinding.
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 179744)
+++ config/avr/avr.c	(working copy)
@@ -351,6 +351,20 @@ avr_option_override (void)
 {
   flag_delete_null_pointer_checks = 0;
 
+  /* Avoid warning from toplev.c:process_options */
+
+  if ((flag_unwind_tables
+   || flag_non_call_exceptions
+   || flag_asynchronous_unwind_tables)
+  && !ACCUMULATE_OUTGOING_ARGS)
+{
+  flag_omit_frame_pointer = 0;
+}
+  else
+{
+  flag_omit_frame_pointer = (optimize >= 1);
+}
+
   avr_current_device = &avr_mcu_types[avr_mcu_index];
   avr_current_arch = &avr_arch_types[avr_current_device->arch];
   avr_extra_arch_macro = avr_current_device->macro;


Re: [patch, libgo] remove empty directories

2011-10-10 Thread Ian Lance Taylor
Matthias Klose  writes:

> libgo currently has some empty directories. ok to remove?
>
> D   go/encoding/line
> D   go/exp/ogle
> D   go/exp/eval
> D   go/exp/draw
> D   go/exp/draw/x11

Yes, thanks.

Ian


int_cst_hash_table mapping persistence and the garbage collector

2011-10-10 Thread Gary Funck
Recently, a few UPC test programs failed to compile
due to mis-matches of parameters in a prototype and its
corresponding function definition.  The mis-match was
based upon the apparent inequality of UPC layout qualifiers
(blocking factors).

UPC blocking factors are integer constants.  They are
recorded in a hash table indexed by the type tree node
that they correspond to.

Currently, the test for equality of blocking factors
tests only the pointer to the tree node defining the constant.
All blocking factors are recorded as "sizetype" type'd nodes.
Given that integer constants are hashed by type/value, it seemed
safe to assume that a given blocking factor would map to
a single tree node due to the underlying hash method that is used
when integral constants are created.

Is it valid to assume that pointer equality is sufficient
to ensure that two integer constants are equal as long
as their type and values are equal?

The bug that we ran into occurred because a garbage collection
pass was run between the point that the function prototype
tree node was created and the point at which the function declaration
was processed.  The garbage collector decided that the integer
constant representing the blocking factor was no longer in use,
because it had not been marked.

In fact, the integer constant was needed because it appeared
in the blocking factor hash table, but not via a direct pointer.
Rather it was referenced by nature of the fact that
the blocking factor hash table referenced the integer constant
that is mapped in the integer constant hash table.

Here's a rough diagram:

   tree (type) -> [ blocking factor hash ] -> tree (integer constant)
   tree (integer constant) -> [ integer constant hash ] {unique map}
  -> tree (integer constant)

When the garbage collector deleted the entry from the
"integer constant hash", it forced a new integer constant tree
node to be created for the same (type, value) integral constant
blocking factor.

One easy way to address the current issue is to call
tree_int_cst_equal() if the integer constant tree pointers
do not match:

if ((c1 != c2) && !tree_int_cst_equal (c1, c2))
  /* integer constants aren't equal.  */

This may be necessary if 'int_cst_hash_table' is viewed as
a cache rather than a persistent, unique mapping.

Another approach, would be to somehow mark the node
in int_cst_hash_table as in use when the blocking factor
hash table is traversed by the garbage collector, or
to add logic the hash table delete function associated
with int_cst_hash_table; to dis-allow the delete if the
integer constant is present in the UPC blocking factor
hash table.

To effect this change in a modular way probably the hash table
delete function associated with 'int_cst_hash_table' would have
to be daisy-chained, where the UPC blocking factor check is made
first.  The difficulty with implementing the daisy chaining is that
int_cst_hash_table needs to exist before the UPC-related initialization
code is run.  One way to handle this might be yet another language
hook, called from the code that creates 'int_cst_hash_table'.
That seems overly complex.

For reference, the current blocking factor mapping table
is created as follows:

  upc_block_factor_for_type = htab_create_ggc (512, tree_map_hash,
   tree_map_eq, 0);

Summary:

1. Is it valid to assume that pointer equality is sufficient
to compare two integer constants for equality as long as they
have identical type and value?

2. Should 'int_cst_hash_table' be viewed as a cache, where
the mapping of a given (type, value) integer constant may
vary over time?

3. If the answer to 1. is "yes" and the answer to 2. is "no"
then what is the recommended way to ensure that nodes in
'int_cst_hash_table' are not removed if the integer constant
is being referenced via the 'upc_block_factor_for_type'
hash table?

thanks,
- Gary


[PATCH]: Fix PR bootstrap/50665

2011-10-10 Thread Uros Bizjak
Hello!

Attached patch moves DOI_vec_perm index to the right place.

2011-10-10  Uros Bizjak  

PR bootstrap/50665
* optabs.h (DOI_vec_perm): Rename from OTI_vec_perm.  Move from enum
optab_index to enum direct_optab_index.
(vec_perm_optab): Update.

Tested by bootstrapping i686-pc-linux-gnu, committed to mainline SVN as obvious.

Uros.
Index: optabs.h
===
--- optabs.h(revision 179754)
+++ optabs.h(working copy)
@@ -377,9 +377,6 @@ enum optab_index
   OTI_vec_pack_sfix_trunc,
   OTI_vec_pack_ufix_trunc,
 
-  /* Vector shuffling.  */
-  OTI_vec_perm,
-
   /* Perform a raise to the power of integer.  */
   OTI_powi,
 
@@ -560,7 +557,6 @@ enum optab_index
 #define vec_pack_usat_optab (&optab_table[OTI_vec_pack_usat])
 #define vec_pack_sfix_trunc_optab (&optab_table[OTI_vec_pack_sfix_trunc])
 #define vec_pack_ufix_trunc_optab (&optab_table[OTI_vec_pack_ufix_trunc])
-#define vec_perm_optab (&direct_optab_table[(int) OTI_vec_perm])
 
 #define powi_optab (&optab_table[OTI_powi])
 
@@ -642,6 +638,9 @@ enum direct_optab_index
   DOI_reload_in,
   DOI_reload_out,
 
+  /* Vector shuffling.  */
+  DOI_vec_perm,
+
   /* Block move operation.  */
   DOI_movmem,
 
@@ -705,6 +704,7 @@ typedef struct direct_optab_d *direct_optab;
 #endif
 #define reload_in_optab (&direct_optab_table[(int) DOI_reload_in])
 #define reload_out_optab (&direct_optab_table[(int) DOI_reload_out])
+#define vec_perm_optab (&direct_optab_table[(int) DOI_vec_perm])
 #define movmem_optab (&direct_optab_table[(int) DOI_movmem])
 #define setmem_optab (&direct_optab_table[(int) DOI_setmem])
 #define cmpstr_optab (&direct_optab_table[(int) DOI_cmpstr])


Re: [Patch,AVR]: Avoid unwind warning from toplev.c

2011-10-10 Thread Denis Chertykov
2011/10/10 Georg-Johann Lay :
> toplev.c complains about "unwind tables currently require a frame pointer for
> correctness".
>
> This patchlet supplies a fix to avoid build warnings/test fails in that it 
> sets
> flag_omit_frame_pointer to 0 if unwind needs FP.
>
> toplev.c:process_options sets flag_unwind_tables depending on
> flag_non_call_exceptions and flag_asynchronous_unwind_tables after calling
> targetm.target_option.override() so that the test includes these flags, too.
>
> Ok?
>
> Johann
>
>        * config/avr/avr.c (avr_option_override): Set
>        flag_omit_frame_pointer to 0 if frame pointer is needed for
>        unwinding.
>

Approved.

Denis.


Re: [C++ Patch / RFC] PR 33067

2011-10-10 Thread Gabriel Dos Reis
On Mon, Oct 10, 2011 at 7:47 AM, Paolo Carlini  wrote:
> On 10/10/2011 02:13 PM, Paolo Carlini wrote:
>>
>> . looks like we want to do something else, not printing the number at all.
>> See audit trail.
>
> An option, for 4.7 at least, would be, instead of just closing 33067 as a
> duplicate of the much more general 49152, doing something like:
>
> Index: c-family/c-pretty-print.c
> ===
> --- c-family/c-pretty-print.c   (revision 179745)
> +++ c-family/c-pretty-print.c   (working copy)
> @@ -1019,7 +1019,7 @@ static void
>  pp_c_floating_constant (c_pretty_printer *pp, tree r)
>  {
>   real_to_decimal (pp_buffer (pp)->digit_buffer, &TREE_REAL_CST (r),
> -                  sizeof (pp_buffer (pp)->digit_buffer), 0, 1);
> +                  sizeof (pp_buffer (pp)->digit_buffer), 6, 1);
>   pp_string (pp, pp_buffer(pp)->digit_buffer);
>   if (TREE_TYPE (r) == float_type_node)
>     pp_character (pp, 'f');
>
> which gives:
>
> 33067.C:2:18: error: no match for ‘operator<’ in ‘1.1e+0 < t’
>
> Given in particular that we now have column numbers, as a user of GCC I
> would certainly consider it a good improvement.
>
> Just let me know...

on this particular input, '6' looks OK.  However, the
question is why '6'?
Why can't we retain the original number spelling from the
source code and use that instead?


Re: [C++ Patch / RFC] PR 33067

2011-10-10 Thread Paolo Carlini

On 10/10/2011 07:13 PM, Gabriel Dos Reis wrote:
on this particular input, '6' looks OK. However, the question is why 
'6'? Why can't we retain the original number spelling from the source 
code and use that instead? 
Yes, that would be 49152, no? It's quite a bit of work, I don't think 
somebody will be able to do that in time for 4.7.0, right? I I were a 
user of gcc, I would not be suprised to see 6 used, which after all it's 
the historical printf default, I would find it much better anyway than 
the current behavior. But I don't have a strong opinion, I already 
unassigned myself from the PR, fwiw.


Paolo.


Re: [C++ Patch / RFC] PR 33067

2011-10-10 Thread Gabriel Dos Reis
On Mon, Oct 10, 2011 at 12:16 PM, Paolo Carlini
 wrote:
> On 10/10/2011 07:13 PM, Gabriel Dos Reis wrote:
>>
>> on this particular input, '6' looks OK. However, the question is why '6'?
>> Why can't we retain the original number spelling from the source code and
>> use that instead?
>
> Yes, that would be 49152, no? It's quite a bit of work, I don't think
> somebody will be able to do that in time for 4.7.0, right? I I were a user
> of gcc, I would not be suprised to see 6 used, which after all it's the
> historical printf default, I would find it much better anyway than the
> current behavior. But I don't have a strong opinion, I already unassigned
> myself from the PR, fwiw.

I bet most C++ programers don't know what printf default is :-)

In any case that is a compiler implementation detail that has no bearing
with the input program.

A GCC user not interested in numerics probably won't care.
However, I do not think that extends to people who do care about numerics
and have literals in their program.  If GCC displays an error message
with literals truncated, it is not at clear to the user that GCC is
behaving correctly:
for one thing, the error might be mysterious and hard to understand,
now the compiler
is indicating that apparently it does not even parse and understand
correctly his or
her numbers.  How if you have several numbers like that that differs
only on the 10th
digit?  If the user is using an IDE that pattern match the output on
the source code, it
would be even more confusing (as it is now.)


>
> Paolo.
>


Re: [C++ Patch / RFC] PR 33067

2011-10-10 Thread Paolo Carlini

On 10/10/2011 07:28 PM, Gabriel Dos Reis wrote:
A GCC user not interested in numerics probably won't care. However, I 
do not think that extends to people who do care about numerics and 
have literals in their program. If GCC displays an error message with 
literals truncated, it is not at clear to the user that GCC is 
behaving correctly: for one thing, the error might be mysterious and 
hard to understand, now the compiler is indicating that apparently it 
does not even parse and understand correctly his or her numbers. How 
if you have several numbers like that that differs only on the 10th 
digit? If the user is using an IDE that pattern match the output on 
the source code, it would be even more confusing (as it is now.) 
Would you like to see the max_digits10 - inspired patch for 4.7.0? 
Again, no strong opinion, just let me know. Personally, I find that too 
quite a bit better than the current behavior.


Paolo.


Re: [PATCH 1/5] Use MADV_DONTNEED for freeing in garbage collector

2011-10-10 Thread Andi Kleen
> So can we move the param patch back as a possible followup?

I can drop it, however it will still mean fragmentation on any
non Linux (or rather non MADV_DONTNEED) hosts.

-Andi


Re: [C++ Patch / RFC] PR 33067

2011-10-10 Thread Gabriel Dos Reis
On Mon, Oct 10, 2011 at 12:30 PM, Paolo Carlini
 wrote:
> On 10/10/2011 07:28 PM, Gabriel Dos Reis wrote:
>>
>> A GCC user not interested in numerics probably won't care. However, I do
>> not think that extends to people who do care about numerics and have
>> literals in their program. If GCC displays an error message with literals
>> truncated, it is not at clear to the user that GCC is behaving correctly:
>> for one thing, the error might be mysterious and hard to understand, now the
>> compiler is indicating that apparently it does not even parse and understand
>> correctly his or her numbers. How if you have several numbers like that that
>> differs only on the 10th digit? If the user is using an IDE that pattern
>> match the output on the source code, it would be even more confusing (as it
>> is now.)
>
> Would you like to see the max_digits10 - inspired patch for 4.7.0? Again, no
> strong opinion, just let me know. Personally, I find that too quite a bit
> better than the current behavior.

Yes, I suspect the max_digits10 patch would be definitely an improvement.

Of course, the actual fix would be to write back what was in the
input source program, but I understand that is more work that you would
like to tackle or have time for.


Re: [C++ Patch / RFC] PR 33067

2011-10-10 Thread Paolo Carlini

On 10/10/2011 07:59 PM, Gabriel Dos Reis wrote:

Yes, I suspect the max_digits10 patch would be definitely an improvement.
Good. It's at the beginning of this thread, passes testing. Please have 
a closer look.


If you like it, we can have it for 4.7.0 and otherwise also mark this 
specific PR as duplicate of 49152 (which, actually, for this *specific* 
case leans toward not printing any constant at all, similarly to the 
status quo of the C front end)


Paolo.


Re: [C++ Patch / RFC] PR 33067

2011-10-10 Thread Gabriel Dos Reis
On Mon, Oct 10, 2011 at 1:07 PM, Paolo Carlini  wrote:
> On 10/10/2011 07:59 PM, Gabriel Dos Reis wrote:
>>
>> Yes, I suspect the max_digits10 patch would be definitely an improvement.
>
> Good. It's at the beginning of this thread, passes testing. Please have a
> closer look.

I did, but you seemed to show a preference for '6' digits which
prompted my comments.

OK for 4.6

>
> If you like it, we can have it for 4.7.0 and otherwise also mark this
> specific PR as duplicate of 49152 (which, actually, for this *specific* case
> leans toward not printing any constant at all, similarly to the status quo
> of the C front end)

I suspect printing the literal is better.  I believe the actual fix is to
print the lexeme as it appears in the source code.


Re: [C++ Patch / RFC] PR 33067

2011-10-10 Thread Paolo Carlini

Hi,

On 10/10/2011 07:59 PM, Gabriel Dos Reis wrote:

Yes, I suspect the max_digits10 patch would be definitely an improvement.

Good. It's at the beginning of this thread, passes testing. Please have a
closer look.

I did, but you seemed to show a preference for '6' digits which
prompted my comments.

OK for 4.6
Did I understand correctly, only 4.6, not 4.7? Works for me but seems a 
little unusual not applying anything to mainline.

If you like it, we can have it for 4.7.0 and otherwise also mark this
specific PR as duplicate of 49152 (which, actually, for this *specific* case
leans toward not printing any constant at all, similarly to the status quo
of the C front end)

I suspect printing the literal is better.  I believe the actual fix is to print 
the lexeme as it appears in the source code.
I would suggest adding something to the audit trail of 49152. I 
understand that otherwise for this specific type of error message people 
will lean toward not printing the constants. In general, I (we) hear 
you, of course, it's a lond standing issue, isn't it?


Paolo.


Re: [PATCH, testsuite]: Remove *.gdb files from testsuite dir

2011-10-10 Thread Uros Bizjak
On Sun, Oct 9, 2011 at 9:38 PM, Uros Bizjak  wrote:

> Attached patch removes *.gdb temporary files from testsuite directory.
>
> 2011-10-09  Uros Bizjak  
>
>        * lib/gcc-gdb-test.exp (gdb-test): Delete $cmd_file before return.
>
> Tested on x86_64-pc-linux-gnu {,-m32}. OK for mainline and branches?

I went ahead and install this patch to mainline SVN. It looks kind of
obvious, though.

Uros.


Re: [Patch, Fortran] Fix PR 50564

2011-10-10 Thread Thomas Koenig

Hi Tobias,


In conclusion: I am fine with the FORALL part, but not with the
DO CONCURRENT part.


Yep, you're right.  I have modified the patch accordingly.  Here
is what I committed.

Best regards

Thomas
Index: fortran/ChangeLog
===
--- fortran/ChangeLog	(revision 179768)
+++ fortran/ChangeLog	(working copy)
@@ -1,3 +1,12 @@
+2011-10-10  Thomas Koenig  
+
+	PR fortran/50564
+	* frontend-passes (forall_level):  New variable.
+	(cfe_register_funcs):  Don't register functions if we
+	are within a forall loop.
+	(optimize_namespace):  Set forall_level to 0 before entry.
+	(gfc_code_walker):  Increase/decrease forall_level.
+
 2011-10-09  Tobias Burnus  
 
 	PR fortran/45044
Index: fortran/frontend-passes.c
===
--- fortran/frontend-passes.c	(revision 179709)
+++ fortran/frontend-passes.c	(working copy)
@@ -62,6 +62,10 @@
 
 gfc_namespace *current_ns;
 
+/* If we are within any forall loop.  */
+
+static int forall_level;
+
 /* Entry point - run all passes for a namespace.  So far, only an
optimization pass is run.  */
 
@@ -165,6 +169,12 @@
 	  || (*e)->ts.u.cl->length->expr_type != EXPR_CONSTANT))
 return 0;
 
+  /* We don't do function elimination within FORALL statements, it can
+ lead to wrong-code in certain circumstances.  */
+
+  if (forall_level > 0)
+return 0;
+
   /* If we don't know the shape at compile time, we create an allocatable
  temporary variable to hold the intermediate result, but only if
  allocation on assignment is active.  */
@@ -493,6 +503,7 @@
 {
 
   current_ns = ns;
+  forall_level = 0;
 
   gfc_code_walker (&ns->code, convert_do_while, dummy_expr_callback, NULL);
   gfc_code_walker (&ns->code, cfe_code, cfe_expr_0, NULL);
@@ -1193,6 +1204,8 @@
 		WALK_SUBEXPR (fa->end);
 		WALK_SUBEXPR (fa->stride);
 		  }
+		if (co->op == EXEC_FORALL)
+		  forall_level ++;
 		break;
 	  }
 
@@ -1335,6 +1348,10 @@
 	  WALK_SUBEXPR (b->expr2);
 	  WALK_SUBCODE (b->next);
 	}
+
+	  if (co->op == EXEC_FORALL)
+	forall_level --;
+
 	}
 }
   return 0;


  1   2   >