date:20111101

IVopts bug?

2011-11-01 Thread 杜越海

Hi all

 I found IVopts rewrite a memory access with a weird iv candidate,
which make it lost its original memory attribute.
 a non-local memory access' base pointer was rewrite into a local one,
and  it was deleted in pass_cd_dce since
it was recognized as a local memory access.

here is the case i simplified from a decoder source

foo1(unsigned char* pSrcLeft,
 unsigned char* pSrcAbove,
 unsigned char* pSrcAboveLeft,
 unsigned char* pDst,
 int dstStep,
 int leftStep)
{
  signed int x, y, s;
  unsigned char  p1[5], p2[5],  p3;

  p1[0] = *pSrcAboveLeft;
  p2[0] = p1[0];
  p2[1] = pSrcLeft[0];
  pSrcLeft += leftStep;
  p2[2] = pSrcLeft[0];
  pSrcLeft += leftStep;
  p2[3] = pSrcLeft[0];
  pSrcLeft += leftStep;
  p2[4] = pSrcLeft[0];

  p1[1] = pSrcAbove[0];
  p1[2] = pSrcAbove[1];
  p1[3] = pSrcAbove[2];
  p1[4] = pSrcAbove[3];

  p3 = (unsigned char)(((signed int)p1[1] + (signed int)p2[1] +
(signed int)p1[0]
+(signed int)p1[0] + 2 ) >> 2 );

  for( y=0; y<4; y++, pDst += dstStep ) {
for( x=y+1; x<4; x++ ) {
s = ( p1[x-y-1] + p1[x-y] + p1[x-y] + p1[x-y+1] + 2 ) >> 2;
pDst[x] = (unsigned char)s;
}

pDst[y] = p3; -This memory access
  }
}

before IVopts

  D.6508_65 = pDst_88 + y.6_64;
  *D.6508_65 = p3_37;

after IVopts
it was rewrite to
MEM[symbol: p1, index: ivtmp.161_200, offset: 0B] = p3_37 ,

by
candidate 15
  depends on 3
  var_before ivtmp.161
  var_after ivtmp.161
  incremented before exit test
  type unsigned int
  base (unsigned int) pDst_39(D) - (unsigned int) &p1
  step (unsigned int) (pretmp.28_118 + 1)

so it still is &p1+ pDst - &p1 + step = pDst + step,
and in pass_cd_dce, is_hidden_global_store () return false for this memory
since it think this stmt only access local array p1.



gcc version r180694

Configured with: /home/croseadu/android/_src/src/gcc-src/configure
--host=i486-linux-gnu --build=i486-linux-gnu
--target=arm-none-linux-gnueabi
--prefix=/home/croseadu/android/_src/install/arm-none-linux-gnueabi
--enable-threads --disable-libmudflap --disable-libssp
--disable-libstdcxx-pch --with-gnu-as --with-gnu-ld
--enable-languages=c,c++ --enable-shared --enable-symvers=gnu
--enable-__cxa_atexit
--with-specs='%{funwind-tables|fno-unwind-tables|mabi=*|ffreestanding|nostdlib:;:-funwind-tables}'
--disable-nls --enable-lto
--with-sysroot=/home/croseadu/android/_src/install/arm-none-linux-gnueabi/arm-none-linux-gnueabi/libc
--with-build-sysroot=/home/croseadu/android/_src/install/arm-none-linux-gnueabi/arm-none-linux-gnueabi/libc
--with-gmp=/home/croseadu/android/_src/objs/arm-none-linux-gnueabi/obj/host-libs-/usr
--with-mpfr=/home/croseadu/android/_src/objs/arm-none-linux-gnueabi/obj/host-libs-/usr
--with-ppl=/home/croseadu/android/_src/objs/arm-none-linux-gnueabi/obj/host-libs-/usr
--with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic
-lm' 
--with-cloog=/home/croseadu/android/_src/objs/arm-none-linux-gnueabi/obj/host-libs-/usr
--enable-cloog-backend=isl
--with-mpc=/home/croseadu/android/_src/objs/arm-none-linux-gnueabi/obj/host-libs-/usr
--enable-poison-system-directories --disable-libquadmath --enable-lto
--enable-libgomp
--with-build-time-tools=/home/croseadu/android/_src/install/arm-none-linux-gnueabi/arm-none-linux-gnueabi/bin
--with-cpu=cortex-a8 --with-float=soft

compile flags:
-O3 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-double

need file a bug?


Yuehai Du

#include 

#define N 10

__attribute__ ((noinline)) void 
foo1(unsigned char* pSrcLeft,
 unsigned char* pSrcAbove,
 unsigned char* pSrcAboveLeft,
 unsigned char* pDst,
 int dstStep,
 int leftStep)
{
  signed int x, y, s;
  unsigned char  p1[5], p2[5],  p3;

  p1[0] = *pSrcAboveLeft;
  p2[0] = p1[0];
  p2[1] = pSrcLeft[0];
  pSrcLeft += leftStep;
  p2[2] = pSrcLeft[0];
  pSrcLeft += leftStep;
  p2[3] = pSrcLeft[0];
  pSrcLeft += leftStep;
  p2[4] = pSrcLeft[0];

  p1[1] = pSrcAbove[0];
  p1[2] = pSrcAbove[1];
  p1[3] = pSrcAbove[2];
  p1[4] = pSrcAbove[3];

  p3 = (unsigned char)(((signed int)p1[1] + (signed int)p2[1] + (signed int)p1[0]
		+(signed int)p1[0] + 2 ) >> 2 );

  for( y=0; y<4; y++, pDst += dstStep ) {
for( x=y+1; x<4; x++ ) {
s = ( p1[x-y-1] + p1[x-y] + p1[x-y] + p1[x-y+1] + 2 ) >> 2;
pDst[x] = (unsigned char)s;
}
   
pDst[y] = p3;
  }
}

__attribute__ ((noinline)) void 
foo2(unsigned char* pSrcLeft,
 unsigned char* pSrcAbove,
 unsigned char* pSrcAboveLeft,
 unsigned char* pDst,
 int dstStep,
 int leftStep)
{
  signed int x, y, s;
  unsigned char  p1[5], p2[5], p3;

  p1[0] = *pSrcAboveLeft;
  p2[0] = p1[0];
  p2[1] = pSrcLeft[0];
  pSrcLeft += leftStep;
  p2[2] = pSrcLeft[0];
  pSrcLeft += leftStep;
  p2[3] = pSrcLeft[0];
  pSrcLeft += leftStep;
  p2[4] = pSrcLeft[0];

  p1[1] = pSrcAbove[0];
  p1[2] = pSrcAbove[1];
  p1[3] = pSrcAbove[2];
  p1[4] = pSrcAbove[3];

  p3 = (unsigned char)(((signed

Re: Adding official support into the main tree for SPARC Leon

2011-11-01 Thread Konrad Eisele


I'll send new patches as a reply.

Eric Botcazou wrote:
> [CCing David Miller, the SPARC binutils maintainer]
> OK, so you're proposing a new 'leon' sub-architecture for binutils.

Yes.

> 
>> The appended 2 patches do:
>> 1. 0001-sparc-leon-Use-Aleon-assembler-switch-for-mcpu-leon-.patch
>>Append "-Aleon" to the assembler
> 
> This looks incomplete.  Don't you also want to enable the instructions?

The [casa,smac,umac] are used as inline assembler only.

> 
>> 2. 0001-sparc-leon-add-leon-architecture-to-GAS.patch
>>Define new "leon" processor type in GAS + enable for "leon"
>>umac/smac and "casa".
> 
> The configure.tgt change looks useless to me.

I have removed it, if gcc's "-Aleon" would be added it is not needed.

> 
> Other nits:
> 
> @@ -1668,9 +1671,8 @@ EFPOP2_2 ("efcmpes",0x055, "e,f"),
>  { "cpop2",   F3(2, 0x37, 0), F3(~2, ~0x37, ~1), "[1+2],d", F_ALIAS, v6notv9 
> },
>  
>  /* sparclet specific insns */
> -
> -COMMUTEOP ("umac", 0x3e, sparclet),
> -COMMUTEOP ("smac", 0x3f, sparclet),
> +COMMUTEOP ("umac", 0x3e, sparclet|MASK_LEON),
> +COMMUTEOP ("smac", 0x3f, sparclet|MASK_LEON),
>  COMMUTEOP ("umacd", 0x2e, sparclet),
>  COMMUTEOP ("smacd", 0x2f, sparclet),
>  COMMUTEOP ("umuld", 0x09, sparclet),
> 
> sparclet|leon
> 
> -{ "casa",F3(3, 0x3c, 0), F3(~3, ~0x3c, ~0), "[1]A,2,d", 0, v9 },
> -{ "casa",F3(3, 0x3c, 1), F3(~3, ~0x3c, ~1), "[1]o,2,d", 0, v9 },
> +{ "casa",F3(3, 0x3c, 0), F3(~3, ~0x3c, ~0), "[1]A,2,d", 0, v9|MASK_LEON 
> },
> +{ "casa",F3(3, 0x3c, 1), F3(~3, ~0x3c, ~1), "[1]o,2,d", 0, v9|MASK_LEON 
> },
> 
> v9|leon
> 
> +{ "cas", F3(3, 0x3c, 0)|ASI(0x80), F3(~3, ~0x3c, ~0)|ASI(~0x80), 
> "[1],2,d", 
> F_ALIAS, v9|MASK_LEON }, /* casa [rs1]ASI_P,rs2,rd */
> +{ "casl",F3(3, 0x3c, 0)|ASI(0x88), F3(~3, ~0x3c, ~0)|ASI(~0x88), 
> "[1],2,d", 
> F_ALIAS, v9|MASK_LEON }, /* casa [rs1]ASI_P_L,rs2,rd */
> 
> Likewise.
> 

I fixed that.

Intro

2011-11-01 Thread Konrad Eisele

Here are the new patches for adding -Aleon to binutils and to add -Aleon
as default asm-switch to gcc:

 - [PATCH 1/1] sparc leon: add -Aleon architecture to GAS:
   Binutils patch

 - [PATCH 1/1] sparc leon: Use -Aleon assembler switch for -mcpu=leon arch
   Gcc patch

[PATCH 1/1] sparc leon: add -Aleon architecture to GAS

2011-11-01 Thread Konrad Eisele

Add -Aleon architecture selection to GAS. -Aleon supports [umul,smul] and
[casa,casl].

Signed-off-by: Konrad Eisele 
---
 gas/config/tc-sparc.c  |3 ++-
 include/opcode/sparc.h |1 +
 opcodes/sparc-opc.c|   16 +---
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/gas/config/tc-sparc.c b/gas/config/tc-sparc.c
index 77fda56..47f4386 100644
--- a/gas/config/tc-sparc.c
+++ b/gas/config/tc-sparc.c
@@ -221,7 +221,7 @@ static void output_insn (const struct sparc_opcode *, 
struct sparc_it *);
for this use.  That table is for opcodes only.  This table is for opcodes
and file formats.  */
 
-enum sparc_arch_types {v6, v7, v8, sparclet, sparclite, sparc86x, v8plus,
+enum sparc_arch_types {v6, v7, v8, leon, sparclet, sparclite, sparc86x, v8plus,
   v8plusa, v9, v9a, v9b, v9_64};
 
 static struct sparc_arch {
@@ -246,6 +246,7 @@ static struct sparc_arch {
   { "sparcima", "v9b", v9, 0, 1, 
F_MUL32|F_DIV32|F_FSMULD|F_POPC|F_VIS|F_VIS2|F_FMAF|F_IMA },
   { "sparcvis3", "v9b", v9, 0, 1, 
F_MUL32|F_DIV32|F_FSMULD|F_POPC|F_VIS|F_VIS2|F_FMAF|F_VIS3|F_HPC },
   { "sparcvis3r", "v9b", v9, 0, 1, 
F_MUL32|F_DIV32|F_FSMULD|F_POPC|F_VIS|F_VIS2|F_FMAF|F_VIS3|F_HPC|F_RANDOM|F_TRANS|F_FJFMAU
 },
+  { "leon", "leon", leon, 32, 1, F_MUL32|F_DIV32|F_FSMULD },
   { "sparclet", "sparclet", sparclet, 32, 1, F_MUL32|F_DIV32|F_FSMULD },
   { "sparclite", "sparclite", sparclite, 32, 1, F_MUL32|F_DIV32|F_FSMULD },
   { "sparc86x", "sparclite", sparc86x, 32, 1, F_MUL32|F_DIV32|F_FSMULD },
diff --git a/include/opcode/sparc.h b/include/opcode/sparc.h
index 7ae3641..2283a93 100644
--- a/include/opcode/sparc.h
+++ b/include/opcode/sparc.h
@@ -42,6 +42,7 @@ enum sparc_opcode_arch_val
   SPARC_OPCODE_ARCH_V6 = 0,
   SPARC_OPCODE_ARCH_V7,
   SPARC_OPCODE_ARCH_V8,
+  SPARC_OPCODE_ARCH_LEON,
   SPARC_OPCODE_ARCH_SPARCLET,
   SPARC_OPCODE_ARCH_SPARCLITE,
   /* V9 variants must appear last.  */
diff --git a/opcodes/sparc-opc.c b/opcodes/sparc-opc.c
index a2096c5..f467588 100644
--- a/opcodes/sparc-opc.c
+++ b/opcodes/sparc-opc.c
@@ -33,6 +33,7 @@
 #define MASK_V6SPARC_OPCODE_ARCH_MASK (SPARC_OPCODE_ARCH_V6)
 #define MASK_V7SPARC_OPCODE_ARCH_MASK (SPARC_OPCODE_ARCH_V7)
 #define MASK_V8SPARC_OPCODE_ARCH_MASK (SPARC_OPCODE_ARCH_V8)
+#define MASK_LEON  SPARC_OPCODE_ARCH_MASK (SPARC_OPCODE_ARCH_LEON)
 #define MASK_SPARCLET  SPARC_OPCODE_ARCH_MASK (SPARC_OPCODE_ARCH_SPARCLET)
 #define MASK_SPARCLITE SPARC_OPCODE_ARCH_MASK (SPARC_OPCODE_ARCH_SPARCLITE)
 #define MASK_V9SPARC_OPCODE_ARCH_MASK (SPARC_OPCODE_ARCH_V9)
@@ -56,6 +57,7 @@
recognizes all v8 insns.  */
 #define v8 (MASK_V8 | MASK_SPARCLET | MASK_SPARCLITE \
 | MASK_V9 | MASK_V9A | MASK_V9B)
+#define leon   (MASK_LEON)
 #define sparclet   (MASK_SPARCLET)
 #define sparclite  (MASK_SPARCLITE)
 #define v9 (MASK_V9 | MASK_V9A | MASK_V9B)
@@ -76,6 +78,7 @@ const struct sparc_opcode_arch sparc_opcode_archs[] =
   { "v6", MASK_V6 },
   { "v7", MASK_V6 | MASK_V7 },
   { "v8", MASK_V6 | MASK_V7 | MASK_V8 },
+  { "leon", MASK_V6 | MASK_V7 | MASK_V8 | MASK_LEON },
   { "sparclet", MASK_V6 | MASK_V7 | MASK_V8 | MASK_SPARCLET },
   { "sparclite", MASK_V6 | MASK_V7 | MASK_V8 | MASK_SPARCLITE },
   /* ??? Don't some v8 priviledged insns conflict with v9?  */
@@ -1668,9 +1671,8 @@ EFPOP2_2 ("efcmpes",  0x055, "e,f"),
 { "cpop2", F3(2, 0x37, 0), F3(~2, ~0x37, ~1), "[1+2],d", F_ALIAS, v6notv9 
},
 
 /* sparclet specific insns */
-
-COMMUTEOP ("umac", 0x3e, sparclet),
-COMMUTEOP ("smac", 0x3f, sparclet),
+COMMUTEOP ("umac", 0x3e, sparclet|leon),
+COMMUTEOP ("smac", 0x3f, sparclet|leon),
 COMMUTEOP ("umacd", 0x2e, sparclet),
 COMMUTEOP ("smacd", 0x2f, sparclet),
 COMMUTEOP ("umuld", 0x09, sparclet),
@@ -1721,8 +1723,8 @@ SLCBCC("cbnefr", 15),
 #undef SLCBCC2
 #undef SLCBCC
 
-{ "casa",  F3(3, 0x3c, 0), F3(~3, ~0x3c, ~0), "[1]A,2,d", 0, v9 },
-{ "casa",  F3(3, 0x3c, 1), F3(~3, ~0x3c, ~1), "[1]o,2,d", 0, v9 },
+{ "casa",  F3(3, 0x3c, 0), F3(~3, ~0x3c, ~0), "[1]A,2,d", 0, v9|leon },
+{ "casa",  F3(3, 0x3c, 1), F3(~3, ~0x3c, ~1), "[1]o,2,d", 0, v9|leon },
 { "casxa", F3(3, 0x3e, 0), F3(~3, ~0x3e, ~0), "[1]A,2,d", 0, v9 },
 { "casxa", F3(3, 0x3e, 1), F3(~3, ~0x3e, ~1), "[1]o,2,d", 0, v9 },
 
@@ -1732,8 +1734,8 @@ SLCBCC("cbnefr", 15),
 { "signx", F3(2, 0x27, 0), F3(~2, ~0x27, ~0)|(1<<12)|ASI(~0)|RS2_G0, "r", 
F_ALIAS, v9 }, /* sra rd,%g0,rd */
 { "clruw", F3(2, 0x26, 0), F3(~2, ~0x26, ~0)|(1<<12)|ASI(~0)|RS2_G0, 
"1,d", F_ALIAS, v9 }, /* srl rs1,%g0,rd */
 { "clruw", F3(2, 0x26, 0), F3(~2, ~0x26, ~0)|(1<<12)|ASI(~0)|RS2_G0, "r", 
F_ALIAS, v9 }, /* srl rd,%g0,rd */
-{ "cas",   F3(3, 0x3c, 0)|ASI(0x80), F3(~3, ~0x3c, ~0)|ASI(~0x80), 
"[1],2,d", F_ALIAS, v9 }, /* casa [rs1]ASI_P,rs2,rd */
-{ "casl",  F3(3, 0x3c, 0)|ASI(0x88), F3(~3, ~0x3c, ~0)|ASI(~0x88), 
"[1],2,d", F_ALIAS, v9 },

[PATCH 1/1] sparc leon: Use -Aleon assembler switch for -mcpu=leon arch

2011-11-01 Thread Konrad Eisele

Use -Aleon to enable binutils sparc-leon architecture. The leon-arch
binutils GAS has umul/smul and casa enabled.

Signed-off-by: Konrad Eisele 
---
 gcc/config/sparc/sparc.h |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index 65b4527..bbadeb2 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -236,7 +236,7 @@ extern enum cmodel sparc_cmodel;
 
 #if TARGET_CPU_DEFAULT == TARGET_CPU_leon
 #define CPP_CPU32_DEFAULT_SPEC "-D__leon__ -D__sparc_v8__"
-#define ASM_CPU32_DEFAULT_SPEC ""
+#define ASM_CPU32_DEFAULT_SPEC "-Aleon"
 #endif
 
 #endif
@@ -324,7 +324,7 @@ extern enum cmodel sparc_cmodel;
 
 /* Override in target specific files.  */
 #define ASM_CPU_SPEC "\
-%{mcpu=sparclet:-Asparclet} %{mcpu=tsc701:-Asparclet} \
+%{mcpu=sparclet:-Asparclet} %{mcpu=leon:-Aleon} %{mcpu=tsc701:-Asparclet} \
 %{mcpu=sparclite:-Asparclite} \
 %{mcpu=sparclite86x:-Asparclite} \
 %{mcpu=f930:-Asparclite} %{mcpu=f934:-Asparclite} \
-- 
1.6.4.1

Re: [PATCH 1/1] sparc leon: add -Aleon architecture to GAS

2011-11-01 Thread David Miller


Please post binutils patches with the binutils development list CC:'d.

Re: [PATCH 1/1] sparc leon: Use -Aleon assembler switch for -mcpu=leon arch

2011-11-01 Thread David Miller


GCC patches are to be posted to gcc-patches, not gcc.

Re: [PATCH 1/1] sparc leon: add -Aleon architecture to GAS

2011-11-01 Thread Konrad Eisele

David Miller wrote:
> 
> Please post binutils patches with the binutils development list CC:'d.
> 
> 

Is the binutils development list bug-binut...@gnu.org ?

Re: [PATCH 1/1] sparc leon: Use -Aleon assembler switch for -mcpu=leon arch

2011-11-01 Thread Konrad Eisele

David Miller wrote:
> 
> GCC patches are to be posted to gcc-patches, not gcc.
> 
> 
I have sent it there.

Re: [PATCH 1/1] sparc leon: add -Aleon architecture to GAS

2011-11-01 Thread David Miller

From: Konrad Eisele 
Date: Tue, 01 Nov 2011 10:19:04 +0100

> David Miller wrote:
>> 
>> Please post binutils patches with the binutils development list CC:'d.
>> 
>> 
> 
> Is the binutils development list bug-binut...@gnu.org ?

No, it's binut...@sourceware.org

Re: [PATCH 1/1] sparc leon: add -Aleon architecture to GAS

2011-11-01 Thread Konrad Eisele

David Miller wrote:
> From: Konrad Eisele 
> Date: Tue, 01 Nov 2011 10:19:04 +0100
> 
>> David Miller wrote:
>>>
>>> Please post binutils patches with the binutils development list CC:'d.
>>>
>>>
>>
>> Is the binutils development list bug-binut...@gnu.org ?
> 
> No, it's binut...@sourceware.org
> 
> 

Ok, I've sent it there.

Re: Potentially merging the transactional-memory branch into mainline.

2011-11-01 Thread Richard Guenther

On Mon, Oct 31, 2011 at 11:33 PM, Aldy Hernandez  wrote:
> This is somewhat of a me-too message for the transactional-memory work.  We
> would also like it to be considered for merging with mainline before the end
> of stage1.
>
> We have a kept a wiki here:
>
> http://gcc.gnu.org/wiki/TransactionalMemory
>
> What it is
> ==
>
> From the wiki...
>
> Transactional memory is intended to make programming with threads simpler.
> As with databases, a transaction is a unit of work that either completes in
> its entirety or has no effect at all. Further, transactions are isolated
> from each other such that each transaction sees a consistent view of memory.
>
> Transactional memory comes in two forms: a Software Transactional Memory
> (STM) system uses locks or other standard atomic instructions to do its job.
> A Hardware Transactional Memory (HTM) system uses features of the cpu to
> implement the requirements of the transaction directly (e.g. Rock
> processor). Most HTM systems are best effort, which means that the
> transaction can fail for unrelated reasons. Thus almost all systems that
> incorporate HTM also have a STM component and are thus termed Hybrid
> Transactional Memory systems.
>
> The transactional memory system to be implemented in GCC provides single
> lock atomicity semantics. That is, a program behaves as if a single global
> lock guards each transaction.
>
> What it involves
> 
>
> We have implemented the latest spec from the multi-vendor transactional
> memory group that includes AMD, Intel, Oracle, and others.  The last
> official spec is what is in the wiki above, yet there are some minor changes
> to the keywords that are currently being finalized in the final document
> (but have already been agreed upon), and will be published shortly.
>
> It is my understanding (Torvald, correct me if I'm wrong), that the current
> implementation is what has been agreed to by the committee, and has been
> given a favorable nod by various members of the C++ standardization
> committee.  Most importantly, the keywords are agreed upon.
>
> There are changes to the C and C++ front-end, and a software library
> (libitm) to go along with it.  The library works on x86-64, x86-32, and
> Richard's favorite, Alpha :-).  Porting to other architectures should be a
> straightforward affair.
>
> Status
> ==
>
> The current implementation runs the common TM benchmarks correctly, albeit
> there is still work to be done to improve performance.
>
> There are a handful of failed compiler tests on the included transactional
> memory testsuite (g*.dg/tm/*), but they are all missed optimizations, which
> we hope to have fixed after the merge.
>
> What's left
> ===
>
> Torvald is working on some recent changes to noexcept, and we should have
> this working in a few days.
>
> I will be removing the cancel-throw construct which didn't make it in the
> final spec.  I should have that done tomorrow.
>
> The final word
> ==
> Seeing that a global maintainer has been lead on this for a while, I suspect
> there isn't much to review formally.  I believe the only bits that Richard
> isn't directly responsible for are the C++ front-end changes.
>
> So what is the opinion/consensus on merging the branch?  It would be nice to
> get this infrastructure in place as well so we can get people to start using
> it, and then we can work out any issues that arise.
>
> I have no idea how this happened, but apparently I'm on the hook for merging
> both the cxx-mem-model and this branch (if/when one/both get approved).  If
> this gets approved, I'd prefer to get the cxx-mem-model branch merged first,
> and the transactional-memory branch later during the week.  I will be
> partially available during the weekend, and definitely during next week.

Given that you only recently merged with trunk again are you really
sure this is a great
idea at this point in time?  Does the GCC 4.7 user community benefit from this
in any way (or rather how much percentage of it)?

Thus, please consider merging early during GCC 4.8 stage1 instead.

Thanks,
Richard.

Re: IVopts bug?

2011-11-01 Thread Richard Guenther

2011/11/1 杜越海 :
> Hi all
>
>  I found IVopts rewrite a memory access with a weird iv candidate,
> which make it lost its original memory attribute.
>  a non-local memory access' base pointer was rewrite into a local one,
> and  it was deleted in pass_cd_dce since
> it was recognized as a local memory access.
>
> here is the case i simplified from a decoder source
>
> foo1(unsigned char* pSrcLeft,
>     unsigned char* pSrcAbove,
>     unsigned char* pSrcAboveLeft,
>     unsigned char* pDst,
>     int dstStep,
>     int leftStep)
> {
>  signed int x, y, s;
>  unsigned char  p1[5], p2[5],  p3;
>
>  p1[0] = *pSrcAboveLeft;
>  p2[0] = p1[0];
>  p2[1] = pSrcLeft[0];
>  pSrcLeft += leftStep;
>  p2[2] = pSrcLeft[0];
>  pSrcLeft += leftStep;
>  p2[3] = pSrcLeft[0];
>  pSrcLeft += leftStep;
>  p2[4] = pSrcLeft[0];
>
>  p1[1] = pSrcAbove[0];
>  p1[2] = pSrcAbove[1];
>  p1[3] = pSrcAbove[2];
>  p1[4] = pSrcAbove[3];
>
>  p3 = (unsigned char)(((signed int)p1[1] + (signed int)p2[1] +
> (signed int)p1[0]
>                +(signed int)p1[0] + 2 ) >> 2 );
>
>  for( y=0; y<4; y++, pDst += dstStep ) {
>    for( x=y+1; x<4; x++ ) {
>                    s = ( p1[x-y-1] + p1[x-y] + p1[x-y] + p1[x-y+1] + 2 ) >> 2;
>                    pDst[x] = (unsigned char)s;
>    }
>
>    pDst[y] = p3; -This memory access
>  }
> }
>
> before IVopts
>
>  D.6508_65 = pDst_88 + y.6_64;
>  *D.6508_65 = p3_37;
>
> after IVopts
> it was rewrite to
> MEM[symbol: p1, index: ivtmp.161_200, offset: 0B] = p3_37 ,
>
> by
> candidate 15
>  depends on 3
>  var_before ivtmp.161
>  var_after ivtmp.161
>  incremented before exit test
>  type unsigned int
>  base (unsigned int) pDst_39(D) - (unsigned int) &p1
>  step (unsigned int) (pretmp.28_118 + 1)
>
> so it still is &p1+ pDst - &p1 + step = pDst + step,
> and in pass_cd_dce, is_hidden_global_store () return false for this memory
> since it think this stmt only access local array p1.
>
>
>
> gcc version r180694
>
> Configured with: /home/croseadu/android/_src/src/gcc-src/configure
> --host=i486-linux-gnu --build=i486-linux-gnu
> --target=arm-none-linux-gnueabi
> --prefix=/home/croseadu/android/_src/install/arm-none-linux-gnueabi
> --enable-threads --disable-libmudflap --disable-libssp
> --disable-libstdcxx-pch --with-gnu-as --with-gnu-ld
> --enable-languages=c,c++ --enable-shared --enable-symvers=gnu
> --enable-__cxa_atexit
> --with-specs='%{funwind-tables|fno-unwind-tables|mabi=*|ffreestanding|nostdlib:;:-funwind-tables}'
> --disable-nls --enable-lto
> --with-sysroot=/home/croseadu/android/_src/install/arm-none-linux-gnueabi/arm-none-linux-gnueabi/libc
> --with-build-sysroot=/home/croseadu/android/_src/install/arm-none-linux-gnueabi/arm-none-linux-gnueabi/libc
> --with-gmp=/home/croseadu/android/_src/objs/arm-none-linux-gnueabi/obj/host-libs-/usr
> --with-mpfr=/home/croseadu/android/_src/objs/arm-none-linux-gnueabi/obj/host-libs-/usr
> --with-ppl=/home/croseadu/android/_src/objs/arm-none-linux-gnueabi/obj/host-libs-/usr
> --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic
> -lm' 
> --with-cloog=/home/croseadu/android/_src/objs/arm-none-linux-gnueabi/obj/host-libs-/usr
> --enable-cloog-backend=isl
> --with-mpc=/home/croseadu/android/_src/objs/arm-none-linux-gnueabi/obj/host-libs-/usr
> --enable-poison-system-directories --disable-libquadmath --enable-lto
> --enable-libgomp
> --with-build-time-tools=/home/croseadu/android/_src/install/arm-none-linux-gnueabi/arm-none-linux-gnueabi/bin
> --with-cpu=cortex-a8 --with-float=soft
>
> compile flags:
> -O3 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-double
>
> need file a bug?

Yes, it definitely should not do this kind of stupid (and invalid) thing.

Richard.

>
> Yuehai Du
>

SLP vectorizer on non-loop?

2011-11-01 Thread Bingfeng Mei

Hello,
I have one example with two very similar loops. cunrolli pass unrolls one loop 
completely
but not the other based on slightly different cost estimations. The 
not-unrolled loop 
get SLP-vectorized, then unrolled by "cunroll" pass, whereas the other unrolled 
loop cannot
be vectorized since it is not a loop any more.  In the end, there is big 
difference of
performance between two loops. 

My question is why SLP vectorization has to be performed on loop (it is a 
sub-pass under
pass_tree_loop). Conceptually, cannot it be done on any basic block? Our port 
are still
stuck at 4.5. But I checked 4.7, it seems still the same. I also checked 
functions in 
tree-vect-slp.c. They use a lot of loop_vinfo structures. But in some places it 
checks
whether loop_vinfo exists to use it or other alternative. I tried to add an 
extra SLP 
pass after pass_tree_loop, but it didn't work. I wonder how easy to make SLP 
works for 
non-loop.

Thanks,
Bingfeng Mei

Broadcom UK

void foo (int *__restrict__ temp_hist_buffer, 
  int * __restrict__ p_hist_buff, 
  int *__restrict__ p_input)
{
  int i;
  for(i=0;i<4;i++)
 temp_hist_buffer[i]=p_hist_buff[i];

  for(i=0;i<4;i++)
 temp_hist_buffer[i+4]=p_input[i];

}

Re: SLP vectorizer on non-loop?

2011-11-01 Thread Ira Rosen



gcc-ow...@gcc.gnu.org wrote on 01/11/2011 12:41:32 PM:

> Hello,
> I have one example with two very similar loops. cunrolli pass
> unrolls one loop completely
> but not the other based on slightly different cost estimations. The
> not-unrolled loop
> get SLP-vectorized, then unrolled by "cunroll" pass, whereas the
> other unrolled loop cannot
> be vectorized since it is not a loop any more.  In the end, there is
> big difference of
> performance between two loops.
>

Here what I see with the current trunk on x86_64 with -O3 (with the two
loops split into different functions):

The first loop, the one that doesn't get unrolled by cunrolli, gets loop
vectorized with -fno-vect-cost-model. With the cost model the vectorization
fails because the number of iterations is not sufficient (the vectorizer
tries to apply loop peeling in order to align the accesses), the loop gets
later unrolled by cunroll and the basic block gets vectorized by SLP.

The second loop, unrolled by cunrolli, also gets vectorized by SLP.

The *.optimized dumps look similar:


:
  vect_var_.14_48 = MEM[(int *)p_hist_buff_9(D)];
  MEM[(int *)temp_hist_buffer_5(D)] = vect_var_.14_48;
  return;


:
  vect_var_.7_57 = MEM[(int *)p_input_10(D)];
  MEM[(int *)temp_hist_buffer_6(D) + 16B] = vect_var_.7_57;
  return;


> My question is why SLP vectorization has to be performed on loop (it
> is a sub-pass under
> pass_tree_loop). Conceptually, cannot it be done on any basic block?
> Our port are still
> stuck at 4.5. But I checked 4.7, it seems still the same. I also
> checked functions in
> tree-vect-slp.c. They use a lot of loop_vinfo structures. But in
> some places it checks
> whether loop_vinfo exists to use it or other alternative. I tried to
> add an extra SLP
> pass after pass_tree_loop, but it didn't work. I wonder how easy to
> make SLP works for
> non-loop.

SLP vectorization works both on loops (in vectorize pass) and on basic
blocks (in slp-vectorize pass).

Ira

>
> Thanks,
> Bingfeng Mei
>
> Broadcom UK
>
> void foo (int *__restrict__ temp_hist_buffer,
>   int * __restrict__ p_hist_buff,
>   int *__restrict__ p_input)
> {
>   int i;
>   for(i=0;i<4;i++)
>  temp_hist_buffer[i]=p_hist_buff[i];
>
>   for(i=0;i<4;i++)
>  temp_hist_buffer[i+4]=p_input[i];
>
> }
>
>

RE: SLP vectorizer on non-loop?

2011-11-01 Thread Bingfeng Mei

Ira,
Thank you very much for quick answer. I will check 4.7 x86-64 
to see difference from our port. Is there significant change
between 4.5 & 4.7 regarding SLP? 

Cheers,
Bingfeng

> -Original Message-
> From: Ira Rosen [mailto:i...@il.ibm.com]
> Sent: 01 November 2011 11:13
> To: Bingfeng Mei
> Cc: gcc@gcc.gnu.org
> Subject: Re: SLP vectorizer on non-loop?
> 
> 
> 
> gcc-ow...@gcc.gnu.org wrote on 01/11/2011 12:41:32 PM:
> 
> > Hello,
> > I have one example with two very similar loops. cunrolli pass
> > unrolls one loop completely
> > but not the other based on slightly different cost estimations. The
> > not-unrolled loop
> > get SLP-vectorized, then unrolled by "cunroll" pass, whereas the
> > other unrolled loop cannot
> > be vectorized since it is not a loop any more.  In the end, there is
> > big difference of
> > performance between two loops.
> >
> 
> Here what I see with the current trunk on x86_64 with -O3 (with the two
> loops split into different functions):
> 
> The first loop, the one that doesn't get unrolled by cunrolli, gets
> loop
> vectorized with -fno-vect-cost-model. With the cost model the
> vectorization
> fails because the number of iterations is not sufficient (the
> vectorizer
> tries to apply loop peeling in order to align the accesses), the loop
> gets
> later unrolled by cunroll and the basic block gets vectorized by SLP.
> 
> The second loop, unrolled by cunrolli, also gets vectorized by SLP.
> 
> The *.optimized dumps look similar:
> 
> 
> :
>   vect_var_.14_48 = MEM[(int *)p_hist_buff_9(D)];
>   MEM[(int *)temp_hist_buffer_5(D)] = vect_var_.14_48;
>   return;
> 
> 
> :
>   vect_var_.7_57 = MEM[(int *)p_input_10(D)];
>   MEM[(int *)temp_hist_buffer_6(D) + 16B] = vect_var_.7_57;
>   return;
> 
> 
> > My question is why SLP vectorization has to be performed on loop (it
> > is a sub-pass under
> > pass_tree_loop). Conceptually, cannot it be done on any basic block?
> > Our port are still
> > stuck at 4.5. But I checked 4.7, it seems still the same. I also
> > checked functions in
> > tree-vect-slp.c. They use a lot of loop_vinfo structures. But in
> > some places it checks
> > whether loop_vinfo exists to use it or other alternative. I tried to
> > add an extra SLP
> > pass after pass_tree_loop, but it didn't work. I wonder how easy to
> > make SLP works for
> > non-loop.
> 
> SLP vectorization works both on loops (in vectorize pass) and on basic
> blocks (in slp-vectorize pass).
> 
> Ira
> 
> >
> > Thanks,
> > Bingfeng Mei
> >
> > Broadcom UK
> >
> > void foo (int *__restrict__ temp_hist_buffer,
> >   int * __restrict__ p_hist_buff,
> >   int *__restrict__ p_input)
> > {
> >   int i;
> >   for(i=0;i<4;i++)
> >  temp_hist_buffer[i]=p_hist_buff[i];
> >
> >   for(i=0;i<4;i++)
> >  temp_hist_buffer[i+4]=p_input[i];
> >
> > }
> >
> >
>

RE: SLP vectorizer on non-loop?

2011-11-01 Thread Ira Rosen



"Bingfeng Mei"  wrote on 01/11/2011 01:25:14 PM:

> Ira,
> Thank you very much for quick answer. I will check 4.7 x86-64
> to see difference from our port. Is there significant change
> between 4.5 & 4.7 regarding SLP?

Yes, I think so. 4.5 can't SLP data accesses with unknown alignment that
you have here.

Ira

>
> Cheers,
> Bingfeng
>
> > -Original Message-
> > From: Ira Rosen [mailto:i...@il.ibm.com]
> > Sent: 01 November 2011 11:13
> > To: Bingfeng Mei
> > Cc: gcc@gcc.gnu.org
> > Subject: Re: SLP vectorizer on non-loop?
> >
> >
> >
> > gcc-ow...@gcc.gnu.org wrote on 01/11/2011 12:41:32 PM:
> >
> > > Hello,
> > > I have one example with two very similar loops. cunrolli pass
> > > unrolls one loop completely
> > > but not the other based on slightly different cost estimations. The
> > > not-unrolled loop
> > > get SLP-vectorized, then unrolled by "cunroll" pass, whereas the
> > > other unrolled loop cannot
> > > be vectorized since it is not a loop any more.  In the end, there is
> > > big difference of
> > > performance between two loops.
> > >
> >
> > Here what I see with the current trunk on x86_64 with -O3 (with the two
> > loops split into different functions):
> >
> > The first loop, the one that doesn't get unrolled by cunrolli, gets
> > loop
> > vectorized with -fno-vect-cost-model. With the cost model the
> > vectorization
> > fails because the number of iterations is not sufficient (the
> > vectorizer
> > tries to apply loop peeling in order to align the accesses), the loop
> > gets
> > later unrolled by cunroll and the basic block gets vectorized by SLP.
> >
> > The second loop, unrolled by cunrolli, also gets vectorized by SLP.
> >
> > The *.optimized dumps look similar:
> >
> >
> > :
> >   vect_var_.14_48 = MEM[(int *)p_hist_buff_9(D)];
> >   MEM[(int *)temp_hist_buffer_5(D)] = vect_var_.14_48;
> >   return;
> >
> >
> > :
> >   vect_var_.7_57 = MEM[(int *)p_input_10(D)];
> >   MEM[(int *)temp_hist_buffer_6(D) + 16B] = vect_var_.7_57;
> >   return;
> >
> >
> > > My question is why SLP vectorization has to be performed on loop (it
> > > is a sub-pass under
> > > pass_tree_loop). Conceptually, cannot it be done on any basic block?
> > > Our port are still
> > > stuck at 4.5. But I checked 4.7, it seems still the same. I also
> > > checked functions in
> > > tree-vect-slp.c. They use a lot of loop_vinfo structures. But in
> > > some places it checks
> > > whether loop_vinfo exists to use it or other alternative. I tried to
> > > add an extra SLP
> > > pass after pass_tree_loop, but it didn't work. I wonder how easy to
> > > make SLP works for
> > > non-loop.
> >
> > SLP vectorization works both on loops (in vectorize pass) and on basic
> > blocks (in slp-vectorize pass).
> >
> > Ira
> >
> > >
> > > Thanks,
> > > Bingfeng Mei
> > >
> > > Broadcom UK
> > >
> > > void foo (int *__restrict__ temp_hist_buffer,
> > >   int * __restrict__ p_hist_buff,
> > >   int *__restrict__ p_input)
> > > {
> > >   int i;
> > >   for(i=0;i<4;i++)
> > >  temp_hist_buffer[i]=p_hist_buff[i];
> > >
> > >   for(i=0;i<4;i++)
> > >  temp_hist_buffer[i+4]=p_input[i];
> > >
> > > }
> > >
> > >
> >
>
>

Re: approaches to carry-flag modelling in RTL

2011-11-01 Thread Paulo J. Matos


On 01/11/11 02:43, Hans-Peter Nilsson wrote:


Not obvious or maybe I was unclear as to what I alluded?
In the below insn-bodies, "sub" is the insn that sets cc0 as a
side-effect.

Supposed canonical form :

(parallel
  [(set cc_reg) (compare ...))
   (set destreg) (sub ...))])
and:
(parallel
  [(set destreg) (sub ...))
   (clobber cc_reg)])

But IMHO it'd be easier (for most values of "easier") to combine
both patterns with that non-existing mechanism (and no, I don't
count match_parallel) if we instead canonicalized on the CC_REG
set being the same as the clobber position:

(parallel
  [(set destreg) (sub ...))
   (set cc_reg) (compare ...))])
with:
(parallel
  [(set destreg) (sub ...))
   (clobber cc_reg)])

brgds, H-P



That is very strange because if you look into RX or MN10300, they all 
have the set REG_CC as the last in the parallel. I wonder if it has 
anything to do with the fact that in these backends the set of the 
REG_CC only shows up after reload.



--
PMatos

Re: Potentially merging the transactional-memory branch into mainline.

2011-11-01 Thread Torvald Riegel

On Tue, 2011-11-01 at 10:49 +0100, Richard Guenther wrote:
> On Mon, Oct 31, 2011 at 11:33 PM, Aldy Hernandez  wrote:
> > This is somewhat of a me-too message for the transactional-memory work.  We
> > would also like it to be considered for merging with mainline before the end
> > of stage1.

[snip]

> > The final word
> > ==
> > Seeing that a global maintainer has been lead on this for a while, I suspect
> > there isn't much to review formally.  I believe the only bits that Richard
> > isn't directly responsible for are the C++ front-end changes.
> >
> > So what is the opinion/consensus on merging the branch?  It would be nice to
> > get this infrastructure in place as well so we can get people to start using
> > it, and then we can work out any issues that arise.
> >
> > I have no idea how this happened, but apparently I'm on the hook for merging
> > both the cxx-mem-model and this branch (if/when one/both get approved).  If
> > this gets approved, I'd prefer to get the cxx-mem-model branch merged first,
> > and the transactional-memory branch later during the week.  I will be
> > partially available during the weekend, and definitely during next week.
> 
> Given that you only recently merged with trunk again are you really
> sure this is a great
> idea at this point in time?

Yes, for the reasons outlined below.

> Does the GCC 4.7 user community benefit from this
> in any way (or rather how much percentage of it)?

Yes, we think so. Transactional Memory (TM) is a very easy-to-use
synchronization mechanism, which does not burden the programmer with
having to consider issues such as deadlocks or having to rely on
conventions regarding which locks cover which data. This complements the
recent efforts for low-level synchronization in GCC (ie, C++11 atomics)
and other threading-related features in C++11.

It is a new feature that isn't yet available in other mainstream
compiler products, but there is wide industry interest in TM. The TM
language specification for C++ that we have implemented in the branch is
the output of a cross-industry working group consisting of people from
HP, IBM, Intel, Oracle, and Red Hat. This group, including C++11 and
synchronization experts such as Hans Boehm, has been working since at
least 2009 on this specification, and we are pretty confident that we
have a good understanding of the matter, and which programming
abstractions we can and should offer. We have presented it to and
discussed it with several other affected parties (e.g., Boost folks,
academia, ...), and we hope to present it to the C++ standard community
in February.
On the hardware side, there clearly is interest too. For example, IBM
BlueGene/Q chips have hardware support for TM, and AMD released a
proposal for such support for x86 (AMD's Advanced Synchronization
Facility).

Thus, we are not investing in some wild and crazy idea here. In
contrast, because other mainstream compilers do not have this feature
yet (but do have in preview versions, e.g., in an ICC what-if
prototype), it is actually an opportunity for GCC to offer improvements
for its users first before other compilers do.

Parallelization and synchronization are of interest to many GCC users I
would argue, so giving them another reason to use GCC is definitely
good. Also, this is an area that will become even more important in the
future.

> Thus, please consider merging early during GCC 4.8 stage1 instead.

I do think that merging now is definitely better than waiting another
cycle:
- It does improve GCC for programmers that have to build concurrent
code. We know that the percentage of these programmers will increase.
- TM does not negatively affect any other features of GCC from the
perspective of users, because TM as we have implemented it smoothly
embeds into the C++11 memory model (but without actually being dependent
on its implementation or presence) and does not create other
dependencies. Do you see any examples for negative effects on other
features?
- The TM code in GCC is also pretty isolated from anything else; while
there is front-end support, most of the implementation and logic is in
an isolated runtime library (libitm).
- We (here meaning my colleagues and myself) definitely have the
expertise to maintain this, and we are willing to invest time in this in
the future.

Overall, this looks like much benefit, very little costs to me. The
sooner we make this available in mainline, the earlier users can benefit
from it.

Torvald

Re: Potentially merging the transactional-memory branch into mainline.

2011-11-01 Thread Andrew Haley

On 11/01/2011 01:52 PM, Torvald Riegel wrote:
> Yes, we think so. Transactional Memory (TM) is a very easy-to-use
> synchronization mechanism, which does not burden the programmer with
> having to consider issues such as deadlocks or having to rely on
> conventions regarding which locks cover which data. This complements the
> recent efforts for low-level synchronization in GCC (ie, C++11 atomics)
> and other threading-related features in C++11.

Speaking as someone not involved in the project, I have to agree with
this.  TM is something that's been kicking around in academe for a
while now, and exposure in gcc is potentially a significant benefit
for both people who want to experiment with TM and the gcc community.
The promise of TM for scalability is so great that we'd be fools not
to include it.

Andrew.

Re: Potentially merging the transactional-memory branch into mainline.

2011-11-01 Thread Jeff Law

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/01/11 03:49, Richard Guenther wrote:

> 
> Given that you only recently merged with trunk again are you
> really sure this is a great idea at this point in time?  Does the
> GCC 4.7 user community benefit from this in any way (or rather how
> much percentage of it)?
> 
> Thus, please consider merging early during GCC 4.8 stage1 instead.
This stuff is fairly isolated in terms of what it touches and I'm sure
if anything goes wrong, Aldy, Richard & Torvald will be available to
fix it.

The request to merge came in before the end of stage1, I don't see a
reason to delay things another 6-9 months.  This isn't like asking to
pull in a whole new register allocator at the end of stage1 :-)

Additionally, I believe we have a small window where we can position
GCC to be the compiler of choice for those working with TM; waiting
6-9 months for GCC 4.8 will miss that window.

Jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOsA7jAAoJEBRtltQi2kC7S7AH/27+psLfl2BMUnxK6aJkbH7i
UVto6b56d/mBKxNRAWr1CtwM2EQ312ZdR6Q5kWbXTfEOj/HaWHzg/0EROKT28HDn
3HQausblOz677J0Xx8iiAeuG7FY9pcsFKIs89KIDWqouuPkP1iea6ZxiMyF2YhDO
bOdXUscyD5upYU8t8Xk9PUB/LoRpby7wPRpmVuK6sd+SAyNYOZRRzQ6Rfu6eHdGB
R6jWktJmuNbKacFTYAFL7bwVRtFayb3VvrOwO+tIFcsUPRmHloz31HtCLjt0G6Vo
q+cwRM345Ku3IT2+8o/GHORKg3rD0wXdA9dUj/hMLcW321s5pMHnADcF14zzha0=
=CEcq
-END PGP SIGNATURE-

Re: implementation of std::thread::hardware_concurrency()

2011-11-01 Thread niXman

> Er, the macro _GLIBCXX_NPROCS already handles
> the case sysconf(_SC_NPROCESSORS_ONLN).
> It looks like you actually want to remove the macro
> _GLIBCXX_NPROCS completely.

Fixed.

diff --git a/libstdc++-v3/src/thread.cc b/libstdc++-v3/src/thread.cc
index 09e7fc5..6feda4d 100644
--- a/libstdc++-v3/src/thread.cc
+++ b/libstdc++-v3/src/thread.cc
@@ -112,10 +112,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   unsigned int
   thread::hardware_concurrency() noexcept
   {
-int __n = _GLIBCXX_NPROCS;
-if (__n < 0)
-  __n = 0;
-return __n;
+int count = 0;
+#if defined(PTW32_VERSION) || \
+   (defined(__MINGW64_VERSION_MAJOR) && defined(_POSIX_THREADS)) || \
+   defined(__hpux)
+count = pthread_num_processors_np();
+#elif defined(__APPLE__) || defined(__FreeBSD__)
+size_t size = sizeof(count);
+sysctlbyname("hw.ncpu", &count, &size, NULL, 0);
+#elif defined(_GLIBCXX_USE_GET_NPROCS) || \
+   defined(_GLIBCXX_USE_SC_NPROCESSORS_ONLN)
+count = _GLIBCXX_NPROCS;
+#endif
+return (count > 0) ? count : 0;
   }

 _GLIBCXX_END_NAMESPACE_VERSION




> Do you have already a Copyright assignment in place?

No. For public domain.

Re: implementation of std::thread::hardware_concurrency()

2011-11-01 Thread Jonathan Wakely

I've put gcc-patches@ back in the CC list and removed gcc@

On 1 November 2011 15:35, niXman wrote:
>> Er, the macro _GLIBCXX_NPROCS already handles
>> the case sysconf(_SC_NPROCESSORS_ONLN).
>> It looks like you actually want to remove the macro
>> _GLIBCXX_NPROCS completely.
>
> Fixed.

No, this still isn't acceptable.

I do not want to see preprocessor tests like

+#elif defined(__APPLE__) || defined(__FreeBSD__)

in the body of get_thread::hardware_concurrency(), the configure
script should determine what is available on the platform and set an
appropriate macro.

Look at the definition of _GLIBCXX_NPROCS and adjust that to do

#define _GLIBCXX_NPROCS pthread_num_processors_np()

for the relevant platforms.

For the platforms using sysctlbyname there could be an inline function
that calls it, and _GLIBCXX_NPROCS could be defined to call that, so
that thread::hardware_concurrency() can still be defined as it is
today.

Please read the code you're changing and understand how it works today
before making changes.

Re: Potentially merging the transactional-memory branch into mainline.

2011-11-01 Thread Diego Novillo


On 11-11-01 11:23 , Jeff Law wrote:


This stuff is fairly isolated in terms of what it touches and I'm sure
if anything goes wrong, Aldy, Richard&  Torvald will be available to
fix it.

The request to merge came in before the end of stage1, I don't see a
reason to delay things another 6-9 months.  This isn't like asking to
pull in a whole new register allocator at the end of stage1 :-)

Additionally, I believe we have a small window where we can position
GCC to be the compiler of choice for those working with TM; waiting
6-9 months for GCC 4.8 will miss that window.


I agree as well.  I have not being following the TM work very closely, 
but I've been interested in TM for a Long Time.  Having it available in 
4.7 would be a huge benefit to GCC.


I don't think it really is a risk from a release standpoint.  TM is an 
optional component and should not affect standard codegen paths. 
Additionally, given that Aldy and Richard H. are involved in it, I'm 
sure they will be very quick in addressing any problems that crop up.


Aldy, Richard, is there a patchset or master patch I could read?


Diego.

Re: Potentially merging the transactional-memory branch into mainline.

2011-11-01 Thread David Edelsohn

On Tue, Nov 1, 2011 at 5:49 AM, Richard Guenther
 wrote:

> Given that you only recently merged with trunk again are you really
> sure this is a great
> idea at this point in time?  Does the GCC 4.7 user community benefit from this
> in any way (or rather how much percentage of it)?

GCC has a history of merging and exposing technology previews.  Why
should the bar be placed higher for this feature?  The feature is
isolated and does not appear that it will interfere with other parts
of GCC.

Aldy, RTH, Torvald and Red Hat appear ready to address any problems promptly.

- David

Re: Potentially merging the transactional-memory branch into mainline.

2011-11-01 Thread Robert Dewar


On 11/1/2011 12:59 PM, David Edelsohn wrote:

On Tue, Nov 1, 2011 at 5:49 AM, Richard Guenther
  wrote:


Given that you only recently merged with trunk again are you really
sure this is a great
idea at this point in time?  Does the GCC 4.7 user community benefit from this
in any way (or rather how much percentage of it)?


GCC has a history of merging and exposing technology previews.  Why
should the bar be placed higher for this feature?  The feature is
isolated and does not appear that it will interfere with other parts
of GCC.

Aldy, RTH, Torvald and Red Hat appear ready to address any problems promptly.


I think this is an important feature, and support Richard's viewpoint on 
this.

Re: Potentially merging the transactional-memory branch into mainline.

2011-11-01 Thread Diego Novillo

On Tue, Nov 1, 2011 at 17:19, Robert Dewar  wrote:
> On 11/1/2011 12:59 PM, David Edelsohn wrote:
>>
>> On Tue, Nov 1, 2011 at 5:49 AM, Richard Guenther
>>   wrote:
>>
>>> Given that you only recently merged with trunk again are you really
>>> sure this is a great
>>> idea at this point in time?  Does the GCC 4.7 user community benefit from
>>> this
>>> in any way (or rather how much percentage of it)?
>>
>> GCC has a history of merging and exposing technology previews.  Why
>> should the bar be placed higher for this feature?  The feature is
>> isolated and does not appear that it will interfere with other parts
>> of GCC.
>>
>> Aldy, RTH, Torvald and Red Hat appear ready to address any problems
>> promptly.
>
> I think this is an important feature, and support Richard's viewpoint on
> this.

Richard who?  There are two Richards in this thread, and they seem to
have opposing views.


Diego.

Re: Potentially merging the transactional-memory branch into mainline.

2011-11-01 Thread Robert Dewar




Richard who?  There are two Richards in this thread, and they seem to
have opposing views.


I am confused by the multiple levels of quotes I think (the feature
in mailers of easily allowing you to include an entire earlier thread
is evil! :-)

Anyway, I support merging this in ...



Diego.

Re: Potentially merging the transactional-memory branch into mainline.

2011-11-01 Thread Aldy Hernandez




Aldy, Richard, is there a patchset or master patch I could read?


I have made current diff as of today:

http://quesejoda.com/tm-branch-latest.bz2

Re: Potentially merging the transactional-memory branch into mainline.

2011-11-01 Thread Diego Novillo


On 11-11-01 14:44 , Aldy Hernandez wrote:



Aldy, Richard, is there a patchset or master patch I could read?


I have made current diff as of today:

http://quesejoda.com/tm-branch-latest.bz2


Thanks.


Diego.

Re: Potentially merging the transactional-memory branch into mainline.

2011-11-01 Thread Jeff Law

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/01/11 12:44, Aldy Hernandez wrote:
> 
>> Aldy, Richard, is there a patchset or master patch I could read?
> 
> I have made current diff as of today:
> 
> http://quesejoda.com/tm-branch-latest.bz2

Umm,

Have you looked at those diffs, there's a fair amount of unrelated
crud in there...  It might help to break the blob into more easily
understood hunks for actual submissions.  ie, runtime bits (libitm),
changes to existing runtime stuff, compiler proper, testsuite bits, etc.

Obviously folks will want to look at changes to existing runtime and
the compiler proper bits the closest.

Jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOsEFTAAoJEBRtltQi2kC78/IH/RJ5yGVNZH8pJO1dt8AVGgBO
oj60ZpAqrXq0atYaYAj7VjPBx9RTYHFUWWW4acW4PGLuS01e2F7bRjSxI0dkSc5s
s++C01k+JvWzu9Q3hoSN73owGDC+eOEJ9vob6p8b99STgAWFly5OMXGdjjCcZjH1
EilDo6RNrpSn0Ez3rPxjeKItkwjsKHdE3LbgFScYnaQwE+LcU/JUgxXCiAvqu5Dg
3Aa4ADdbMWZHeOx9DewxHrcUrr8mRGgY3cCMv3miW0aCv6ClpMBrg+yPYK4Fh8EI
gJ9DvL9Y1yKTmYOp9cvhbAMs4UFAuEvbAUxOFZhp6dEp+HVAC+vGInQtAET27Ak=
=mxyE
-END PGP SIGNATURE-

Re: Potentially merging the transactional-memory branch into mainline.

2011-11-01 Thread Aldy Hernandez





Have you looked at those diffs, there's a fair amount of unrelated


Will clean up.


crud in there...  It might help to break the blob into more easily
understood hunks for actual submissions.  ie, runtime bits (libitm),
changes to existing runtime stuff, compiler proper, testsuite bits, etc.


Will do.

_mm{,256}_i{32,64}gather_{ps,pd,epi32,epi64} intrinsics semantics

2011-11-01 Thread Jakub Jelinek

Hi!

As the vgather* insns are designed to support both
unconditional and conditional gather loads, the current
pattern consume the previous content of the destination
register, so we end up with code like:
vmovaps .LC0(%rip), %ymm0
vmovdqa .LC1(%rip), %ymm5
vmovdqa .LC2(%rip), %ymm4
.p2align 4,,10
.p2align 3
.L6:
vmovdqa k(%rax,%rax), %ymm1
vmovaps %ymm0, %ymm6
vmovaps %ymm0, %ymm2
vmovdqa k+32(%rax,%rax), %ymm3
vgatherdps  %ymm6, vf1(,%ymm1,4), %ymm2
vmovaps %ymm0, %ymm1
vmovaps %ymm0, %ymm6
vcvttps2dq  %ymm2, %ymm2
vpshufb %ymm5, %ymm2, %ymm2
vgatherdps  %ymm6, vf1(,%ymm3,4), %ymm1
...
note: each vgather* preceeded usually by two movaps, one
copying usually before the loop computed/loaded mask of
all ones and the other initializes the destination register.
But with mask of all ones the whole destination register is
overwritten unless there is a segfault, so IMNSHO at least for
autovectorization it would be nice to just leave the content
of the destination register undefined in case of a segfault.
The only way users can see a difference is if a segfault happens
and in a segfault handler they inspect the destination register
or transfer control to the next insn from the segfault handler.

My question is about the avx2intrin.h intrinsics, in the AVX2
manual the insns are well documented, but there are no details
about the intrinsics.  There are 2 kind of intrinsics for
gather, one without mask/src operands, one with them.

So, my question is, for the intrinsics without mask/src
operands, is it supposed to be well defined what dest register will
contain after a segfault?  Currently we load zeros into src,
but would it be a valid optimization to just leave that register
undefined in case of segfault?  And, what about the other intrinsics
if mask is known to be all ones?  Can the compiler optimize this
and assume the destination is just overwritten rather than
being in/out operand?

What could be done is during expansion check if mask has all high
bits set and if so, just use different insn patterns that wouldn't
consume the register with "0" constraint.  Or have second set
of compiler builtins that wouldn't have src/mask arguments.

On large testcases (like Toon's weather forecast routine which has
over 260 vgather* insns) this would allow us to get rid of one
extra insn per vgather* insn.

Jakub

Re: Potentially merging the transactional-memory branch into mainline.

2011-11-01 Thread Richard Guenther

On Tue, Nov 1, 2011 at 5:59 PM, David Edelsohn  wrote:
> On Tue, Nov 1, 2011 at 5:49 AM, Richard Guenther
>  wrote:
>
>> Given that you only recently merged with trunk again are you really
>> sure this is a great
>> idea at this point in time?  Does the GCC 4.7 user community benefit from 
>> this
>> in any way (or rather how much percentage of it)?
>
> GCC has a history of merging and exposing technology previews.  Why
> should the bar be placed higher for this feature?  The feature is
> isolated and does not appear that it will interfere with other parts
> of GCC.

I remember at least seeing middle-end pieces in alias analysis.

> Aldy, RTH, Torvald and Red Hat appear ready to address any problems promptly.

Sure, I was just asking for a good reason to merge it now, given that I had the
impression the desire to merge for 4.7 is a bit rushed (given that the
branch wasn't
kept up-to-date with trunk until very recently and trunk regressions
were still being
fixed).

I'd like to see some breakdown into subsystem patches.  Can someone provide
those together with changelog entries?

Thanks,
Richard.

> - David
>

Re: approaches to carry-flag modelling in RTL

2011-11-01 Thread Hans-Peter Nilsson

Please, when replying, also send to me, not just the list.

On Tue, 1 Nov 2011, Paulo J. Matos wrote:
> On 01/11/11 02:43, Hans-Peter Nilsson wrote:
> >
> > Not obvious or maybe I was unclear as to what I alluded?
> > In the below insn-bodies, "sub" is the insn that sets cc0 as a
> > side-effect.
> >
> > Supposed canonical form :
> >
> > (parallel
> >   [(set cc_reg) (compare ...))
> >(set destreg) (sub ...))])
> > and:
> > (parallel
> >   [(set destreg) (sub ...))
> >(clobber cc_reg)])

> That is very strange because if you look into RX or MN10300, they all have the
> set REG_CC as the last in the parallel.

That'd be a good reason to flip the default...except that the
i386 has it the other way round i.e. as shown above.  I think
the main reason is that it just seemed right to those port
authors.

> I wonder if it has anything to do with
> the fact that in these backends the set of the REG_CC only shows up after
> reload.

Right, it'd only matter where (also) GCC cooks up combinations
(which IIRC it doesn't if the register is only exposed
post-reload), not where only the port emits them.  N.B., it
*could* very well be that I misremember about the canonical
form, but it seems neither of us bother to search the archives,
so never mind. ;)
...oh wait, see the comments at combine.c:2824 and 3030 r180744.
I can't find anything in the docs, but that might just be my
grep-fu failing.

I'm still thinking of a generic md iterator mechanism (one that
doesn't restrict the form of the expansion in ways getting in
the way with expanding to both a clobber and a set, and in
swapped locations as above), to make the troubles go away...
But maybe expanding them by a pass through e.g. m4 would be
better than cooking up something new there.

brgds, H-P

Re: Potentially merging the transactional-memory branch into mainline.

2011-11-01 Thread Aldy Hernandez




I'd like to see some breakdown into subsystem patches.  Can someone provide
those together with changelog entries?


I am doing another merge from trunk->branch, and will post a series of 
patches by subsystem.  I will do so after the merge is complete and tested.

gcc-4.4-20111101 is now available

2011-11-01 Thread gccadmin

Snapshot gcc-4.4-2001 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-2001/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.4 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch 
revision 180747

You'll find:

 gcc-4.4-20111101.tar.bz2 Complete GCC

  MD5=ada84cede36790f97da8a772e17dd211
  SHA1=962a08327a57dec5e3205821de336bd320707b4d

Diffs from 4.4-20111025 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.4
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

Re: Need help resolving PR target/50906

2011-11-01 Thread Alan Modra

On Mon, Oct 31, 2011 at 10:58:03AM -0500, Moffett, Kyle D wrote:
> I have not yet been able to figure out if it's a libgcc issue or an
> actual compiler issue.

It is a gcc bug.  I've added a comment to the PR.

-- 
Alan Modra
Australia Development Lab, IBM

printed versions of GCC Internals book?

2011-11-01 Thread Alan Lehotsky

While I really like machine-readable (and searchable) text online for the GCC 
internals, there's still an atavistic streak in me that wants hard copy that I 
can put post-it notes on, run a highlighter over relevant passages or read when 
I'm not near a computer screen.

I have two bound hard-copies (but the newer one is GCC 2.95) and laser-printed 
newer editions, but I've decided I really miss the bound-book format.

Anybody have any experience with using one of the print-on-demand services to 
produce a recent version of the gccint manual?  I was actually kind of 
surprised that the FSF hasn't taken advantage of this as a fund-raising 
opportunity.

After the initial setup costs, it looks like the per/book price for the 700pg 
gccint would be about $20, but the setup fees (at least here 
http://www.harvard.com/on_our_shelves/in_store_book_printing/books_on_demand/ ) 
would be ~$100.

So, unless someone has already done this, is there anyone else who'd want to 
buy a printed copy at a price that would recover my investment in the setup 
costs and postage?  I'd be happy to turn over the whole project to the FSF so 
they could end up with an ongoing revenue stream once I break-even on the 
deal

I'd guess that with 10 copies, we'd be looking at ~$35/copy, which is about as 
high I price as I'd be willing to pay if I was reading this email instead of 
writing it.

So, are there 10 people out there who'd like a reasonably current version of 
the Internals book, or is there someone else who'd like to drive?

-- Al Lehotsky

Re: IVopts bug?

2011-11-01 Thread Yuehai Du

2011/11/1 Richard Guenther :
> 2011/11/1 杜越海 :
>> Hi all
>>
>>  I found IVopts rewrite a memory access with a weird iv candidate,
>> which make it lost its original memory attribute.
>>  a non-local memory access' base pointer was rewrite into a local one,
>> and  it was deleted in pass_cd_dce since
>> it was recognized as a local memory access.
>>
>> here is the case i simplified from a decoder source
>>
>> foo1(unsigned char* pSrcLeft,
>> unsigned char* pSrcAbove,
>> unsigned char* pSrcAboveLeft,
>> unsigned char* pDst,
>> int dstStep,
>> int leftStep)
>> {
>>  signed int x, y, s;
>>  unsigned char  p1[5], p2[5],  p3;
>>
>>  p1[0] = *pSrcAboveLeft;
>>  p2[0] = p1[0];
>>  p2[1] = pSrcLeft[0];
>>  pSrcLeft += leftStep;
>>  p2[2] = pSrcLeft[0];
>>  pSrcLeft += leftStep;
>>  p2[3] = pSrcLeft[0];
>>  pSrcLeft += leftStep;
>>  p2[4] = pSrcLeft[0];
>>
>>  p1[1] = pSrcAbove[0];
>>  p1[2] = pSrcAbove[1];
>>  p1[3] = pSrcAbove[2];
>>  p1[4] = pSrcAbove[3];
>>
>>  p3 = (unsigned char)(((signed int)p1[1] + (signed int)p2[1] +
>> (signed int)p1[0]
>>+(signed int)p1[0] + 2 ) >> 2 );
>>
>>  for( y=0; y<4; y++, pDst += dstStep ) {
>>for( x=y+1; x<4; x++ ) {
>>s = ( p1[x-y-1] + p1[x-y] + p1[x-y] + p1[x-y+1] + 2 ) >> 
>> 2;
>>pDst[x] = (unsigned char)s;
>>}
>>
>>pDst[y] = p3; -This memory access
>>  }
>> }
>>
>> before IVopts
>>
>>  D.6508_65 = pDst_88 + y.6_64;
>>  *D.6508_65 = p3_37;
>>
>> after IVopts
>> it was rewrite to
>> MEM[symbol: p1, index: ivtmp.161_200, offset: 0B] = p3_37 ,
>>
>> by
>> candidate 15
>>  depends on 3
>>  var_before ivtmp.161
>>  var_after ivtmp.161
>>  incremented before exit test
>>  type unsigned int
>>  base (unsigned int) pDst_39(D) - (unsigned int) &p1
>>  step (unsigned int) (pretmp.28_118 + 1)
>>
>> so it still is &p1+ pDst - &p1 + step = pDst + step,
>> and in pass_cd_dce, is_hidden_global_store () return false for this memory
>> since it think this stmt only access local array p1.
>>
>>
>>
>> gcc version r180694
>>
>> Configured with: /home/croseadu/android/_src/src/gcc-src/configure
>> --host=i486-linux-gnu --build=i486-linux-gnu
>> --target=arm-none-linux-gnueabi
>> --prefix=/home/croseadu/android/_src/install/arm-none-linux-gnueabi
>> --enable-threads --disable-libmudflap --disable-libssp
>> --disable-libstdcxx-pch --with-gnu-as --with-gnu-ld
>> --enable-languages=c,c++ --enable-shared --enable-symvers=gnu
>> --enable-__cxa_atexit
>> --with-specs='%{funwind-tables|fno-unwind-tables|mabi=*|ffreestanding|nostdlib:;:-funwind-tables}'
>> --disable-nls --enable-lto
>> --with-sysroot=/home/croseadu/android/_src/install/arm-none-linux-gnueabi/arm-none-linux-gnueabi/libc
>> --with-build-sysroot=/home/croseadu/android/_src/install/arm-none-linux-gnueabi/arm-none-linux-gnueabi/libc
>> --with-gmp=/home/croseadu/android/_src/objs/arm-none-linux-gnueabi/obj/host-libs-/usr
>> --with-mpfr=/home/croseadu/android/_src/objs/arm-none-linux-gnueabi/obj/host-libs-/usr
>> --with-ppl=/home/croseadu/android/_src/objs/arm-none-linux-gnueabi/obj/host-libs-/usr
>> --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic
>> -lm' 
>> --with-cloog=/home/croseadu/android/_src/objs/arm-none-linux-gnueabi/obj/host-libs-/usr
>> --enable-cloog-backend=isl
>> --with-mpc=/home/croseadu/android/_src/objs/arm-none-linux-gnueabi/obj/host-libs-/usr
>> --enable-poison-system-directories --disable-libquadmath --enable-lto
>> --enable-libgomp
>> --with-build-time-tools=/home/croseadu/android/_src/install/arm-none-linux-gnueabi/arm-none-linux-gnueabi/bin
>> --with-cpu=cortex-a8 --with-float=soft
>>
>> compile flags:
>> -O3 -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-double
>>
>> need file a bug?
>
> Yes, it definitely should not do this kind of stupid (and invalid) thing.
>
> Richard.
>
>>
>> Yuehai Du
>>
>
file a bug http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50955,  Could
someboy help me to fix this PR? Thank you very much.

Yuehai Du

Re: scalar vector shift expansion problem on 64-bit

2011-11-01 Thread David Miller

From: David Miller 
Date: Fri, 28 Oct 2011 01:05:54 -0400 (EDT)

> So should expand_vector_broadcast() really provide this invariant to
> the vec_init expander, or does the vec_init expander need to tidy
> things up with gen_lowpart() etc. calls?

Richard I don't know if you had a chance to look into this at all yet,
but I wanted to make a comment about vec_init in general.

I've come to find that I want the compiler to do as little as possible
with the expressions that get put into vector initializers.

I don't want it to modify the mode of the individual inner elements in
the assignment. I also don't want it to force mems into registers.

In fact, the less it does the better.

I want to make use of the special VIS load instructions that can take
a QImode or HImode value in memory and load it zero extended into a
64-bit float register.

For example:

int x;

__v8qi test_v8qi(void)
{
  __v8qi ret = { x, x, x, x, x, x, x, x };

  return ret;
}

I want to be able to generate:

test_v8qi:
sethi %hi(x + 3), %g1
or%g1, %lo(x + 3), %g1
ldda  [%g1] ASI_FL8_P, %f2
sethi %hi(0x), %g2
or%g2, %lo(0x), %g2
bmask %g2, %g0, %g0
retl
 bshuffle  %f2, %f2, %f0

but I can't because the vec_init expander sees:

(parallel:V8QI [
(reg:QI 110 [ D.2249 ])
(reg:QI 110 [ D.2249 ])
(reg:QI 110 [ D.2249 ])
(reg:QI 110 [ D.2249 ])
(reg:QI 110 [ D.2249 ])
(reg:QI 110 [ D.2249 ])
(reg:QI 110 [ D.2249 ])
(reg:QI 110 [ D.2249 ])
])

in operands[1].

Re: # of unexpected failures 768 ?

2011-11-01 Thread Michael Haubenwallner

On 10/31/11 19:20, Jonathan Wakely wrote:
> On 31 October 2011 17:38, Rainer Orth wrote:
>> Dennis Clarke  writes:
>>
> I'm uncertain if Solaris 8/x86 still supports bare i386 machines, so it
> might be better to keep the default of pentiumpro instead.

 Solaris 8 won't run on anything less than pentium, I recently
 convinced someone else to stop building GCC for i386 on Solaris:

 http://gcc.gnu.org/ml/gcc-help/2011-10/msg5.html
> 
> Quite.  In fact there are *very* good reasons not to configure for
> 80386: libstdc++'s configure uses the default arch being configured
> for, and disables a number of features on i386 because it doesn't
> support the required atomic ops.
> 
> So by configuring for i386 you will distribute a GCC package that is
> missing useful features, but supports an ancient architecture that
> Solaris doesn't even run on.
> 
> You should configure for pentium-pc-solaris2.8 or use --with-arch-32=pentium

When not configuring with '--host=i386-pc-solaris2.8', it is config.guess
that detects 'i386-pc-solaris2.8', just tried here with most recent
config.guess on i86pc Solaris2.10, result is 'i386-pc-solaris2.10'.

Actually, it is uname showing the 'i386' on Solaris:
  $ uname -p   # Prints the current host's ISA or processor type.
  i386
  $ uname -i   # Prints the name of the platform.
  i86pc

So I'd wonder if '--host=i386-pc-solaris2.8' actually does make any difference 
here.

Just my 2 cents.

/haubi/

42 matches

Mail list logo