RE: option -mprfchw on 2 different Opteron cpus

2016-05-02 Thread Kumar, Venkataramanan
Hi,

> -Original Message-
> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
> NightStrike
> Sent: Monday, May 2, 2016 1:55 AM
> To: gcc@gcc.gnu.org
> Cc: Jan Hubicka ; Jakub Jelinek 
> Subject: option -mprfchw on 2 different Opteron cpus  
> 
> Reposting from here:
> https://gcc.gnu.org/ml/gcc-help/2016-05/msg3.html
> 
> Not sure if this applies:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54210
> 
> If I compile on a k8 Opteron 248 with -march=native, I do not see -mprfchw
> listed in the options in -fverbose-asm.  In the assembly, I see this:
> 
> prefetcht0  (%rax)  # ivtmp.1160
> prefetcht0  304(%rcx)   #
> prefetcht0  (%rax)  # ivtmp.1160

In AMD processors -mprfchw flag  is used to enable "3dnowprefetch" ISA support.

(Snip)
CPUID Fn8000_0001_ECX Feature Identifiers
Bit 8 
3DNowPrefetch: PREFETCH and PREFETCHW instruction support. See “PREFETCH” and
“PREFETCHW” in APM3
Ref: http://support.amd.com/TechDocs/25481.pdf
(Snip)

Can you please confirm what this CPUID flag returns on your k8 machine ?.
I believe this ISA is not available on k8 machine so when -march=native is 
added you don’t see  -mprfchw in verbose.

> 
> If I compile on a bdver2 Opteron 6386 SE with -march=k8 (thus trying to
> target the older system), I do see it listed in the options in -fverbose-asm. 
>  In
> the assembly, I see this:

K8 has 3dnow support and there is a patch that replaced 3dnow with prefetchw 
(3DNowPrefetch). 
https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00866.html
So when you add -march=k8 you see -mprfchw  getting listed in verbose.

> 
> prefetcht0  (%rax)  # ivtmp.1160
> prefetcht0  304(%rcx)   #
> prefetchw   (%rax)  # ivtmp.1160
> 
> (The third line is the only difference)
> 

This is my guess without seeing the test case, when write  prefetching is 
requested "prefetchw" is generated. 
3dnow (TARGET_3DNOW) ISA has support for it. 

(Snip)
Support for the PREFETCH and PREFETCHW instructions is indicated by CPUID
Fn8000_0001_ECX[3DNowPrefetch] OR Fn8000_0001_EDX[LM] OR
Fn8000_0001_EDX[3DNow] = 1.
(Snip)
Ref: http://developer.amd.com/wordpress/media/2008/10/24594_APM_v3.pdf

> In both cases, I'm using gcc 4.9.3.  Which is correct for a k8 Opteron 248?
> 
> Also, FWIW:
> 
> 1) The march=native version that uses prefetcht0 is very repeatably faster by
> about 15% in the particular test case I'm looking at.
> 
> 2) The compilers in both instances are not just the same version, they are the
> same compiler binary installed on an NFS mount and shared to both
> computers.

As per GCC4.9.3 source.

(Snip)
(define_expand "prefetch"
  [(prefetch (match_operand 0 "address_operand")
 (match_operand:SI 1 "const_int_operand")
 (match_operand:SI 2 "const_int_operand"))]
  "TARGET_PREFETCH_SSE || TARGET_PRFCHW || TARGET_PREFETCHWT1"
{
  bool write = INTVAL (operands[1]) != 0;
  int locality = INTVAL (operands[2]);

  gcc_assert (IN_RANGE (locality, 0, 3));

  /* Use 3dNOW prefetch in case we are asking for write prefetch not
 supported by SSE counterpart or the SSE prefetch is not available
 (K6 machines).  Otherwise use SSE prefetch as it allows specifying
 of locality.  */
  if (TARGET_PREFETCHWT1 && write && locality <= 2)
operands[2] = const2_rtx;
  else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE))
operands[2] = GEN_INT (3);
  else
operands[1] = const0_rtx;
})
(Snip)

Write prefetch may be requested (either by auto prefetcher or builtins) but on 
-march=native, the below check could have become false.
   else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE))
TARGET_PRFCHW is off on native. 

So there are two issues here. 

(1) ISA flags enabled with -march=k8 is different from -march=native on k8 
machine.
(2) Need to check why GCC middle end requested write prefetch for the test case 
with -march=k8 .

Regards,
Venkat.


 



 




GCC 6.1 Hard-coded C++ header paths and relocation problem on Windows

2016-05-02 Thread lh_mouse
This is a cross-post from gcc-help as there haven't been any replies on 
gcc-help since two days ago. Hope someone could help.
```

I have built GCC from gcc-6-branch in MSYS2 with mingw-w64 CRT on Windows today.
Now I have a relocation problem:

Assuming mingw-w64 headers are located in the follow directory,which is, the 
native_system_header_dir:
> C:/MinGW/MSYS2/mingw32/lib/gcc/i686-w64-mingw32/6.1.1/include
I have built GCC and it has that hard-coded path.
When I compile something using g++ -v, the headers are searched in the 
following paths:
```
ignoring nonexistent directory "/mingw32/include"
ignoring duplicate directory "C:/MinGW/MSYS2/mingw32/i686-w64-mingw32/include"
#include "..." search starts here:
#include <...> search starts here:
 C:/MinGW/MSYS2/mingw32/include/c++/6.1.1
 C:/MinGW/MSYS2/mingw32/include/c++/6.1.1/i686-w64-mingw32
 C:/MinGW/MSYS2/mingw32/include/c++/6.1.1/backward
 C:/MinGW/MSYS2/mingw32/lib/gcc/i686-w64-mingw32/6.1.1/include
 C:/MinGW/MSYS2/mingw32/lib/gcc/i686-w64-mingw32/6.1.1/../../../../include
 C:/MinGW/MSYS2/mingw32/lib/gcc/i686-w64-mingw32/6.1.1/include-fixed
 
C:/MinGW/MSYS2/mingw32/lib/gcc/i686-w64-mingw32/6.1.1/../../../../i686-w64-mingw32/include
End of search list.
```
The C++ headers are searched before any mingw-w64 headers, which is just fine.

However, if I move gcc to another directory, let's say, 
C:/this_is_a_new_directory/mingw32/,
then re-compile the same program with g++ -v, the headers are searched in the 
following paths:
```
ignoring duplicate directory 
"C:/this_is_a_new_directory/mingw32/lib/gcc/../../lib/gcc/i686-w64-mingw32/6.1.1/include"
ignoring nonexistent directory "C:/MinGW/MSYS2/mingw32/include"
ignoring nonexistent directory "/mingw32/include"
ignoring duplicate directory 
"C:/this_is_a_new_directory/mingw32/lib/gcc/../../lib/gcc/i686-w64-mingw32/6.1.1/include-fixed"
ignoring duplicate directory 
"C:/this_is_a_new_directory/mingw32/lib/gcc/../../lib/gcc/i686-w64-mingw32/6.1.1/../../../../i686-w64-mingw32/include"
ignoring nonexistent directory "C:/MinGW/MSYS2/mingw32/i686-w64-mingw32/include"
#include "..." search starts here:
#include <...> search starts here:
 
C:/this_is_a_new_directory/mingw32/bin/../lib/gcc/i686-w64-mingw32/6.1.1/include
 
C:/this_is_a_new_directory/mingw32/bin/../lib/gcc/i686-w64-mingw32/6.1.1/../../../../include
 
C:/this_is_a_new_directory/mingw32/bin/../lib/gcc/i686-w64-mingw32/6.1.1/include-fixed
 
C:/this_is_a_new_directory/mingw32/bin/../lib/gcc/i686-w64-mingw32/6.1.1/../../../../i686-w64-mingw32/include
 C:/this_is_a_new_directory/mingw32/lib/gcc/../../include/c++/6.1.1
 
C:/this_is_a_new_directory/mingw32/lib/gcc/../../include/c++/6.1.1/i686-w64-mingw32
 C:/this_is_a_new_directory/mingw32/lib/gcc/../../include/c++/6.1.1/backward
End of search list.
```
This time the C++ headers are searched after mingw-w64 headers, which causes 
the following error:
```
In file included from 
C:/MinGW/mingw32/include/c++/6.1.1/ext/string_conversions.h:41:0,
 from 
C:/MinGW/mingw32/include/c++/6.1.1/bits/basic_string.h:5402,
 from C:/MinGW/mingw32/include/c++/6.1.1/string:52,
 from 
C:/MinGW/mingw32/include/c++/6.1.1/bits/locale_classes.h:40,
 from C:/MinGW/mingw32/include/c++/6.1.1/bits/ios_base.h:41,
 from C:/MinGW/mingw32/include/c++/6.1.1/ios:42,
 from C:/MinGW/mingw32/include/c++/6.1.1/ostream:38,
 from C:/MinGW/mingw32/include/c++/6.1.1/iostream:39,
 from test.cpp:1:
C:/MinGW/mingw32/include/c++/6.1.1/cstdlib:75:25: fatal error: stdlib.h: No 
such file or directory
 #include_next 
 ^
compilation terminated.
```

Do you know how to solve this problem (modifications to gcc source code are 
expected)?
Thanks in advance.



--
Best regards,
lh_mouse
2016-05-02



Re: GCC 6.1 Hard-coded C++ header paths and relocation problem on Windows

2016-05-02 Thread lh_mouse
I made some investigation yesterday and here is the result:
```

Diff'ing gcc/libstdc++-v3/include/c_global/cstdlib from gcc-5-branch and 
gcc-6-branch gives the following result:
(git diff gcc-5-branch gcc-6-branch -- libstdc++-v3/include/c_global/cstdlib)
```
@@ -69,7 +69,11 @@ namespace std
 
 #else
 
-#include 
+// Need to ensure this finds the C library's  not a libstdc++
+// wrapper that might already be installed later in the include search path.
+#define _GLIBCXX_INCLUDE_NEXT_C_HEADERS
+#include_next 
+#undef _GLIBCXX_INCLUDE_NEXT_C_HEADERS
 
 // Get rid of those macros defined in  in lieu of real functions.
 #undef abort
```
Replacing #include_next with #include fixes the problem.

However, I am not exactly clear about whether it is these headers (cstdlib and 
cmath currently, there might be more) that are the problem.
In my point of view, it is the inversion of C and C++ header paths that is the 
problem.



--   
Best regards,
lh_mouse
2016-05-02



Re: option -mprfchw on 2 different Opteron cpus

2016-05-02 Thread NightStrike
On Mon, May 2, 2016 at 5:55 AM, Kumar, Venkataramanan
 wrote:
>> If I compile on a k8 Opteron 248 with -march=native, I do not see -mprfchw
>> listed in the options in -fverbose-asm.  In the assembly, I see this:
>>
>> prefetcht0  (%rax)  # ivtmp.1160
>> prefetcht0  304(%rcx)   #
>> prefetcht0  (%rax)  # ivtmp.1160
>
> In AMD processors -mprfchw flag  is used to enable "3dnowprefetch" ISA 
> support.
>
> (Snip)
> CPUID Fn8000_0001_ECX Feature Identifiers
> Bit 8
> 3DNowPrefetch: PREFETCH and PREFETCHW instruction support. See “PREFETCH” and
> “PREFETCHW” in APM3
> Ref: http://support.amd.com/TechDocs/25481.pdf
> (Snip)
>
> Can you please confirm what this CPUID flag returns on your k8 machine ?.
> I believe this ISA is not available on k8 machine so when -march=native is 
> added you don’t see  -mprfchw in verbose.

Looks like zero?  This was generated with the cpuid program from
http://www.etallen.com/cpuid.html

CPU 0:
   0x 0x00: eax=0x0001 ebx=0x68747541 ecx=0x444d4163 edx=0x69746e65
   0x0001 0x00: eax=0x0f58 ebx=0x0800 ecx=0x edx=0x078bfbff
   0x8000 0x00: eax=0x8018 ebx=0x68747541 ecx=0x444d4163 edx=0x69746e65
   0x8001 0x00: eax=0x0f58 ebx=0x0405 ecx=0x edx=0xe1d3fbff
   0x8002 0x00: eax=0x20444d41 ebx=0x6574704f ecx=0x286e6f72 edx=0x20296d74
   0x8003 0x00: eax=0x636f7250 ebx=0x6f737365 ecx=0x34322072 edx=0x0038
   0x8004 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x8005 0x00: eax=0xff08ff08 ebx=0xff20ff20 ecx=0x40020140 edx=0x40020140
   0x8006 0x00: eax=0x ebx=0x42004200 ecx=0x04008140 edx=0x
   0x8007 0x00: eax=0x ebx=0x ecx=0x edx=0x0009
   0x8008 0x00: eax=0x3028 ebx=0x ecx=0x edx=0x
   0x8009 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x800a 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x800b 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x800c 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x800d 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x800e 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x800f 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x8010 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x8011 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x8012 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x8013 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x8014 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x8015 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x8016 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x8017 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x8018 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0x8086 0x00: eax=0x ebx=0x ecx=0x edx=0x
   0xc000 0x00: eax=0x ebx=0x ecx=0x edx=0x

CPU:
   vendor_id = "AuthenticAMD"
   version information (1/eax):
  processor type  = primary processor (0)
  family  = Intel Pentium 4/Pentium D/Pentium Extreme
Edition/Celeron/Xeon/Xeon MP/Itanium2, AMD Athlon 64/Athlon
XP-M/Opteron/Sempron/Turion (15)
  model   = 0x5 (5)
  stepping id = 0x8 (8)
  extended family = 0x0 (0)
  extended model  = 0x0 (0)
  (simple synth)  = AMD Opteron (DP SledgeHammer SH7-C0) / Athlon
64 FX (DP SledgeHammer SH7-C0), 940-pin, .13um
   miscellaneous (1/ebx):
  process local APIC physical ID = 0x0 (0)
  cpu count  = 0x0 (0)
  CLFLUSH line size  = 0x8 (8)
  brand index= 0x0 (0)
   brand id = 0x00 (0): unknown
   feature information (1/edx):
  x87 FPU on chip= true
  virtual-8086 mode enhancement  = true
  debugging extensions   = true
  page size extensions   = true
  time stamp counter = true
  RDMSR and WRMSR support= true
  physical address extensions= true
  machine check exception= true
  CMPXCHG8B inst.= true
  APIC on chip   = true
  SYSENTER and SYSEXIT   = true
  memory type range registers= true
  PTE global bit = true
  machine check architecture = true
  conditional move/compare instruction   = true
  page attribute table   = true
  page size extension= true
  processor serial number= f

Re: (R5900) Implementing Vector Support

2016-05-02 Thread Richard Henderson

On 04/29/2016 07:54 AM, Liu Woon Yung wrote:

I've done something like that, but GCC still doesn't select the pattern to use:
(define_insn "vec_cmp"


Because you've used the wrong name.  The patterns are:

OPTAB_CD(vec_cmp_optab, "vec_cmp$a$b")
OPTAB_CD(vec_cmpu_optab, "vec_cmpu$a$b")

I see where the confusion is though.  These:

i386/sse.md:(define_expand "vec_cmp"
i386/sse.md:(define_expand "vec_cmp"
i386/sse.md:(define_expand "vec_cmp"
i386/sse.md:(define_expand "vec_cmp"
i386/sse.md:(define_expand "vec_cmpv2div2di"
i386/sse.md:(define_expand "vec_cmp"
i386/sse.md:(define_expand "vec_cmp"
i386/sse.md:(define_expand "vec_cmpu"
i386/sse.md:(define_expand "vec_cmpu"
i386/sse.md:(define_expand "vec_cmpu"
i386/sse.md:(define_expand "vec_cmpu"
i386/sse.md:(define_expand "vec_cmpuv2div2di"

are the only usage examples within the gcc tree.

All of the other "vec_cmp" stuff that you're seeing are internal to the 
rs6000 and s390 ports, for implementing builtins and/or vcond.



rs6000 doesn't implement bare comparisons, but only implements the "vcond"
conditional move upon which uses the comparison.  Many of the other targets
do the same thing.


Is there a reason why implementing only vcond is preferred?


I believe that's just history.  IIRC, only vcond was present originally.

Amusingly, I believe that was because vcond was designed to handle one of the 
other MIPS vector extensions (MDMX?) wherein the comparison results are placed 
in (a set of) condition code registers, and thus producing a per-element {0,-1} 
vector result requires extra instructions.



r~


r235766 incomplete?

2016-05-02 Thread Martin Sebor

Hi Jan,

I just noticed the compilation errors in the attached file with
the latest trunk.  It seems as though your recent patch below may
be incomplete:

  commit 46e5dccc6f188bd0fd5af4e9778f547ab63c9cae
  Author: hubicka 
  Date: Mon May 2 16:55:56 2016 +

The following change causes compilation errors due to
ipa_find_agg_cst_for_param taking just three arguments, while it
is being called with four.  (I haven't looked into the other error.)

Regards
Martin

--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -850,7 +850,8 @@ evaluate_conditions_for_known_args (struct 
cgraph_node *node

,
  if (known_aggs.exists ())
{
  agg = known_aggs[c->operand_num];
- val = ipa_find_agg_cst_for_param (agg, c->offset, c->by_ref);
+ val = ipa_find_agg_cst_for_param (agg, 
known_vals[c->operand_num],

+   c->offset, c->by_ref);
/src/gcc/66561/gcc/ipa-inline-analysis.c: In function ‘clause_t evaluate_conditions_for_known_args(cgraph_node*, bool, vec, vec)’:
/src/gcc/66561/gcc/ipa-inline-analysis.c:854:27: error: invalid conversion from ‘tree_node*’ to ‘long int’ [-fpermissive]
   c->offset, c->by_ref);
   ^
/src/gcc/66561/gcc/ipa-inline-analysis.c:854:27: error: too many arguments to function ‘tree_node* ipa_find_agg_cst_for_param(ipa_agg_jump_function*, long int, bool)’
In file included from /src/gcc/66561/gcc/ipa-inline-analysis.c:90:0:
/src/gcc/66561/gcc/ipa-prop.h:639:6: note: declared here
 tree ipa_find_agg_cst_for_param (struct ipa_agg_jump_function *, HOST_WIDE_INT,
  ^
/src/gcc/66561/gcc/ipa-inline.c: In function ‘bool can_inline_edge_p(cgraph_edge*, bool, bool, bool)’:
/src/gcc/66561/gcc/ipa-inline.c:341:55: error: ‘CIF_THUNK’ was not declared in this scope
 e->inline_failed = e->caller->thunk.thunk_p ? CIF_THUNK : CIF_MISMATCHED_ARGUMENTS;
   ^


Re: r235766 incomplete?

2016-05-02 Thread David Malcolm
On Mon, 2016-05-02 at 11:50 -0600, Martin Sebor wrote:
> Hi Jan,
> 
> I just noticed the compilation errors in the attached file with
> the latest trunk.  It seems as though your recent patch below may
> be incomplete:
> 
>commit 46e5dccc6f188bd0fd5af4e9778f547ab63c9cae
>Author: hubicka 
>Date: Mon May 2 16:55:56 2016 +
> 
> The following change causes compilation errors due to
> ipa_find_agg_cst_for_param taking just three arguments, while it
> is being called with four.  (I haven't looked into the other error.)
> 
> Regards
> Martin
> 
> --- a/gcc/ipa-inline-analysis.c
> +++ b/gcc/ipa-inline-analysis.c
> @@ -850,7 +850,8 @@ evaluate_conditions_for_known_args (struct 
> cgraph_node *node
> ,
>if (known_aggs.exists ())
>  {
>agg = known_aggs[c->operand_num];
> - val = ipa_find_agg_cst_for_param (agg, c->offset, c
> ->by_ref);
> + val = ipa_find_agg_cst_for_param (agg, 
> known_vals[c->operand_num],
> +   c->offset, c
> ->by_ref);

I saw this too (with r235766).  I believe it's fixed by r235770 and
r235771:

 2016-05-02  Jan Hubicka  
 
* cif-code.def (CIF_THUNK): Add.
* ipa-inline-analsysis.c (evaluate_conditions_for_known_args): Revert
accidental change.

(albeit with a typo in that second filename)


r235771 work for me, FWIW.


Dave


determining reassociation width

2016-05-02 Thread Aaron Sawdey
So, my first cut at the function to select reassociation width for
power was modeled after what I saw i386 and aarch64 doing, which is to
return something based on the number of that kind of op we can do at
the same time:

static int
rs6000_reassociation_width (unsigned int opc, enum machine_mode mode)
{
switch (rs6000_cpu) {
case PROCESSOR_POWER8:
case PROCESSOR_POWER9:
if (VECTOR_MODE_P (mode)) 
return 2;
if (INTEGRAL_MODE_P (mode)) {
if ( opc == MULT_EXPR ) return 2;
return 6; /* correct for all integral modes? */
}
if (FLOAT_MODE_P (mode))
return 2;
/* decimal float gets default 1 */
break;
default:
break;
}

return 1;
}

However, the reality of the situation is a bit more complicated I
think.

* If we want maximum parallelism, we should really base this on the
number of units times the latency. I.e. for float on p8 we have 2 units
and 6 cycles latency so we would want to issue up to 12 fadd or fmul in
parallel, then the result from the first one would be ready for the
next series of dependent ops.
* Of course this may cause massive register spills and so we can't
really make things that wide. So, reassociation ought to be aware of
how much register pressure it is creating and how much has been created
by things that want to be live across this bb. 
* Ideally we would also be aware of whether we are reassociating a tree
of fp additions whose terms are fp multiplies because now we have
fused multipy-adds to consider. See PR 70912 for more on this.

Suggestions?

Thanks, 
   Aaron

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain




Re: r235766 incomplete?

2016-05-02 Thread Jan Hubicka
> On Mon, 2016-05-02 at 11:50 -0600, Martin Sebor wrote:
> > Hi Jan,
> > 
> > I just noticed the compilation errors in the attached file with
> > the latest trunk.  It seems as though your recent patch below may
> > be incomplete:
> > 
> >commit 46e5dccc6f188bd0fd5af4e9778f547ab63c9cae
> >Author: hubicka 
> >Date: Mon May 2 16:55:56 2016 +
> > 
> > The following change causes compilation errors due to
> > ipa_find_agg_cst_for_param taking just three arguments, while it
> > is being called with four.  (I haven't looked into the other error.)
> > 
> > Regards
> > Martin
> > 
> > --- a/gcc/ipa-inline-analysis.c
> > +++ b/gcc/ipa-inline-analysis.c
> > @@ -850,7 +850,8 @@ evaluate_conditions_for_known_args (struct 
> > cgraph_node *node
> > ,
> >if (known_aggs.exists ())
> >  {
> >agg = known_aggs[c->operand_num];
> > - val = ipa_find_agg_cst_for_param (agg, c->offset, c
> > ->by_ref);
> > + val = ipa_find_agg_cst_for_param (agg, 
> > known_vals[c->operand_num],
> > +   c->offset, c
> > ->by_ref);
> 
> I saw this too (with r235766).  I believe it's fixed by r235770 and
> r235771:
> 
>  2016-05-02  Jan Hubicka  
>  
>   * cif-code.def (CIF_THUNK): Add.
>   * ipa-inline-analsysis.c (evaluate_conditions_for_known_args): Revert
>   accidental change.
> 
> (albeit with a typo in that second filename)
Uh, thanks. Will fix that with next commit.

I amanaged to accidentaly bundle unrelated changes to the patch. My apologizes
for that. Will try to keep my commit tree clean.
Honza


Re: Is MODES_TIEABLE_P transitive?

2016-05-02 Thread Michael Meissner
On Mon, Apr 25, 2016 at 11:04:01AM -0600, Jeff Law wrote:
> On 04/21/2016 01:53 PM, Michael Meissner wrote:
> >As I start to allow integer modes into vector registers, I need to revisit
> >MODES_TIEABLE_P. I'm wondering if MODES_TIEABLE_P is transitive?
> I don't recall a need for it to be transitive.  The only really
> special thing I remember about MODES_TIEABLE_P was its relation to
> HARD_REGNO_MODE_OK and the need for them to be consistent.
> 
> >
> >What I'd like to do, in a Power8 context, is to allow these to return true
> >(after allowing SImode to go in VSX registers):
> >
> > MODES_TIEABLE_P (SImode, DFmode)
> > MODES_TIEABLE_P (SImode, QImode)
> >
> >but, the following would return false for power8:
> >
> > MODES_TIEABLE_P (QImode, DFmode)
> >
> >In a power9 context, since there are loads/stores of 8/16-bit items it would
> >return true
> So what this kindof setup would allow would be subregs more
> aggressively between SImode/DFmode and SImode/QImode, but would
> restrict QImode/DFmode.
> 
> You may need to twiddle CANNOT_CHANGE_MODE_CLASS along the way.
> 
> 
> >So the question is whether we might need to support MODES_TIEABLE_P being
> >transitive (i.e. it would return true a lot less of the time). I would prefer
> >to not have to worry about odd corner cases where another type is tieable 
> >with
> >one of the arguments, but not tieable with the other. And does it matter
> >whether we are using RELOAD or IRA?
> IIRC MODES_TIEABLE_P was largely related to reloads [in]ability to
> handle certain subreg extractions -- like trying to extract a QImode
> subreg out of FP hard register and the like.

Yeah, this is getting rather complex. I recall trying to change MODES_TIEABLE_P
in the past, and back then having all sorts of reload issues.

I kind of want my cake and eat it too. On one hand, I want the primary integer
types to be tieable, and other hand, having 32/64-bit ints tieable with
floating point (things like IBM extended double and vectors can never be
tieable due to the extended double using 2 64-bit parts in 2 separate
registers, and vectors using a single 128-bit part in 1 register).

In particular, in power9 there are various instructions for packing and
unpacking floating point types, and it would be natural to want to use them for
unions to help speed up the math library (as well as the ability to do byte and
half-word memory operations). I will put it on the back burner for now.

Thanks.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



RE: option -mprfchw on 2 different Opteron cpus

2016-05-02 Thread Kumar, Venkataramanan
Hi 

> -Original Message-
> From: NightStrike [mailto:nightstr...@gmail.com]
> Sent: Monday, May 2, 2016 10:31 PM
> To: Kumar, Venkataramanan 
> Cc: Uros Bizjak (ubiz...@gmail.com) ;
> lopeziba...@gmail.com; Jan Hubicka ; Jakub Jelinek
> ; gcc@gcc.gnu.org
> Subject: Re: option -mprfchw on 2 different Opteron cpus
> 
> On Mon, May 2, 2016 at 5:55 AM, Kumar, Venkataramanan
>  wrote:
> >> If I compile on a k8 Opteron 248 with -march=native, I do not see
> >> -mprfchw listed in the options in -fverbose-asm.  In the assembly, I see
> this:
> >>
> >> prefetcht0  (%rax)  # ivtmp.1160
> >> prefetcht0  304(%rcx)   #
> >> prefetcht0  (%rax)  # ivtmp.1160
> >
> > In AMD processors -mprfchw flag  is used to enable "3dnowprefetch" ISA
> support.
> >
> > (Snip)
> > CPUID Fn8000_0001_ECX Feature Identifiers Bit 8
> > 3DNowPrefetch: PREFETCH and PREFETCHW instruction support. See
> > “PREFETCH” and “PREFETCHW” in APM3
> > Ref: http://support.amd.com/TechDocs/25481.pdf
> > (Snip)
> >
> > Can you please confirm what this CPUID flag returns on your k8 machine ?.
> > I believe this ISA is not available on k8 machine so when -march=native is
> added you don’t see  -mprfchw in verbose.
> 
> Looks like zero?  This was generated with the cpuid program from
> http://www.etallen.com/cpuid.html
> 
>   3DNow! instruction extensions = true
>   3DNow! instructions   = true

It has 3Dnow support.  "prefetchw" is available with 3dnow.
 
>   misaligned SSE mode= false
>   3DNow! PREFETCH/PREFETCHW instructions = false

It does not have 3DNowprefetch enabling ISA flag -mprftchw is not correct for 
-march=k8.  

>   OS visible workaround  = false
>   instruction based sampling = false
> >> If I compile on a bdver2 Opteron 6386 SE with -march=k8 (thus trying
> >> to target the older system), I do see it listed in the options in
> >> -fverbose-asm.  In the assembly, I see this:
> >
> > K8 has 3dnow support and there is a patch that replaced 3dnow with
> prefetchw (3DNowPrefetch).
> > https://gcc.gnu.org/ml/gcc-patches/2013-05/msg00866.html
> > So when you add -march=k8 you see -mprfchw  getting listed in verbose.
> >
> >>
> >> prefetcht0  (%rax)  # ivtmp.1160
> >> prefetcht0  304(%rcx)   #
> >> prefetchw   (%rax)  # ivtmp.1160
> >>
> >> (The third line is the only difference)
> >>
> >
> > This is my guess without seeing the test case, when write  prefetching is
> requested "prefetchw" is generated.
> > 3dnow (TARGET_3DNOW) ISA has support for it.
> >
> > (Snip)
> > Support for the PREFETCH and PREFETCHW instructions is indicated by
> > CPUID Fn8000_0001_ECX[3DNowPrefetch] OR Fn8000_0001_EDX[LM] OR
> > Fn8000_0001_EDX[3DNow] = 1.
> > (Snip)
> > Ref:
> http://developer.amd.com/wordpress/media/2008/10/24594_APM_v3.pdf
> >
> >> In both cases, I'm using gcc 4.9.3.  Which is correct for a k8 Opteron 248?
> >>
> >> Also, FWIW:
> >>
> >> 1) The march=native version that uses prefetcht0 is very repeatably
> >> faster by about 15% in the particular test case I'm looking at.
> >>
> >> 2) The compilers in both instances are not just the same version,
> >> they are the same compiler binary installed on an NFS mount and
> >> shared to both computers.
> >
> > As per GCC4.9.3 source.
> >
> > (Snip)
> > (define_expand "prefetch"
> >   [(prefetch (match_operand 0 "address_operand")
> >  (match_operand:SI 1 "const_int_operand")
> >  (match_operand:SI 2 "const_int_operand"))]
> >   "TARGET_PREFETCH_SSE || TARGET_PRFCHW || TARGET_PREFETCHWT1"
> > {
> >   bool write = INTVAL (operands[1]) != 0;
> >   int locality = INTVAL (operands[2]);
> >
> >   gcc_assert (IN_RANGE (locality, 0, 3));
> >
> >   /* Use 3dNOW prefetch in case we are asking for write prefetch not
> >  supported by SSE counterpart or the SSE prefetch is not available
> >  (K6 machines).  Otherwise use SSE prefetch as it allows specifying
> >  of locality.  */
> >   if (TARGET_PREFETCHWT1 && write && locality <= 2)
> > operands[2] = const2_rtx;
> >   else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE))
> > operands[2] = GEN_INT (3);
> >   else
> > operands[1] = const0_rtx;
> > })
> > (Snip)
> >
> > Write prefetch may be requested (either by auto prefetcher or builtins) but
> on -march=native, the below check could have become false.
> >else if (TARGET_PRFCHW && (write || !TARGET_PREFETCH_SSE))
> > TARGET_PRFCHW is off on native.
> >
> > So there are two issues here.
> >
> > (1) ISA flags enabled with -march=k8 is different from -march=native on k8
> machine.

I think  we need to file bug for this.  Need to check with Uros why the flag 
-mprfchw is shared with 3dnow.
To work around this issue you can use -mno-prfchw when building with -march=k8.

> > (2) Need to check why GCC middle end requested write prefetch for the
> test case with -march=k8 .
On "prefetchw" generation it may be the case that GCC auto prefet