Re: Using known data structure hierarchy in GC and PCH?

2012-12-12 Thread Richard Biener
On Tue, Dec 11, 2012 at 8:49 PM, Steven Bosscher  wrote:
> On Tue, Dec 11, 2012 at 6:55 PM, Martin Jambor wrote:
>> some IPA passes do have on-the side vectors with their information
>> about each cgraph node or edge and those are independent GC roots.
>> Not all, but many (e.g. inline_summary_vec or ipa_edge_args_vector) do
>> have pointers to other GC data, usually trees, and thus are mananged
>> by GC too.  Many of those trees (e.g. constants) might not be
>> reachable at least in LTO WPA phase.  Sure enough, inventing something
>> more clever for them might be a good idea.
>
> I wasn't really thinking of 'tree' anyway; 'tree' is way too complex for this.
>
> I'm more thinking of pointers from on-the-side data to cgraph
> nodes/edges, and of pointers within the cgraph objects that are
> "redundant" from a GC marking point of view. Those pointers should
> only point to reachable cgraph nodes/edges, so it shouldn't be
> necessary to mark them separately when walking the on-the-side data.
>
> Take, for instance, "struct cgraph_edge" where all pointer fields are
> reachable via other pointers already:
>
> struct cgraph_edge {
> ...
>   struct cgraph_node *caller; // reachable via symtab_nodes
>   struct cgraph_node *callee; // reachable via symtab_nodes
>   struct cgraph_edge *prev_caller; // reachable via symtab_nodes
>   struct cgraph_edge *next_caller; // reachable via symtab_nodes
>   struct cgraph_edge *prev_callee; // reachable via symtab_nodes
>   struct cgraph_edge *next_callee; // reachable via symtab_nodes
>   gimple call_stmt; // reachable via the CFG that contains the call
> ...
> }
>
> so, at least in theory, it shouldn't be necessary to do anything for a
> cgraph_edge.
> Unless I'm missing something...

No, you are correct.  All cgraph nodes are reachable from the global
symtab_node list head.  Thus _no_ pointer to a symtab_node (or its
derived kinds) require GTY tracking.  If that breaks it's a bug ;)

> PS: shouldn't "struct symtab_node" have GTY next/prev markers?

Yes.

Richard.


Broken link in gcc-4.8/changes.html

2012-12-12 Thread Paolo Carlini

Hi,

I just noticed a broken link (in case the issue is trivial I may get 
around to fixing it myself, but at the moment I don't know):


http://gcc.gnu.org/onlinedocs/gcc/X86-Built-in-Functions.html#X86-Built-in-Functions

(it appers twice)

Cheers,
Paolo.


Re: RFC: [ARM] Disable peeling

2012-12-12 Thread Christophe Lyon
On 11 December 2012 13:26, Tim Prince  wrote:
> On 12/11/2012 5:14 AM, Richard Earnshaw wrote:
>>
>> On 11/12/12 09:56, Richard Biener wrote:
>>>
>>> On Tue, Dec 11, 2012 at 10:48 AM, Richard Earnshaw 
>>> wrote:

 On 11/12/12 09:45, Richard Biener wrote:
>
>
> On Mon, Dec 10, 2012 at 10:07 PM, Andi Kleen 
> wrote:
>>
>>
>> Jan Hubicka  writes:
>>
>>> Note that I think Core has similar characteristics - at least for
>>> string
>>> operations
>>> it fares well with unalignes accesses.
>>
>>
>>
>> Nehalem and later has very fast unaligned vector loads. There's still
>> some
>> penalty when they cross cache lines however.
>>
>> iirc the rule of thumb is to do unaligned for 128 bit vectors,
>> but avoid it for 256bit vectors because the cache line cross
>> penalty is larger on Sandy Bridge and more likely with the larger
>> vectors.
>
>
>
> Yes, I think the rule was that using the unaligned instruction variants
> carries
> no penalty when the actual access is aligned but that aligned accesses
> are
> still faster than unaligned accesses.  Thus peeling for alignment _is_
> a
> win.
> I also seem to remember that the story for unaligned stores vs.
> unaligned
> loads
> is usually different.



 Yes, it's generally the case that unaligned loads are slightly more
 expensive than unaligned stores, since the stores can often merge in a
 store
 buffer with little or no penalty.
>>>
>>>
>>> It was the other way around on AMD CPUs AFAIK - unaligned stores forced
>>> flushes of the store buffers.  Which is why the vectorizer first and
>>> foremost tries
>>> to align stores.
>>>
>>
>> In which case, which to align should be a question that the ME asks the
>> BE.
>>
>> R.
>>
>>
> I see that this thread is no longer about ARM.
> Yes, when peeling for alignment, aligned stores should take precedence over
> aligned loads.
> "ivy bridge" corei7-3 is supposed to have corrected the situation on "sandy
> bridge" corei7-2 where unaligned 256-bit load is more expensive than
> explicitly split (128-bit) loads.  There aren't yet any production
> multi-socket corei7-3 platforms.
> It seems difficult to make the best decision between 128-bit unaligned
> access without peeling and 256-bit access with peeling for alignment (unless
> the loop count is known to be too small for the latter to come up to speed).
> Facilities afforded by various compilers to allow the programmer to guide
> this choice are rather strange and probably not to be counted on.
> In my experience, "westmere" unaligned 128-bit loads are more expensive than
> explicitly split (64-bit) loads, but the architecture manuals disagree with
> this finding.  gcc already does a good job for corei7[-1] in such
> situations.
>
> --
> Tim Prince
>

Since this thread is also about x86 now, I have tried to look at how
things are implemented on this target.
People have mentioned nehalem, sandy bridge, ivy bridge and westmere;
I have searched for occurrences of these strings in GCC, and I
couldn't find anything that would imply a different behavior wrt
unaligned loads on 128/256 bits vectors. Is it still unimplemented?

Thanks,

Christophe.


Re: RFC: [ARM] Disable peeling

2012-12-12 Thread H.J. Lu
On Wed, Dec 12, 2012 at 9:06 AM, Christophe Lyon
 wrote:
> On 11 December 2012 13:26, Tim Prince  wrote:
>> On 12/11/2012 5:14 AM, Richard Earnshaw wrote:
>>>
>>> On 11/12/12 09:56, Richard Biener wrote:

 On Tue, Dec 11, 2012 at 10:48 AM, Richard Earnshaw 
 wrote:
>
> On 11/12/12 09:45, Richard Biener wrote:
>>
>>
>> On Mon, Dec 10, 2012 at 10:07 PM, Andi Kleen 
>> wrote:
>>>
>>>
>>> Jan Hubicka  writes:
>>>
 Note that I think Core has similar characteristics - at least for
 string
 operations
 it fares well with unalignes accesses.
>>>
>>>
>>>
>>> Nehalem and later has very fast unaligned vector loads. There's still
>>> some
>>> penalty when they cross cache lines however.
>>>
>>> iirc the rule of thumb is to do unaligned for 128 bit vectors,
>>> but avoid it for 256bit vectors because the cache line cross
>>> penalty is larger on Sandy Bridge and more likely with the larger
>>> vectors.
>>
>>
>>
>> Yes, I think the rule was that using the unaligned instruction variants
>> carries
>> no penalty when the actual access is aligned but that aligned accesses
>> are
>> still faster than unaligned accesses.  Thus peeling for alignment _is_
>> a
>> win.
>> I also seem to remember that the story for unaligned stores vs.
>> unaligned
>> loads
>> is usually different.
>
>
>
> Yes, it's generally the case that unaligned loads are slightly more
> expensive than unaligned stores, since the stores can often merge in a
> store
> buffer with little or no penalty.


 It was the other way around on AMD CPUs AFAIK - unaligned stores forced
 flushes of the store buffers.  Which is why the vectorizer first and
 foremost tries
 to align stores.

>>>
>>> In which case, which to align should be a question that the ME asks the
>>> BE.
>>>
>>> R.
>>>
>>>
>> I see that this thread is no longer about ARM.
>> Yes, when peeling for alignment, aligned stores should take precedence over
>> aligned loads.
>> "ivy bridge" corei7-3 is supposed to have corrected the situation on "sandy
>> bridge" corei7-2 where unaligned 256-bit load is more expensive than
>> explicitly split (128-bit) loads.  There aren't yet any production
>> multi-socket corei7-3 platforms.
>> It seems difficult to make the best decision between 128-bit unaligned
>> access without peeling and 256-bit access with peeling for alignment (unless
>> the loop count is known to be too small for the latter to come up to speed).
>> Facilities afforded by various compilers to allow the programmer to guide
>> this choice are rather strange and probably not to be counted on.
>> In my experience, "westmere" unaligned 128-bit loads are more expensive than
>> explicitly split (64-bit) loads, but the architecture manuals disagree with
>> this finding.  gcc already does a good job for corei7[-1] in such
>> situations.
>>
>> --
>> Tim Prince
>>
>
> Since this thread is also about x86 now, I have tried to look at how
> things are implemented on this target.
> People have mentioned nehalem, sandy bridge, ivy bridge and westmere;
> I have searched for occurrences of these strings in GCC, and I
> couldn't find anything that would imply a different behavior wrt
> unaligned loads on 128/256 bits vectors. Is it still unimplemented?
>

i386.c has

   {
  /* When not optimize for size, enable vzeroupper optimization for
 TARGET_AVX with -fexpensive-optimizations and split 32-byte
 AVX unaligned load/store.  */
  if (!optimize_size)
{
  if (flag_expensive_optimizations
  && !(target_flags_explicit & MASK_VZEROUPPER))
target_flags |= MASK_VZEROUPPER;
  if ((x86_avx256_split_unaligned_load & ix86_tune_mask)
  && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD))
target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD;
  if ((x86_avx256_split_unaligned_store & ix86_tune_mask)
  && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE))
target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE;
  /* Enable 128-bit AVX instruction generation
 for the auto-vectorizer.  */
  if (TARGET_AVX128_OPTIMAL
  && !(target_flags_explicit & MASK_PREFER_AVX128))
target_flags |= MASK_PREFER_AVX128;
}
}


-- 
H.J.


Re: RFC: [ARM] Disable peeling

2012-12-12 Thread Andi Kleen
"H.J. Lu"  writes:
>
> i386.c has
>
>{
>   /* When not optimize for size, enable vzeroupper optimization for
>  TARGET_AVX with -fexpensive-optimizations and split 32-byte
>  AVX unaligned load/store.  */

This is only for the load, not for deciding whether peeling is
worthwhile or not.

I believe it's unimplemented for x86 at this point. There isn't even a
hook for it. Any hook that is added should ideally work for both ARM64
and x86. This would imply it would need to handle different vector
sizes.

-Andi


-- 
a...@linux.intel.com -- Speaking for myself only


Deprecate i386 for GCC 4.8?

2012-12-12 Thread Steven Bosscher
Hello,

Linux support for i386 has been removed. Should we do the same for GCC?
The "oldest" ix86 variant that'd be supported would be i486.

The benefit would be a few good cleanups:

* PROCESSOR_I386 / TARGET_386 can be removed

* X86_TUNE_DOUBLE_WITH_ADD can be removed (always true)
* X86_ARCH_CMPXCHG/X86_ARCH_XADD/X86_ARCH_BSWAP can be removed.

* The only ix86 variant without lock-free atomic int support goes
away, which probably allows for a few more cleanups in some of the
libraries (?).

Not much, but something is more than nothing :-)

Ciao!
Steven


Re: Deprecate i386 for GCC 4.8?

2012-12-12 Thread H.J. Lu
On Wed, Dec 12, 2012 at 10:01 AM, Steven Bosscher  wrote:
> Hello,
>
> Linux support for i386 has been removed. Should we do the same for GCC?
> The "oldest" ix86 variant that'd be supported would be i486.
>
> The benefit would be a few good cleanups:
>
> * PROCESSOR_I386 / TARGET_386 can be removed
>
> * X86_TUNE_DOUBLE_WITH_ADD can be removed (always true)
> * X86_ARCH_CMPXCHG/X86_ARCH_XADD/X86_ARCH_BSWAP can be removed.
>
> * The only ix86 variant without lock-free atomic int support goes
> away, which probably allows for a few more cleanups in some of the
> libraries (?).
>
> Not much, but something is more than nothing :-)
>

I am for it.



-- 
H.J.


Re: Deprecate i386 for GCC 4.8?

2012-12-12 Thread Robert Dewar

On 12/12/2012 1:01 PM, Steven Bosscher wrote:

Hello,

Linux support for i386 has been removed. Should we do the same for GCC?
The "oldest" ix86 variant that'd be supported would be i486.


Are there any embedded chips that still use the 386 instruction set?



Re: Deprecate i386 for GCC 4.8?

2012-12-12 Thread Steven Bosscher
On Wed, Dec 12, 2012 at 8:39 PM, Robert Dewar wrote:
> On 12/12/2012 1:01 PM, Steven Bosscher wrote:
>>
>> Hello,
>>
>> Linux support for i386 has been removed. Should we do the same for GCC?
>> The "oldest" ix86 variant that'd be supported would be i486.
>
>
> Are there any embedded chips that still use the 386 instruction set?

Hard to be sure, but doubtful. A search with Google doesn't give
anything that suggests someone's selling 386-ISA embedded chips. I'd
also expect even the most low-end embedded chip would use the 486 ISA,
which is a small bug significant extension to 386.

And as usual: If you use an almost 30 years old architecture, why
would you need the latest-and-greatest compiler technology?
Seriously...

Ciao!
Steven


Re: Deprecate i386 for GCC 4.8?

2012-12-12 Thread Robert Dewar

On 12/12/2012 2:52 PM, Steven Bosscher wrote:


And as usual: If you use an almost 30 years old architecture, why
would you need the latest-and-greatest compiler technology?
Seriously...


Well the embedded folk often end up with precisely this dichotomy :-)
But if no sign of 386 embedded chips, then reasonable to deprecate
I agree.


Ciao!
Steven





x86-64 medium memory model

2012-12-12 Thread Leif Ekblad
I'm working on OS-adaptations for an OS that would use x86-64 applications 
that are located above 4G, but not in the upper area. Binutils provide a 
function to be able to set the start of text to above 4G, but there are 
problems with GCC when using this memory model.


The first issue has to do with creating a cross-compiler that defaults to 
medium memory model using PIC. While there are switches to achieve this on 
command line (-mcmodel=medium -fpic), this is inconvinient since everything 
ported must be changed to add these switches, including libgcc and newlib. 
The cross-compiler instead should default to this memory model.


One possibility to achieve this is to add a new .h-file in the 
gcc/config/i386 directory. However, further inspection of the source 
indicates there is no macro that can be redefined to achieve this. One 
possibility is to add such a macro in gcc/config/i386.h, and implement it in 
gcc/config/i386.c.


Here is a simple patch to do this:

diff -u -r -N gcc-4.8-20121202/gcc/config/i386/i386.h 
gcc-work/gcc/config/i386/i386.h
--- gcc-4.8-20121202/gcc/config/i386/i386.h 2012-11-23 17:02:10.0 
+0100

+++ gcc-work/gcc/config/i386/i386.h 2012-12-08 12:17:40.0 +0100
@@ -86,6 +86,8 @@
#define TARGET_LP64 TARGET_ABI_64
#define TARGET_X32 TARGET_ABI_X32

+#define TARGET_MEDIUM_PIC   0
+
/* SSE4.1 defines round instructions */
#define OPTION_MASK_ISA_ROUND OPTION_MASK_ISA_SSE4_1
#define TARGET_ISA_ROUND ((ix86_isa_flags & OPTION_MASK_ISA_ROUND) != 0)

diff -u -r -N gcc-4.8-20121202/gcc/config/i386/i386.c 
gcc-work/gcc/config/i386/i386.c
--- gcc-4.8-20121202/gcc/config/i386/i386.c 2012-12-02 00:43:52.0 
+0100

+++ gcc-work/gcc/config/i386/i386.c 2012-12-11 21:43:48.0 +0100
@@ -3235,6 +3235,8 @@
  DLL, and is essentially just as efficient as direct addressing.  */
  if (TARGET_64BIT && DEFAULT_ABI == MS_ABI)
 ix86_cmodel = CM_SMALL_PIC, flag_pic = 1;
+  else if (TARGET_64BIT && TARGET_MEDIUM_PIC)
+ix86_cmodel = CM_MEDIUM_PIC, flag_pic = 1;
  else if (TARGET_64BIT)
 ix86_cmodel = flag_pic ? CM_SMALL_PIC : CM_SMALL;
  else

It can be used like this:

+#undef TARGET_MEDIUM_PIC
+#define TARGET_MEDIUM_PIC 1

Next, with this issue fixed, there is still another problem in libgcc when 
using a cross-compiler compiled with this memory model:


../../gcc-work/libgcc/. -I../../../gcc-work/libgcc/../gcc -I../../../gcc-work/libgcc/../include 
-DHAVE_CC_TLS -o cpuinfo.o -MT cpuinfo.o -MD -MP -MF cpuinfo.dep -c 
../../../gcc-work/libgcc/config/i386/cpuinfo.c -fvisibility=hidden -DHIDE_EXPORTS

In file included from ../../../gcc-work/libgcc/config/i386/cpuinfo.c:21:0:
../../../gcc-work/libgcc/config/i386/cpuinfo.c: In function 
'get_available_features':
../../../gcc-work/libgcc/config/i386/cpuinfo.c:236:7: error: inconsistent 
operand constraints in an 'asm'

__cpuid_count (7, 0, eax, ebx, ecx, edx);
^
../../../gcc-work/libgcc/static-object.mk:17: recipe for target `cpuinfo.o' 
failed

make[1]: *** [cpuinfo.o] Error 1
make[1]: Lämnar katalogen "/usr/src/build-gcc-noheader/rdos/libgcc"
Makefile:10619: recipe for target `all-target-libgcc' failed
make: *** [all-target-libgcc] Error 2

My guess is that __cpuid_count uses 32-bit addressing when it should be 
using 64-bit addressing. I have no patch for this as I don't understand what 
is going on here well enough, but __cpuid_count is defined in 
gcc/config/i386/cpuid.h.


In order to be able to continue to test the medium memory model I'd need 
patches to be applied to fix these issues.


Regards,
Leif Ekblad
RDOS Development



Re: x86-64 medium memory model

2012-12-12 Thread H.J. Lu
On Wed, Dec 12, 2012 at 12:56 PM, Leif Ekblad  wrote:
> I'm working on OS-adaptations for an OS that would use x86-64 applications
> that are located above 4G, but not in the upper area. Binutils provide a
> function to be able to set the start of text to above 4G, but there are
> problems with GCC when using this memory model.
>

Have you tried PIE with small model?  You can place your
binaries above 4G with better performance.

-- 
H.J.


Re: Deprecate i386 for GCC 4.8?

2012-12-12 Thread David Brown

On 12/12/12 20:54, Robert Dewar wrote:

On 12/12/2012 2:52 PM, Steven Bosscher wrote:


And as usual: If you use an almost 30 years old architecture, why
would you need the latest-and-greatest compiler technology?
Seriously...


Well the embedded folk often end up with precisely this dichotomy :-)


True enough.


But if no sign of 386 embedded chips, then reasonable to deprecate
I agree.


I believe it has been a very long time since any manufacturers made a 
pure 386 chip.  While I've never used x86 devices in any of my embedded 
systems, I believe there are two main classes of x86 embedded systems - 
those that use DOS (these still exist!), and those that aim to be a 
small PC with more modern x86 OS's.  For the DOS systems, gcc does not 
matter, because it is not used - compilers like OpenWatcom are far more 
common (ref. the FreeDOS website).  And for people looking for "embedded 
PC's", the processor is always going to be a lot more modern than the 
386 - otherwise they are not going to be able to run any current OS.


The only people I can think of that still actively compile for 386 as 
the lowest common denominator are the BSD folks.  Some of them still 
like to compile with compatibility for 386 chips.  But I have no idea if 
they need 386 support in future gcc versions.




Ciao!
Steven








Re: x86-64 medium memory model

2012-12-12 Thread Leif Ekblad
The small memory model will not do since I want to put data at other 
distinct addresses above 4G. I also want to place the heap at yet another 
address interval. This way it becomes easy to separate out code, data and 
heap references, and making sure that pointers are valid. The primary reason 
for not using below 2G or the last 2G, is because such numbers are formed 
naturally when doing 32-bit arithmetics, and thus could be executed by 
chance from corrupt data.


If I understand it correctly, the PIE option is very similar to the PIC 
option, and will not make it possible to use any address for both code and 
data.


Additionally, when I tried the small memory model with a start address of 
text above 4G, the linker complains about 32-bit fixups overflowing.


Leif Ekblad


- Original Message - 
From: "H.J. Lu" 

To: "Leif Ekblad" 
Cc: "GCC Patches" ; "GCC Mailing List" 


Sent: Wednesday, December 12, 2012 9:59 PM
Subject: Re: x86-64 medium memory model



On Wed, Dec 12, 2012 at 12:56 PM, Leif Ekblad  wrote:
I'm working on OS-adaptations for an OS that would use x86-64 
applications

that are located above 4G, but not in the upper area. Binutils provide a
function to be able to set the start of text to above 4G, but there are
problems with GCC when using this memory model.



Have you tried PIE with small model?  You can place your
binaries above 4G with better performance.

--
H.J. 




GNU Tools Cauldron 2013 - Call for Abstracts

2012-12-12 Thread Diego Novillo
==

 GNU Tools Cauldron 2013
  http://gcc.gnu.org/wiki/cauldron2013


   Call for Abstracts

 12-14 July 2013
   Google Headquarters
1600 Amphitheatre Parkway
 Mountain View, California, USA


   Organized by Google


 Abstract submission deadline: 28 February 2013 

==

We are pleased to announce another gathering of GNU tools
developers. The basic format of this meeting will be similar to
the last one at Charles University in Prague
(http://gcc.gnu.org/wiki/cauldron2012).

The purpose of this workshop is to gather all GNU tools
developers, discuss current/future work, coordinate efforts,
exchange reports on ongoing efforts, discuss development plans
for the next 12 months, developer tutorials and any other related
discussions.

This time we will meet at Google Headquarters in Mountain View,
California from 12/Jul/2013 to 14/Jul/2013.

We are inviting every developer working in the GNU toolchain:
GCC, GDB, binutils, runtimes, etc.  In addition to discussion
topics selected at the conference, we are looking for advance
submissions.

If you have a topic that you would like to present, please submit
an abstract describing what you plan to present.  We are
accepting three types of submissions:

- Prepared presentations: demos, project reports, etc.
- BoFs: coordination meetings with other developers.
- Tutorials for developers.  No user tutorials, please.

Note that we will not be doing in-depth reviews of the
presentations.  Mainly we are looking for applicability and to
decide scheduling.  There will be time at the conference to add
other topics of discussion, similarly to what we did at the
previous meetings.

To register your abstract, send e-mail to
tools-cauldron-ad...@googlegroups.com.

Your submission should contain the following information:

Title:
Authors:
Abstract:

If you intend to participate, but not necessarily present, please
let us know as well.  Send a message to
tools-cauldron-ad...@googlegroups.com stating your intent to
participate.