Re: Using known data structure hierarchy in GC and PCH?
On Tue, Dec 11, 2012 at 8:49 PM, Steven Bosscher wrote: > On Tue, Dec 11, 2012 at 6:55 PM, Martin Jambor wrote: >> some IPA passes do have on-the side vectors with their information >> about each cgraph node or edge and those are independent GC roots. >> Not all, but many (e.g. inline_summary_vec or ipa_edge_args_vector) do >> have pointers to other GC data, usually trees, and thus are mananged >> by GC too. Many of those trees (e.g. constants) might not be >> reachable at least in LTO WPA phase. Sure enough, inventing something >> more clever for them might be a good idea. > > I wasn't really thinking of 'tree' anyway; 'tree' is way too complex for this. > > I'm more thinking of pointers from on-the-side data to cgraph > nodes/edges, and of pointers within the cgraph objects that are > "redundant" from a GC marking point of view. Those pointers should > only point to reachable cgraph nodes/edges, so it shouldn't be > necessary to mark them separately when walking the on-the-side data. > > Take, for instance, "struct cgraph_edge" where all pointer fields are > reachable via other pointers already: > > struct cgraph_edge { > ... > struct cgraph_node *caller; // reachable via symtab_nodes > struct cgraph_node *callee; // reachable via symtab_nodes > struct cgraph_edge *prev_caller; // reachable via symtab_nodes > struct cgraph_edge *next_caller; // reachable via symtab_nodes > struct cgraph_edge *prev_callee; // reachable via symtab_nodes > struct cgraph_edge *next_callee; // reachable via symtab_nodes > gimple call_stmt; // reachable via the CFG that contains the call > ... > } > > so, at least in theory, it shouldn't be necessary to do anything for a > cgraph_edge. > Unless I'm missing something... No, you are correct. All cgraph nodes are reachable from the global symtab_node list head. Thus _no_ pointer to a symtab_node (or its derived kinds) require GTY tracking. If that breaks it's a bug ;) > PS: shouldn't "struct symtab_node" have GTY next/prev markers? Yes. Richard.
Broken link in gcc-4.8/changes.html
Hi, I just noticed a broken link (in case the issue is trivial I may get around to fixing it myself, but at the moment I don't know): http://gcc.gnu.org/onlinedocs/gcc/X86-Built-in-Functions.html#X86-Built-in-Functions (it appers twice) Cheers, Paolo.
Re: RFC: [ARM] Disable peeling
On 11 December 2012 13:26, Tim Prince wrote: > On 12/11/2012 5:14 AM, Richard Earnshaw wrote: >> >> On 11/12/12 09:56, Richard Biener wrote: >>> >>> On Tue, Dec 11, 2012 at 10:48 AM, Richard Earnshaw >>> wrote: On 11/12/12 09:45, Richard Biener wrote: > > > On Mon, Dec 10, 2012 at 10:07 PM, Andi Kleen > wrote: >> >> >> Jan Hubicka writes: >> >>> Note that I think Core has similar characteristics - at least for >>> string >>> operations >>> it fares well with unalignes accesses. >> >> >> >> Nehalem and later has very fast unaligned vector loads. There's still >> some >> penalty when they cross cache lines however. >> >> iirc the rule of thumb is to do unaligned for 128 bit vectors, >> but avoid it for 256bit vectors because the cache line cross >> penalty is larger on Sandy Bridge and more likely with the larger >> vectors. > > > > Yes, I think the rule was that using the unaligned instruction variants > carries > no penalty when the actual access is aligned but that aligned accesses > are > still faster than unaligned accesses. Thus peeling for alignment _is_ > a > win. > I also seem to remember that the story for unaligned stores vs. > unaligned > loads > is usually different. Yes, it's generally the case that unaligned loads are slightly more expensive than unaligned stores, since the stores can often merge in a store buffer with little or no penalty. >>> >>> >>> It was the other way around on AMD CPUs AFAIK - unaligned stores forced >>> flushes of the store buffers. Which is why the vectorizer first and >>> foremost tries >>> to align stores. >>> >> >> In which case, which to align should be a question that the ME asks the >> BE. >> >> R. >> >> > I see that this thread is no longer about ARM. > Yes, when peeling for alignment, aligned stores should take precedence over > aligned loads. > "ivy bridge" corei7-3 is supposed to have corrected the situation on "sandy > bridge" corei7-2 where unaligned 256-bit load is more expensive than > explicitly split (128-bit) loads. There aren't yet any production > multi-socket corei7-3 platforms. > It seems difficult to make the best decision between 128-bit unaligned > access without peeling and 256-bit access with peeling for alignment (unless > the loop count is known to be too small for the latter to come up to speed). > Facilities afforded by various compilers to allow the programmer to guide > this choice are rather strange and probably not to be counted on. > In my experience, "westmere" unaligned 128-bit loads are more expensive than > explicitly split (64-bit) loads, but the architecture manuals disagree with > this finding. gcc already does a good job for corei7[-1] in such > situations. > > -- > Tim Prince > Since this thread is also about x86 now, I have tried to look at how things are implemented on this target. People have mentioned nehalem, sandy bridge, ivy bridge and westmere; I have searched for occurrences of these strings in GCC, and I couldn't find anything that would imply a different behavior wrt unaligned loads on 128/256 bits vectors. Is it still unimplemented? Thanks, Christophe.
Re: RFC: [ARM] Disable peeling
On Wed, Dec 12, 2012 at 9:06 AM, Christophe Lyon wrote: > On 11 December 2012 13:26, Tim Prince wrote: >> On 12/11/2012 5:14 AM, Richard Earnshaw wrote: >>> >>> On 11/12/12 09:56, Richard Biener wrote: On Tue, Dec 11, 2012 at 10:48 AM, Richard Earnshaw wrote: > > On 11/12/12 09:45, Richard Biener wrote: >> >> >> On Mon, Dec 10, 2012 at 10:07 PM, Andi Kleen >> wrote: >>> >>> >>> Jan Hubicka writes: >>> Note that I think Core has similar characteristics - at least for string operations it fares well with unalignes accesses. >>> >>> >>> >>> Nehalem and later has very fast unaligned vector loads. There's still >>> some >>> penalty when they cross cache lines however. >>> >>> iirc the rule of thumb is to do unaligned for 128 bit vectors, >>> but avoid it for 256bit vectors because the cache line cross >>> penalty is larger on Sandy Bridge and more likely with the larger >>> vectors. >> >> >> >> Yes, I think the rule was that using the unaligned instruction variants >> carries >> no penalty when the actual access is aligned but that aligned accesses >> are >> still faster than unaligned accesses. Thus peeling for alignment _is_ >> a >> win. >> I also seem to remember that the story for unaligned stores vs. >> unaligned >> loads >> is usually different. > > > > Yes, it's generally the case that unaligned loads are slightly more > expensive than unaligned stores, since the stores can often merge in a > store > buffer with little or no penalty. It was the other way around on AMD CPUs AFAIK - unaligned stores forced flushes of the store buffers. Which is why the vectorizer first and foremost tries to align stores. >>> >>> In which case, which to align should be a question that the ME asks the >>> BE. >>> >>> R. >>> >>> >> I see that this thread is no longer about ARM. >> Yes, when peeling for alignment, aligned stores should take precedence over >> aligned loads. >> "ivy bridge" corei7-3 is supposed to have corrected the situation on "sandy >> bridge" corei7-2 where unaligned 256-bit load is more expensive than >> explicitly split (128-bit) loads. There aren't yet any production >> multi-socket corei7-3 platforms. >> It seems difficult to make the best decision between 128-bit unaligned >> access without peeling and 256-bit access with peeling for alignment (unless >> the loop count is known to be too small for the latter to come up to speed). >> Facilities afforded by various compilers to allow the programmer to guide >> this choice are rather strange and probably not to be counted on. >> In my experience, "westmere" unaligned 128-bit loads are more expensive than >> explicitly split (64-bit) loads, but the architecture manuals disagree with >> this finding. gcc already does a good job for corei7[-1] in such >> situations. >> >> -- >> Tim Prince >> > > Since this thread is also about x86 now, I have tried to look at how > things are implemented on this target. > People have mentioned nehalem, sandy bridge, ivy bridge and westmere; > I have searched for occurrences of these strings in GCC, and I > couldn't find anything that would imply a different behavior wrt > unaligned loads on 128/256 bits vectors. Is it still unimplemented? > i386.c has { /* When not optimize for size, enable vzeroupper optimization for TARGET_AVX with -fexpensive-optimizations and split 32-byte AVX unaligned load/store. */ if (!optimize_size) { if (flag_expensive_optimizations && !(target_flags_explicit & MASK_VZEROUPPER)) target_flags |= MASK_VZEROUPPER; if ((x86_avx256_split_unaligned_load & ix86_tune_mask) && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_LOAD)) target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD; if ((x86_avx256_split_unaligned_store & ix86_tune_mask) && !(target_flags_explicit & MASK_AVX256_SPLIT_UNALIGNED_STORE)) target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE; /* Enable 128-bit AVX instruction generation for the auto-vectorizer. */ if (TARGET_AVX128_OPTIMAL && !(target_flags_explicit & MASK_PREFER_AVX128)) target_flags |= MASK_PREFER_AVX128; } } -- H.J.
Re: RFC: [ARM] Disable peeling
"H.J. Lu" writes: > > i386.c has > >{ > /* When not optimize for size, enable vzeroupper optimization for > TARGET_AVX with -fexpensive-optimizations and split 32-byte > AVX unaligned load/store. */ This is only for the load, not for deciding whether peeling is worthwhile or not. I believe it's unimplemented for x86 at this point. There isn't even a hook for it. Any hook that is added should ideally work for both ARM64 and x86. This would imply it would need to handle different vector sizes. -Andi -- a...@linux.intel.com -- Speaking for myself only
Deprecate i386 for GCC 4.8?
Hello, Linux support for i386 has been removed. Should we do the same for GCC? The "oldest" ix86 variant that'd be supported would be i486. The benefit would be a few good cleanups: * PROCESSOR_I386 / TARGET_386 can be removed * X86_TUNE_DOUBLE_WITH_ADD can be removed (always true) * X86_ARCH_CMPXCHG/X86_ARCH_XADD/X86_ARCH_BSWAP can be removed. * The only ix86 variant without lock-free atomic int support goes away, which probably allows for a few more cleanups in some of the libraries (?). Not much, but something is more than nothing :-) Ciao! Steven
Re: Deprecate i386 for GCC 4.8?
On Wed, Dec 12, 2012 at 10:01 AM, Steven Bosscher wrote: > Hello, > > Linux support for i386 has been removed. Should we do the same for GCC? > The "oldest" ix86 variant that'd be supported would be i486. > > The benefit would be a few good cleanups: > > * PROCESSOR_I386 / TARGET_386 can be removed > > * X86_TUNE_DOUBLE_WITH_ADD can be removed (always true) > * X86_ARCH_CMPXCHG/X86_ARCH_XADD/X86_ARCH_BSWAP can be removed. > > * The only ix86 variant without lock-free atomic int support goes > away, which probably allows for a few more cleanups in some of the > libraries (?). > > Not much, but something is more than nothing :-) > I am for it. -- H.J.
Re: Deprecate i386 for GCC 4.8?
On 12/12/2012 1:01 PM, Steven Bosscher wrote: Hello, Linux support for i386 has been removed. Should we do the same for GCC? The "oldest" ix86 variant that'd be supported would be i486. Are there any embedded chips that still use the 386 instruction set?
Re: Deprecate i386 for GCC 4.8?
On Wed, Dec 12, 2012 at 8:39 PM, Robert Dewar wrote: > On 12/12/2012 1:01 PM, Steven Bosscher wrote: >> >> Hello, >> >> Linux support for i386 has been removed. Should we do the same for GCC? >> The "oldest" ix86 variant that'd be supported would be i486. > > > Are there any embedded chips that still use the 386 instruction set? Hard to be sure, but doubtful. A search with Google doesn't give anything that suggests someone's selling 386-ISA embedded chips. I'd also expect even the most low-end embedded chip would use the 486 ISA, which is a small bug significant extension to 386. And as usual: If you use an almost 30 years old architecture, why would you need the latest-and-greatest compiler technology? Seriously... Ciao! Steven
Re: Deprecate i386 for GCC 4.8?
On 12/12/2012 2:52 PM, Steven Bosscher wrote: And as usual: If you use an almost 30 years old architecture, why would you need the latest-and-greatest compiler technology? Seriously... Well the embedded folk often end up with precisely this dichotomy :-) But if no sign of 386 embedded chips, then reasonable to deprecate I agree. Ciao! Steven
x86-64 medium memory model
I'm working on OS-adaptations for an OS that would use x86-64 applications that are located above 4G, but not in the upper area. Binutils provide a function to be able to set the start of text to above 4G, but there are problems with GCC when using this memory model. The first issue has to do with creating a cross-compiler that defaults to medium memory model using PIC. While there are switches to achieve this on command line (-mcmodel=medium -fpic), this is inconvinient since everything ported must be changed to add these switches, including libgcc and newlib. The cross-compiler instead should default to this memory model. One possibility to achieve this is to add a new .h-file in the gcc/config/i386 directory. However, further inspection of the source indicates there is no macro that can be redefined to achieve this. One possibility is to add such a macro in gcc/config/i386.h, and implement it in gcc/config/i386.c. Here is a simple patch to do this: diff -u -r -N gcc-4.8-20121202/gcc/config/i386/i386.h gcc-work/gcc/config/i386/i386.h --- gcc-4.8-20121202/gcc/config/i386/i386.h 2012-11-23 17:02:10.0 +0100 +++ gcc-work/gcc/config/i386/i386.h 2012-12-08 12:17:40.0 +0100 @@ -86,6 +86,8 @@ #define TARGET_LP64 TARGET_ABI_64 #define TARGET_X32 TARGET_ABI_X32 +#define TARGET_MEDIUM_PIC 0 + /* SSE4.1 defines round instructions */ #define OPTION_MASK_ISA_ROUND OPTION_MASK_ISA_SSE4_1 #define TARGET_ISA_ROUND ((ix86_isa_flags & OPTION_MASK_ISA_ROUND) != 0) diff -u -r -N gcc-4.8-20121202/gcc/config/i386/i386.c gcc-work/gcc/config/i386/i386.c --- gcc-4.8-20121202/gcc/config/i386/i386.c 2012-12-02 00:43:52.0 +0100 +++ gcc-work/gcc/config/i386/i386.c 2012-12-11 21:43:48.0 +0100 @@ -3235,6 +3235,8 @@ DLL, and is essentially just as efficient as direct addressing. */ if (TARGET_64BIT && DEFAULT_ABI == MS_ABI) ix86_cmodel = CM_SMALL_PIC, flag_pic = 1; + else if (TARGET_64BIT && TARGET_MEDIUM_PIC) +ix86_cmodel = CM_MEDIUM_PIC, flag_pic = 1; else if (TARGET_64BIT) ix86_cmodel = flag_pic ? CM_SMALL_PIC : CM_SMALL; else It can be used like this: +#undef TARGET_MEDIUM_PIC +#define TARGET_MEDIUM_PIC 1 Next, with this issue fixed, there is still another problem in libgcc when using a cross-compiler compiled with this memory model: ../../gcc-work/libgcc/. -I../../../gcc-work/libgcc/../gcc -I../../../gcc-work/libgcc/../include -DHAVE_CC_TLS -o cpuinfo.o -MT cpuinfo.o -MD -MP -MF cpuinfo.dep -c ../../../gcc-work/libgcc/config/i386/cpuinfo.c -fvisibility=hidden -DHIDE_EXPORTS In file included from ../../../gcc-work/libgcc/config/i386/cpuinfo.c:21:0: ../../../gcc-work/libgcc/config/i386/cpuinfo.c: In function 'get_available_features': ../../../gcc-work/libgcc/config/i386/cpuinfo.c:236:7: error: inconsistent operand constraints in an 'asm' __cpuid_count (7, 0, eax, ebx, ecx, edx); ^ ../../../gcc-work/libgcc/static-object.mk:17: recipe for target `cpuinfo.o' failed make[1]: *** [cpuinfo.o] Error 1 make[1]: Lämnar katalogen "/usr/src/build-gcc-noheader/rdos/libgcc" Makefile:10619: recipe for target `all-target-libgcc' failed make: *** [all-target-libgcc] Error 2 My guess is that __cpuid_count uses 32-bit addressing when it should be using 64-bit addressing. I have no patch for this as I don't understand what is going on here well enough, but __cpuid_count is defined in gcc/config/i386/cpuid.h. In order to be able to continue to test the medium memory model I'd need patches to be applied to fix these issues. Regards, Leif Ekblad RDOS Development
Re: x86-64 medium memory model
On Wed, Dec 12, 2012 at 12:56 PM, Leif Ekblad wrote: > I'm working on OS-adaptations for an OS that would use x86-64 applications > that are located above 4G, but not in the upper area. Binutils provide a > function to be able to set the start of text to above 4G, but there are > problems with GCC when using this memory model. > Have you tried PIE with small model? You can place your binaries above 4G with better performance. -- H.J.
Re: Deprecate i386 for GCC 4.8?
On 12/12/12 20:54, Robert Dewar wrote: On 12/12/2012 2:52 PM, Steven Bosscher wrote: And as usual: If you use an almost 30 years old architecture, why would you need the latest-and-greatest compiler technology? Seriously... Well the embedded folk often end up with precisely this dichotomy :-) True enough. But if no sign of 386 embedded chips, then reasonable to deprecate I agree. I believe it has been a very long time since any manufacturers made a pure 386 chip. While I've never used x86 devices in any of my embedded systems, I believe there are two main classes of x86 embedded systems - those that use DOS (these still exist!), and those that aim to be a small PC with more modern x86 OS's. For the DOS systems, gcc does not matter, because it is not used - compilers like OpenWatcom are far more common (ref. the FreeDOS website). And for people looking for "embedded PC's", the processor is always going to be a lot more modern than the 386 - otherwise they are not going to be able to run any current OS. The only people I can think of that still actively compile for 386 as the lowest common denominator are the BSD folks. Some of them still like to compile with compatibility for 386 chips. But I have no idea if they need 386 support in future gcc versions. Ciao! Steven
Re: x86-64 medium memory model
The small memory model will not do since I want to put data at other distinct addresses above 4G. I also want to place the heap at yet another address interval. This way it becomes easy to separate out code, data and heap references, and making sure that pointers are valid. The primary reason for not using below 2G or the last 2G, is because such numbers are formed naturally when doing 32-bit arithmetics, and thus could be executed by chance from corrupt data. If I understand it correctly, the PIE option is very similar to the PIC option, and will not make it possible to use any address for both code and data. Additionally, when I tried the small memory model with a start address of text above 4G, the linker complains about 32-bit fixups overflowing. Leif Ekblad - Original Message - From: "H.J. Lu" To: "Leif Ekblad" Cc: "GCC Patches" ; "GCC Mailing List" Sent: Wednesday, December 12, 2012 9:59 PM Subject: Re: x86-64 medium memory model On Wed, Dec 12, 2012 at 12:56 PM, Leif Ekblad wrote: I'm working on OS-adaptations for an OS that would use x86-64 applications that are located above 4G, but not in the upper area. Binutils provide a function to be able to set the start of text to above 4G, but there are problems with GCC when using this memory model. Have you tried PIE with small model? You can place your binaries above 4G with better performance. -- H.J.
GNU Tools Cauldron 2013 - Call for Abstracts
== GNU Tools Cauldron 2013 http://gcc.gnu.org/wiki/cauldron2013 Call for Abstracts 12-14 July 2013 Google Headquarters 1600 Amphitheatre Parkway Mountain View, California, USA Organized by Google Abstract submission deadline: 28 February 2013 == We are pleased to announce another gathering of GNU tools developers. The basic format of this meeting will be similar to the last one at Charles University in Prague (http://gcc.gnu.org/wiki/cauldron2012). The purpose of this workshop is to gather all GNU tools developers, discuss current/future work, coordinate efforts, exchange reports on ongoing efforts, discuss development plans for the next 12 months, developer tutorials and any other related discussions. This time we will meet at Google Headquarters in Mountain View, California from 12/Jul/2013 to 14/Jul/2013. We are inviting every developer working in the GNU toolchain: GCC, GDB, binutils, runtimes, etc. In addition to discussion topics selected at the conference, we are looking for advance submissions. If you have a topic that you would like to present, please submit an abstract describing what you plan to present. We are accepting three types of submissions: - Prepared presentations: demos, project reports, etc. - BoFs: coordination meetings with other developers. - Tutorials for developers. No user tutorials, please. Note that we will not be doing in-depth reviews of the presentations. Mainly we are looking for applicability and to decide scheduling. There will be time at the conference to add other topics of discussion, similarly to what we did at the previous meetings. To register your abstract, send e-mail to tools-cauldron-ad...@googlegroups.com. Your submission should contain the following information: Title: Authors: Abstract: If you intend to participate, but not necessarily present, please let us know as well. Send a message to tools-cauldron-ad...@googlegroups.com stating your intent to participate.