Re: Adding Leon processor to the SPARC list of processors
Eric Botcazou wrote: >> Following the recent comments by Eric, the patch now sketches the >> following setup: >> >> If multi-lib is wanted: >> configure --with-cpu=leon ... : creates multilib-dir soft|v8 >> combinations using [-msoft-float|-mcpu=sparcleonv8] (MULTILIB_OPTIONS = >> msoft-float mcpu=sparcleonv8) >> >> If Single-lib is wanted: >> configure --with-cpu=sparcleonv7 --with-float=soft --disable-multilib ... >> : (v7 | soft | no-multilib) configure --with-cpu=sparcleonv8 >> --with-float=soft --disable-multilib ... : (v8 | soft | no-multilib) >> configure --with-cpu=sparcleonv7 --with-float=hard --disable-multilib ... >> : (v7 | hard | no-multilib) configure --with-cpu=sparcleonv8 >> --with-float=hard --disable-multilib ... : (v8 | hard | no-multilib) >> >> Using --with-cpu=leon|sparcleonv7|sparcleonv8 the the sparc_cpu is switched >> to PROCESSOR_LEON. > > I'm mostly OK, but I don't think we need sparcleonv7 or sparcleonv8. > Attached You are right. > is another proposal, which: > > 1. Adds -mtune/--with-tune=leon for all SPARC targets. In particular, this > mean that if you configure --target=sparc-{elf,rtems} --with-tune=leon, you > get a multilib-ed compiler defaulting to V7/FPU and -mtune=leon, with V8 and > NO-FPU libraries. Ok, this scheme seems best. > > 2. Adds new targets sparc-leon-{elf,linux}: multilib-ed compiler defaulting > to V8/FPU and -mtune=leon, with V7 and NO-FPU libraries. Ok. > > 3. Adds new targets sparc-leon3-{elf,linux}: multilib-ed compiler defaulting > to V8/FPU and -mtune=leon, with NO-FPU libraries. > > Singlelib-ed compilers are available through --disable-multilib and > --with=cpu={v7,v8} --with-float={soft,hard} --with-tune=leon > for sparc-{elf,rtems} or just > --with=cpu={v7,v8} --with-float={soft,hard} > for sparc-leon*-*. > > The rationale is that --with-cpu shouldn't change the set of multilibs, it is > only the configure-time equivalent of -mcpu. This set of multilibs should > only depend on the target and the presence of --disable-multilib. > Ok, understood. > > * config.gcc (sparc-*-elf*): Deal with sparc-leon specifically. > (sparc-*-linux*): Likewise. > (sparc*-*-*): Remove obsolete sparc86x setting. > (sparc-leon*): Default to --with-cpu=v8 and --with-tune=leon. > * doc/invoke.texi (SPARC Options): Document -mcpu/-mtune=leon. > * config/sparc/sparc.h (TARGET_CPU_leon): Define. > (TARGET_CPU_sparc86x): Delete. > (TARGET_CPU_cypress): Define as alias to TARGET_CPU_v7. > (TARGET_CPU_f930): Define as alias to TARGET_CPU_sparclite. > (TARGET_CPU_f934): Likewise. > (TARGET_CPU_tsc701): Define as alias to TARGET_CPU_sparclet. > (CPP_CPU_SPEC): Add entry for -mcpu=leon. > (enum processor_type): Add PROCESSOR_LEON. > * config/sparc/sparc.c (leon_costs): New cost array. > (sparc_option_override): Add entry for TARGET_CPU_leon and -mcpu=leon. > Initialize cost array to leon_costs if -mtune=leon. > * config/sparc/sparc.md (cpu attribute): Add leon. > Include leon.md scheduling description. > * config/sparc/leon.md: New file. > * config/sparc/t-elf: Do not assemble Solaris startup files. > * config/sparc/t-leon: New file. > * config/sparc/t-leon3: Likewise. > > Is the list above an indication that you are already finished with the modifications? :-) Can you give me a note, otherwise I'll create a new patch that implements the scheme you suggested. -- Greetings Konrad
Re: Adding Leon processor to the SPARC list of processors
> Is the list above an indication that you are already finished with > the modifications? :-) > Can you give me a note, otherwise I'll create a new patch that implements > the scheme you suggested. > Sorry, I didnt note the attachment that is already your implementation. I thought it was the old diff. So: I'm ok with all. Thanks for the effort. -- Greetings Konrad
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On Tue, Nov 23, 2010 at 9:09 PM, Joern Rennecke wrote: > If we changed BITS_PER_UNIT into an ordinary piece-of-data 'hook', this > would not only cost a data load from the target vector, but would also > inhibit optimizations that replace division / modulo / multiply with shift > or mask operations. > So maybe we should look into having a few functional hooks that do common > operations, i.e. > bits_in_units x / BITS_PER_UNIT > bits_in_units_ceil (x + BITS_PER_UNIT - 1) / BITS_PER_UNIT > bit_unit_remainder x % BITS_PER_UNIT > units_in_bits x * BITS_PER_UNIT > > Although we currently have some HOST_WIDE_INT uses, I hope using > unsigned HOST_WIDE_INT as the argument / return type will generally work. > > tree.h also defines BITS_PER_UNIT_LOG, which (or its hook equivalent) > should probably be used in all the places that use > exact_log_2 (BITS_PER_UNIT), and, if it could be relied upon to exist, we > could also use it as a substitute for the above hooks. However, this seems > a bit iffy - we'd permanently forgo the possibility to have 6 / 7 / 36 > bit etc. units. > > Similar arrangements could be made for BITS_PER_WORD and UNITS_PER_WORD, > although these macros seem not quite so prevalent in the tree optimizers. Well. Some things really ought to stay as macros. You can always error out if a multi-target compiler would have conflicts there at configure time. Richard.
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On Nov 24, 2010, at 6:45 AM, Richard Guenther wrote: > On Tue, Nov 23, 2010 at 9:09 PM, Joern Rennecke wrote: >> If we changed BITS_PER_UNIT into an ordinary piece-of-data 'hook', this >> would not only cost a data load from the target vector, but would also >> inhibit optimizations that replace division / modulo / multiply with shift >> or mask operations. >> So maybe we should look into having a few functional hooks that do common >> operations, i.e. >> bits_in_unitsx / BITS_PER_UNIT >> bits_in_units_ceil (x + BITS_PER_UNIT - 1) / BITS_PER_UNIT >> bit_unit_remainder x % BITS_PER_UNIT >> units_in_bitsx * BITS_PER_UNIT >> >> Although we currently have some HOST_WIDE_INT uses, I hope using >> unsigned HOST_WIDE_INT as the argument / return type will generally work. >> >> tree.h also defines BITS_PER_UNIT_LOG, which (or its hook equivalent) >> should probably be used in all the places that use >> exact_log_2 (BITS_PER_UNIT), and, if it could be relied upon to exist, we >> could also use it as a substitute for the above hooks. However, this seems >> a bit iffy - we'd permanently forgo the possibility to have 6 / 7 / 36 >> bit etc. units. >> >> Similar arrangements could be made for BITS_PER_WORD and UNITS_PER_WORD, >> although these macros seem not quite so prevalent in the tree optimizers. > > Well. Some things really ought to stay as macros. You can always > error out if a multi-target compiler would have conflicts there at > configure time. That seems reasonable especially since BITS_PER_UNIT is likely to be consistent (and 8) in any multi-target compiler. paul
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
Quoting Richard Guenther : Well. Some things really ought to stay as macros. You can always error out if a multi-target compiler would have conflicts there at configure time. So what are we going to do about all the tree optimizers and frontends that use BITS_PER_UNIT? Should they all include tm.h, with the hazard that more specific macros creep in? Or do we want to put this in a separate header file?
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On Tuesday 23 November 2010 20:09:52, Joern Rennecke wrote: > If we changed BITS_PER_UNIT into an ordinary piece-of-data 'hook', this > would not only cost a data load from the target vector, but would also > inhibit optimizations that replace division / modulo / multiply with shift > or mask operations. Have you done any sort of measurement, to see if what is lost is actually noticeable in practice? > So maybe we should look into having a few functional hooks that do > common operations, i.e. > bits_in_unitsx / BITS_PER_UNIT > bits_in_units_ceil (x + BITS_PER_UNIT - 1) / BITS_PER_UNIT > bit_unit_remainder x % BITS_PER_UNIT > units_in_bitsx * BITS_PER_UNIT -- Pedro Alves
Re: Help with reloading FP + offset addressing mode
On 30 October 2010 05:45, Joern Rennecke wrote: > Quoting Mohamed Shafi : > >> On 29 October 2010 00:06, Joern Rennecke >> wrote: >>> >>> Quoting Mohamed Shafi : >>> Hi, I am doing a port in GCC 4.5.1. For the port 1. there is only (reg + offset) addressing mode only when reg is SP. Other base registers are not allowed 2. FP cannot be used as a base register. (FP based addressing is done by copying it into a base register) In order to take advantage of FP elimination (this will create SP + offset addressing), what i did the following 1. Created a new register class (address registers + FP) and used this new class as the BASE_REG_CLASS >>> >>> Stop right there. You need to distinguish between FRAME_POINTER_REGNUM >>> and HARD_FRAME_POINTER_REGNUM. >>> >> >> From the description given in the internals, i am not able to >> understand why you suggested this. Could you please explain this? > > In order to trigger reloading of the address, you have to have a register > elimination, even if the stack pointer is not a suitable destinatination > for the elimination. Also, if you want to reload do the work for you, > you must not lie to it about the addressing capabilities of an actual hard > register. Hence, you need separate hard and soft frame pointers. > > If you have them, but conflate them when you describe what you are doing > in your port, you are not only likely to confuse the listener/reader, > but also your documentation, your code, and ultimately yourself. > Having a FRAME_POINTER_REGNUM and HARD_FRAME_POINTER_REGNUM will trigger reloading of address. But for the following pattern (insn 3 2 4 2 test.c:120 (set (mem/c/i:QI (plus:QI (reg/f:QI 35 SFP) (const_int 1 [0x1])) [0 c+0 S1 A32]) (reg:QI 0 g0 [ c ])) 7 {movqi_op} (nil)) where SFP is FRAME_POINTER_REGNUM, an elimination will result in (insn 3 2 4 2 test.c:120 (set (mem/c/i:QI (plus:QI (reg/f:QI 27 as15) (const_int 1 [0x1])) [0 c+0 S1 A32]) (reg:QI 0 g0 [ c ])) 7 {movqi_op} (nil)) where as15 is the HARD_FRAME_POINTER_REGNUM. But remember this new address is not valid (as only SP is allowed in this addressing mode). When the above pattern is reloaded i get: (insn 28 27 4 2 test.c:120 (set (mem/c/i:QI (plus:QI (reg:QI 28 a0) (const_int 1 [0x1])) [0 c+0 S1 A32]) (reg:QI 3 g3)) -1 (nil)) I get unrecognizable insn ICE, because this addressing mode is not valid. I believe this happens because when the reload_pass get the address of the form (reg + off), it assumes that the address is invalid due to one of the following: 1. 'reg' is not a suitable base register 2. the offset is out of range 3. the address has an eliminatable register as a base register. Is there any way to over come this one? Any help is appreciated. Shafi
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
Quoting Pedro Alves : On Tuesday 23 November 2010 20:09:52, Joern Rennecke wrote: If we changed BITS_PER_UNIT into an ordinary piece-of-data 'hook', this would not only cost a data load from the target vector, but would also inhibit optimizations that replace division / modulo / multiply with shift or mask operations. Have you done any sort of measurement, to see if what is lost is actually noticeable in practice? No, I haven't. On an i686 it's probably not measurable. On a host with a slow software divide it might be, if the code paths that require these operations are exercised a lot - that would also depend on the source code being compiled. Also, these separate hooks for common operations can make the code more readable, particularly in the bits_in_units_ceil case. I.e. foo_var = ((bitsize + targetm.bits_per_unit () - 1) / targetm.bits_per_unit ()); vs. foo_var = targetm.bits_in_units_ceil (bitsize);
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On Wed, Nov 24, 2010 at 1:56 PM, Joern Rennecke wrote: > Quoting Richard Guenther : > >> Well. Some things really ought to stay as macros. You can always >> error out if a multi-target compiler would have conflicts there at >> configure time. > > So what are we going to do about all the tree optimizers and frontends that > use BITS_PER_UNIT? Tree optimizers are fine to use target macros/hooks, and I expect use will grow, not shrink. > Should they all include tm.h, with the hazard that more specific > macros creep in? > Or do we want to put this in a separate header file? I don't have a very clear picture of where we want to go with all the hookization. And I've decided to postpone any investigation until more macros are converted (where it makes sense to). Richard.
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
Quoting Richard Guenther : On Wed, Nov 24, 2010 at 1:56 PM, Joern Rennecke wrote: So what are we going to do about all the tree optimizers and frontends that use BITS_PER_UNIT? Tree optimizers are fine to use target macros/hooks, and I expect use will grow, not shrink. Hooks are fine, as long as we can make the target vector type target independent (see PR46500). However, macro use means the tree optimizer / frontend is compiled for a particular target. That prevents both mulit-target compilers and target-independent frontend plugins from working properly. Should they all include tm.h, with the hazard that more specific macros creep in? Or do we want to put this in a separate header file? I don't have a very clear picture of where we want to go with all the hookization. And I've decided to postpone any investigation until more macros are converted (where it makes sense to). I'm fine with the RTL optimizers to use target macros, but I'd like the frontends and tree optimizers to cease to use tm.h . That means all macros uses there have to be converted. That does not necessarily involve target port code - a wrapper hook could be provided in targhooks.c that uses the target macro. target libraries should also not use tm.h, but predefined macros or built-in functions. I'm not currently working on that, though.
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On Wed, Nov 24, 2010 at 3:12 PM, Joern Rennecke wrote: > Quoting Richard Guenther : > >> On Wed, Nov 24, 2010 at 1:56 PM, Joern Rennecke >> wrote: >>> >>> So what are we going to do about all the tree optimizers and frontends >>> that >>> use BITS_PER_UNIT? >> >> Tree optimizers are fine to use target macros/hooks, and I expect >> use will grow, not shrink. > > Hooks are fine, as long as we can make the target vector type target > independent (see PR46500). However, macro use means the tree > optimizer / frontend is compiled for a particular target. That prevents > both mulit-target compilers and target-independent frontend plugins > from working properly. >> >>> Should they all include tm.h, with the hazard that more specific >>> macros creep in? >>> Or do we want to put this in a separate header file? >> >> I don't have a very clear picture of where we want to go with all the >> hookization. And I've decided to postpone any investigation until >> more macros are converted (where it makes sense to). > > I'm fine with the RTL optimizers to use target macros, but I'd like the > frontends and tree optimizers to cease to use tm.h . That means > all macros uses there have to be converted. That does not necessarily > involve target port code - a wrapper hook could be provided in targhooks.c > that uses the target macro. I don't see why RTL optimizers should be different from tree optimizers. And we don't want to pay the overhead of hookization every target dependent constant just for the odd guys who want multi-target compilers that have those constants differing. Richard. > target libraries should also not use tm.h, but predefined macros or built-in > functions. I'm not currently working on that, though. >
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On Wed, Nov 24, 2010 at 09:17, Richard Guenther wrote: > And we don't want to pay the overhead of hookization every target > dependent constant just for the odd guys who want multi-target > compilers that have those constants differing. I would like to know how much this overhead really amounts to. Long term, I would like to see back ends become shared objects that can be selected with a -fbackend=... flag or some such. Removing configure/compile-time macros and other hardwired data is instrumental to that. Diego.
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On Wed, Nov 24, 2010 at 3:33 PM, Diego Novillo wrote: > On Wed, Nov 24, 2010 at 09:17, Richard Guenther > wrote: > >> And we don't want to pay the overhead of hookization every target >> dependent constant just for the odd guys who want multi-target >> compilers that have those constants differing. > > I would like to know how much this overhead really amounts to. Long > term, I would like to see back ends become shared objects that can be > selected with a -fbackend=... flag or some such. Removing > configure/compile-time macros and other hardwired data is instrumental > to that. Well. Long term. Hookizing constants is easy - before proceeding with those (seemingly expensive) ones I'd like to see all the _hard_ target macros converted into hooks. If there are only things like BITS_PER_UNIT left we can talk again. Richard. > > Diego. >
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On Wed, Nov 24, 2010 at 09:37, Richard Guenther wrote: > Well. Long term. Hookizing constants is easy - before proceeding > with those (seemingly expensive) ones I'd like to see all the _hard_ > target macros converted into hooks. If there are only things like > BITS_PER_UNIT left we can talk again. Sure. I mostly wanted to check whether my long term view was compatible with yours. Diego.
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On Wednesday 24 November 2010 13:45:40, Joern Rennecke wrote: > Quoting Pedro Alves : > > > On Tuesday 23 November 2010 20:09:52, Joern Rennecke wrote: > >> If we changed BITS_PER_UNIT into an ordinary piece-of-data 'hook', this > >> would not only cost a data load from the target vector, but would also > >> inhibit optimizations that replace division / modulo / multiply with shift > >> or mask operations. > > > > Have you done any sort of measurement, to see if what is lost > > is actually noticeable in practice? > > No, I haven't. > On an i686 it's probably not measurable. On a host with a slow software > divide it might be, if the code paths that require these operations are > exercised a lot - that would also depend on the source code being compiled. And I imagine that it should be possible to factor out many of the slow divides out of hot loops, if the compiler doesn't manage to do that already. > Also, these separate hooks for common operations can make the code more > readable, particularly in the bits_in_units_ceil case. > I.e. > foo_var = ((bitsize + targetm.bits_per_unit () - 1) > / targetm.bits_per_unit ()); > vs. > foo_var = targetm.bits_in_units_ceil (bitsize); > bits_in_units_ceil could well be a macro or helper function implemented on top of targetm.bits_per_unit (which itself could be a data field instead of a function call), that only accessed bits_per_unit once. It could even be implemented as a helper macro / function today, on top of BITS_PER_UNIT. Making design decisions like this based on supposedly missed optimizations _alone_, without knowing how much overhead we're talking about is really the wrong way to do things. -- Pedro Alves
Possible GCC bug.
I think I may have hit a bug where an implicit copy constructor can't construct an array of a subtype with a user-defined copy constructor. I can't see any hits searching for "invalid array assignment" on the bug repository. I messed up submitting my last bug so I thought I'd ask here first for confirmation. §12.8.28 states: "A copy/move assignment operator that is defaulted and not defined as deleted is implicitly defined when [...] or when it is explicitly defaulted after its first declaration." §12.8.30 (implicitly-defined copy assignment) states: "The implicitly-defined copy assignment operator for a non-union class X performs memberwise copy assignment of its subobjects [...] Each subobject is assigned in the manner appropriate to its type: [...] -- if the subobject is an array, each element is assigned, in the manner appropriate to the element type;" I'm assuming that "the manner appropriate to the element type" means use copy-assignment. At least, that's what seems to happens if the main object's copy-assignment operator is implicitly defined. Yet the above doesn't seem able to compile if: - The main object contains an array of the subobject. - The main object's copy-assignment operator IS explicitly defaulted (§12.8.28). - The subobject's copy-assignment operator isn't implicitly or default defined. TEST SOURCE (Attached): 1) I created the most trivial type (named SFoo) that contains a non-default copy-assignment operator. 2) I created the most trivial type (named SBar) that contains: - an array of SFoo. - an explicitly defaulted copy-assignment operator. 3) I created a function that: - creates two copies of SBar. - assigns one copy to the other. TEST: I compiled using the -std=c++0x option. GCC refuses to compile (11:8: error: invalid array assignment). - If I remove the explicit defaulting of SBar's copy-assignment, it works. - If I default SFoo's copy-assignment, it works. SPECS: GCC: 4.6.0 20101106 (experimental) (GCC) - Using Pedro Lamarão's delegating constructors patch: - http://gcc.gnu.org/ml/gcc-patches/2007-04/msg00620.html - (I can't see this having any effect here). TARGET: x86_64-unknown-linux-gnu SYSTEM: Core2Duo(64), Ubuntu(64) 10.4. TL/DR: (§12.8.28) & (§12.8.30) seem to say attached code should compile. It doesn't. struct SFoo { SFoo& operator = (SFoo const&) { return *this; } // <--(1) FAILS. // =default; // <--(2) WORKS. //void operator = (SFoo const&) {} // <--(3) ALSO FAILS. }; struct SBar { SBar& operator = (SBar const&) =default; // <--(4): WORKS if removed. SFoo M_data[1]; }; int main() { SBar x; SBar y; y = x; }
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On Wed, 24 Nov 2010, Richard Guenther wrote: > Well. Long term. Hookizing constants is easy - before proceeding > with those (seemingly expensive) ones I'd like to see all the _hard_ > target macros converted into hooks. If there are only things like > BITS_PER_UNIT left we can talk again. I think doing easy ones first is natural - the hard ones are those affecting enum values, #if conditionals etc. (which includes a lot of constants), and if you convert the easy ones you can then see what's left. I think good priorities for moving away from target macros include: * Anything in code built for the target (use predefined macros, built-in functions or if appropriate macros defined in headers under libgcc/config/). * Anything that may expand to a function call for some targets and so requires tm_p.h to be included. * Anything clearly not performance-critical - for example, things used only at startup. But Joern has a different set of priorities: * Anything used in front ends. * Anything used in tree optimizers. And that's also fine. What's important is: * Do the conversion rather than spending ages talking about it. * *Think* about the appropriate conversion for a macro or set of macros rather than blindly mirroring the macro semantics in a hook. See for example my recent elimination of HANDLE_SYSV_PRAGMA and HANDLE_PRAGMA_PACK_PUSH_POP by enabling features unconditionally - not everything that is presently configurable by a target necessarily has a good reason for being configurable by a target, and sometimes the existing set of macros may not be a good way of describing what actually does need to be configured in a particular area. Or, in the BITS_PER_UNIT case, making sure to use TYPE_PRECISION (char_type_node) where that seems more appropriate. -- Joseph S. Myers jos...@codesourcery.com
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On Wed, Nov 24, 2010 at 02:48:01PM +, Pedro Alves wrote: > On Wednesday 24 November 2010 13:45:40, Joern Rennecke wrote: > > Quoting Pedro Alves : > > Also, these separate hooks for common operations can make the code more > > readable, particularly in the bits_in_units_ceil case. > > I.e. > > foo_var = ((bitsize + targetm.bits_per_unit () - 1) > > / targetm.bits_per_unit ()); > > vs. > > foo_var = targetm.bits_in_units_ceil (bitsize); > > > > bits_in_units_ceil could well be a macro or helper function > implemented on top of targetm.bits_per_unit (which itself could > be a data field instead of a function call), that only accessed > bits_per_unit once. It could even be implemented as a helper > macro / function today, on top of BITS_PER_UNIT. I think adding the functions as inline functions somewhere and using them in the appropriate places would be a reasonable standalone cleanup. It'd be easy to move towards something more general later. Writing: int bits = ...; ... (X + bits - 1)/ bits; also generates ever-so-slightly smaller code than: ... (X + BITS_PER_UNIT - 1) / BITS_PER_UNIT; on targets where BITS_PER_UNIT is not constant. I personally am not a fan of the X_in_Y naming, though; I think X_to_Y is a little clearer. -Nathan
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
Quoting Richard Guenther : On Wed, Nov 24, 2010 at 3:12 PM, Joern Rennecke wrote: I'm fine with the RTL optimizers to use target macros, but I'd like the frontends and tree optimizers to cease to use tm.h . That means all macros uses there have to be converted. That does not necessarily involve target port code - a wrapper hook could be provided in targhooks.c that uses the target macro. I don't see why RTL optimizers should be different from tree optimizers. RTL optimizers tend to have a lot of target dependencies; hookizing them all is likely impractical, and also to have a performance impact. Also, by making the tree optimizers target independent, you can make optimizations that consider more than one target. Because RTL optimizers work on highly target-dependent program representations, the decision on what target's code to work on has already been fixed by the time the RTL optimizers run. And we don't want to pay the overhead of hookization every target dependent constant just for the odd guys who want multi-target compilers that have those constants differing. As compared to... having a multi-year unfinished hookization process that hasn't provided any new functinality yet. I don't think hookizing the frontends and tree optimizers will have a noticable performance impact. And if you must have the abolute fastest compiler, LTO should eventually be able to inline the hooks if they are really only returning a constant. With regards to BITS_PER_UNIT, the issue is not so much that I really need it hookized for a multi-target compiler - ultimately there have to be consistent structure layout rules for an input program. The issue is that BITS_PER_UNIT is defined in tm.h, and if every file that wants to know BITS_PER_UNIT includes tm.h for that purpose, we'll continue to have hard-to-predict interactions between target, midddle-end and front-end headers in the frontends and tree optimizers, and other macros can creep in unnoticed which work on one target, but not for some other target. Hookizing and poisoning individual macros is only patchwork, and it can actually give higher performance penalties when you hookize the macro even in files that are tighly coupled with the target definitions - as many RTL optimizers are. The only watertight way to make sure frontends do not use macros from tm.h is for them not to include that header, and and neither should any of the headers they need include that header; make this a written policy, and poison TM_H for IN_GCC_FRONTEND . Well. Long term. Hookizing constants is easy - before proceeding with those (seemingly expensive) ones I'd like to see all the _hard_ target macros converted into hooks. If there are only things like BITS_PER_UNIT left we can talk again. The hard parts certainly include target.h and function.h . But these are necessary to get a proper overview on the actual problem, and stopping us from sliding back. When I fix these, a number of files suddenly become exposed as using tm.h without including it themselves. Should I change all these files to explicitly include "tm.h" then, even if it's only for BITS_PER_UNIT?
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On Wed, Nov 24, 2010 at 4:22 PM, Joern Rennecke wrote: > Quoting Richard Guenther : > >> On Wed, Nov 24, 2010 at 3:12 PM, Joern Rennecke >> wrote: >>> >>> I'm fine with the RTL optimizers to use target macros, but I'd like the >>> frontends and tree optimizers to cease to use tm.h . That means >>> all macros uses there have to be converted. That does not necessarily >>> involve target port code - a wrapper hook could be provided in >>> targhooks.c >>> that uses the target macro. >> >> I don't see why RTL optimizers should be different from tree optimizers. > > RTL optimizers tend to have a lot of target dependencies; hookizing them > all is likely impractical, and also to have a performance impact. > > Also, by making the tree optimizers target independent, you can make > optimizations that consider more than one target. > > Because RTL optimizers work on highly target-dependent program > representations, the decision on what target's code to work on has already > been fixed by the time the RTL optimizers run. As we are moving towards doing more target dependent optimizations on the tree level this doesn't sound like a sustainable opinion. GIMPLE is just a representation - whether it is target dependent or not isn't related to that it is GIMPLE or RTL. >> And we don't want to pay the overhead of hookization every target >> dependent constant just for the odd guys who want multi-target >> compilers that have those constants differing. > > As compared to... having a multi-year unfinished hookization process that > hasn't provided any new functinality yet. And hookizing BITS_PER_UNIT brings you closer exactly how much? Tackle the hard ones. Because if you can't solve those you won't succeed ever and there's no reason to pay the price for BITS_PER_UNIT then. > I don't think hookizing the frontends and tree optimizers will have a > noticable performance impact. > And if you must have the abolute fastest compiler, LTO should eventually be > able to inline the hooks if they are really only returning a constant. Not for a multi-target compiler where the hooks are in shared loadable modules like Diego envisions. Maybe we should at least have a way to specify indirect function calls are 'const' or 'pure', I don't know if that works right now, but I doubt it (decl vs. type attributes, etc.). > With regards to BITS_PER_UNIT, the issue is not so much that I really need > it hookized for a multi-target compiler - ultimately there have to be > consistent structure layout rules for an input program. > > The issue is that BITS_PER_UNIT is defined in tm.h, and if every > file that wants to know BITS_PER_UNIT includes tm.h for that purpose, > we'll continue to have hard-to-predict interactions between target, > midddle-end and front-end headers in the frontends and tree optimizers, > and other macros can creep in unnoticed which work on one target, but > not for some other target. > Hookizing and poisoning individual macros is only patchwork, and it > can actually give higher performance penalties when you hookize > the macro even in files that are tighly coupled with the target definitions > - > as many RTL optimizers are. > > The only watertight way to make sure frontends do not use macros from tm.h > is for them not to include that header, and and neither should any of the > headers they need include that header; make this a written policy, and > poison TM_H for IN_GCC_FRONTEND . Well, it was already said that maybe the FEs should use type-precision of char-type-node. I don't know if splitting tm.h into good-for-tree and not-good-for tree is a way to go, but it's certainly a possibility if your short-term goal is to avoid accidential use of target information. Richard.
Boostrap failures on Solaris at gcc/toplev.c stage2 compilation
Hi. This morning's build attempts on both i386-pc-solaris2.10 and sparc-sun-solaris2.10 failed with the following error: /export/home/arth/gnu/gcc-1124/./prev-gcc/xgcc -B/export/home/arth/gnu/gcc-1124/./prev-gcc/ -B/export/home/arth/local/i386-pc-solaris2.10/bin/ -B/export/home/arth/local/i386-pc-solaris2.10/bin/ -B/export/home/arth/local/i386-pc-solaris2.10/lib/ -isystem /export/home/arth/local/i386-pc-solaris2.10/include -isystem /export/home/arth/local/i386-pc-solaris2.10/sys-include-c -g -O2 -gtoggle -DIN_GCC -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -Wold-style-definition -Wc++-compat -DHAVE_CONFIG_H -I. -I. -I/home/ahaas/gnu/gcc.git/gcc -I/home/ahaas/gnu/gcc.git/gcc/. -I/home/ahaas/gnu/gcc.git/gcc/../include -I/home/ahaas/gnu/gcc.git/gcc/../libcpp/include -I/export/home/arth/local/include -I/export/home/arth/local/include -I/home/ahaas/gnu/gcc.git/gcc/../libdecnumber -I/home/ahaas/gnu/gcc.git/gcc/../libdecnumber/dpd -I../libdecnumber/! home/ahaas/gnu/gcc.git/gcc/tree-call-cdce.c -o tree-call-cdce.o /home/ahaas/gnu/gcc.git/gcc/toplev.c: In function 'crash_signal': /home/ahaas/gnu/gcc.git/gcc/toplev.c:445:3: error: implicit declaration of function 'signal' [-Werror=implicit-function-declaration] cc1: all warnings being treated as errors The likely cause is this patch applied yesterday: 2010-11-23 Joseph Myers { ...snip ... } * toplev.c: Don't include or . (setup_core_dumping, strip_off_ending, decode_d_option): Move to opts.c. With the issues involving libquadmath I'm not expect the i386 build to succeed even once this snag is resolved, but my sparc builds have been working, and I had a successful build yesterday morning: $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/export/home/arth/local/libexec/gcc/sparc-sun-solaris2.10/4.6.0/lto-wrapper Target: sparc-sun-solaris2.10 Configured with: /export/home/arth/src/gcc.git/configure --prefix=/export/home/arth/local --enable-languages=c,c++,objc --disable-nls --with-gmp=/export/home/arth/local --with-mpfr=/export/home/arth/local --with-mpc=/export/home/arth/local --enable-checking=release --enable-threads --with-gnu-as --with-as=/export/home/arth/local/bin/as --with-gnu-ld --with-ld=/export/home/arth/local/bin/ld --enable-libstdcxx-pch=no --with-cpu=ultrasparc3 --with-tune=ultrasparc3 Thread model: posix gcc version 4.6.0 20101123 (experimental) [master revision 66b86a7:d759d44:e71ec76db59f8a20d013503e7192680f92872796] (GCC) Art Haas
Re: Boostrap failures on Solaris at gcc/toplev.c stage2 compilation
On Wed, 24 Nov 2010, Art Haas wrote: > This morning's build attempts on both i386-pc-solaris2.10 and > sparc-sun-solaris2.10 failed with the following error: > > /export/home/arth/gnu/gcc-1124/./prev-gcc/xgcc > -B/export/home/arth/gnu/gcc-1124/./prev-gcc/ > -B/export/home/arth/local/i386-pc-solaris2.10/bin/ > -B/export/home/arth/local/i386-pc-solaris2.10/bin/ > -B/export/home/arth/local/i386-pc-solaris2.10/lib/ -isystem > /export/home/arth/local/i386-pc-solaris2.10/include -isystem > /export/home/arth/local/i386-pc-solaris2.10/sys-include-c -g -O2 > -gtoggle -DIN_GCC -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes > -Wmissing-prototypes -Wmissing-format-attribute -pedantic -Wno-long-long > -Wno-variadic-macros -Wno-overlength-strings -Werror -Wold-style-definition > -Wc++-compat -DHAVE_CONFIG_H -I. -I. -I/home/ahaas/gnu/gcc.git/gcc > -I/home/ahaas/gnu/gcc.git/gcc/. -I/home/ahaas/gnu/gcc.git/gcc/../include > -I/home/ahaas/gnu/gcc.git/gcc/../libcpp/include > -I/export/home/arth/local/include -I/export/home/arth/local/include > -I/home/ahaas/gnu/gcc.git/gcc/../libdecnumber > -I/home/ahaas/gnu/gcc.git/gcc/../libdecnumber/dpd -I../libdecnumber/! > home/ahaas/gnu/gcc.git/gcc/tree-call-cdce.c -o tree-call-cdce.o > /home/ahaas/gnu/gcc.git/gcc/toplev.c: In function 'crash_signal': > /home/ahaas/gnu/gcc.git/gcc/toplev.c:445:3: error: implicit declaration of > function 'signal' [-Werror=implicit-function-declaration] > cc1: all warnings being treated as errors > > The likely cause is this patch applied yesterday: > > 2010-11-23 Joseph Myers > { ...snip ... } > * toplev.c: Don't include or . > (setup_core_dumping, strip_off_ending, decode_d_option): Move to > opts.c. I've committed this patch as obvious to fix this. (With glibc, includes , as POSIX permits but does not require, which explains why I didn't see this in my testing.) I do wonder if it really makes sense for includes to go in individual source files or whether it would be better to put more headers in system.h. There may be cases where including a system header means you need to link in extra libraries - in all programs, not just the compilers proper - if it has inline functions (gmp.h and mpfr.h might be like that). But otherwise I think more host-side code should avoid including more system headers itself. Particular headers in point: . There are also several cases of host-side code including headers already included in system.h. Index: toplev.c === --- toplev.c(revision 167122) +++ toplev.c(working copy) @@ -28,6 +28,7 @@ #include "system.h" #include "coretypes.h" #include "tm.h" +#include #ifdef HAVE_SYS_TIMES_H # include Index: ChangeLog === --- ChangeLog (revision 167122) +++ ChangeLog (working copy) @@ -1,3 +1,7 @@ +2010-11-24 Joseph Myers + + * toplev.c: Include . + 2010-11-24 Richard Guenther PR lto/43218 -- Joseph S. Myers jos...@codesourcery.com
Re: Method to test all sse2 calls?
Ian Lance Taylor , wrote: > Tests that directly invoke __builtin functions are not appropriate for > your replacement for emmintrin.h. Clearly. However, I do not see why these are in the test routines in the first place. They seem not to be needed. I made the changes below my signature, eliminating all of the vector builtins, and the programs still worked with both -msse2 and -mno-sse2 plus my software SSE2. If anything the test programs are much easier to understand without the builtins. There is also a (big) problem with sse2-vec-2.c (and -2a, which is empty other than an #include sse2-vec-2.c). There are no explicit sse2 operations within this test program. Moreover, the code within the tests does not work. Finally, if one puts a print statement anywhere in the test that is there, compiles it with: gcc -msse -msse2 there will be no warnings, and the run will appear to show a valid test, but in actuality the test will never execute! This shows part of the problem: gcc -Wall -msse -msse2 -o foo sse2-vec-2.c sse-os-support.h:27: warning: 'sse_os_support' defined but not used sse2-check.h:10: warning: 'do_test' defined but not used (also for -m64) There must be some sort of main in there, but no test, it does nothing and returns a valid status. When stuffed with debug statements: for (i = 0; i < 2; i++) masks[i] = i; printf("DEBUG res[0] %llX\n",res[0]); printf("DEBUG res[1] %llX\n",res[1]); printf("DEBUG val1.ll[0] %llX\n",val1.ll[0]); printf("DEBUG val1.ll[1] %llX\n",val1.ll[1]); for (i = 0; i < 2; i++) if (res[i] != val1.ll [masks[i]]){ printf("DEBUG i %d\n",i); printf("DEBUG masks[i] %d\n",masks[i]); printf("DEBUG val1.ll [masks[i]] %llX\n", val1.ll [masks[i]]); abort (); } and compiled with my software SSE2 gcc -Wall -msse -mno-sse2 -I. -O0 -m32 -lm -DSOFT_SSE2 -DEMMSOFTDBG -o foo sse2-vec-2.c It emits: DEBUG res[0] 3020100 DEBUG res[1] 7060504 DEBUG val1.ll[0] 706050403020100 DEBUG val1.ll[1] F0E0D0C0B0A0908 DEBUG i 0 DEBUG masks[i] 0 DEBUG val1.ll [masks[i]] 706050403020100 Aborted True enough 3020100 != 706050403020100, but what kind of test is that??? Regards, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech changes to sse2-vec-*.c routines to eliminate all of the __builtin calls: ls -1 sse2-vec*dist | grep -v vec-2 | extract -cols 'diff --context=0 [1,-6] [1,]' | execinput *** sse2-vec-1.c2010-11-24 09:06:46.0 -0800 --- sse2-vec-1.c.dist 2010-11-24 09:06:39.0 -0800 *** *** 27,28 ! res[0] = val1.d[msk0]; ! res[1] = val1.d[msk1]; --- 27,28 ! res[0] = __builtin_ia32_vec_ext_v2df ((__v2df)val1.x, msk0); ! res[1] = __builtin_ia32_vec_ext_v2df ((__v2df)val1.x, msk1); *** sse2-vec-3.c2010-11-24 09:09:13.0 -0800 --- sse2-vec-3.c.dist 2010-11-24 09:07:48.0 -0800 *** *** 27,30 ! res[0] = val1.i[0]; ! res[1] = val1.i[1]; ! res[2] = val1.i[2]; ! res[3] = val1.i[3]; --- 27,30 ! res[0] = __builtin_ia32_vec_ext_v4si ((__v4si)val1.x, 0); ! res[1] = __builtin_ia32_vec_ext_v4si ((__v4si)val1.x, 1); ! res[2] = __builtin_ia32_vec_ext_v4si ((__v4si)val1.x, 2); ! res[3] = __builtin_ia32_vec_ext_v4si ((__v4si)val1.x, 3); *** sse2-vec-4.c2010-11-24 09:10:00.0 -0800 --- sse2-vec-4.c.dist 2010-11-24 09:07:48.0 -0800 *** *** 27,34 ! res[0] = val1.s[0]; ! res[1] = val1.s[1]; ! res[2] = val1.s[2]; ! res[3] = val1.s[3]; ! res[4] = val1.s[4]; ! res[5] = val1.s[5]; ! res[6] = val1.s[6]; ! res[7] = val1.s[7]; --- 27,34 ! res[0] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 0); ! res[1] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 1); ! res[2] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 2); ! res[3] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 3); ! res[4] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 4); ! res[5] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 5); ! res[6] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 6); ! res[7] = __builtin_ia32_vec_ext_v8hi ((__v8hi)val1.x, 7); *** sse2-vec-5.c2010-11-24 09:11:09.0 -0800 --- sse2-vec-5.c.dist 2010-11-24 09:07:48.0 -0800 *** *** 27,42 ! res[0] = val1.c[0]; ! res[1] = val1.c[1]; ! res[2] = val1.c[2]; ! res[3] = val1.c[3]; ! res[4] = val1.c[4]; ! res[5] = val1.c[5]; ! res[6] = val1.c[6]; ! res[7] = val1.c[7]; ! res[8] = val1.c[8]; ! res[9] = val1.c[9]; ! res[10] = val1.c[10]; ! res[11] = val1.c[11]; ! res[12] = val1.c[12]; ! res[13] = val1.c[13]; ! res[14] = val1.c[14]; ! res[15] = val1.c[15]; --- 27,42 ! res[0] = __builtin_ia32_vec_ext_v16qi ((__v16qi)val1.x, 0); ! res[1] = __builtin_ia32_vec_ext_v16qi ((__v16qi)val1.x, 1); ! res[2] = __builtin_ia32_vec_ext_v16qi ((__v16qi)val1.x, 2); ! res[3] = __builtin_ia32_vec_ext_v16qi
Re: Method to test all sse2 calls?
"David Mathog" writes: > Ian Lance Taylor , wrote: > >> Tests that directly invoke __builtin functions are not appropriate for >> your replacement for emmintrin.h. > > Clearly. However, I do not see why these are in the test routines in > the first place. They seem not to be needed. I made the changes below > my signature, eliminating all of the vector builtins, and the programs > still worked with both -msse2 and -mno-sse2 plus my software SSE2. If > anything the test programs are much easier to understand without the > builtins. Your changes are relying on a gcc extension which was only recently added, more recently than those tests were added to the testsuite. Only recently did gcc acquire the ability to use [] to access elements in a vector. I agree that your changes look good, although we rarely change existing tests unless there is a very good reason. Avoiding __builtin functions in the gcc testsuite is not in itself a good reason. These tests were written for gcc; they were not written as general purpose SSE tests. > There is also a (big) problem with sse2-vec-2.c (and -2a, which is empty > other than an #include sse2-vec-2.c). There are no explicit sse2 > operations within this test program. Moreover, the code within the > tests does not work. Finally, if one puts a print statement anywhere in > the test that is there, compiles it with: > > gcc -msse -msse2 > > there will be no warnings, and the run will appear to show a valid test, > but in actuality the test will never execute! This shows part of the > problem: > > gcc -Wall -msse -msse2 -o foo sse2-vec-2.c > sse-os-support.h:27: warning: 'sse_os_support' defined but not used > sse2-check.h:10: warning: 'do_test' defined but not used > > (also for -m64) There must be some sort of main in there, but no test, > it does nothing and returns a valid status. The main function is in sse2-check.h. As you can see in that file, the test is only run if the CPU includes SSE2 support. That is fine for gcc's purposes, but I can see that it is problematic for yours. > When stuffed with debug statements: > > for (i = 0; i < 2; i++) > masks[i] = i; > > printf("DEBUG res[0] %llX\n",res[0]); > printf("DEBUG res[1] %llX\n",res[1]); > printf("DEBUG val1.ll[0] %llX\n",val1.ll[0]); > printf("DEBUG val1.ll[1] %llX\n",val1.ll[1]); > for (i = 0; i < 2; i++) > if (res[i] != val1.ll [masks[i]]){ > printf("DEBUG i %d\n",i); > printf("DEBUG masks[i] %d\n",masks[i]); > printf("DEBUG val1.ll [masks[i]] %llX\n", val1.ll [masks[i]]); > abort (); > } > > and compiled with my software SSE2 > > gcc -Wall -msse -mno-sse2 -I. -O0 -m32 -lm -DSOFT_SSE2 -DEMMSOFTDBG -o > foo sse2-vec-2.c > > It emits: > > DEBUG res[0] 3020100 > DEBUG res[1] 7060504 > DEBUG val1.ll[0] 706050403020100 > DEBUG val1.ll[1] F0E0D0C0B0A0908 > DEBUG i 0 > DEBUG masks[i] 0 > DEBUG val1.ll [masks[i]] 706050403020100 > Aborted > > True enough 3020100 != 706050403020100, but what kind of test > is that??? When I run the unmodified test on my system, which has SSE2 support in hardware, I see that res[0] == 0x706050403020100 res[1] == 0xf0e0d0c0b0a0908 So I think you may have misinterpreted the __builtin_ia32_vec_ext_v2di builtin function. That function treats the vector as containing two 8-byte integers, and pulls out one or the other depending on the second argument. Your dumps of res[0] and res[1] suggest that you are treating the vector as four 4-byte integers and pulling out specific ones. Ian
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On Wed, Nov 24, 2010 at 4:32 PM, Richard Guenther wrote: > On Wed, Nov 24, 2010 at 4:22 PM, Joern Rennecke wrote: >> Quoting Richard Guenther : >> >>> On Wed, Nov 24, 2010 at 3:12 PM, Joern Rennecke >>> wrote: I'm fine with the RTL optimizers to use target macros, but I'd like the frontends and tree optimizers to cease to use tm.h . That means all macros uses there have to be converted. That does not necessarily involve target port code - a wrapper hook could be provided in targhooks.c that uses the target macro. >>> >>> I don't see why RTL optimizers should be different from tree optimizers. >> >> RTL optimizers tend to have a lot of target dependencies; hookizing them >> all is likely impractical, and also to have a performance impact. >> >> Also, by making the tree optimizers target independent, you can make >> optimizations that consider more than one target. >> >> Because RTL optimizers work on highly target-dependent program >> representations, the decision on what target's code to work on has already >> been fixed by the time the RTL optimizers run. > > As we are moving towards doing more target dependent optimizations > on the tree level this doesn't sound like a sustainable opinion. GIMPLE > is just a representation - whether it is target dependent or not isn't > related to that it is GIMPLE or RTL. > >>> And we don't want to pay the overhead of hookization every target >>> dependent constant just for the odd guys who want multi-target >>> compilers that have those constants differing. >> >> As compared to... having a multi-year unfinished hookization process that >> hasn't provided any new functinality yet. > > And hookizing BITS_PER_UNIT brings you closer exactly how much? > > Tackle the hard ones. Because if you can't solve those you won't > succeed ever and there's no reason to pay the price for BITS_PER_UNIT > then. Btw, I don't remember what your reason was for hookization, Joern. But I can't see why things like BITS_PER_UNIT cannot be part of the ABI/API between the middle-end and a possible target shared object. If the goal is to emit code for different targets from a single compilation (thus basically make the IL re-targetable) then hookization of BITS_PER_UNIT brings you exactly nothing as values derived from it are stored all over in the IL, so you'd need to fixup all types and decls and possibly re-layout things at the time you switch to a different target. So, Joern, maybe you can clarify what the benefit is in hookizing BITS_PER_UNIT? Thanks, Richard.
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
Quoting Richard Guenther : So, Joern, maybe you can clarify what the benefit is in hookizing BITS_PER_UNIT? The point is that I want to eliminate all tm.h macro uses from the tree optimizer and frontend files, so that they can stop including tm.h . When I first tried putting all patches to eliminate tm.h includes from target.h, function.h and gimple.h together, a missing definition of BITS_PER_UNIT was the first problem that popped up. Also, even if the count of files where BITS_PER_UNIT is the only tm.h macro is low right now, if we don't have a strategy how to deal with it, it'll remain the last macro standing and block all efforts to get rid of the tm.h includes. With our current supported target set, we can actually define BITS_PER_UNIT as constant 8 in system.h - that'd get it out of the way. If we actually actually get different different BITS_PER_UNIT values again, we can generate a header file to define the appropriate value, but with our current target set, there would be little point and little test coverage for doing that.
Re: Method to test all sse2 calls?
Ian Lance Taylor wrote: > Your changes are relying on a gcc extension which was only recently > added, more recently than those tests were added to the testsuite. Only > recently did gcc acquire the ability to use [] to access elements in a > vector. That isn't what my changes did. The array accesses are to the arrays in the union - nothing cutting edge there. The data is accessed through the array specified by .d (or .s etc.) not to name.x[index]. > So I think you may have misinterpreted the __builtin_ia32_vec_ext_v2di > builtin function. That function treats the vector as containing two > 8-byte integers, and pulls out one or the other depending on the second > argument. Your dumps of res[0] and res[1] suggest that you are treating > the vector as four 4-byte integers and pulling out specific ones. Yup, my bad, put in d where it should have been ll. Also fixed the problem I induced in sse2-check.h, where too large a chunk was commented out, that was causing the gcc -Wall -msse2 problem. The changed part in the original source was if ((edx & bit_SSE2) && sse_os_support ()) and is now: #if !defined(SOFT_SSE2) if ((edx & bit_SSE2) && sse_os_support ()) #else if (sse_os_support ()) #endif /*SOFT_SSE2*/ My software SSE2 passes all 165 of the sse2 tests that are complete programs. However, there is a problem in the real world. While the sse2 programs in the testsuite do exercise the _mm* functions, they do so one at a time. I have found that in real code, which makes multiple _mm* calls, if -O0 is not used, the wrong results (may) come out. % gcc -std=gnu99 -g -pg -pthread -O0 -msse -mno-sse2 -DSOFT_SSE2 -m32 -g -pg -DHAVE_CONFIG_H -L../../easel -L.. -L. -I../../easel -I../../easel -I. -I.. -I. -I../../src -Dp7MSVFILTER_TESTDRIVE -o msvfilter_utest ./msvfilter.c -Wl,--start-group -lhmmer -lhmmerimpl -Wall -Wl,--end-group -leasel -lm % ./msvfilter_utest (no output, it ran correctly) % gcc -std=gnu99 -g -pg -pthread -O1 -msse -mno-sse2 -DSOFT_SSE2 -m32 -g -pg -DHAVE_CONFIG_H -L../../easel -L.. -L. -I../../easel -I../../easel -I. -I.. -I. -I../../src -Dp7MSVFILTER_TESTDRIVE -o msvfilter_utest ./msvfilter.c -Wl,--start-group -lhmmer -lhmmerimpl -Wall -Wl,--end-group -leasel -lm % ./msvfilter_utest msv filter unit test failed: scores differ (-50.37, -10.86) Going to higher optimization and there are even bigger issues, like not compiling at all (even with gcc 4.4.1): % gcc -std=gnu99 -g -pg -pthread -O2 -msse -mno-sse2 -DSOFT_SSE2 -m32 -g -pg -DHAVE_CONFIG_H -L../../easel -L.. -L. -I../../easel -I../../easel -I. -I.. -I. -I../../src -Dp7MSVFILTER_TESTDRIVE -o msvfilter_utest ./msvfilter.c -Wl,--start-group -lhmmer -lhmmerimpl -Wall -Wl,--end-group -leasel -lm ../../easel/emmintrin.h:2178: warning: dereferencing pointer '({anonymous})' does break strict-aliasing rules ../../easel/emmintrin.h:2178: note: initialized from here . . (same sort of message many many times) . ./msvfilter.c:208: error: unable to find a register to spill in class 'GENERAL_REGS' ./msvfilter.c:208: error: this is the insn: (insn 1944 1943 1945 46 ../../easel/emmintrin.h:2348 (set (strict_low_part (subreg:HI (reg:TI 1239) 0)) (mem:HI (reg/f:SI 96 [ pretmp.1031 ]) [13 S2 A16])) 47 {*movstricthi_1} (nil)) ./msvfilter.c:208: confused by earlier errors, bailing out Would changing the use of inlined functions to defines let the compiler digest it better? For instance: static __inline __m128i __attribute__((__always_inline__)) _mm_andnot_si128 (__m128i __A, __m128i __B) { return (~__A) & __B; } becomes #define _mm_andnot_si128(A,B) (~A & B) That approach will get really messy for the more complicated _mm*. In general terms, can somebody give me a hint as to the sorts of things that if found in inlined functions might cause the compiler to optimize to invalid code? Thanks, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On Nov 24, 2010, at 4:04 PM, Joern Rennecke wrote: > Quoting Richard Guenther : > >> So, Joern, maybe you can clarify what the benefit is in hookizing >> BITS_PER_UNIT? > > The point is that I want to eliminate all tm.h macro uses from the > tree optimizer and frontend files, so that they can stop including > tm.h . When I first tried putting all patches to eliminate tm.h includes > from target.h, function.h and gimple.h together, a missing definition of > BITS_PER_UNIT was the first problem that popped up. Also, even if the > count of files where BITS_PER_UNIT is the only tm.h macro is low right now, > if we don't have a strategy how to deal with it, it'll remain the last > macro standing and block all efforts to get rid of the tm.h includes. > With our current supported target set, we can actually > define BITS_PER_UNIT as constant 8 in system.h - that'd get it out > of the way. > If we actually actually get different different BITS_PER_UNIT values again, > we can generate a header file to define the appropriate value, but > with our current target set, there would be little point and little test > coverage for doing that. If BITS_PER_UNIT is all that's left, could you use some genxxx.c to extract that from tm.h and drop it into a tm-bits.h in the build directory? Then you could include that one instead of tm.h. paul
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
Quoting Paul Koning : If BITS_PER_UNIT is all that's left, could you use some genxxx.c to extract that from tm.h and drop it into a tm-bits.h in the build directory? Then you could include that one instead of tm.h. Yes, that's what I said. Only there is little point in writing the generator program right now if all it ever does is spit out #define BITS_PER_UNIT 8 We can add the generator program when we (re-) add a word addressed target, or add a bit addressed one.
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On Wed, Nov 24, 2010 at 10:04 PM, Joern Rennecke wrote: > Quoting Richard Guenther : > >> So, Joern, maybe you can clarify what the benefit is in hookizing >> BITS_PER_UNIT? > > The point is that I want to eliminate all tm.h macro uses from the > tree optimizer and frontend files, so that they can stop including > tm.h . When I first tried putting all patches to eliminate tm.h includes > from target.h, function.h and gimple.h together, a missing definition of > BITS_PER_UNIT was the first problem that popped up. Also, even if the > count of files where BITS_PER_UNIT is the only tm.h macro is low right now, > if we don't have a strategy how to deal with it, it'll remain the last > macro standing and block all efforts to get rid of the tm.h includes. > With our current supported target set, we can actually > define BITS_PER_UNIT as constant 8 in system.h - that'd get it out > of the way. > If we actually actually get different different BITS_PER_UNIT values again, > we can generate a header file to define the appropriate value, but > with our current target set, there would be little point and little test > coverage for doing that. What's the benefit of not including tm.h in the tree optimizers and frontend files to our users? Richard.
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
Quoting Richard Guenther : What's the benefit of not including tm.h in the tree optimizers and frontend files to our users? We should see less instability of frontends and tree optimizers for less-often tested targets. This can prevent release cycles from getting longer, and/or allow more work to be accomplished in a release cycle. These files should compile the same for different target configurations (assuming we don't have a BITS_PER_UNIT discrepancy). With tm.h included, you can never quite tell what's going on. (unless you want to analyze every every source file for every target configuration - that can be done in finite time, but not necessarily before the code is obsolete.) So you should be able to build the frontend once and use it in multiple compilers, e.g. a native one and across-compiler to a netbook with uses a different processor. More importantly, CPU-GPU programming certainly coming, and a multi-target compiler should eventuallyprovide a tool to use such a heterogenous system without having to do all the partitioning and interworking by hand.
Re: Method to test all sse2 calls?
"David Mathog" writes: > Ian Lance Taylor wrote: > >> Your changes are relying on a gcc extension which was only recently >> added, more recently than those tests were added to the testsuite. Only >> recently did gcc acquire the ability to use [] to access elements in a >> vector. > > That isn't what my changes did. The array accesses are to the arrays in > the union - nothing cutting edge there. The data is accessed through > the array specified by .d (or .s etc.) not to name.x[index]. Oh, sorry, completely misunderstood. In that case, it seems to me that your changes are causing the tests to no longer test what they should: the code generation resulting from the specific gcc builtins, now available as a gcc extension. > Would changing the use of inlined functions to defines let the compiler > digest it better? For instance: > > static __inline __m128i __attribute__((__always_inline__)) > _mm_andnot_si128 (__m128i __A, __m128i __B) > { > return (~__A) & __B; > } > > becomes > > #define _mm_andnot_si128(A,B) (~A & B) > > That approach will get really messy for the more complicated _mm*. I can't think of any reason why that would help. > In general terms, can somebody give me a hint as to the sorts of things > that if found in inlined functions might cause the compiler to optimize > to invalid code? The usual issue is invalid aliasing; see the docs for the -fstrict-aliasing option. I don't know if that is the problem here. Ian
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On 24/11/2010 14:17, Richard Guenther wrote: > I don't see why RTL optimizers should be different from tree optimizers. I thought half the point of tree-ssa in the first place was to separate optimisation out from target-specific stuff and do it on an independent level? On 24/11/2010 15:32, Richard Guenther wrote: > As we are moving towards doing more target dependent optimizations > on the tree level this doesn't sound like a sustainable opinion. Wait, we're doing that? Isn't that the same mistake we made earlier? On 24/11/2010 14:17, Richard Guenther wrote: > And we don't want to pay the overhead of hookization every target > dependent constant just for the odd guys who want multi-target > compilers that have those constants differing. Why not? Precisely how big is this cost? Back in the old days we all used to want to avoid virtual functions, because of the cost of a function-call-through-pointer, but that certainly isn't justified any more and may not even have been then. > a multi-target compiler where the hooks are in shared loadable > modules It's not just Diego who envisions that, I think it would be an excellent long-term goal too. And I thought that was why all the work to hookize macros was motivated in the first place. cheers, DaveK
Re: RFD: hookizing BITS_PER_UNIT in tree optimizers / frontends
On 24/11/2010 21:31, Joern Rennecke wrote: > Quoting Paul Koning : > >> If BITS_PER_UNIT is all that's left, could you use some genxxx.c to >> extract that from tm.h and drop it into a tm-bits.h in the build >> directory? Then you could include that one instead of tm.h. > > Yes, that's what I said. Only there is little point in writing > the generator program right now if all it ever does is spit out > #define BITS_PER_UNIT 8 > > We can add the generator program when we (re-) add a word addressed > target, or add a bit addressed one. I do think that this goal is not so far off that we should actually encourage new code to break it. I built gcc 4.5.0 based on a 24-bit word size recently, just in order to get the driver to work and the actual compiler itself to successfully init itself and compile an empty file without crashing, and that proved entirely practical, so we might not be so far off as one might assume. That shows that the core is already substantially independent from the target, I think, and that we could go further with that independence. cheers, DaveK