how to distinguish patched GCCs
Hi, Abstract :) === A means to distinguish a patched GCC release from a vanilla GCC release should be added. This would enable developers to work around incompatibilities between GCC releases in public header files. One macro, defined only by the respective distributor, could uniquely identify the distribution and its version. Problem === As you're surely aware, most people don't actually use GCC as released by the FSF (aka. "vanilla GCC") but use the versions as packaged by their distribution (whether that be Linux, MacOS, or whatever else) (aka. "patched GCC"). Often these patched GCCs include all patches from the SVN branch available at the time of packaging + some distribution specific changes. This leads to incompatiblities between GCC releases that all identify themselves with the same version number. The most recent issue, which prompts this mail, was the AVX maskstore interface change between 4.5.2 and 4.5.3 [1]. There are other conceivable examples; e.g. I experienced compilation errors (failed inlining) with Fedora GCC, while the same versions of vanilla GCC compiled my code just fine. Ideally, a software project would be able to use configure checks and feature- macros to work around those issues, and thus support all (vanilla and patched) GCC releases. But it's not always that "easy". If the code in question resides in a public header of a library, the feature-macros would have to be correctly defined by every project that makes use of them, which is a major problem. suggested solution == GCC should provide (an) additional predefined macro(s) to distinguish a patched GCC from vanilla GCC. This/These macro(s) should be sufficient to uniquely identify every released GCC from each other. This must also include updates to distribution packages, which could fix or introduce a problem. Idea: add the following macro: __GNUC_DISTRIBUTOR___ This macro is defined in releases of GCC that are prepared by entities other than the FSF. The actual name of the macro depends on the value set by the packager. A list of known names can be found at . This macro expands to a number that uniquely identifies the package. The actual format of the number is defined by the distributor, but it is recommended that distributors define the value like this: * 0x1 + * 0x100 + (or call it __GNUC__VERSION__ ?) and the value of the macro would be set by a configure switch to GCC and would have values like "REDHAT", "UBUNTU", "SUSE", ... Rationale = - We can't expect distributors to only ship vanilla GCC packages (even if I'd prefer that). - We can't expect that incompatibilities between GCC releases with the exact same version number will never occur again. - We can't expect software developers to correctly define compiler-specific feature-macros for the header files of the libraries they use. - A means to distinguish different releases of a given GCC version is currently not available. => The suggested macro would make it possible for library headers to work with all released GCCs, without additional work for the library user. How to go forward = I'd look into implementing this for GCC 4.7, if you like the idea. Unless, of course, somebody else prefers to do it instead. :-) [1] https://bugs.launchpad.net/bugs/780551 Cheers, Matthias -- Dipl.-Phys. Matthias Kretz http://compeng.uni-frankfurt.de/?mkretz SIMD easy and portable: http://compeng.uni-frankfurt.de/?vc
Re: how to distinguish patched GCCs
On Thu, May 26, 2011 at 12:06 PM, Matthias Kretz wrote: > Hi, > > Abstract :) > === > A means to distinguish a patched GCC release from a vanilla GCC > release should be added. This would enable developers to work > around incompatibilities between GCC releases in public header > files. One macro, defined only by the respective distributor, > could uniquely identify the distribution and its version. > > > Problem > === > As you're surely aware, most people don't actually use GCC as released by the > FSF (aka. "vanilla GCC") but use the versions as packaged by their > distribution (whether that be Linux, MacOS, or whatever else) (aka. "patched > GCC"). Often these patched GCCs include all patches from the SVN branch > available at the time of packaging + some distribution specific changes. > > This leads to incompatiblities between GCC releases that all identify > themselves with the same version number. The most recent issue, which prompts > this mail, was the AVX maskstore interface change between 4.5.2 and 4.5.3 [1]. > There are other conceivable examples; e.g. I experienced compilation errors > (failed inlining) with Fedora GCC, while the same versions of vanilla GCC > compiled my code just fine. > > Ideally, a software project would be able to use configure checks and feature- > macros to work around those issues, and thus support all (vanilla and patched) > GCC releases. But it's not always that "easy". If the code in question resides > in a public header of a library, the feature-macros would have to be correctly > defined by every project that makes use of them, which is a major problem. > > suggested solution > == > GCC should provide (an) additional predefined macro(s) to distinguish a > patched GCC from vanilla GCC. This/These macro(s) should be sufficient to > uniquely identify every released GCC from each other. This must also include > updates to distribution packages, which could fix or introduce a problem. > > Idea: > add the following macro: > __GNUC_DISTRIBUTOR___ > This macro is defined in releases of GCC that are prepared by entities other > than the FSF. The actual name of the macro depends on the value set by the > packager. A list of known names can be found at . > This macro expands to a number that uniquely identifies the package. The > actual format of the number is defined by the distributor, but it is > recommended that distributors define the value like this: > * 0x1 + * 0x100 > + > > (or call it __GNUC__VERSION__ ?) > > and the value of the macro would be set by a configure switch to GCC > and would have values like "REDHAT", "UBUNTU", "SUSE", ... How would that help with vendors releasing updates with fixes? So no, I don't like the idea at all. Use configure-time checks instead. The cases where you have to work around compiler issues in a _header_ file should be very rare. > Rationale > = > - We can't expect distributors to only ship vanilla GCC packages (even if I'd > prefer that). > - We can't expect that incompatibilities between GCC releases with the exact > same version number will never occur again. > - We can't expect software developers to correctly define compiler-specific > feature-macros for the header files of the libraries they use. > - A means to distinguish different releases of a given GCC version is > currently not available. It is. Vendors use (or should use) --with-pkgversion, so you get > gcc --version gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973] Copyright (C) 2008 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. and yes, we (SUSE) even adjust the patchlevel version of the compiler down to the last released version instead of keeping the next-to-be released version when picking from a release branch (vanilla GCC would say 'gcc 4.3.5 20100927 [gcc-4_3-branch revision 152973]' or similar. > => The suggested macro would make it possible for library headers to work with > all released GCCs, without additional work for the library user. Only if all vendors use it that way. A better solution is to guard those cases in library headers via a library specific define, defaulting to a "safe" version. That then even works for vendors the library implementor does not know about (you can't even enumerate all vendors, so it's really a pointless approach). Richard. > How to go forward > = > I'd look into implementing this for GCC 4.7, if you like the idea. Unless, of > course, somebody else prefers to do it instead. :-) > > [1] https://bugs.launchpad.net/bugs/780551 > > Cheers, > Matthias > -- > Dipl.-Phys. Matthias Kretz > http://compeng.uni-frankfurt.de/?mkretz > > SIMD easy and portable: http://compeng.uni-frankfurt.de/?vc >
Re: how to distinguish patched GCCs
On Thu, May 26, 2011 at 12:06:18PM +0200, Matthias Kretz wrote: > suggested solution > == > GCC should provide (an) additional predefined macro(s) to distinguish a > patched GCC from vanilla GCC. This/These macro(s) should be sufficient to > uniquely identify every released GCC from each other. This must also include > updates to distribution packages, which could fix or introduce a problem. We (Fedora/RHEL) already use something like that, in particular #define __GNUC__ 4 #define __GNUC_MINOR__ 6 #define __GNUC_PATCHLEVEL__ 0 #define __GNUC_RH_RELEASE__ 7 means GCC 4.6-RH 4.6.0-7 (like SUSE, we decrease patchlevel version to the last released version if any, so 4.6.0 with non-zero __GNUC_RH_RELEASE__ means based on 4.6 branch after 4.6.0 release (which normally presents itself as 4.6.1 prerelease). We've used it a couple of times e.g. in our glibc headers, so that we could start earlier using backported features like _FORTIFY_SOURCE, warning/error attributes, gnu_inline including C++, __builtin_va_arg_pack etc. Jakub
Help with specifying processor pipeline GCC4.5.1
Hello All, I need some help with setting the pipeline hazard recognizer (I am working with gcc v4.5.1 for a private target). A brief pipeline description of my target: We have 2 functional units 1) For multiplication. 2) For All other instructions. a) Multiply instructions are not pipelined. b) It takes 4 cycles to execute a multiply instruction. c) The result of multiply instruction will be available after 4 cycles. So there should be a 4 cycle gap between 2 multiply instructions (independent/dependent) and also its depend instructions (other than multiply). e.g.1: mult R3, R4, R5 -- (A) add R0, R1, R2 mult R7, R8, R9 -- (B) A) & (B) are independent. This is a pipeline error. Need to add 2 NOP's or schedule 2 other independent instructions before (B). e.g.2: mult R3, R4, R5 --(A) add R7, R8, R9 add R5, R1, R2 --(B) (A) & (B) are dependent. Even though there is no pipeline error, the value of "R5" used will not be the updated one as 'mult' takes 4 cycles for the result to be available. Need to add 2 NOP's or schedule 2 other independent instructions before (B). I have done the following, but not sure if this will take care of: A) 4 cycle gap between 2 Independent multiply instructions B) 4 cycle gap beween multiply and any other dependent instruction. (define_automaton "pipeline") (define_cpu_unit "simple" "pipeline") (define_cpu_unit "mult" "pipeline") (define_insn_reservation "any_insn" 1 (eq_attr "type" "!mul") "simple") (define_insn_reservation "mult" 4 (eq_attr "type" "mul") "mult*4") In case other independent instructions are not available to be scheduled for this latency, i will be inserting NOP's from the backend. But i want to make sure the correct info is passed to the scheduler. Any comments/suggestions? Thanks, Rohit
Re: Deprecating mips-openbsd
On 05/24/2011 10:40 AM, Andrew Haley wrote: > On 23/05/11 19:35, Richard Sandiford wrote: >> According to: >> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47110 >> >> mips-openbsd does not build in 4.6. I haven't seen any activity >> on this port for years. Would anyone object to its deprecation? > > I'm going to forward this to openbsd. Please wait a bit for a reply. Assuming this is the old 32-bit a.out port, they seem to be ok with it being deprecated. Andrew.
Wrong code: missing input reload
Trying to track faulty code generation because of a missing input reload, I got lost in reload and need some help. The insn to reload (insn 7) is (set (subreg:QI (reg:HI 28) 0) (const_int 0)) This insn generates one output reload (.ira dump) Reloads for insn # 7 Reload 0: reload_out (HI) = (reg/v:HI 28 r28 [orig:43 y ] [43]) GENERAL_REGS, RELOAD_FOR_OUTPUT (opnum = 0) reload_out_reg: (reg/v:HI 28 r28 [orig:43 y ] [43]) reload_reg_rtx: (reg:HI 24 r24) which eventually generates code (insn 7 6 17 2 (set (reg:QI 24 r24) (const_int 0 [0])) pr46779-1.c:34 4 {*movqi} (nil)) (insn 17 7 8 2 (set (reg/v:HI 28 r28 [orig:43 y ] [43]) (reg:HI 24 r24)) pr46779-1.c:34 10 {*movhi} (nil)) so there is a missing input reload, i.e. prior to insn 7 there must be something like (set (reg:HI 28) (reg:HI 24)) find_reloads for insn 7 calls push_reload just once with in = 0 out = (subreg:QI (reg:HI 28) 0) outmode = QImode strict_low = 0 optional = 0 type = RELOAD_FOR_OUTPUT Is that correct so far? Or should push_reload also be called for type = RELOAD_FOR_INPUT or with another reload type? find_reloads has a comment that push_reload will do some fixes: /* Any constants that aren't allowed and can't be reloaded into registers are here changed into memory references. */ for (i = 0; i < noperands; i++) if (! goal_alternative_win[i]) { rtx op = recog_data.operand[i]; rtx subreg = NULL_RTX; rtx plus = NULL_RTX; enum machine_mode mode = operand_mode[i]; /* Reloads of SUBREGs of CONSTANT RTXs are handled later in push_reload so we have to let them pass here. */ if (GET_CODE (op) == SUBREG) { subreg = op; op = SUBREG_REG (op); mode = GET_MODE (op); } the code actually enters the last SUBREG block for insn 7 = gcc 4.7 configured ../../gcc.gnu.org/trunk/configure --target=avr --prefix=/local/gnu/install/gcc-4.7 --disable-nls --disable-shared --enable-languages=c,c++ = gcc called with -Os -dp -save-temps -mmcu=atmega128 -S -v = The C source is: struct S { unsigned char a, b; } ab; void yoo (struct S y) { asm volatile ("ldi %B0, 56" : "+y" (y)); y.a = 0; asm ("; y = %0" : "+y" (y)); ab = y; } = insn 7 is generated from y.a = 0. The C source does some asm hacking to produce this bug. This is because otherwise the bug is hard to reproduce and the source would be rather lengthy and much harder to debug. Register y=r29:r28 is the frame pointer, which is not needed for this function. The avr backend had problems ever since because the frame pointer spans two registers but many places in GCC just do their tests like if (x == FRAME_POINTER_REGNUM) The current implementation of HARD_REGNO_MODE_OK allows HI for r28 but denies QI for r28 and r29. Older versions of avr-gcc tried several other approaches to work around all of this like checking for frame_pointer_needed, reload_completed, reload_in_progress etc. in avr.c:avr_hard_regno_mode_ok. Each approach lead to some problems like faulty code or spill failures. The problem disappears when QI is allowed in r28/r28, but I do not know enough of reload and fp elimination to know if that would be an appropriate fix. Anyway, all this shows that there is something going wrong in reload, both in 4.7, 4.6 and 4.5. Moreover, the first asm needs a "volatile" in order not to get thrown away. This indicates that there is an error in life analysis because some part fails to detect that y.a = 0 just changes only a part of the register. The life analysis is correct for the QI part, i.e. for r24 where r28 gets spilled to, but it is wrong for the overall action.
Re: Wrong code: missing input reload
Georg-Johann Lay wrote: > so there is a missing input reload, i.e. prior to insn 7 there must be > something like > > (set (reg:HI 28) > (reg:HI 24)) > Typo, that should read: (set (reg:HI 24) (reg:HI 28)) prior to insn 7.
Re: finding the induction variable after graphite (before ivcanon pass)?
On 05/24/2011 10:09 PM, Sebastian Pop wrote: One change that I introduced sometime in February is that some reductions are not translated to a zero dim array to make the dependence test work on some of the interchange testcases. With this change, are we going to also create privatized copies for the reduction variables that are not translated into zero dim arrays? Hi Sebastian, Could you please provide some more details about these reductions? An example of loop nest or testcase with such reductions would be very helpful. In current graphite-opencl implementation only zero dim arrays can be privatized (but local scalar variables are always private). Currently we are not able to handle reduction between OpenCL kernels, so loops like this can not be transformed into kernel: for (i = 0; i < N; i++) sum += A[i]; We are only able to handle reduction inside kernel's body by privatizing local reduction variable: for (j = 0; j < M; j++) { int sum = 0; for (i = 0; i < N; i++) sum+= B[i]; } In this case we privatize sum[0] (zero dim array created from scalar variable) inside kernel's body (outer loop will be replaced by kernel launch). -- Alexey Kravets kayr...@ispras.ru
Re: finding the induction variable after graphite (before ivcanon pass)?
On Thu, May 26, 2011 at 09:58, Alexey Kravets wrote: > On 05/24/2011 10:09 PM, Sebastian Pop wrote: >> >> One change that I introduced sometime in February is that some reductions >> are not translated to a zero dim array to make the dependence test work >> on some of the interchange testcases. With this change, are we going to >> also create privatized copies for the reduction variables that are not >> translated into zero dim arrays? >> > > Hi Sebastian, > Could you please provide some more details about these reductions? > An example of loop nest or testcase with such reductions would be very > helpful. I think that most of the interchange-*.c files contain such reductions that were not previously handled by the data dependence analysis, and thus not interchanged. I can see interchange-1.c and interchange-7.c have reductions. > In current graphite-opencl implementation only zero dim arrays can be > privatized (but local scalar variables are always private). > > Currently we are not able to handle reduction between OpenCL kernels, > so loops like this can not be transformed into kernel: > > for (i = 0; i < N; i++) > sum += A[i]; > > We are only able to handle reduction inside kernel's body by privatizing > local reduction variable: > > for (j = 0; j < M; j++) > { > int sum = 0; > for (i = 0; i < N; i++) > sum+= B[i]; > > } > > In this case we privatize sum[0] (zero dim array created from scalar > variable) inside kernel's body (outer loop will be replaced by kernel > launch). I see. Would it be possible to strip mine the reduction loop that you say not handled yet, and then translate to opencl the partial sums? Thanks, Sebastian
Re: Wrong code: missing input reload
On 05/26/2011 06:53 AM, Georg-Johann Lay wrote: > Trying to track faulty code generation because of a missing input > reload, I got lost in reload and need some help. > > The insn to reload (insn 7) is > > (set (subreg:QI (reg:HI 28) 0) > (const_int 0)) > > This insn generates one output reload (.ira dump) > > Reloads for insn # 7 > Reload 0: reload_out (HI) = (reg/v:HI 28 r28 [orig:43 y ] [43]) > GENERAL_REGS, RELOAD_FOR_OUTPUT (opnum = 0) > reload_out_reg: (reg/v:HI 28 r28 [orig:43 y ] [43]) > reload_reg_rtx: (reg:HI 24 r24) > > which eventually generates code > > (insn 7 6 17 2 (set (reg:QI 24 r24) > (const_int 0 [0])) pr46779-1.c:34 4 {*movqi} > (nil)) > > (insn 17 7 8 2 (set (reg/v:HI 28 r28 [orig:43 y ] [43]) > (reg:HI 24 r24)) pr46779-1.c:34 10 {*movhi} > (nil)) > > so there is a missing input reload... Don't see a strict-low-part here. Why do you believe that this should have an input reload? > i.e. prior to insn 7 there must be > something like > > (set (reg:HI 28) > (reg:HI 24)) Why do you believe that? It looks to me that we've done exactly as requested, namely, set the low BITS_PER_UNIT of r28 to zero while leaving the high BITS_PER_UNIT of r28 undefined. Perhaps the original subreg shouldn't have been there? r~
Re: [RFC] alpha/ev6: model 1-cycle cross-cluster delay
On 05/24/2011 08:52 PM, Matt Turner wrote: > Alpha EV6 and newer can execute four instructions per cycle if correctly > scheduled. The architecture has two clusters {0, 1}, each with its own > register file. In each cluster, there are two slots {upper, lower}. Some > instructions only execute from either upper or lower slots. > > Register values produced in one cluster take 1 cycle to appear in the > other cluster, so improperly scheduled instructions may incur a cross- > cluster delay. Given the lack of control of how insns are dispatched to clusters, this is essentially an intractable problem. One can manage clusters only in extremely rare situations in hand-tuned assembly. Namely: (1) One has to start with an empty re-order queue. Such as on transition to/from PALcode, at the beginning of an align 16 block of code. (2) One has to pad with lots of nearly-nops in order to keep the dispatch to the various pipelines aligned with the programmer's idea of how dispatch is occurring. > - The CWG lists the latency of unconditional branches and jsr/call >instructions as 3, whereas we have 1. I guess this latency value is >only meaningful if the instruction produces a value? I'm a bit >confused by this value in the CWG since it lists the latency of >conditional branches as N/A, while these other types of branches as >3, although none produce a register value. They produce a value -- the return address. It's $31 in most unconditional branches, but it's still there. > - I also see that fadd/fcmov/fmul instructions take an extra two cycles >when the consumer is fst/ftoi, so something similar should be added >for them. Can a (define_bypass ...) function specify a latency value >greater than the default latency? Yes. r~
Re: Wrong code: missing input reload
> Don't see a strict-low-part here. Why do you believe that this > should have an input reload? This is AVR so QImode is the word mode and the strict-low-part is implicit. > Perhaps the original subreg shouldn't have been there? Yes, I'd think that everything in the RTL middle-end expects word-mode subregs of double-word-mode hard regs to be simplifiable. -- Eric Botcazou
Re: Wrong code: missing input reload
Eric Botcazou schrieb: Perhaps the original subreg shouldn't have been there? Yes, I'd think that everything in the RTL middle-end expects word-mode subregs of double-word-mode hard regs to be simplifiable. You are right, I was staring at the wrong place. subreg of hardreg should not be there.
Re: Wrong code: missing input reload
> You are right, I was staring at the wrong place. subreg of hardreg > should not be there. You can take a look at PR target/48830, this is a related problem for the SPARC where reload generates: (set (reg:SI 708 [ D.2989+4 ]) (subreg:SI (reg:DI 72 %f40) 4)) and (subreg:SI (reg:DI 72 %f40) 4) isn't simplifiable either. H.P. wrote a tentative patch for the subreg machinery to forbid this. Other references are: http://gcc.gnu.org/ml/gcc-patches/2008-07/msg01688.html http://gcc.gnu.org/ml/gcc-patches/2008-08/msg01743.html -- Eric Botcazou
gcc-4.5-20110526 is now available
Snapshot gcc-4.5-20110526 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20110526/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.5 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch revision 174310 You'll find: gcc-4.5-20110526.tar.bz2 Complete GCC MD5=f6d04cf21c7ffe7e15378458d8d86332 SHA1=a904dc3bbad7a573ac848a413557c290c2ba2359 Diffs from 4.5-20110519 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.5 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.