date:20110526

how to distinguish patched GCCs

2011-05-26 Thread Matthias Kretz

Hi,

Abstract :)
===
 A means to distinguish a patched GCC release from a vanilla GCC
 release should be added.   This would enable developers to work
 around incompatibilities between  GCC releases in public header
 files.  One macro, defined  only by the respective distributor,
 could uniquely identify the distribution and its version.


Problem
===
As you're surely aware, most people don't actually use GCC as released by the 
FSF (aka. "vanilla GCC") but use the versions as packaged by their 
distribution (whether that be Linux, MacOS, or whatever else) (aka. "patched 
GCC"). Often these patched GCCs include all patches from the SVN branch 
available at the time of packaging + some distribution specific changes.

This leads to incompatiblities between GCC releases that all identify 
themselves with the same version number. The most recent issue, which prompts 
this mail, was the AVX maskstore interface change between 4.5.2 and 4.5.3 [1]. 
There are other conceivable examples; e.g. I experienced compilation errors 
(failed inlining) with Fedora GCC, while the same versions of vanilla GCC 
compiled my code just fine.

Ideally, a software project would be able to use configure checks and feature-
macros to work around those issues, and thus support all (vanilla and patched) 
GCC releases. But it's not always that "easy". If the code in question resides 
in a public header of a library, the feature-macros would have to be correctly 
defined by every project that makes use of them, which is a major problem.

suggested solution
==
GCC should provide (an) additional predefined macro(s) to distinguish a 
patched GCC from vanilla GCC. This/These macro(s) should be sufficient to 
uniquely identify every released GCC from each other. This must also include 
updates to distribution packages, which could fix or introduce a problem.

Idea:
add the following macro:
__GNUC_DISTRIBUTOR___
  This macro is defined in releases of GCC that are prepared by entities other 
than the FSF. The actual name of the macro depends on the value set by the 
packager. A list of known names can be found at .
  This macro expands to a number that uniquely identifies the package. The 
actual format of the number is defined by the distributor, but it is 
recommended that distributors define the value like this:
 * 0x1 +  * 0x100 
+ 

(or call it __GNUC__VERSION__ ?)

 and the value of the macro would be set by a configure switch to GCC 
and would have values like "REDHAT", "UBUNTU", "SUSE", ...

Rationale
=
- We can't expect distributors to only ship vanilla GCC packages (even if I'd
  prefer that).
- We can't expect that incompatibilities between GCC releases with the exact
  same version number will never occur again.
- We can't expect software developers to correctly define compiler-specific
  feature-macros for the header files of the libraries they use.
- A means to distinguish different releases of a given GCC version is
  currently not available.

=> The suggested macro would make it possible for library headers to work with
   all released GCCs, without additional work for the library user.

How to go forward
=
I'd look into implementing this for GCC 4.7, if you like the idea. Unless, of 
course, somebody else prefers to do it instead. :-)

[1] https://bugs.launchpad.net/bugs/780551

Cheers,
Matthias
-- 
Dipl.-Phys. Matthias Kretz
http://compeng.uni-frankfurt.de/?mkretz

SIMD easy and portable: http://compeng.uni-frankfurt.de/?vc

Re: how to distinguish patched GCCs

2011-05-26 Thread Richard Guenther

On Thu, May 26, 2011 at 12:06 PM, Matthias Kretz
 wrote:
> Hi,
>
> Abstract :)
> ===
>     A means to distinguish a patched GCC release from a vanilla GCC
>     release should be added.   This would enable developers to work
>     around incompatibilities between  GCC releases in public header
>     files.  One macro, defined  only by the respective distributor,
>     could uniquely identify the distribution and its version.
>
>
> Problem
> ===
> As you're surely aware, most people don't actually use GCC as released by the
> FSF (aka. "vanilla GCC") but use the versions as packaged by their
> distribution (whether that be Linux, MacOS, or whatever else) (aka. "patched
> GCC"). Often these patched GCCs include all patches from the SVN branch
> available at the time of packaging + some distribution specific changes.
>
> This leads to incompatiblities between GCC releases that all identify
> themselves with the same version number. The most recent issue, which prompts
> this mail, was the AVX maskstore interface change between 4.5.2 and 4.5.3 [1].
> There are other conceivable examples; e.g. I experienced compilation errors
> (failed inlining) with Fedora GCC, while the same versions of vanilla GCC
> compiled my code just fine.
>
> Ideally, a software project would be able to use configure checks and feature-
> macros to work around those issues, and thus support all (vanilla and patched)
> GCC releases. But it's not always that "easy". If the code in question resides
> in a public header of a library, the feature-macros would have to be correctly
> defined by every project that makes use of them, which is a major problem.
>
> suggested solution
> ==
> GCC should provide (an) additional predefined macro(s) to distinguish a
> patched GCC from vanilla GCC. This/These macro(s) should be sufficient to
> uniquely identify every released GCC from each other. This must also include
> updates to distribution packages, which could fix or introduce a problem.
>
> Idea:
> add the following macro:
> __GNUC_DISTRIBUTOR___
>  This macro is defined in releases of GCC that are prepared by entities other
> than the FSF. The actual name of the macro depends on the value set by the
> packager. A list of known names can be found at .
>  This macro expands to a number that uniquely identifies the package. The
> actual format of the number is defined by the distributor, but it is
> recommended that distributors define the value like this:
>  * 0x1 +  * 0x100
> + 
>
> (or call it __GNUC__VERSION__ ?)
>
>  and the value of the macro would be set by a configure switch to GCC
> and would have values like "REDHAT", "UBUNTU", "SUSE", ...

How would that help with vendors releasing updates with fixes?

So no, I don't like the idea at all.  Use configure-time checks instead.

The cases where you have to work around compiler issues in a
_header_ file should be very rare.

> Rationale
> =
> - We can't expect distributors to only ship vanilla GCC packages (even if I'd
>  prefer that).
> - We can't expect that incompatibilities between GCC releases with the exact
>  same version number will never occur again.
> - We can't expect software developers to correctly define compiler-specific
>  feature-macros for the header files of the libraries they use.
> - A means to distinguish different releases of a given GCC version is
>  currently not available.

It is.  Vendors use (or should use) --with-pkgversion, so you get

> gcc --version
gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973]
Copyright (C) 2008 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and yes, we (SUSE) even adjust the patchlevel version of the compiler
down to the last released version instead of keeping the next-to-be
released version when picking from a release branch (vanilla GCC
would say 'gcc 4.3.5 20100927 [gcc-4_3-branch revision 152973]'
or similar.

> => The suggested macro would make it possible for library headers to work with
>   all released GCCs, without additional work for the library user.

Only if all vendors use it that way.

A better solution is to guard those cases in library headers via a
library specific define, defaulting to a "safe" version.  That then
even works for vendors the library implementor does not know about
(you can't even enumerate all vendors, so it's really a pointless approach).

Richard.

> How to go forward
> =
> I'd look into implementing this for GCC 4.7, if you like the idea. Unless, of
> course, somebody else prefers to do it instead. :-)
>
> [1] https://bugs.launchpad.net/bugs/780551
>
> Cheers,
>        Matthias
> --
> Dipl.-Phys. Matthias Kretz
> http://compeng.uni-frankfurt.de/?mkretz
>
> SIMD easy and portable: http://compeng.uni-frankfurt.de/?vc
>

Re: how to distinguish patched GCCs

2011-05-26 Thread Jakub Jelinek

On Thu, May 26, 2011 at 12:06:18PM +0200, Matthias Kretz wrote:
> suggested solution
> ==
> GCC should provide (an) additional predefined macro(s) to distinguish a 
> patched GCC from vanilla GCC. This/These macro(s) should be sufficient to 
> uniquely identify every released GCC from each other. This must also include 
> updates to distribution packages, which could fix or introduce a problem.

We (Fedora/RHEL) already use something like that, in particular
#define __GNUC__ 4
#define __GNUC_MINOR__ 6
#define __GNUC_PATCHLEVEL__ 0
#define __GNUC_RH_RELEASE__ 7
means GCC 4.6-RH 4.6.0-7 (like SUSE, we decrease patchlevel version to the
last released version if any, so 4.6.0 with non-zero __GNUC_RH_RELEASE__
means based on 4.6 branch after 4.6.0 release (which normally presents
itself as 4.6.1 prerelease).
We've used it a couple of times e.g. in our glibc headers, so that we
could start earlier using backported features like _FORTIFY_SOURCE,
warning/error attributes, gnu_inline including C++, __builtin_va_arg_pack etc.

Jakub

Help with specifying processor pipeline GCC4.5.1

2011-05-26 Thread Rohit Arul Raj

Hello All,

I need some help with setting the pipeline hazard recognizer (I am
working with gcc v4.5.1 for a private target).

A brief pipeline description of my target:

We have 2 functional units
1) For multiplication.
2) For All other instructions.

a) Multiply instructions are not pipelined.
b) It takes 4 cycles to execute a multiply instruction.
c) The result of multiply instruction will be available after 4 cycles.

So there should be a 4 cycle gap between 2 multiply instructions
(independent/dependent) and also its depend instructions (other than
multiply).

e.g.1:
mult R3, R4, R5 -- (A)
add  R0, R1, R2
mult R7, R8, R9 -- (B)

A) & (B) are independent.
This is a pipeline error.
Need to add 2 NOP's or schedule 2 other independent instructions before 
(B).

e.g.2:
mult R3, R4, R5  --(A)
add  R7, R8, R9
add  R5, R1, R2  --(B)

(A) & (B) are dependent.
Even though there is no pipeline error, the value of "R5" used will
not be the updated one as 'mult' takes 4 cycles for the result to be
available.
Need to add 2 NOP's or schedule 2 other independent instructions before 
(B).

I have done the following, but not sure if this will take care of:

A) 4 cycle gap between 2 Independent multiply instructions
B) 4 cycle gap beween multiply and any other dependent instruction.

(define_automaton "pipeline")
(define_cpu_unit "simple" "pipeline")
(define_cpu_unit "mult" "pipeline")

(define_insn_reservation "any_insn" 1 (eq_attr "type" "!mul") "simple")
(define_insn_reservation "mult" 4 (eq_attr "type" "mul") "mult*4")


In case other independent instructions are not available to be
scheduled for this latency, i will be inserting NOP's from the
backend. But i want to make sure the correct info is passed to the
scheduler.

Any comments/suggestions?

Thanks,
Rohit

Re: Deprecating mips-openbsd

2011-05-26 Thread Andrew Haley

On 05/24/2011 10:40 AM, Andrew Haley wrote:
> On 23/05/11 19:35, Richard Sandiford wrote:
>> According to:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47110
>>
>> mips-openbsd does not build in 4.6.  I haven't seen any activity
>> on this port for years.  Would anyone object to its deprecation?
> 
> I'm going to forward this to openbsd.  Please wait a bit for a reply.

Assuming this is the old 32-bit a.out port, they seem to be ok with it
being deprecated.

Andrew.

Wrong code: missing input reload

2011-05-26 Thread Georg-Johann Lay

Trying to track faulty code generation because of a missing input
reload, I got lost in reload and need some help.

The insn to reload (insn 7) is

(set (subreg:QI (reg:HI 28) 0)
 (const_int 0))

This insn generates one output reload (.ira dump)

Reloads for insn # 7
Reload 0: reload_out (HI) = (reg/v:HI 28 r28 [orig:43 y ] [43])
GENERAL_REGS, RELOAD_FOR_OUTPUT (opnum = 0)
reload_out_reg: (reg/v:HI 28 r28 [orig:43 y ] [43])
reload_reg_rtx: (reg:HI 24 r24)

which eventually generates code

(insn 7 6 17 2 (set (reg:QI 24 r24)
(const_int 0 [0])) pr46779-1.c:34 4 {*movqi}
 (nil))

(insn 17 7 8 2 (set (reg/v:HI 28 r28 [orig:43 y ] [43])
(reg:HI 24 r24)) pr46779-1.c:34 10 {*movhi}
 (nil))

so there is a missing input reload, i.e. prior to insn 7 there must be
something like

(set (reg:HI 28)
 (reg:HI 24))

find_reloads for insn 7 calls push_reload just once with

in = 0
out = (subreg:QI (reg:HI 28) 0)
outmode = QImode
strict_low = 0
optional = 0
type = RELOAD_FOR_OUTPUT

Is that correct so far?

Or should push_reload also be called for type = RELOAD_FOR_INPUT or
with another reload type?

find_reloads has a comment that push_reload will do some fixes:

  /* Any constants that aren't allowed and can't be reloaded
 into registers are here changed into memory references.  */
  for (i = 0; i < noperands; i++)
if (! goal_alternative_win[i])
  {
rtx op = recog_data.operand[i];
rtx subreg = NULL_RTX;
rtx plus = NULL_RTX;
enum machine_mode mode = operand_mode[i];

/* Reloads of SUBREGs of CONSTANT RTXs are handled later in
   push_reload so we have to let them pass here.  */
if (GET_CODE (op) == SUBREG)
  {
subreg = op;
op = SUBREG_REG (op);
mode = GET_MODE (op);
  }

the code actually enters the last SUBREG block for insn 7

=

gcc 4.7 configured

../../gcc.gnu.org/trunk/configure --target=avr
--prefix=/local/gnu/install/gcc-4.7 --disable-nls --disable-shared
--enable-languages=c,c++

=

gcc called with

-Os -dp -save-temps -mmcu=atmega128 -S -v

=

The C source is:

struct S
{
unsigned char a, b;
} ab;

void yoo (struct S y)
{
asm volatile ("ldi %B0, 56" : "+y" (y));
y.a = 0;
asm ("; y = %0" : "+y" (y));
ab = y;
}

=

insn 7 is generated from y.a = 0.

The C source does some asm hacking to produce this bug. This is
because otherwise the bug is hard to reproduce and the source would be
rather lengthy and much harder to debug.

Register y=r29:r28 is the frame pointer, which is not needed for this
function.

The avr backend had problems ever since because the frame pointer
spans two registers but many places in GCC just do their tests like if
(x == FRAME_POINTER_REGNUM)

The current implementation of HARD_REGNO_MODE_OK allows HI for r28 but
denies QI for r28 and r29. Older versions of avr-gcc tried several
other approaches to work around all of this like checking for
frame_pointer_needed, reload_completed, reload_in_progress etc. in
avr.c:avr_hard_regno_mode_ok. Each approach lead to some problems like
faulty code or spill failures.

The problem disappears when QI is allowed in r28/r28, but I do not
know enough of reload and fp elimination to know if that would be an
appropriate fix.

Anyway, all this shows that there is something going wrong in reload,
both in 4.7, 4.6 and 4.5.

Moreover, the first asm needs a "volatile" in order not to get thrown
away. This indicates that there is an error in life analysis because
some part fails to detect that y.a = 0 just changes only a part of the
register. The life analysis is correct for the QI part, i.e. for r24
where r28 gets spilled to, but it is wrong for the overall action.

Re: Wrong code: missing input reload

2011-05-26 Thread Georg-Johann Lay

Georg-Johann Lay wrote:

> so there is a missing input reload, i.e. prior to insn 7 there must be
> something like
> 
> (set (reg:HI 28)
>  (reg:HI 24))
> 

Typo, that should read:

(set (reg:HI 24)
 (reg:HI 28))

prior to insn 7.

Re: finding the induction variable after graphite (before ivcanon pass)?

2011-05-26 Thread Alexey Kravets


On 05/24/2011 10:09 PM, Sebastian Pop wrote:

One change that I introduced sometime in February is that some reductions
are not translated to a zero dim array to make the dependence test work
on some of the interchange testcases.  With this change, are we going to
also create privatized copies for the reduction variables that are not
translated into zero dim arrays?



Hi Sebastian,
Could you please provide some more details about these reductions?
An example of loop nest or testcase with such reductions would be very helpful.
In current graphite-opencl implementation only zero dim arrays can be privatized (but 
local scalar variables are always private).


Currently we are not able to handle reduction between OpenCL kernels,
so loops like this can not be transformed into kernel:

for (i = 0; i < N; i++)
sum += A[i];

We are only able to handle reduction inside kernel's body by privatizing local reduction 
variable:


for (j = 0; j < M; j++)
{
int sum = 0;
for (i = 0; i < N; i++)
sum+= B[i];

}

In this case we privatize sum[0] (zero dim array created from scalar variable) inside 
kernel's body (outer loop will be replaced by kernel launch).

--
Alexey Kravets
kayr...@ispras.ru

Re: finding the induction variable after graphite (before ivcanon pass)?

2011-05-26 Thread Sebastian Pop

On Thu, May 26, 2011 at 09:58, Alexey Kravets  wrote:
> On 05/24/2011 10:09 PM, Sebastian Pop wrote:
>>
>> One change that I introduced sometime in February is that some reductions
>> are not translated to a zero dim array to make the dependence test work
>> on some of the interchange testcases.  With this change, are we going to
>> also create privatized copies for the reduction variables that are not
>> translated into zero dim arrays?
>>
>
> Hi Sebastian,
> Could you please provide some more details about these reductions?
> An example of loop nest or testcase with such reductions would be very
> helpful.

I think that most of the interchange-*.c files contain such reductions that
were not previously handled by the data dependence analysis, and thus
not interchanged.  I can see interchange-1.c and interchange-7.c have
reductions.

> In current graphite-opencl implementation only zero dim arrays can be
> privatized (but local scalar variables are always private).
>
> Currently we are not able to handle reduction between OpenCL kernels,
> so loops like this can not be transformed into kernel:
>
> for (i = 0; i < N; i++)
> sum += A[i];
>
> We are only able to handle reduction inside kernel's body by privatizing
> local reduction variable:
>
> for (j = 0; j < M; j++)
> {
>        int sum = 0;
>        for (i = 0; i < N; i++)
>                sum+= B[i];
>
> }
>
> In this case we privatize sum[0] (zero dim array created from scalar
> variable) inside kernel's body (outer loop will be replaced by kernel
> launch).

I see.  Would it be possible to strip mine the reduction loop that you say
not handled yet, and then translate to opencl the partial sums?

Thanks,
Sebastian

Re: Wrong code: missing input reload

2011-05-26 Thread Richard Henderson

On 05/26/2011 06:53 AM, Georg-Johann Lay wrote:
> Trying to track faulty code generation because of a missing input
> reload, I got lost in reload and need some help.
> 
> The insn to reload (insn 7) is
> 
> (set (subreg:QI (reg:HI 28) 0)
>  (const_int 0))
> 
> This insn generates one output reload (.ira dump)
> 
> Reloads for insn # 7
> Reload 0: reload_out (HI) = (reg/v:HI 28 r28 [orig:43 y ] [43])
>   GENERAL_REGS, RELOAD_FOR_OUTPUT (opnum = 0)
>   reload_out_reg: (reg/v:HI 28 r28 [orig:43 y ] [43])
>   reload_reg_rtx: (reg:HI 24 r24)
> 
> which eventually generates code
> 
> (insn 7 6 17 2 (set (reg:QI 24 r24)
> (const_int 0 [0])) pr46779-1.c:34 4 {*movqi}
>  (nil))
> 
> (insn 17 7 8 2 (set (reg/v:HI 28 r28 [orig:43 y ] [43])
> (reg:HI 24 r24)) pr46779-1.c:34 10 {*movhi}
>  (nil))
> 
> so there is a missing input reload...

Don't see a strict-low-part here.  Why do you believe that this
should have an input reload?

> i.e. prior to insn 7 there must be
> something like
> 
> (set (reg:HI 28)
>  (reg:HI 24))

Why do you believe that?  It looks to me that we've done exactly
as requested, namely, set the low BITS_PER_UNIT of r28 to zero
while leaving the high BITS_PER_UNIT of r28 undefined.

Perhaps the original subreg shouldn't have been there?


r~

Re: [RFC] alpha/ev6: model 1-cycle cross-cluster delay

2011-05-26 Thread Richard Henderson

On 05/24/2011 08:52 PM, Matt Turner wrote:
> Alpha EV6 and newer can execute four instructions per cycle if correctly
> scheduled. The architecture has two clusters {0, 1}, each with its own
> register file. In each cluster, there are two slots {upper, lower}. Some
> instructions only execute from either upper or lower slots.
> 
> Register values produced in one cluster take 1 cycle to appear in the
> other cluster, so improperly scheduled instructions may incur a cross-
> cluster delay.

Given the lack of control of how insns are dispatched to clusters, this
is essentially an intractable problem.  One can manage clusters only in
extremely rare situations in hand-tuned assembly.  Namely:

(1) One has to start with an empty re-order queue.  Such as on transition
to/from PALcode, at the beginning of an align 16 block of code.
(2) One has to pad with lots of nearly-nops in order to keep the dispatch
to the various pipelines aligned with the programmer's idea of how
dispatch is occurring.

>  - The CWG lists the latency of unconditional branches and jsr/call
>instructions as 3, whereas we have 1. I guess this latency value is
>only meaningful if the instruction produces a value? I'm a bit
>confused by this value in the CWG since it lists the latency of
>conditional branches as N/A, while these other types of branches as
>3, although none produce a register value.

They produce a value -- the return address.  It's $31 in most
unconditional branches, but it's still there.

>  - I also see that fadd/fcmov/fmul instructions take an extra two cycles
>when the consumer is fst/ftoi, so something similar should be added
>for them. Can a (define_bypass ...) function specify a latency value
>greater than the default latency?

Yes.

r~

Re: Wrong code: missing input reload

2011-05-26 Thread Eric Botcazou

> Don't see a strict-low-part here.  Why do you believe that this
> should have an input reload?

This is AVR so QImode is the word mode and the strict-low-part is implicit.

> Perhaps the original subreg shouldn't have been there?

Yes, I'd think that everything in the RTL middle-end expects word-mode subregs 
of double-word-mode hard regs to be simplifiable.

-- 
Eric Botcazou

Re: Wrong code: missing input reload

2011-05-26 Thread Georg-Johann Lay


Eric Botcazou schrieb:


Perhaps the original subreg shouldn't have been there?


Yes, I'd think that everything in the RTL middle-end expects word-mode subregs 
of double-word-mode hard regs to be simplifiable.


You are right, I was staring at the wrong place. subreg of hardreg 
should not be there.

Re: Wrong code: missing input reload

2011-05-26 Thread Eric Botcazou

> You are right, I was staring at the wrong place. subreg of hardreg
> should not be there.

You can take a look at PR target/48830, this is a related problem for the 
SPARC where reload generates:

(set (reg:SI 708 [ D.2989+4 ])
(subreg:SI (reg:DI 72 %f40) 4))

and (subreg:SI (reg:DI 72 %f40) 4) isn't simplifiable either.  H.P. wrote a 
tentative patch for the subreg machinery to forbid this.  Other references are:
  http://gcc.gnu.org/ml/gcc-patches/2008-07/msg01688.html
  http://gcc.gnu.org/ml/gcc-patches/2008-08/msg01743.html

-- 
Eric Botcazou

gcc-4.5-20110526 is now available

2011-05-26 Thread gccadmin

Snapshot gcc-4.5-20110526 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20110526/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch 
revision 174310

You'll find:

 gcc-4.5-20110526.tar.bz2 Complete GCC

  MD5=f6d04cf21c7ffe7e15378458d8d86332
  SHA1=a904dc3bbad7a573ac848a413557c290c2ba2359

Diffs from 4.5-20110519 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

how to distinguish patched GCCs

Re: how to distinguish patched GCCs

Re: how to distinguish patched GCCs

Help with specifying processor pipeline GCC4.5.1

Re: Deprecating mips-openbsd

Wrong code: missing input reload

Re: Wrong code: missing input reload

Re: finding the induction variable after graphite (before ivcanon pass)?

Re: finding the induction variable after graphite (before ivcanon pass)?

Re: Wrong code: missing input reload

Re: [RFC] alpha/ev6: model 1-cycle cross-cluster delay

Re: Wrong code: missing input reload

Re: Wrong code: missing input reload

Re: Wrong code: missing input reload

gcc-4.5-20110526 is now available

15 matches

Site Navigation

Mail list logo

Footer information