Re: define_split

2010-11-10 Thread Paolo Bonzini

On 11/10/2010 12:47 AM, Joern Rennecke wrote:




I remember that it has been there even before the GNU GCC project started
using cvs.  Fortunately, we still have the translated history from RCS
going backeven further... but the earliest mention of find_split_point
in combine.c is shown as having been added in 'revision' 357 - the
same one that combine.c was brought under RCS control, in February 1992.


find_split_point is something different, it is used to find a place 
where it should make sense to split a single insn into two.  You're 
thinking of split_insns which is just a little bit younger (r727, April 
1992).


Paolo



software pipelining

2010-11-10 Thread roy rosen
Hi,

I was wondering if gcc has software pipelining.
I saw options -fsel-sched-pipelining -fselective-scheduling
-fselective-scheduling2 but I don't see any pipelining happening
(tried with ia64).
Is there a gcc VLIW port in which I can see it working?

For an example function like

int nor(char* __restrict__ c, char* __restrict__ d)
{
int i, sum = 0;
for (i = 0; i < 256; i++)
d[i] = c[i] << 3;
return sum;
}

with no pipelining a code like

r1 = 0
r2 = c
r3 = d
_startloop
if r1 == 256 jmp _end
r4 = [r2]+
r4 >>= r4
[r3]+ = r4
r1++
jmp _startloop
_end

here inside the loop there is a data dependency between all 3 insns
(only the r1++ is independent) which does not permit any parallelism

with pipelining I expect a code like

r1 = 2
r2 = c
r3 = d
// peel first iteration
r4 = [r2]+
r4 >>= r4
r5 = [r2]+
_startloop
if r1 == 256 jmp _end
[r3]+ = r4 ; r4 >>= r5 ; r5 = [r2]+
r1++
jmp _startloop
_end

Now the data dependecy is broken and parlallism is possible.
As I said I could not see that happening.
Can someone please tell me on which port and with what options can I
get such a result?

Thanks, Roy.


Re: peephole2: dead regs not marked as dead

2010-11-10 Thread Paolo Bonzini

On 11/10/2010 11:58 AM, Georg Lay wrote:

In the old 3.4.x (private port) I introduced a target hook in combine,
just prior to where recog_for_combine gets called. The hook did some
canonicalization of rtx and thereby considerably reduced the number of
patterns that would have been necessary without that hook. It
transformed some unspecs.


How is that related to delegitimize_address?


A second use case of such a hook could look as follows: Imagine a
combined pattern that does not match but would be legal if the backend
knew that some register contains some specific value like, e.g.,
non-negative, zero-or-storeflagvalue, combiner has proved that some
bits will always be set or always be cleared;


You can use nonzero_bits or num_signbit_copies in define_splits.  In 
_this_ case, a define_insn_or_split doesn't really help, I agree with 
Joern on this. :)



II.
Suppose insns A, B and C with costs f(A) = f(B) = f(C) = 2.

Combine combines A+B and sees costs f(A+B) = 5. This makes combine
reject the pattern, not looking deeper.


Does it really do that?  I see nothing in combine_instructions which 
would prevent going deeper.


Paolo


Re: peephole2: dead regs not marked as dead

2010-11-10 Thread Georg Lay
Michael Meissner schrieb:

> In particular, go to pages 5-7 of my tutorial (chapter 6) where I talk about
> scratch registers and allocating a new pseudo in split1 which is now always
> run.

Ah great! That's the missing link. The pseudo hatches in split1 from
scratch.

This makes combine even more powerful an split1 can be seen as expand2 :-)

imho, combine could be slightly improved by some minor modifications:

I.
In the old 3.4.x (private port) I introduced a target hook in combine,
just prior to where recog_for_combine gets called. The hook did some
canonicalization of rtx and thereby considerably reduced the number of
patterns that would have been necessary without that hook. It
transformed some unspecs.

A second use case of such a hook could look as follows: Imagine a
combined pattern that does not match but would be legal if the backend
knew that some register contains some specific value like, e.g.,
non-negative, zero-or-storeflagvalue, combiner has proved that some
bits will always be set or always be cleared; that kind of things. The
hook could accept the pattern (provided it passes recog_for_combine,
in that setup the hook will dispatch to recog_for_combine, so it has
the opportunity to canonicalize the pattern or take some decision
depending on the insn matched) if backend is allowed so see the proof
(pattern ok) or there is no such proof (pattern rejected).

In the actual setup you could write more complex patterns to make
combiner look deeper into things. That implies writing an insn that
has the information explicit in the pattern. Most probably you are
going so split the stuff in split1 again. A little bit more explicit:

Suppose f(x) is the pattern in question, just kooked up by combine.
You have an insn that can do f(x) under the condition that x \in {0,1}
but not f(x) generally, and combine doesn't look deeper into things
because of some reason. In a case where combine can proof that x will
always be 0 or 1 it is legal to accept f(x), otherwise not.

The current situation, however, would go like this: In the case where
the source happens to be like x &= 1, f(x) and there is a combine
bridge that can do f(x & 1) combine might take a try. f(x & 1)
matches, and in split1 things get split as x &= 1, f(x). Now you have
the knowledge that x is either 0 or 1 and you can replace (or directly
split it like) x &=1, g(x) with an insn g(x) that is cheaper than f(x).

Note that the first case is more general and need no combine bridge.
The only thing that has to be done is replacing f(x) with g(x), which
is legal under the assumption x \in {0,1}.


II.
Suppose insns A, B and C with costs f(A) = f(B) = f(C) = 2.

Combine combines A+B and sees costs f(A+B) = 5. This makes combine
reject the pattern, not looking deeper. If f(A+B+C) = 2 you either
will never see A+B+C, or you have to lie about cost and say f(A+B) =
2. Ok. As combine is myopic (I do not blame combine here, it's complex
enough) you go and write the combine bridge A+B. But: In situations
where A+B matches but there is no thing like C we see a
dis-optimization. split1 could fix that by splitting A+B into A, B
again, but that clutters up machine description. Combine could go like
this:

a) f(A+B) = 5 is more expensive than f(A)+f(B) = 4
b) do the combine A+B
c) look deeper
d1) there is match A+B+C cost f(A+B+C) = 2 => accept
d2) there is match A+B+C cost f(A+B+C) = 12 => costly => reject
d3) there is no match A+B+C => A+B ist costly => reject

Skimming combine dumps, I see most patterns get rejected because they
do not match, just a few insns get rejected because of higher cost. So
looking deeper even if the costs are higher may give better code and
increasing compile time just a little bit.

Case II is similar to tunneling known from physics.

Georg





Re: software pipelining

2010-11-10 Thread Andrey Belevantsev

Hi,

On 10.11.2010 12:32, roy rosen wrote:

Hi,

I was wondering if gcc has software pipelining.
I saw options -fsel-sched-pipelining -fselective-scheduling
-fselective-scheduling2 but I don't see any pipelining happening
(tried with ia64).
Is there a gcc VLIW port in which I can see it working?
You need to try -fmodulo-sched.  Selective scheduling works by default on 
ia64 with -O3, otherwise you need -fselective-scheduling2 
-fsel-sched-pipelining.  Note that selective scheduling disables autoinc 
generation for the pipelining to work, and modulo scheduling will likely 
refuse to pipeline a loop with autoincs.


Modulo scheduling implementation in GCC may be improved, but that's a 
different topic.


Andrey



For an example function like

int nor(char* __restrict__ c, char* __restrict__ d)
{
 int i, sum = 0;
 for (i = 0; i<  256; i++)
 d[i] = c[i]<<  3;
 return sum;
}

with no pipelining a code like

r1 = 0
r2 = c
r3 = d
_startloop
if r1 == 256 jmp _end
r4 = [r2]+
r4>>= r4
[r3]+ = r4
r1++
jmp _startloop
_end

here inside the loop there is a data dependency between all 3 insns
(only the r1++ is independent) which does not permit any parallelism

with pipelining I expect a code like

r1 = 2
r2 = c
r3 = d
// peel first iteration
r4 = [r2]+
r4>>= r4
r5 = [r2]+
_startloop
if r1 == 256 jmp _end
[r3]+ = r4 ; r4>>= r5 ; r5 = [r2]+
r1++
jmp _startloop
_end

Now the data dependecy is broken and parlallism is possible.
As I said I could not see that happening.
Can someone please tell me on which port and with what options can I
get such a result?

Thanks, Roy.




Re: define_split

2010-11-10 Thread Andrew Stubbs

On 09/11/10 22:54, Michael Meissner wrote:

The split pass would then break this back into three insns:

(insn ... (set (reg:SF ACC_REGISTER)
   (mult:SF (reg:SF 124)
(reg:SF 125

(insn ... (set (reg:SF ACC_REGISTER)
   (plus:SF (reg:SF ACC_REGISTER)
(reg:SF 127

(insn ... (set (reg:SF 126)
   (reg:SF ACC_REGISTER)))

Now, if you just had the split and no define_insn, combine would try and form
the (plus (mult ...)) and not find an insn to match, so while it had the
temporary insn created, it would try to immediately split the insn, so at the
end of the combine pass, you would have:

(insn ... (set (reg:SF ACC_REGISTER)
   (mult:SF (reg:SF 124)
(reg:SF 125

(insn ... (set (reg:SF ACC_REGISTER)
   (plus:SF (reg:SF ACC_REGISTER)
(reg:SF 127

(insn ... (set (reg:SF 126)
   (reg:SF ACC_REGISTER)))


I'm trying to follow this example for my own education, but these two 
example results appear to be identical.


Presumably this isn't deliberate?

Andrew


Idea - big and little endian data areas using named address spaces

2010-11-10 Thread David Brown
Would it be possible to use the named address space syntax to implement 
reverse-endian data?  Conversion between little-endian and big-endian 
data structures is something that turns up regularly in embedded 
systems, where you might well be using two different architectures with 
different endianness.  Some compilers offer direct support for endian 
swapping, but gcc has no neat solution.  You can use the 
__builtin_bswap32 (but no __builtin_bswap16?) function in recent 
versions of gcc, but you still need to handle the swapping explicitly.


Named address spaces would give a very neat syntax for using such 
byte-swapped areas.  Ideally you'd be able to write something like:


__swapendian stuct { int a; short b; } data;

and every access to data.a and data.b would be endian-swapped.  You 
could also have __bigendian and __litteendian defined to __swapendian or 
blank depending on the native ordering of the target.



I've started reading a little about how named address spaces work, but I 
don't know enough to see whether this is feasible or not.



Another addition in a similar vein would be __nonaligned, for targets 
which cannot directly access non-aligned data.  The loads and stores 
would be done byte-wise for slower but correct functionality.




Re: Dedicated logical instructions

2010-11-10 Thread Radu Hobincu
Thank you, that worked out eventually.

However, now I have another problem. I have 2 instructions in the ISA:
'where' and 'endwhere' which modify the behavior of the instructions put
in between them. I made a macro with inline assembly for each of them. The
problem is that since `endwhere` doesn't have any operands and doesn't
clobber any registers, the GCC optimization reorders it and places the
`endwhere` immediately after `where` leaving all the instructions outside
the block.

A hack-solution came in mind, and that is specifying that the asm inline
uses all the registers as operands without actually placing them in the
instruction mnemonic. The problem is I don't know how to write that
especially when I don't know the variables names (I want to use the same
macro in more than one place).


These are the macros:

#define WHERE(_condition)   \
__asm__ __volatile__("move %0 %0, wherenz 0xf"  \
:   \
: "v" (_condition)  \
);

#define ENDWHERE\
__asm__ __volatile__("nop, endwhere");



This is the C code:
vector doSmth(vector a, vector b){
WHERE(LT(a, b))
  a++;
ENDWHERE
return a;
}

And this is what cc1 -O3 outputs:

;# 113
"/home/rhobincu/svnroot/connex/trunk/software/gcc/examples/include/connex.h"
1
lt R31 R16 R17
;# 4 "/home/rhobincu/svnroot/connex/trunk/software/gcc/examples/test0.c" 1
move R31 R31, wherenz 0xf
;# 6 "/home/rhobincu/svnroot/connex/trunk/software/gcc/examples/test0.c" 1
nop, endwhere
iadd R31 R16 1

You can see that the `nop,endwhere` and the `iadd ...` insns are inverted.
I think this is similar to having instructions for enabling and disabling
interrupts: the instructions have no operands, but the compiler shouldn't
move the block in between them for optimization.

Thank you, and please, if I waste too much of your time with random
questions, tell me and I will stop. :)

Regards,
R.

> "Radu Hobincu"  writes:
>
>> I have another, quick question: I have dedicated logical instructions
in
>> my RISC machine (lt - less than, gt - greater than, ult - unsigned less
than, etc.). I'm also working on adding instructions for logical OR, AND,
>> NOT, XOR. While reading GCC internals, I've stumbled on this:
>> "Except when they appear in the condition operand of a COND_EXPR,
logical
>> `and` and `or` operators are simplified as follows: a = b && c
>> becomes
>> T1 = (bool)b;
>> if (T1)
>>   T1 = (bool)c;
>> a = T1;"
>> I really, really don't want this. Is there any way I can define the
instructions in the .md file so the compiler generates code for
computing
>> a boolean expression without using branches (using these dedicated
insns)?
>
> That is the only correct way to implement && and || in C, C++, and other
similar languages.  The question you should be asking is whether gcc will
be able to put simple cases without side effects back together again.  The
answer is that, yes, it should be able to do that.
>
> You should not worry about this level of things when it comes to writing
your backend port.  Language level details like this are handled by the
frontend, not the backend.  When your port is working, come back to this
and make sure that you get the kind of code you want.
>
> Ian
>







[gimplefe] Merge trunk r166509

2010-11-10 Thread Diego Novillo
Two merges, actually.  The first one at r165633, the second at
r166509.

Bootstrapped and tested on x86_64.


Diego.


Re: Idea - big and little endian data areas using named address spaces

2010-11-10 Thread Andreas Krebbel
David,

for s390 we would also be interested in a solution like that. s390 is big 
endian and
endianess conflicts often come in the way when porting an application from 
'other'
platforms.  Linus Torvalds already in 2001 made a quite similiar suggestions 
using a type
attribute:

http://gcc.gnu.org/ml/gcc/2001-12/msg00932.html

It probably also would make sense to distingiush between byte and bit endianess.

Bye,

-Andreas-


Re: Dedicated logical instructions

2010-11-10 Thread Ian Lance Taylor
"Radu Hobincu"  writes:

> However, now I have another problem. I have 2 instructions in the ISA:
> 'where' and 'endwhere' which modify the behavior of the instructions put
> in between them. I made a macro with inline assembly for each of them. The
> problem is that since `endwhere` doesn't have any operands and doesn't
> clobber any registers, the GCC optimization reorders it and places the
> `endwhere` immediately after `where` leaving all the instructions outside
> the block.

That's tricky in general.  You want an absolute barrier, but gcc doesn't
really provide one that can be used in inline asm.  The closest you can
come is by adding a clobber of "memory":
  asm volatile ("xxx" : /* outputs */ : /* inputs */ : "memory");
That will block all instructions that load or store from memory from
moving across the barrier.  However, it does not currently block
register changes from moving across the barrier.  I don't know whether
that matters to you.

You didn't really describe what these instructions do, but they sound
like looping instructions which ideally gcc would generate itself.  They
have some similarity to the existing doloop pattern, q.v.  If you can
get gcc to generate the instructions itself, then it seems to me that
you will get better code in general and you won't have to worry about
this issue.

Ian


Re: peephole2: dead regs not marked as dead

2010-11-10 Thread Georg Lay
Paolo Bonzini schrieb:
> On 11/10/2010 11:58 AM, Georg Lay wrote:
>> In the old 3.4.x (private port) I introduced a target hook in combine,
>> just prior to where recog_for_combine gets called. The hook did some
>> canonicalization of rtx and thereby considerably reduced the number of
>> patterns that would have been necessary without that hook. It
>> transformed some unspecs.
> 
> How is that related to delegitimize_address?

It's nothing to do with delegitimize address.

Ok, let me tell the storyan go into the mess. Some users wanted to
have a bit data type in C and to write down variables that represent a
bit in memory or in auto or as parameter. (I strongly recommended not
to do such hacking in a compiler because it is not within C and for
reasons you all know. For reasons you also know, the project was
agreed ...).

There was already some hacking in gcc when I started and users
complained about superfluous zero-extract here and there. No bloat,
just a trade-off compiler vs. asm. tricore is capable of loading a
byte from absolute address space (LD.BU) and has instructions to
operate on any bit (and, or, xor, jmp, compare-accumulate-bitop,
extract, insert, etc.). New type _bit was mapped to BImode. To see
where the inconveniences are, let's have a look at a comment

;;   The TriCore instruction set has many instructions that deal with
;;   bit operands. These bits are then given in two operands: one operand
;;   is the register that contains the bit, the other operand specifies
;;   the bit's position within that register as an immediate, i.e. the
;;   bit's position must be a runtime constant.
;;
;;   A new relocation type `bpos' for the bit's position within a byte had
;;   been introduced, enabling to pack bits in memory without the need to
;;   pack them in bitfields.
;;
;;   Such an instruction sequence may look like
;;
;; ld.bu   %d0, the_bit # insn A
;; jz.t%d0, bpos:the_bit, .some_label   # insn A
;;
;;   The compiler could, of course, emit such instruction seqences.
;;   However, emitting such sequences "en bloc" has several drawbacks:
;;
;;   -- every time we need a bit the bit must be read
;;   -- emitting instruction en bloc knocks out the scheduler
;;   -- when writing all the cases as combiner patterns the number of
;;  insn patterns would explode
;;
;;   Therefore, a bit could have been described as BImode, which would
;;   lead to the following instruction sequence
;;
;; ld.bu   %d0, the_bit# insn A
;; extr.u  %d0, %d0,bpos:the_bit, 1# insn A
;; jz.t%d0, 0, .some_label # insn B
;;
;;   I.e. a bit is represented as BImode and therefore always resides
;;   in the LSB. The advantage is that the bit now lives in a register
;;   and can be reused. The disadvantage is that
;;
;;   -- often, there is a superfluous instruction (extr.u)
;;   -- the scheduler cannot schedule ld.bu away from extr.u
;;   -- for combiner patterns the same as above applies
;;
;;   BPOS representation
;;   ===
;;
;;   Thus, we need a way to tag a register with the bit's position.
;;   I use the new LOADBI and BPOS rtx_code, here in an assembler dump
from gcc
;;   which shows the 1-to-1 correspondence between insn and machine
instruction:
;;
;; # (set (reg:SI 0)
;; #  (loadbi:SI (mem:BI (symbol_ref:SI ("?the_bit")))
;; # (symbol_ref:SI ("?the_bit"
;; ld.bu   %d0, the_bit# insn A
;;
;; # (set (pc)
;; #  (if_then_else (eq (bpos:BI (reg:SI 0)
;; # (symbol_ref:SI ("?the_bit")))
;; #(const_int 0))
;; #(label_ref 42)
;; #(pc)))
;; jz.t%d0, bpos:the_bit, .some_label  # insn B
;;
;;   The sole SYMBOL_REFs are the tags.
;;   The reason to use own RTX code and not UNSPEC is because the combiner
;;   does not like UNSPEC and the code would not be as dense as could be.
;;
;;   Using ZERO_EXTRACT for LOADBI does not work either because a
ZERO_EXTRACT
;;   may not be used as lvalue which is needed when loading a bit.
;;   The ZERO_EXTRACT on the left side would assume that the register that
;;   is to be loaded already lives, because the ZERO_EXTRACT affects
;;   just some bits, not all.
;;
;;   Note that a LOADBI in this context is a load together with a
shift left
;;   and a BPOS is a shift right i.e. equivalent to some ZERO_EXTRACT.
;;
;;   The tag in a BPOS may also be a CONST_INT
;;   in which case the bit's position is a compile time constant.
;;
;;   The output mode of a BPOS (M1) needs not to be BI, it may also be
;;   some wider mode like QI or SI. A wider mode represents a BPOS that is
;;   zero extended to that mode. The input mode's bit size must be at most
;;   one plus the bit's position (8 for a SYMBOL_REF).
;;
;;   BPOS Operands
;;   =
;;
;;   Next, I introduced BPOS operands, i.e. a 

Re: Dedicated logical instructions

2010-11-10 Thread Radu Hobincu
> "Radu Hobincu"  writes:
>
>> However, now I have another problem. I have 2 instructions in the ISA:
>> 'where' and 'endwhere' which modify the behavior of the instructions put
>> in between them. I made a macro with inline assembly for each of them.
>> The
>> problem is that since `endwhere` doesn't have any operands and doesn't
>> clobber any registers, the GCC optimization reorders it and places the
>> `endwhere` immediately after `where` leaving all the instructions
>> outside
>> the block.
>
> That's tricky in general.  You want an absolute barrier, but gcc doesn't
> really provide one that can be used in inline asm.  The closest you can
> come is by adding a clobber of "memory":
>   asm volatile ("xxx" : /* outputs */ : /* inputs */ : "memory");
> That will block all instructions that load or store from memory from
> moving across the barrier.  However, it does not currently block
> register changes from moving across the barrier.  I don't know whether
> that matters to you.

It does matter unfortunately. I've tried with memory clobber with the same
result (the addition in the example doesn't do any memory loads/stores).

> You didn't really describe what these instructions do, but they sound
> like looping instructions which ideally gcc would generate itself.  They
> have some similarity to the existing doloop pattern, q.v.  If you can
> get gcc to generate the instructions itself, then it seems to me that
> you will get better code in general and you won't have to worry about
> this issue.
>
> Ian
>

I have 16 vectorial registers in the machine R16-R31 which all have 128
cells of 16 bits each. These support ALU operations and load/stores just
as normal registers, but in one clock. So an

add R16 R17 R18

will add the whole R17 array with R18 (corresponding cells) and place the
result in R16. The 'where' instruction places a mask on the array so the
operation is done only where a certain condition is met. In the example in
the previous e-mail, where `a` is less than `b`. I've read the description
of doloop and I don't think I can use it in this case. I'll have to dig
more or settle with -O0 and cry.

Thank you, anyway!
R.



Re: Dedicated logical instructions

2010-11-10 Thread Uday P. Khedker





I have 16 vectorial registers in the machine R16-R31 which all have 128
cells of 16 bits each. These support ALU operations and load/stores just
as normal registers, but in one clock. So an

add R16 R17 R18

will add the whole R17 array with R18 (corresponding cells) and place the
result in R16. The 'where' instruction places a mask on the array so the
operation is done only where a certain condition is met. In the example in
the previous e-mail, where `a` is less than `b`. I've read the description
of doloop and I don't think I can use it in this case. I'll have to dig
more or settle with -O0 and cry.


Is it possible to abstract out such pieces of code in the input program in an
independent function whose prologue and epilogue have the necessary setting?

Just curious.

Uday.


Re: Idea - big and little endian data areas using named address spaces

2010-11-10 Thread Chris Lattner

On Nov 10, 2010, at 4:00 AM, David Brown wrote:

> Would it be possible to use the named address space syntax to implement 
> reverse-endian data?  Conversion between little-endian and big-endian data 
> structures is something that turns up regularly in embedded systems, where 
> you might well be using two different architectures with different 
> endianness.  Some compilers offer direct support for endian swapping, but gcc 
> has no neat solution.  You can use the __builtin_bswap32 (but no 
> __builtin_bswap16?) function in recent versions of gcc, but you still need to 
> handle the swapping explicitly.
> 
> Named address spaces would give a very neat syntax for using such 
> byte-swapped areas.  Ideally you'd be able to write something like:
> 
> __swapendian stuct { int a; short b; } data;
> 
> and every access to data.a and data.b would be endian-swapped.  You could 
> also have __bigendian and __litteendian defined to __swapendian or blank 
> depending on the native ordering of the target.
> 
> 
> I've started reading a little about how named address spaces work, but I 
> don't know enough to see whether this is feasible or not.
> 
> 
> Another addition in a similar vein would be __nonaligned, for targets which 
> cannot directly access non-aligned data.  The loads and stores would be done 
> byte-wise for slower but correct functionality.

Why not just handle this in the frontend during gimplification?

-Chris


Re: Idea - big and little endian data areas using named address spaces

2010-11-10 Thread Paul Koning
C++ lets you define explicit-order integer types and hide all the conversions.  
I used that a couple of jobs ago, back around 1996 or so -- worked nicely, and 
should work even better now that C++ is more mature.

paul

On Nov 10, 2010, at 7:00 AM, David Brown wrote:

> Would it be possible to use the named address space syntax to implement 
> reverse-endian data?  Conversion between little-endian and big-endian data 
> structures is something that turns up regularly in embedded systems, where 
> you might well be using two different architectures with different 
> endianness.  Some compilers offer direct support for endian swapping, but gcc 
> has no neat solution.  You can use the __builtin_bswap32 (but no 
> __builtin_bswap16?) function in recent versions of gcc, but you still need to 
> handle the swapping explicitly.
> 
> Named address spaces would give a very neat syntax for using such 
> byte-swapped areas.  Ideally you'd be able to write something like:
> 
> __swapendian stuct { int a; short b; } data;
> 
> and every access to data.a and data.b would be endian-swapped.  You could 
> also have __bigendian and __litteendian defined to __swapendian or blank 
> depending on the native ordering of the target.
> 
> 
> I've started reading a little about how named address spaces work, but I 
> don't know enough to see whether this is feasible or not.
> 
> 
> Another addition in a similar vein would be __nonaligned, for targets which 
> cannot directly access non-aligned data.  The loads and stores would be done 
> byte-wise for slower but correct functionality.
> 



Re: Dedicated logical instructions

2010-11-10 Thread Ian Lance Taylor
"Radu Hobincu"  writes:

> I have 16 vectorial registers in the machine R16-R31 which all have 128
> cells of 16 bits each. These support ALU operations and load/stores just
> as normal registers, but in one clock. So an
>
> add R16 R17 R18
>
> will add the whole R17 array with R18 (corresponding cells) and place the
> result in R16. The 'where' instruction places a mask on the array so the
> operation is done only where a certain condition is met. In the example in
> the previous e-mail, where `a` is less than `b`. I've read the description
> of doloop and I don't think I can use it in this case. I'll have to dig
> more or settle with -O0 and cry.

Ah, that sounds like a straight conditional execution model, which gcc
implements via cond_exec.  Look for define_cond_exec in the manual.

Ian


Re: Idea - big and little endian data areas using named address spaces

2010-11-10 Thread David Brown

On 10/11/10 17:55, Chris Lattner wrote:


On Nov 10, 2010, at 4:00 AM, David Brown wrote:


Would it be possible to use the named address space syntax to
implement reverse-endian data?  Conversion between little-endian
and big-endian data structures is something that turns up regularly
in embedded systems, where you might well be using two different
architectures with different endianness.  Some compilers offer
direct support for endian swapping, but gcc has no neat solution.
You can use the __builtin_bswap32 (but no __builtin_bswap16?)
function in recent versions of gcc, but you still need to handle
the swapping explicitly.

Named address spaces would give a very neat syntax for using such
byte-swapped areas.  Ideally you'd be able to write something
like:

__swapendian stuct { int a; short b; } data;

and every access to data.a and data.b would be endian-swapped.  You
could also have __bigendian and __litteendian defined to
__swapendian or blank depending on the native ordering of the
target.


I've started reading a little about how named address spaces work,
but I don't know enough to see whether this is feasible or not.


Another addition in a similar vein would be __nonaligned, for
targets which cannot directly access non-aligned data.  The loads
and stores would be done byte-wise for slower but correct
functionality.


Why not just handle this in the frontend during gimplification?



I don't know if this is possible or not - I'm just making a suggestion 
that occurred to me after another recent thread about named address 
spaces, and since I recently worked on a program that involved endian 
swapping.


The other natural way to handle endian swapping would be a variable 
attribute (this was Linus's suggestion, following the link in a previous 
reply).


If you think it would be hard or inefficient to implement endian 
swapping as a memory space, then that's a good enough answer for me.




Re: Idea - big and little endian data areas using named address spaces

2010-11-10 Thread David Brown

On 10/11/10 18:05, Paul Koning wrote:

C++ lets you define explicit-order integer types and hide all the
conversions.  I used that a couple of jobs ago, back around 1996 or
so -- worked nicely, and should work even better now that C++ is more
mature.



Yes, a lot of such things can be done with C++ classes.  But I see 
several advantages in implementing it in the compiler rather than classes.


First, it will be available in C as well as C++.  For lots of reasons, 
many programs are written in C and not C++.


Secondly, the best way to implement endian re-ordering varies from 
target to target.  Some targets have assembly instructions that can be 
used, requiring inline assembly.  Some work best using shifts and masks, 
others using union types.  You can't make a C++ class that will give the 
best code on many targets without making it very messy, so typically it 
will be re-implemented for each target.


It's easy to make mistakes with C++ class using casting operators to do 
the endian swapping.  Miss out some details, and you might find the 
compiler omitting the endian swap somewhere.


Finally, if you want a struct with endian-swapped members, then each 
member must be explicitly declared as the appropriate class.  You can't 
swap the entire struct at once, as you could do with a memory space (or 
attribute) solution.


mvh.,

David



paul

On Nov 10, 2010, at 7:00 AM, David Brown wrote:


Would it be possible to use the named address space syntax to
implement reverse-endian data?  Conversion between little-endian
and big-endian data structures is something that turns up regularly
in embedded systems, where you might well be using two different
architectures with different endianness.  Some compilers offer
direct support for endian swapping, but gcc has no neat solution.
You can use the __builtin_bswap32 (but no __builtin_bswap16?)
function in recent versions of gcc, but you still need to handle
the swapping explicitly.

Named address spaces would give a very neat syntax for using such
byte-swapped areas.  Ideally you'd be able to write something
like:

__swapendian stuct { int a; short b; } data;

and every access to data.a and data.b would be endian-swapped.  You
could also have __bigendian and __litteendian defined to
__swapendian or blank depending on the native ordering of the
target.


I've started reading a little about how named address spaces work,
but I don't know enough to see whether this is feasible or not.


Another addition in a similar vein would be __nonaligned, for
targets which cannot directly access non-aligned data.  The loads
and stores would be done byte-wise for slower but correct
functionality.









[CFP] Reminder: GCC Research Opportunities Workshop 2011

2010-11-10 Thread David Edelsohn

 CALL FOR PAPERS

 3rd Workshop on
GCC Research Opportunities
   (GROW 2011)

http://grow2011.inria.fr

  2/3 April 2011, Chamonix, France

   (co-located with CGO 2011)


The GROW workshop focuses on current challenges in research and development of
compiler analyses and optimizations based on the free GNU Compiler
Collection (GCC). The goal of this workshop is to bring together people from
industry and academia that are interested in conducting research based on
GCC and enhancing this compiler suite for research needs. The workshop will
promote and disseminate compiler research (recent, ongoing or planned) with
GCC, as a robust industrial-strength vehicle that supports free and
collaborative research. The program will include an invited talk and a
discussion panel on future research and development directions of GCC.

 Topics of interest 

 Any issue related to innovative program analysis, optimizations and run-time
 adaptation with GCC including but not limited to:

 * Classical compiler analyses, transformations and optimizations
 * Power-aware analyses and optimizations
 * Language/Compiler/HW cooperation
 * Optimizing compilation tools for heterogeneous/reconfigurable/
   multicore systems
 * Tools to improve compiler configurability and retargetability
 * Profiling, program instrumentation and dynamic analysis
 * Iterative and collective feedback-directed optimization
 * Case studies and performance evaluations
 * Techniques and tools to improve usability and quality of GCC
 * Plugins to enhance research capabilities of GCC

 Paper Submission Guidelines 

Submitted papers should be original and not published or submitted for
publication elsewhere; papers similar to published or submitted work must
include an explicit explanation. Papers should use the LNCS format and
should be 12 pages maximum. The submission procedure will be posted on the
GROW 2011 website in due time.

Papers will be refereed by the Program Committee and if accepted, and if the
authors wish, will be made available on the workshop web site.

 Important Dates 

Deadline for submission: 31 January 2011
Decision notification:  28 February 2011
Workshop: 2/3 April 2011 full-day

 Organizers 

 David Edelsohn, IBM, USA
 Erven Rohou, INRIA, France

 Program Committee 

 Zbigniew Chamski, Infrasoft IT Solutions, Poland
 Albert Cohen, INRIA, France
 David Edelsohn, IBM, USA
 Björn Franke, University of Edinburgh, UK
 Grigori Fursin, EXATEC Lab, France
 Benedict Gaster, AMD, USA
 Jan Hubicka, SUSE
 Paul H.J. Kelly, Imperial College of London, UK
 Ondrej Lhotak, University of Waterloo, Canada
 Hans-Peter Nilsson, Axis Communications, Sweden
 Diego Novillo, Google, Canada
 Dorit Nuzman, IBM, Israel
 Andrea Ornstein, STMicroelectronics, Italy
 Sebastian Pop, AMD, USA
 Erven Rohou, INRIA, France
 Ian Lance Taylor, Google, USA
 Chengyong Wu, ICT, China
 Kenneth Zadeck, NaturalBridge, USA
 Ayal Zaks, IBM, Israel