Re: define_split
On 11/10/2010 12:47 AM, Joern Rennecke wrote: I remember that it has been there even before the GNU GCC project started using cvs. Fortunately, we still have the translated history from RCS going backeven further... but the earliest mention of find_split_point in combine.c is shown as having been added in 'revision' 357 - the same one that combine.c was brought under RCS control, in February 1992. find_split_point is something different, it is used to find a place where it should make sense to split a single insn into two. You're thinking of split_insns which is just a little bit younger (r727, April 1992). Paolo
software pipelining
Hi, I was wondering if gcc has software pipelining. I saw options -fsel-sched-pipelining -fselective-scheduling -fselective-scheduling2 but I don't see any pipelining happening (tried with ia64). Is there a gcc VLIW port in which I can see it working? For an example function like int nor(char* __restrict__ c, char* __restrict__ d) { int i, sum = 0; for (i = 0; i < 256; i++) d[i] = c[i] << 3; return sum; } with no pipelining a code like r1 = 0 r2 = c r3 = d _startloop if r1 == 256 jmp _end r4 = [r2]+ r4 >>= r4 [r3]+ = r4 r1++ jmp _startloop _end here inside the loop there is a data dependency between all 3 insns (only the r1++ is independent) which does not permit any parallelism with pipelining I expect a code like r1 = 2 r2 = c r3 = d // peel first iteration r4 = [r2]+ r4 >>= r4 r5 = [r2]+ _startloop if r1 == 256 jmp _end [r3]+ = r4 ; r4 >>= r5 ; r5 = [r2]+ r1++ jmp _startloop _end Now the data dependecy is broken and parlallism is possible. As I said I could not see that happening. Can someone please tell me on which port and with what options can I get such a result? Thanks, Roy.
Re: peephole2: dead regs not marked as dead
On 11/10/2010 11:58 AM, Georg Lay wrote: In the old 3.4.x (private port) I introduced a target hook in combine, just prior to where recog_for_combine gets called. The hook did some canonicalization of rtx and thereby considerably reduced the number of patterns that would have been necessary without that hook. It transformed some unspecs. How is that related to delegitimize_address? A second use case of such a hook could look as follows: Imagine a combined pattern that does not match but would be legal if the backend knew that some register contains some specific value like, e.g., non-negative, zero-or-storeflagvalue, combiner has proved that some bits will always be set or always be cleared; You can use nonzero_bits or num_signbit_copies in define_splits. In _this_ case, a define_insn_or_split doesn't really help, I agree with Joern on this. :) II. Suppose insns A, B and C with costs f(A) = f(B) = f(C) = 2. Combine combines A+B and sees costs f(A+B) = 5. This makes combine reject the pattern, not looking deeper. Does it really do that? I see nothing in combine_instructions which would prevent going deeper. Paolo
Re: peephole2: dead regs not marked as dead
Michael Meissner schrieb: > In particular, go to pages 5-7 of my tutorial (chapter 6) where I talk about > scratch registers and allocating a new pseudo in split1 which is now always > run. Ah great! That's the missing link. The pseudo hatches in split1 from scratch. This makes combine even more powerful an split1 can be seen as expand2 :-) imho, combine could be slightly improved by some minor modifications: I. In the old 3.4.x (private port) I introduced a target hook in combine, just prior to where recog_for_combine gets called. The hook did some canonicalization of rtx and thereby considerably reduced the number of patterns that would have been necessary without that hook. It transformed some unspecs. A second use case of such a hook could look as follows: Imagine a combined pattern that does not match but would be legal if the backend knew that some register contains some specific value like, e.g., non-negative, zero-or-storeflagvalue, combiner has proved that some bits will always be set or always be cleared; that kind of things. The hook could accept the pattern (provided it passes recog_for_combine, in that setup the hook will dispatch to recog_for_combine, so it has the opportunity to canonicalize the pattern or take some decision depending on the insn matched) if backend is allowed so see the proof (pattern ok) or there is no such proof (pattern rejected). In the actual setup you could write more complex patterns to make combiner look deeper into things. That implies writing an insn that has the information explicit in the pattern. Most probably you are going so split the stuff in split1 again. A little bit more explicit: Suppose f(x) is the pattern in question, just kooked up by combine. You have an insn that can do f(x) under the condition that x \in {0,1} but not f(x) generally, and combine doesn't look deeper into things because of some reason. In a case where combine can proof that x will always be 0 or 1 it is legal to accept f(x), otherwise not. The current situation, however, would go like this: In the case where the source happens to be like x &= 1, f(x) and there is a combine bridge that can do f(x & 1) combine might take a try. f(x & 1) matches, and in split1 things get split as x &= 1, f(x). Now you have the knowledge that x is either 0 or 1 and you can replace (or directly split it like) x &=1, g(x) with an insn g(x) that is cheaper than f(x). Note that the first case is more general and need no combine bridge. The only thing that has to be done is replacing f(x) with g(x), which is legal under the assumption x \in {0,1}. II. Suppose insns A, B and C with costs f(A) = f(B) = f(C) = 2. Combine combines A+B and sees costs f(A+B) = 5. This makes combine reject the pattern, not looking deeper. If f(A+B+C) = 2 you either will never see A+B+C, or you have to lie about cost and say f(A+B) = 2. Ok. As combine is myopic (I do not blame combine here, it's complex enough) you go and write the combine bridge A+B. But: In situations where A+B matches but there is no thing like C we see a dis-optimization. split1 could fix that by splitting A+B into A, B again, but that clutters up machine description. Combine could go like this: a) f(A+B) = 5 is more expensive than f(A)+f(B) = 4 b) do the combine A+B c) look deeper d1) there is match A+B+C cost f(A+B+C) = 2 => accept d2) there is match A+B+C cost f(A+B+C) = 12 => costly => reject d3) there is no match A+B+C => A+B ist costly => reject Skimming combine dumps, I see most patterns get rejected because they do not match, just a few insns get rejected because of higher cost. So looking deeper even if the costs are higher may give better code and increasing compile time just a little bit. Case II is similar to tunneling known from physics. Georg
Re: software pipelining
Hi, On 10.11.2010 12:32, roy rosen wrote: Hi, I was wondering if gcc has software pipelining. I saw options -fsel-sched-pipelining -fselective-scheduling -fselective-scheduling2 but I don't see any pipelining happening (tried with ia64). Is there a gcc VLIW port in which I can see it working? You need to try -fmodulo-sched. Selective scheduling works by default on ia64 with -O3, otherwise you need -fselective-scheduling2 -fsel-sched-pipelining. Note that selective scheduling disables autoinc generation for the pipelining to work, and modulo scheduling will likely refuse to pipeline a loop with autoincs. Modulo scheduling implementation in GCC may be improved, but that's a different topic. Andrey For an example function like int nor(char* __restrict__ c, char* __restrict__ d) { int i, sum = 0; for (i = 0; i< 256; i++) d[i] = c[i]<< 3; return sum; } with no pipelining a code like r1 = 0 r2 = c r3 = d _startloop if r1 == 256 jmp _end r4 = [r2]+ r4>>= r4 [r3]+ = r4 r1++ jmp _startloop _end here inside the loop there is a data dependency between all 3 insns (only the r1++ is independent) which does not permit any parallelism with pipelining I expect a code like r1 = 2 r2 = c r3 = d // peel first iteration r4 = [r2]+ r4>>= r4 r5 = [r2]+ _startloop if r1 == 256 jmp _end [r3]+ = r4 ; r4>>= r5 ; r5 = [r2]+ r1++ jmp _startloop _end Now the data dependecy is broken and parlallism is possible. As I said I could not see that happening. Can someone please tell me on which port and with what options can I get such a result? Thanks, Roy.
Re: define_split
On 09/11/10 22:54, Michael Meissner wrote: The split pass would then break this back into three insns: (insn ... (set (reg:SF ACC_REGISTER) (mult:SF (reg:SF 124) (reg:SF 125 (insn ... (set (reg:SF ACC_REGISTER) (plus:SF (reg:SF ACC_REGISTER) (reg:SF 127 (insn ... (set (reg:SF 126) (reg:SF ACC_REGISTER))) Now, if you just had the split and no define_insn, combine would try and form the (plus (mult ...)) and not find an insn to match, so while it had the temporary insn created, it would try to immediately split the insn, so at the end of the combine pass, you would have: (insn ... (set (reg:SF ACC_REGISTER) (mult:SF (reg:SF 124) (reg:SF 125 (insn ... (set (reg:SF ACC_REGISTER) (plus:SF (reg:SF ACC_REGISTER) (reg:SF 127 (insn ... (set (reg:SF 126) (reg:SF ACC_REGISTER))) I'm trying to follow this example for my own education, but these two example results appear to be identical. Presumably this isn't deliberate? Andrew
Idea - big and little endian data areas using named address spaces
Would it be possible to use the named address space syntax to implement reverse-endian data? Conversion between little-endian and big-endian data structures is something that turns up regularly in embedded systems, where you might well be using two different architectures with different endianness. Some compilers offer direct support for endian swapping, but gcc has no neat solution. You can use the __builtin_bswap32 (but no __builtin_bswap16?) function in recent versions of gcc, but you still need to handle the swapping explicitly. Named address spaces would give a very neat syntax for using such byte-swapped areas. Ideally you'd be able to write something like: __swapendian stuct { int a; short b; } data; and every access to data.a and data.b would be endian-swapped. You could also have __bigendian and __litteendian defined to __swapendian or blank depending on the native ordering of the target. I've started reading a little about how named address spaces work, but I don't know enough to see whether this is feasible or not. Another addition in a similar vein would be __nonaligned, for targets which cannot directly access non-aligned data. The loads and stores would be done byte-wise for slower but correct functionality.
Re: Dedicated logical instructions
Thank you, that worked out eventually. However, now I have another problem. I have 2 instructions in the ISA: 'where' and 'endwhere' which modify the behavior of the instructions put in between them. I made a macro with inline assembly for each of them. The problem is that since `endwhere` doesn't have any operands and doesn't clobber any registers, the GCC optimization reorders it and places the `endwhere` immediately after `where` leaving all the instructions outside the block. A hack-solution came in mind, and that is specifying that the asm inline uses all the registers as operands without actually placing them in the instruction mnemonic. The problem is I don't know how to write that especially when I don't know the variables names (I want to use the same macro in more than one place). These are the macros: #define WHERE(_condition) \ __asm__ __volatile__("move %0 %0, wherenz 0xf" \ : \ : "v" (_condition) \ ); #define ENDWHERE\ __asm__ __volatile__("nop, endwhere"); This is the C code: vector doSmth(vector a, vector b){ WHERE(LT(a, b)) a++; ENDWHERE return a; } And this is what cc1 -O3 outputs: ;# 113 "/home/rhobincu/svnroot/connex/trunk/software/gcc/examples/include/connex.h" 1 lt R31 R16 R17 ;# 4 "/home/rhobincu/svnroot/connex/trunk/software/gcc/examples/test0.c" 1 move R31 R31, wherenz 0xf ;# 6 "/home/rhobincu/svnroot/connex/trunk/software/gcc/examples/test0.c" 1 nop, endwhere iadd R31 R16 1 You can see that the `nop,endwhere` and the `iadd ...` insns are inverted. I think this is similar to having instructions for enabling and disabling interrupts: the instructions have no operands, but the compiler shouldn't move the block in between them for optimization. Thank you, and please, if I waste too much of your time with random questions, tell me and I will stop. :) Regards, R. > "Radu Hobincu" writes: > >> I have another, quick question: I have dedicated logical instructions in >> my RISC machine (lt - less than, gt - greater than, ult - unsigned less than, etc.). I'm also working on adding instructions for logical OR, AND, >> NOT, XOR. While reading GCC internals, I've stumbled on this: >> "Except when they appear in the condition operand of a COND_EXPR, logical >> `and` and `or` operators are simplified as follows: a = b && c >> becomes >> T1 = (bool)b; >> if (T1) >> T1 = (bool)c; >> a = T1;" >> I really, really don't want this. Is there any way I can define the instructions in the .md file so the compiler generates code for computing >> a boolean expression without using branches (using these dedicated insns)? > > That is the only correct way to implement && and || in C, C++, and other similar languages. The question you should be asking is whether gcc will be able to put simple cases without side effects back together again. The answer is that, yes, it should be able to do that. > > You should not worry about this level of things when it comes to writing your backend port. Language level details like this are handled by the frontend, not the backend. When your port is working, come back to this and make sure that you get the kind of code you want. > > Ian >
[gimplefe] Merge trunk r166509
Two merges, actually. The first one at r165633, the second at r166509. Bootstrapped and tested on x86_64. Diego.
Re: Idea - big and little endian data areas using named address spaces
David, for s390 we would also be interested in a solution like that. s390 is big endian and endianess conflicts often come in the way when porting an application from 'other' platforms. Linus Torvalds already in 2001 made a quite similiar suggestions using a type attribute: http://gcc.gnu.org/ml/gcc/2001-12/msg00932.html It probably also would make sense to distingiush between byte and bit endianess. Bye, -Andreas-
Re: Dedicated logical instructions
"Radu Hobincu" writes: > However, now I have another problem. I have 2 instructions in the ISA: > 'where' and 'endwhere' which modify the behavior of the instructions put > in between them. I made a macro with inline assembly for each of them. The > problem is that since `endwhere` doesn't have any operands and doesn't > clobber any registers, the GCC optimization reorders it and places the > `endwhere` immediately after `where` leaving all the instructions outside > the block. That's tricky in general. You want an absolute barrier, but gcc doesn't really provide one that can be used in inline asm. The closest you can come is by adding a clobber of "memory": asm volatile ("xxx" : /* outputs */ : /* inputs */ : "memory"); That will block all instructions that load or store from memory from moving across the barrier. However, it does not currently block register changes from moving across the barrier. I don't know whether that matters to you. You didn't really describe what these instructions do, but they sound like looping instructions which ideally gcc would generate itself. They have some similarity to the existing doloop pattern, q.v. If you can get gcc to generate the instructions itself, then it seems to me that you will get better code in general and you won't have to worry about this issue. Ian
Re: peephole2: dead regs not marked as dead
Paolo Bonzini schrieb: > On 11/10/2010 11:58 AM, Georg Lay wrote: >> In the old 3.4.x (private port) I introduced a target hook in combine, >> just prior to where recog_for_combine gets called. The hook did some >> canonicalization of rtx and thereby considerably reduced the number of >> patterns that would have been necessary without that hook. It >> transformed some unspecs. > > How is that related to delegitimize_address? It's nothing to do with delegitimize address. Ok, let me tell the storyan go into the mess. Some users wanted to have a bit data type in C and to write down variables that represent a bit in memory or in auto or as parameter. (I strongly recommended not to do such hacking in a compiler because it is not within C and for reasons you all know. For reasons you also know, the project was agreed ...). There was already some hacking in gcc when I started and users complained about superfluous zero-extract here and there. No bloat, just a trade-off compiler vs. asm. tricore is capable of loading a byte from absolute address space (LD.BU) and has instructions to operate on any bit (and, or, xor, jmp, compare-accumulate-bitop, extract, insert, etc.). New type _bit was mapped to BImode. To see where the inconveniences are, let's have a look at a comment ;; The TriCore instruction set has many instructions that deal with ;; bit operands. These bits are then given in two operands: one operand ;; is the register that contains the bit, the other operand specifies ;; the bit's position within that register as an immediate, i.e. the ;; bit's position must be a runtime constant. ;; ;; A new relocation type `bpos' for the bit's position within a byte had ;; been introduced, enabling to pack bits in memory without the need to ;; pack them in bitfields. ;; ;; Such an instruction sequence may look like ;; ;; ld.bu %d0, the_bit # insn A ;; jz.t%d0, bpos:the_bit, .some_label # insn A ;; ;; The compiler could, of course, emit such instruction seqences. ;; However, emitting such sequences "en bloc" has several drawbacks: ;; ;; -- every time we need a bit the bit must be read ;; -- emitting instruction en bloc knocks out the scheduler ;; -- when writing all the cases as combiner patterns the number of ;; insn patterns would explode ;; ;; Therefore, a bit could have been described as BImode, which would ;; lead to the following instruction sequence ;; ;; ld.bu %d0, the_bit# insn A ;; extr.u %d0, %d0,bpos:the_bit, 1# insn A ;; jz.t%d0, 0, .some_label # insn B ;; ;; I.e. a bit is represented as BImode and therefore always resides ;; in the LSB. The advantage is that the bit now lives in a register ;; and can be reused. The disadvantage is that ;; ;; -- often, there is a superfluous instruction (extr.u) ;; -- the scheduler cannot schedule ld.bu away from extr.u ;; -- for combiner patterns the same as above applies ;; ;; BPOS representation ;; === ;; ;; Thus, we need a way to tag a register with the bit's position. ;; I use the new LOADBI and BPOS rtx_code, here in an assembler dump from gcc ;; which shows the 1-to-1 correspondence between insn and machine instruction: ;; ;; # (set (reg:SI 0) ;; # (loadbi:SI (mem:BI (symbol_ref:SI ("?the_bit"))) ;; # (symbol_ref:SI ("?the_bit" ;; ld.bu %d0, the_bit# insn A ;; ;; # (set (pc) ;; # (if_then_else (eq (bpos:BI (reg:SI 0) ;; # (symbol_ref:SI ("?the_bit"))) ;; #(const_int 0)) ;; #(label_ref 42) ;; #(pc))) ;; jz.t%d0, bpos:the_bit, .some_label # insn B ;; ;; The sole SYMBOL_REFs are the tags. ;; The reason to use own RTX code and not UNSPEC is because the combiner ;; does not like UNSPEC and the code would not be as dense as could be. ;; ;; Using ZERO_EXTRACT for LOADBI does not work either because a ZERO_EXTRACT ;; may not be used as lvalue which is needed when loading a bit. ;; The ZERO_EXTRACT on the left side would assume that the register that ;; is to be loaded already lives, because the ZERO_EXTRACT affects ;; just some bits, not all. ;; ;; Note that a LOADBI in this context is a load together with a shift left ;; and a BPOS is a shift right i.e. equivalent to some ZERO_EXTRACT. ;; ;; The tag in a BPOS may also be a CONST_INT ;; in which case the bit's position is a compile time constant. ;; ;; The output mode of a BPOS (M1) needs not to be BI, it may also be ;; some wider mode like QI or SI. A wider mode represents a BPOS that is ;; zero extended to that mode. The input mode's bit size must be at most ;; one plus the bit's position (8 for a SYMBOL_REF). ;; ;; BPOS Operands ;; = ;; ;; Next, I introduced BPOS operands, i.e. a
Re: Dedicated logical instructions
> "Radu Hobincu" writes: > >> However, now I have another problem. I have 2 instructions in the ISA: >> 'where' and 'endwhere' which modify the behavior of the instructions put >> in between them. I made a macro with inline assembly for each of them. >> The >> problem is that since `endwhere` doesn't have any operands and doesn't >> clobber any registers, the GCC optimization reorders it and places the >> `endwhere` immediately after `where` leaving all the instructions >> outside >> the block. > > That's tricky in general. You want an absolute barrier, but gcc doesn't > really provide one that can be used in inline asm. The closest you can > come is by adding a clobber of "memory": > asm volatile ("xxx" : /* outputs */ : /* inputs */ : "memory"); > That will block all instructions that load or store from memory from > moving across the barrier. However, it does not currently block > register changes from moving across the barrier. I don't know whether > that matters to you. It does matter unfortunately. I've tried with memory clobber with the same result (the addition in the example doesn't do any memory loads/stores). > You didn't really describe what these instructions do, but they sound > like looping instructions which ideally gcc would generate itself. They > have some similarity to the existing doloop pattern, q.v. If you can > get gcc to generate the instructions itself, then it seems to me that > you will get better code in general and you won't have to worry about > this issue. > > Ian > I have 16 vectorial registers in the machine R16-R31 which all have 128 cells of 16 bits each. These support ALU operations and load/stores just as normal registers, but in one clock. So an add R16 R17 R18 will add the whole R17 array with R18 (corresponding cells) and place the result in R16. The 'where' instruction places a mask on the array so the operation is done only where a certain condition is met. In the example in the previous e-mail, where `a` is less than `b`. I've read the description of doloop and I don't think I can use it in this case. I'll have to dig more or settle with -O0 and cry. Thank you, anyway! R.
Re: Dedicated logical instructions
I have 16 vectorial registers in the machine R16-R31 which all have 128 cells of 16 bits each. These support ALU operations and load/stores just as normal registers, but in one clock. So an add R16 R17 R18 will add the whole R17 array with R18 (corresponding cells) and place the result in R16. The 'where' instruction places a mask on the array so the operation is done only where a certain condition is met. In the example in the previous e-mail, where `a` is less than `b`. I've read the description of doloop and I don't think I can use it in this case. I'll have to dig more or settle with -O0 and cry. Is it possible to abstract out such pieces of code in the input program in an independent function whose prologue and epilogue have the necessary setting? Just curious. Uday.
Re: Idea - big and little endian data areas using named address spaces
On Nov 10, 2010, at 4:00 AM, David Brown wrote: > Would it be possible to use the named address space syntax to implement > reverse-endian data? Conversion between little-endian and big-endian data > structures is something that turns up regularly in embedded systems, where > you might well be using two different architectures with different > endianness. Some compilers offer direct support for endian swapping, but gcc > has no neat solution. You can use the __builtin_bswap32 (but no > __builtin_bswap16?) function in recent versions of gcc, but you still need to > handle the swapping explicitly. > > Named address spaces would give a very neat syntax for using such > byte-swapped areas. Ideally you'd be able to write something like: > > __swapendian stuct { int a; short b; } data; > > and every access to data.a and data.b would be endian-swapped. You could > also have __bigendian and __litteendian defined to __swapendian or blank > depending on the native ordering of the target. > > > I've started reading a little about how named address spaces work, but I > don't know enough to see whether this is feasible or not. > > > Another addition in a similar vein would be __nonaligned, for targets which > cannot directly access non-aligned data. The loads and stores would be done > byte-wise for slower but correct functionality. Why not just handle this in the frontend during gimplification? -Chris
Re: Idea - big and little endian data areas using named address spaces
C++ lets you define explicit-order integer types and hide all the conversions. I used that a couple of jobs ago, back around 1996 or so -- worked nicely, and should work even better now that C++ is more mature. paul On Nov 10, 2010, at 7:00 AM, David Brown wrote: > Would it be possible to use the named address space syntax to implement > reverse-endian data? Conversion between little-endian and big-endian data > structures is something that turns up regularly in embedded systems, where > you might well be using two different architectures with different > endianness. Some compilers offer direct support for endian swapping, but gcc > has no neat solution. You can use the __builtin_bswap32 (but no > __builtin_bswap16?) function in recent versions of gcc, but you still need to > handle the swapping explicitly. > > Named address spaces would give a very neat syntax for using such > byte-swapped areas. Ideally you'd be able to write something like: > > __swapendian stuct { int a; short b; } data; > > and every access to data.a and data.b would be endian-swapped. You could > also have __bigendian and __litteendian defined to __swapendian or blank > depending on the native ordering of the target. > > > I've started reading a little about how named address spaces work, but I > don't know enough to see whether this is feasible or not. > > > Another addition in a similar vein would be __nonaligned, for targets which > cannot directly access non-aligned data. The loads and stores would be done > byte-wise for slower but correct functionality. >
Re: Dedicated logical instructions
"Radu Hobincu" writes: > I have 16 vectorial registers in the machine R16-R31 which all have 128 > cells of 16 bits each. These support ALU operations and load/stores just > as normal registers, but in one clock. So an > > add R16 R17 R18 > > will add the whole R17 array with R18 (corresponding cells) and place the > result in R16. The 'where' instruction places a mask on the array so the > operation is done only where a certain condition is met. In the example in > the previous e-mail, where `a` is less than `b`. I've read the description > of doloop and I don't think I can use it in this case. I'll have to dig > more or settle with -O0 and cry. Ah, that sounds like a straight conditional execution model, which gcc implements via cond_exec. Look for define_cond_exec in the manual. Ian
Re: Idea - big and little endian data areas using named address spaces
On 10/11/10 17:55, Chris Lattner wrote: On Nov 10, 2010, at 4:00 AM, David Brown wrote: Would it be possible to use the named address space syntax to implement reverse-endian data? Conversion between little-endian and big-endian data structures is something that turns up regularly in embedded systems, where you might well be using two different architectures with different endianness. Some compilers offer direct support for endian swapping, but gcc has no neat solution. You can use the __builtin_bswap32 (but no __builtin_bswap16?) function in recent versions of gcc, but you still need to handle the swapping explicitly. Named address spaces would give a very neat syntax for using such byte-swapped areas. Ideally you'd be able to write something like: __swapendian stuct { int a; short b; } data; and every access to data.a and data.b would be endian-swapped. You could also have __bigendian and __litteendian defined to __swapendian or blank depending on the native ordering of the target. I've started reading a little about how named address spaces work, but I don't know enough to see whether this is feasible or not. Another addition in a similar vein would be __nonaligned, for targets which cannot directly access non-aligned data. The loads and stores would be done byte-wise for slower but correct functionality. Why not just handle this in the frontend during gimplification? I don't know if this is possible or not - I'm just making a suggestion that occurred to me after another recent thread about named address spaces, and since I recently worked on a program that involved endian swapping. The other natural way to handle endian swapping would be a variable attribute (this was Linus's suggestion, following the link in a previous reply). If you think it would be hard or inefficient to implement endian swapping as a memory space, then that's a good enough answer for me.
Re: Idea - big and little endian data areas using named address spaces
On 10/11/10 18:05, Paul Koning wrote: C++ lets you define explicit-order integer types and hide all the conversions. I used that a couple of jobs ago, back around 1996 or so -- worked nicely, and should work even better now that C++ is more mature. Yes, a lot of such things can be done with C++ classes. But I see several advantages in implementing it in the compiler rather than classes. First, it will be available in C as well as C++. For lots of reasons, many programs are written in C and not C++. Secondly, the best way to implement endian re-ordering varies from target to target. Some targets have assembly instructions that can be used, requiring inline assembly. Some work best using shifts and masks, others using union types. You can't make a C++ class that will give the best code on many targets without making it very messy, so typically it will be re-implemented for each target. It's easy to make mistakes with C++ class using casting operators to do the endian swapping. Miss out some details, and you might find the compiler omitting the endian swap somewhere. Finally, if you want a struct with endian-swapped members, then each member must be explicitly declared as the appropriate class. You can't swap the entire struct at once, as you could do with a memory space (or attribute) solution. mvh., David paul On Nov 10, 2010, at 7:00 AM, David Brown wrote: Would it be possible to use the named address space syntax to implement reverse-endian data? Conversion between little-endian and big-endian data structures is something that turns up regularly in embedded systems, where you might well be using two different architectures with different endianness. Some compilers offer direct support for endian swapping, but gcc has no neat solution. You can use the __builtin_bswap32 (but no __builtin_bswap16?) function in recent versions of gcc, but you still need to handle the swapping explicitly. Named address spaces would give a very neat syntax for using such byte-swapped areas. Ideally you'd be able to write something like: __swapendian stuct { int a; short b; } data; and every access to data.a and data.b would be endian-swapped. You could also have __bigendian and __litteendian defined to __swapendian or blank depending on the native ordering of the target. I've started reading a little about how named address spaces work, but I don't know enough to see whether this is feasible or not. Another addition in a similar vein would be __nonaligned, for targets which cannot directly access non-aligned data. The loads and stores would be done byte-wise for slower but correct functionality.
[CFP] Reminder: GCC Research Opportunities Workshop 2011
CALL FOR PAPERS 3rd Workshop on GCC Research Opportunities (GROW 2011) http://grow2011.inria.fr 2/3 April 2011, Chamonix, France (co-located with CGO 2011) The GROW workshop focuses on current challenges in research and development of compiler analyses and optimizations based on the free GNU Compiler Collection (GCC). The goal of this workshop is to bring together people from industry and academia that are interested in conducting research based on GCC and enhancing this compiler suite for research needs. The workshop will promote and disseminate compiler research (recent, ongoing or planned) with GCC, as a robust industrial-strength vehicle that supports free and collaborative research. The program will include an invited talk and a discussion panel on future research and development directions of GCC. Topics of interest Any issue related to innovative program analysis, optimizations and run-time adaptation with GCC including but not limited to: * Classical compiler analyses, transformations and optimizations * Power-aware analyses and optimizations * Language/Compiler/HW cooperation * Optimizing compilation tools for heterogeneous/reconfigurable/ multicore systems * Tools to improve compiler configurability and retargetability * Profiling, program instrumentation and dynamic analysis * Iterative and collective feedback-directed optimization * Case studies and performance evaluations * Techniques and tools to improve usability and quality of GCC * Plugins to enhance research capabilities of GCC Paper Submission Guidelines Submitted papers should be original and not published or submitted for publication elsewhere; papers similar to published or submitted work must include an explicit explanation. Papers should use the LNCS format and should be 12 pages maximum. The submission procedure will be posted on the GROW 2011 website in due time. Papers will be refereed by the Program Committee and if accepted, and if the authors wish, will be made available on the workshop web site. Important Dates Deadline for submission: 31 January 2011 Decision notification: 28 February 2011 Workshop: 2/3 April 2011 full-day Organizers David Edelsohn, IBM, USA Erven Rohou, INRIA, France Program Committee Zbigniew Chamski, Infrasoft IT Solutions, Poland Albert Cohen, INRIA, France David Edelsohn, IBM, USA Björn Franke, University of Edinburgh, UK Grigori Fursin, EXATEC Lab, France Benedict Gaster, AMD, USA Jan Hubicka, SUSE Paul H.J. Kelly, Imperial College of London, UK Ondrej Lhotak, University of Waterloo, Canada Hans-Peter Nilsson, Axis Communications, Sweden Diego Novillo, Google, Canada Dorit Nuzman, IBM, Israel Andrea Ornstein, STMicroelectronics, Italy Sebastian Pop, AMD, USA Erven Rohou, INRIA, France Ian Lance Taylor, Google, USA Chengyong Wu, ICT, China Kenneth Zadeck, NaturalBridge, USA Ayal Zaks, IBM, Israel