Re: How to support 40bit GP register - Take two
Sorry if I misunderstood, but... On Fri, 18 Dec 2009, Mohamed Shafi wrote: > 2009/12/18 Hans-Peter Nilsson : > > On Fri, 20 Nov 2009, Mohamed Shafi wrote: > >> I tried implementing the suggestion given by Richard, but got into > >> issues. The GCC frame work is written assuming that there are no modes > >> with HOST_BITS_PER_WIDE_INT < GET_MODE_BITSIZE (mode) < 2 * > >> HOST_BITS_PER_WIDE_INT. > > > > (Not seeing a reply regarding this issue, so here's mine, belated:) > > > > Perhaps a wart, but with a 64-bit HOST_BITS_PER_WIDE_INT, would > > that affect your port? It's not? Just set need_64bit_hwint=yes > > in config.gcc. And send a patch for the introductory comment in > > that file, unless your port already matches the "BITS_PER_WORD > > > 32 bits" condition. > > >Thanks Hans for yourr reply >I have already tried that. What you are suggesting is the first > solution that i got from Richard Henderson. Or rather, a prerequisite to that solution, working around the wart you noted. > I have mentioned the > issues if faced with this in my mail. The GCC frame work is written > assuming that there are no modes with HOST_BITS_PER_WIDE_INT < > GET_MODE_BITSIZE (mode) < 2 * HOST_BITS_PER_WIDE_INT. So i had to hack > at places to get things working. For my target the BITS_PER_WORD == > 32. The mode that i am using is RImode (5bytes) You already mentioned all that in the message to which I replied and quoted above (twice now :-) By you mentioning that equation I have to think you misread and missed the point (sorry if instead it's me); In config.gcc, for the line matching your port, set: need_64bit_hwint=yes Then, GET_MODE_BITSIZE (mode) < HOST_BITS_PER_WIDE_INT for your 40-bit mode, as the equation above yields false for 64 < 40 < 64*2. HTH, good luck and happy holidays. brgds, H-P
Re: GCC presentation targeted to users (43 slides in english)
On Thu, 2009-12-17 at 21:00 +0100, Laurent GUERBY wrote: > On Thu, 2009-12-17 at 21:02 +0100, Eric Botcazou wrote: > > > The 43 slides presentation in english is available here > > > in PDF and openoffice format: > > > > > > http://guerby.org/ftp/gcc-toulibre-20091216.pdf > > > http://guerby.org/ftp/gcc-toulibre-20091216.odp > > > > A small nit: you don't need to do 'make bootstrap' anymore, 'make' is > > enough. > > Yes I explained it during the presentation (on native vs cross) but I > couldn't remember in what version the change was made so I erred on the > safe side :). Another nit. Your description of -m options is somewhat (entirely?) i386 centric. R
Re: GCC presentation targeted to users (43 slides in english)
On Fri, 2009-12-18 at 10:42 +, Richard Earnshaw wrote: > On Thu, 2009-12-17 at 21:00 +0100, Laurent GUERBY wrote: > > On Thu, 2009-12-17 at 21:02 +0100, Eric Botcazou wrote: > > > > The 43 slides presentation in english is available here > > > > in PDF and openoffice format: > > > > > > > > http://guerby.org/ftp/gcc-toulibre-20091216.pdf > > > > http://guerby.org/ftp/gcc-toulibre-20091216.odp > > > > > > A small nit: you don't need to do 'make bootstrap' anymore, 'make' is > > > enough. > > > > Yes I explained it during the presentation (on native vs cross) but I > > couldn't remember in what version the change was made so I erred on the > > safe side :). > > Another nit. Your description of -m options is somewhat (entirely?) > i386 centric. Yes I said orally that I ran the command on a regular PC but that each target will have its own list (so "target specific") of -m reported and also of course described in the manual (tersely :). I used a PC because not many people know that you can use "-m32" there. Sincerely, Laurent
Re: GMP and GCC 4.3.2
myarch64 is my port. Then perhaps this is an back-end problem on my port. I'll try to determine this today and keep you posted. Jc On Thu, Dec 17, 2009 at 9:29 PM, Jie Zhang wrote: > On 12/18/2009 06:27 AM, Jean Christophe Beyler wrote: >> >> Actually, I just finished updating my 4.3.2 to 4.3.3 and tested it and >> I still have the same issue. >> >> This seems to be a problem more than "just" 4.3.2. >> >> Here is the test program: >> #include >> #include >> >> int main() { >> mpz_t a,b; >> mpz_init_set_str(a, "100", 10); // program works with 10^9, >> but not >> // with 10^10 (10^20> 2^64) >> mpz_init_set(b, a); >> mpz_mul(a, a, a); >> gmp_printf("first, in GMP mpz_mul(a,a,a) with a=%Zd gives %Zd \n", b, >> a); >> mpz_set(b, a); >> mpz_mul(a, a, a); >> gmp_printf("second, in GMP mpz_mul(a,a,a) with a=%Zd gives %Zd \n", b, >> a); >> return 0; >> } >> >> We obtain: >> first, in GMP mpz_mul(a,a,a) with a=100 gives >> 1 >> second, in GMP mpz_mul(a,a,a) with a=1 gives >> >> 2254536013160540992915637663717291581095979492475463214517286840718852096 >> >> Which clearly is wrong for the second output. >> >> This was tested with a 64 bit architecture. I know that with a 4.1.1 >> port of the compiler, I do not see this issue. >> >> I will see if I can port it forward to see if I still see the problem >> but it might be difficult to port from 4.3.2 to 4.4.2, I'm not sure >> how many things have changed but I'm sure quite a bit ! >> >> This is the -v of my GCC, version 4.3.3: >> Using built-in specs. >> Target: myarch64-linux-elf >> Configured with: >> /home/beyler/myarch64/src/myarch64-gcc-4.3.2/configure >> --target=myarch64-linux-elf >> --with-headers=/home/beyler/myarch64/src/newlib-1.16.0/newlib/libc/include >> --prefix=/home/beyler/myarch64/local --disable-nls >> --enable-languages=c --with-newlib --disable-libssp >> --with-mpfr=/home/beyler/myarch64/local >> Thread model: single >> gcc version 4.3.3 (GCC) >> > What's myarch64? I got the correct result for your test with vanilla > gcc-4.3.2 and gmp-4.3.1 on Debian unstable AMD64. > > > Jie >
Possible IRA improvements for irregular register architectures
Let's assume I have two sub-classes of ALL_REGS: BOTTOM_REGS (c0-c15) and TOP_CREGS (c16-c31). Let's also assume I have two main types of instruction: A-type Instructions, which can use ALL 32 registers, and B-type Instructions, which can only use the 16 BOTTOM_REGS. IRA will correctly calculate costs (in ira-costs.c) for allocnos appearing in B-type instructions, such that TOP_CREGS has a higher cost than BOTTOM_REGS. It will also calculate costs for the A-type instructions such that TOP_CREGS and BOTTOM_REGS have the same cost. The order of coloring will be determined by the algorithm chosen: Priority or Chaitin-Briggs. As each allocno is colored, the costs will be inspected and the best available hard register will be chosen, mainly based on the register class costs mentioned above, so allocnos in B-type Instructions will usually be assigned a BOTTOM_REG if one is free. If two or more hard registers share the same cost then whichever one appears first in the REG_ALLOC_ORDER will be assigned. (If no hard register can be found, the allocno is assigned memory and will require a "reload" in a later pass to get a hard register.) I do not wish to alter the coloring algorithms or the coloring order. I believe they are good at determing the order to color allocnos, which dictates the likelihood of being assigned a hard register. What I wish to change is the hard register that is assigned, given that the coloring order has determined that this allocno should get one next. Why do I need to do this? Since the BOTTOM_REGS can be used by all instructions, it makes sense to put them first in the REG_ALLOC_ORDER, so we minimise the number of registers consumed by a low-pressure function. But it also makes sense, in high-pressure functions, to steer A-type Instructions away from using BOTTOM_REGS so that they are free for B-type Instructions to use. To achieve this, I tried altering the costs calculated in ira-costs.c, either explicitly with various hacks or by altering operand constraints. The problem with this approach was that it is static and independent, occurring before any coloring order has been determined and without any awareness of the needs of other allocnos. I believe I require a dynamic way to alter the costs, based on which allocnos conflict with the allocno currently being colored and which hard registers are still available at this point. The patch I have attached here is my first reasonable successful attempt at this dynamic approach, which has led to performance improvements on some of our benchmarks and no significant regressions. I am hoping it will be useful to others, but I post it more as a talking point or perhaps to inspire others to come up with better solutions and share them with me :-) bolton_ira_subset_experiments.diff Description: bolton_ira_subset_experiments.diff
CSE & compare/branch template problem
Hi -- I'm working on creating the cstore and cbranch templates for the Xilinx MicroBlaze processor. I've run into some problems and an unexpected interaction with CSE processing. Possibly, the insns I'm generating for comparisons are not what the CSE optimization expects. MicroBlaze has a bit unusual compare and branch architecture. There are no condition flags; comparison results are stored in a result register. There's one branch instruction, which compares a register with zero and branches based on condition (eq, ne, lt, gt, etc.). There are a number of different instructions which can be used for comparisons. Regular instructions like sub or xor can be used in some cases. There are also cmp/cmpu instructions which do a modified subtraction of signed and unsigned integers. There are other comparison instructions for eq/ne which give a 0 or one result. The FP comparisons are all coupled with a condition: fcmp.eq, fcmp.lt, etc. To generate the correct comparison instruction, I need to know both the operands and the test condition. It doesn't look like the (compare:CC ...) template can be used. Compare takes two operands and there is no place to save the condition. For comparisons, I'm generating insns like: (set (reg:CC rD) (eq:CC (reg:SI rA) (reg:SI rB)) followed by a branch: (if_then_else (eq:CC (reg:CC rD) (const_int 0)) (label_ref xx) (pc))) This looks OK and it appears to work reasonably well. At least without optimization. CSE appears to be misinterpreting the comparison insn and the code gets trashed. This is taken from strchr.c in Newlib. The preprocessed code snippet is unsigned char c = i; if (!c) { while (((long)s & (sizeof (long) - 1))) { if (!*s) return (char *) s; s++; } . . . Before CSE, this is the insn sequence: (insn 90 89 91 2 .../strchr.c:66 (set (reg/v:SI 137 [ c+-3 ]) (zero_extend:SI (subreg:QI (reg/v:SI 244 [ i ]) 3))) 33 (insn 92 91 93 2 .../strchr.c:73 (set (reg:SI 245) (const_int 0 [0x0])) 41 (insn 93 92 94 2 .../strchr.c:73 (set (reg:CC 246) (eq:CC (reg/v:SI 137 [ c+-3 ]) (reg:SI 245))) 71 (jump_insn 94 93 95 2 .../strchr.c:73 (set (pc) (if_then_else (eq:CC (reg:CC 246) (const_int 0 [0x0])) (label_ref 112) (pc))) This is the first if expression. Note that reg 137 [c] is set to the masked value of the search character i. Reg 245 is set to zero, which is used in the compare instruction in insn 93. The while expression is (insn 97 96 98 3 .../strchr.c:93 (set (reg:SI 247) (and:SI (reg/v/f:SI 140 [ s ]) (const_int 3 [0x3]))) 25 (insn 98 97 99 3 .../strchr.c:93 (set (reg:SI 248) (const_int 0 [0x0])) 41 (insn 99 98 100 3 .../strchr.c:93 (set (reg:CC 249) (eq:CC (reg:SI 247) (reg:SI 248))) 71 (jump_insn 100 99 101 3 .../strchr.c:93 (set (pc) (if_then_else (eq:CC (reg:CC 249) (const_int 0 [0x0])) (label_ref 225) (pc))) 75 Reg 247 is set to the masked value of s, the string pointer. Reg 248 is set to zero and is used in the compare in insn 99. All of this looks OK to me. There is an intermediate step where reg 248 is replaced by reg 245, since both are set to zero. Somehow, CSE is deciding that reg 137 is equal to zero and translates insn 99 to (insn 99 98 100 3 .../strchr.c:93 (set (reg:CC 249) (eq:CC (reg:SI 247) (reg/v:SI 137 [ c+-3 ]))) 71 There is a REG_EQUAL note on this insn saying that it is equal to const zero. At this point, the code is hosed. I've been stepping through CSE to figure out why it decides that reg 137 is equal to zero, but haven't found this yet. (Yes, I do recognize that a comparison with zero can be simplified into just the branch insn. That's on my to do list.) Questions: Is this a reasonable way to represent the comparisons? Are there other targets which save comparison results in registers and require the condition? Any suggestions on better ways to model the MicroBlaze comparison operations? Are there some restriction on using eq/ne/lt/... the way I am? Any suggestions on how to fix the problem in CSE? -- Michael Eagerea...@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
Re: CSE & compare/branch template problem
Quoting Michael Eager : Hi -- I'm working on creating the cstore and cbranch templates for the Xilinx MicroBlaze processor. In theory cstore / cbranch should be the future, but the last time I tried to use them, they didn't quite work right yet, particularily if you have an incomplete comparison set. Because of delays with the Copyright assignment, fixing the middle-end was not an efficient option, so I went for the old hack with separate compare and condjump / cstore patterns. Somehow, CSE is deciding that reg 137 is equal to zero and translates insn 99 to (insn 99 98 100 3 .../strchr.c:93 (set (reg:CC 249) (eq:CC (reg:SI 247) (reg/v:SI 137 [ c+-3 ]))) 71 That is correct; after the jump_insn 94 is not taken, you can conclude that reg 137 is zero. There is a REG_EQUAL note on this insn saying that it is equal to const zero. I would say that is a problem; find out why the note gets there. Are there other targets which save comparison results in registers and require the condition? Yes. One example is SH64. Any suggestions on better ways to model the MicroBlaze comparison operations? Are there some restriction on using eq/ne/lt/... the way I am? Any suggestions on how to fix the problem in CSE? There are some problems if comparisons can't be reversed. I think it's a bug in the generic code, but for any given port, it's easier - and gives better code - to make sure that all conditions can be reversed, rather than fix the generic code. I ran into this with the MXP port.