Re: --disable-multilib broken on x86_64
On Sat, Mar 24, 2007 at 12:27:32PM -0700, Lu, Hongjiu wrote: > I can't duplicate the problem. It works fine for me. > > H.J. > [EMAIL PROTECTED] > > >-Original Message- > >From: Martin Michlmayr [mailto:[EMAIL PROTECTED] > >Sent: Saturday, March 24, 2007 12:03 PM > >To: Michael Meissner; Lu, Hongjiu > >Cc: gcc@gcc.gnu.org > >Subject: Re: --disable-multilib broken on x86_64 > > > >* Martin Michlmayr <[EMAIL PROTECTED]> [2007-03-24 19:01]: > >> The following change broke --disable-multilib: > > > >Actually, it also fails without that option. > >-- > >Martin Michlmayr > >http://www.cyrius.com/ Same here. I won't be able to look at it in more detail until I get back to the office on Monday. -- Michael Meissner, AMD 90 Central Street, MS 83-29, Boxborough, MA, 01719, USA [EMAIL PROTECTED]
Re: --disable-multilib broken on x86_64
On Sat, Mar 24, 2007 at 07:01:40PM +, Martin Michlmayr wrote: > The following change broke --disable-multilib: > > 2007-03-23 Michael Meissner <[EMAIL PROTECTED]> > H.J. Lu <[EMAIL PROTECTED]> > > ../src/configure --enable-languages=c --disable-multilib x86_64-linux-gnu > > /home/tbm/build/gcc-snapshot-20070324/build/./gcc/xgcc > -B/home/tbm/build/gcc-snapshot-20070324/build/./gcc/ > -B/usr/lib/gcc-snapshot/x86_64-linux-gnu/bin/ > -B/usr/lib/gcc-snapshot/x86_64-linux-gnu/lib/ -isystem > /usr/lib/gcc-snapshot/x86_64-linux-gnu/include -isystem > /usr/lib/gcc-snapshot/x86_64-linux-gnu/sys-include -g -fkeep-inline-functions > -O2 -O2 -g -O2 -DIN_GCC-W -Wall -Wwrite-strings -Wstrict-prototypes > -Wmissing-prototypes -Wold-style-definition -isystem ./include -fPIC -g > -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -I. -I. > -I../.././gcc -I../../../src/libgcc -I../../../src/libgcc/. > -I../../../src/libgcc/../gcc -I../../../src/libgcc/../include > -I../../../src/libgcc/../libdecnumber/no > -I../../../src/libgcc/../libdecnumber -I../../libdecnumber -o decLibrary.o > -MT decLibrary.o -MD -MP -MF decLibrary.dep -c > ../../../src/libgcc/../libdecnumber/decLibrary.c > ../../../src/libgcc/../libdecnumber/decLibrary.c:32:24: error: decimal128.h: > No such file or directory > ../../../src/libgcc/../libdecnumber/decLibrary.c:33:23: error: decimal64.h: > No such file or directory > ../../../src/libgcc/../libdecnumber/decLibrary.c:34:23: error: decimal32.h: > No such file or directory > ../../../src/libgcc/../libdecnumber/decLibrary.c:36: error: expected > declaration specifiers or '...' before 'decimal32' > ../../../src/libgcc/../libdecnumber/decLibrary.c:37: error: expected > declaration specifiers or '...' before 'decimal64' > ../../../src/libgcc/../libdecnumber/decLibrary.c:38: error: expected > declaration specifiers or '...' before 'decimal128' > ../../../src/libgcc/../libdecnumber/decLibrary.c: In function 'isinfd32': > ... > > -- > Martin Michlmayr > http://www.cyrius.com/ > Note, the gcc mailing list is perhaps not the best place to post bug reports (following up to the mail with the patch in gcc-patches or posting in gcc-bugs may be better places to post this). As I replied in gcc-patches, I think the following patch fixes the problem, simpler than using AC_CANONICAL. Sorry, I always build with canonical names, and didn't notice that x86_64-linux-gnu wouldn't build: libgcc/ 2007-03-26 Michael Meissner <[EMAIL PROTECTED]> * configure.ac (--enable-decimal-float): Change x86 targets to just x86_64*-linux* and i?86*-linux* so that a non-canonical target of x86_64-linux will be handled. * configure: Regenerate. --- libgcc/configure.ac.~1~ 2007-03-24 13:25:28.593802000 -0400 +++ libgcc/configure.ac 2007-03-26 13:17:46.577488000 -0400 @@ -121,7 +121,7 @@ Valid choices are 'yes', 'bid', 'dpd', a ], [ case $target in -powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux*) +powerpc*-*-linux* | i?86*-linux* | x86_64*-linux*) enable_decimal_float=yes ;; *) @@ -133,7 +133,7 @@ Valid choices are 'yes', 'bid', 'dpd', a # x86's use BID format instead of DPD if test x$enable_decimal_float = xyes; then case $target in -i?86*-*-linux* | x86_64*-*-linux*) +i?86*-linux* | x86_64*-linux*) enable_decimal_float=bid ;; *) --- libgcc/configure.~1~2007-03-26 13:17:14.691407000 -0400 +++ libgcc/configure2007-03-26 13:17:55.282777000 -0400 @@ -3306,7 +3306,7 @@ Valid choices are 'yes', 'bid', 'dpd', a else case $target in -powerpc*-*-linux* | i?86*-*-linux* | x86_64*-*-linux*) +powerpc*-*-linux* | i?86*-linux* | x86_64*-linux*) enable_decimal_float=yes ;; *) @@ -3319,7 +3319,7 @@ fi; # x86's use BID format instead of DPD if test x$enable_decimal_float = xyes; then case $target in -i?86*-*-linux* | x86_64*-*-linux*) +i?86*-linux* | x86_64*-linux*) enable_decimal_float=bid ;; *) -- Michael Meissner, AMD 90 Central Street, MS 83-29, Boxborough, MA, 01719, USA [EMAIL PROTECTED]
Re: error: "no newline at end of file"
On Tue, Mar 27, 2007 at 09:47:35AM -0700, Joe Buck wrote: > On Tue, Mar 27, 2007 at 02:11:21PM +0100, Manuel López-Ibáñez wrote: > > On 27/03/07, Martin Michlmayr <[EMAIL PROTECTED]> wrote: > > >* Manuel López-Ibáñez <[EMAIL PROTECTED]> [2007-03-27 14:01]: > > >> >Thanks for the explanation - this explains what I'm seeing. Is there > > >> >a good reason against changing this particular warning from > > >> >CPP_DL_PEDWARN to CPP_DL_WARNING? Quite a few packages in Debian fail > > >> >to build because of this and it seems overly strict to me. However, if > > >> >it'll remain an error with C++ code, I'll start filing bugs on these > > >> >packages. > > >> > > >> I cannot answer why this is a pedwarn or why C++ emits pedantic errors > > >> by default. > > > > > >Sorry for not being clearer; this question wasn't specifically > > >directed at you. Hopefully someone else, maybe Joseph, can answer it. > > > > Nevertheless, if this is indeed a diagnostic required by the standard, > > perhaps we should just conditionalize it on the presence of -pedantic > > in the command-line (rather than being unconditional as it is now). > > It is not required by the standard to diagnose all undefined behavior. > I've used operating systems in the ancient past (e.g. VMS) where it might > not even be possible to issue the diagnostic, because there are file types > for which the preprocessor would not even "see" the partial line. Yes. In the original C89 standard discussions, one of the problems was how to describe the input in the face of record oriented systems, including systems that had virtual card readers, etc. The problem comes out of the phases of translation (5.1.1.2 in the C99 standard, should be close in the C90 standard). In particular, in paragraph 3, it says: A source file shall not end in a partial preprocessing token or a partial comment... If there is no trailing (logical) newline, then the file would end in a partial preprocessing token. > (I knew the acronym "RMS" as "Record Management Services" before I ever > heard of a certain person who started a certain compiler). > > -- Michael Meissner, AMD 90 Central Street, MS 83-29, Boxborough, MA, 01719, USA [EMAIL PROTECTED]
An old timer returns to the fold
For those of you who I've worked with in the past on various GCC issues, I have returned back to GCC land after a long sojourn in other compilation systems. I will start work at AMD on Monday, April 18th, 2005, but I suspect it will be some time before I'm back up to speed. -- Michael Meissner email: [EMAIL PROTECTED] http://www.the-meissners.org
Re: i387 control word register definition is missing
On Mon, May 23, 2005 at 10:25:26AM +0200, Uros Bizjak wrote: > Hello! > > It looks that i387 control word register definition is missing from register > definitions for i386 processor. Inside i386.h, we have: > > #define FIXED_REGISTERS \ > /*ax,dx,cx,bx,si,di,bp,sp,st,st1,st2,st3,st4,st5,st6,st7*/\ > { 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, \ > /*arg,flags,fpsr,dir,frame*/ \ > 1,1, 1, 1,1, \ > /*xmm0,xmm1,xmm2,xmm3,xmm4,xmm5,xmm6,xmm7*/ \ > 0, 0, 0, 0, 0, 0, 0, 0,\ > /*mmx0,mmx1,mmx2,mmx3,mmx4,mmx5,mmx6,mmx7*/ \ > 0, 0, 0, 0, 0, 0, 0, 0,\ > /* r8, r9, r10, r11, r12, r13, r14, r15*/ \ > 2, 2, 2, 2, 2, 2, 2, 2,\ > /*xmm8,xmm9,xmm10,xmm11,xmm12,xmm13,xmm14,xmm15*/ \ > 2, 2,2,2,2,2,2,2} > > However, there should be another register defined, i387 control word register, > 'fpcr' (Please look at chapter 11.2.1.2 and 11.2.1.3 in > http://webster.cs.ucr.edu/AoA/Windows/HTML/RealArithmetic.html). There are two > instructions in i386.md that actually use fpcr: Well you really want both the fpcr and the mxcsr registers, since the fpcr only controls the x87 and the mxcsr controls the xmm registers. Note, in adding these registers, you are going to have to go through all of the floating point patterns to add (use:HI FPCR_REG) and (use:SI MXCSR_REG) to each and every pattern so that the optimizer can be told not to move a floating point operation past the setting of the control word. Then you need to make sure nothing got broken by adding these USE instructions. It is certainly doable, but it is not a simple fix. I suspect there is then more information needed to be able to move redudant mode switching operations out of a loop. If you are going to tackle it, be sure to have your paperwork in place so that your code changes can be used. -- Michael Meissner email: [EMAIL PROTECTED] http://www.the-meissners.org
Re: Visual C++ style inline asms
On Tue, Jun 14, 2005 at 10:08:16PM -0700, Richard Henderson wrote: > I suspect that one could get quite a lot of milage out of parsing > the assembly code and turning most of it into straight GIMPLE, rather > than into ASM_EXPRs. A great many examples of VC++ inline asms that > I've seen were completely and utterly trivial; the compiler could have > done a better job. Of course there will be cases that you either > can't understand, or use instructions that don't map to an EXPR or a > builtin. But I expect that more often than not, you can reduce the > inline asm block to one such insn, and expose all the rest of the > data movement to the compiler. My previous employer (Metrowerks) did this when parsing GCC style asm instructions, and it caused lots of complaints from users. The main complaint is users did not want the compiler mucking about with their precious hand scheduled insns, and made it much harder to debug. Another problem is when things like status registers or flags fields aren't adequately handled by the compiler. For example on the x86, if you change the rounding mode, we currently don't have a (USE) of the flags field, so the compiler doesn't know that it can't move the set rounding mode instruction before other floating point instructions. Obviously, the right solution is the encode all of this hidden state information into the RTL, GIMPLE, etc. but it is a large and mind numbing process. Finally, for the x86/x86_64, there are a lot of instructions that need to be handled, and it is an ongoing maintenance problem in that you now need to modify GCC as well as binutils to add new instructions. -- Michael Meissner email: [EMAIL PROTECTED]
Re: How to replace -O1 with corresponding -f's?
On Mon, Jun 20, 2005 at 07:57:17PM +0400, Sergei Organov wrote: > Andrew Pinski <[EMAIL PROTECTED]> writes: > > > On Jun 20, 2005, at 11:28 AM, Sergei Organov wrote: > > > > > Andrew Pinski <[EMAIL PROTECTED]> writes: > > > > > >> On Jun 20, 2005, at 10:54 AM, Sergei Organov wrote: > > >> > > >>> so SYMBOL_FLAG_SMALL (flags 0x6 vs 0x2) is somehow being missed when -O1 > > > > >> > > >>> is turned on. Seems to be something at tree-to-RTX conversion time. > > >>> Constant folding? > > >> > > >> No, it would mean that the target says that this is not a small data. > > >> Also try it with the following code and you will see there is no > > >> difference: > > > > >> > > >> double osvf() { return 314314314; } > > > > > > There is no difference in the sense that here both -O0 and -O1 behave > > > roughly the same. So the problem is with detecting "smallness" for true > > > constants by the target, right? > > > > I think the bug is in rs6000_elf_in_small_data_p but since I have not > > debuged it yet I don't know for sure. > > > > Could you file a bug? This is a target bug. > > Yeah, and I've reported it rather long ago against gcc-3.3 (PR 9571). > That time there were 3 problems reported in the PR of which only the > first one seems to be fixed (or are the rest just re-appeared in 4.0?). > > I think PR 9571 is in fact regression with respect to 2.95.x despite the > [wrong] comments: > > --- Additional Comment #5 From Franz Sirl 2003-06-17 15:31 [reply] > --- > > r0 is used as a pointer to sdata2, this is a bug, it should be r2. And > since only r2 is initialized in the ecrt*.o files, how can this work? > Besides that, even if you initialize r0 manually, it is practically > clobbered in about every function. It's been a long time since I've hacked the PowerPC, but IIRC the instruction set, a base register of '0' does not mean r0, but instead it means use 0 as the base address. Every place that uses a base register should use the register class 'b' (BASE_REGS) instead of 'r' (GENERAL_REGS), which excludes r0 from being considered. Under the 32-bit eABI calling sequence, you have three small data areas: The small data area that r2 points to (.sdata/.sbss). The small data area that r13 points to (.sdata2/.sbss2). The small data area centered around location 0 (ie, small positive addresses, and the most negative addresses). I don't recall that we had special sections for this, since for many embedded apps, they couldn't allocate to those addresses. For these relocations, you should use R_PPC_EMB_SDA21, which the linker will fill in both the offset and the appropriate base register into the instruction. > --- Additional Comment #6 From Mark Mitchell 2003-07-20 00:52 [reply] > --- > > Based on Franz's comments, this bug is not really a regression at all. > I've therefore removed the regression tags. > > that I've tried to explain in my comment #7. > > I don't think I need to file yet another PR in this situation, right? > > -- > Sergei. > -- Michael Meissner email: [EMAIL PROTECTED] http://www.the-meissners.org
Re: AMD 64 Problem with assembling
On Sat, Jul 09, 2005 at 10:23:14PM +0200, Florian Michel wrote: > > Hello, > > I have a question concerning successfully assembling and linking the > following assembly program on a linux AMD 64 machine: > > #cpuid2.s View the CPUID Vendor ID string using C library calls > .section .datatext > output: > .asciz "The processor Vendor ID is '%s'\n" > .section .bss > .lcomm buffer, 12 > .section .text > .globl main > main: > movl $0, %eax > cpuid > movl $buffer, %edi > movl %ebx, (%edi) > movl %edx, 4(%edi) > movl %ecx, 8(%edi) > push $buffer > push $output > call printf > addl $8, %esp > push $0 > call exit > > This part of a book on assembly programming I am reading. > > Compile and Link: gcc -o cpuid2 cpuid2.s > When running cpuid2 it crashes with a segmentation fault. > Which switches do I have to add to call gcc? > > Thanks a lot! If you are running on a Linux AMD 64 machine, the default compiler is 64 bit, which uses a different calling sequence than the traditional 32-bit x86 world. The above code however is 32 bit. Use the -m32 option. However, rather than write in assembler, you should use the asm extension to get the asm code you need. This works on both 32 and 64 bit modes: #include int main(int argc, char *argv[]) { int edx; int ebx; int ecx; union { int i[4]; char c[16]; } u; __asm__ ("cpuid" : "=b" (ebx), "=d" (edx), "=c" (ecx) : "a" (0)); u.i[0] = ebx; u.i[1] = edx; u.i[2] = ecx; u.i[3] = 0; printf ("The processor Vendor ID is '%s'\n", u.c); return 0; } -- Michael Meissner email: [EMAIL PROTECTED] http://www.the-meissners.org
Re: Where does the C standard describe overflow of signed integers?
On Mon, Jul 11, 2005 at 09:58:36AM -0500, Nicholas Nethercote wrote: > Hi, > > There was recently a very long thread about the overflow behaviour of > signed integers in C. Apparently this is undefined according to the C > standard. I searched the standard on this matter, and while I did find > some paragraphs that described how unsigned integers must wrap around upon > overflow, I couldn't find anything explicit about signed integers. Can > someone point me to the relevant part(s) of the standard? I don't have time to dig out all of the relevant sections, but I was on the ANSI X3J11 committee that defined the C standard from its beginning through the release of the C90 international standard (and some of the C99 work, though I left the committee before a lot of the changes were made). It did come up for discussion, but the committee did decide to leave it undefined, since there were C compilers for some different machines that did not just silently truncate. >From memory, there was one vendor with a machine that had signed magnitude integers. There was a vendor with a machine that had one's complement integers. I suspect at least one vendor used instructions that caused an overflow trap for signed arithmetic. -- Michael Meissner email: [EMAIL PROTECTED] http://www.the-meissners.org
Re: Pointers in comparison expressions
On Tue, Jul 12, 2005 at 06:25:45PM +0200, Mirco Lorenzoni wrote: > Can a pointer appear in a C/C++ relational expression which doesn't test the > equality (or the inequality) of that pointer with respect to another pointer? > For example, are the comparisons in the following program legal code? > > /* test.c */ > #include > > int main(int argc, char* argv[]) > { > void *a, *b; > int aa, bb; > > a = &aa; > b = &bb; > > printf("a: %p, b: %p\n", a, b); > if (a < b) > printf("a < b\n"); > else > printf("a >= b\n"); > > if (b < a) > printf("b < a\n"); > else > printf("b >= a\n"); > return 0; > } No, this is not legal. Relational tests between pointers is only allowed by the ISO C standard if the two pointers point into the same array, or if a pointer points to exactly one byte beyond the array. > > P.S. > I'm not a list subscriber. Send me a copy of your reply, please. Ummm, I don't understand how you expect to get replies if you don't monitor the list. -- Michael Meissner email: [EMAIL PROTECTED] http://www.the-meissners.org
Re: Pointers in comparison expressions
On Wed, Jul 13, 2005 at 12:38:11AM +0200, Falk Hueffner wrote: > Erik Trulsson <[EMAIL PROTECTED]> writes: > > > On Tue, Jul 12, 2005 at 03:08:54PM -0700, Joe Buck wrote: > >> On Tue, Jul 12, 2005 at 11:42:23PM +0200, Erik Trulsson wrote: > >> > Pointer subtraction is only well defined if both pointers point to > >> > elements > >> > in the same array (or one past the end of the array). Otherwise the > >> > behaviour is undefined. > >> > >> While this is correct, there are certain cases that the standard leaves > >> undefined but that nevertheless can be useful; for example, pointer > >> subtraction can be used to estimate the amount of stack in use (of > >> course, it is necessary to know if the stack grows upward or downward). > >> > >> Similarly, in an OS kernel, comparing pointers may make sense. > > > > Yes, there are some situations where it can be useful to compare pointers > > to different objects, but then you need to make sure that the compiler you > > use actually supports that. > > > > I believe most C compilers support it in practice, but few, if any, have > > actually documented it as a supported extension to C. > > I don't think we should, either. People who want to do this can just > cast both pointers to size_t. Note, it is not guaranteed that size_t is big enough to hold a pointer. In particular, cast your minds back to when we had 16 and 32-bit x86 code, and some compilers had different memory models like near/far (which was the case when the C90 standard was defined). Size_t could be 16-bits, since the compiler would not allow any item bigger than 32K to be created, but pointers could have the segment pointer as well as the bottom 16-bits. If a compiler knew size_t was restricted to 16 bits, it could eliminate code to normalize the pointer before doing the comparison, and just do a simple subtraction. -- Michael Meissner email: [EMAIL PROTECTED] http://www.the-meissners.org
Re: Question of the DFA scheduler
Ling-hua Tseng wrote: > I'm porting gcc 4.0.1 to a new VLIW architecture. > Some of its function units doesn't have internal hardware pipeline > forwarding, > so I need to insert "nop" instructions in order to resovle the data > hazard. > > I used the automata based pipeline description for my ports, > I described the data latency time by `define_insn_reservation', > and I'm trying to insert the "nop" in the hook > TARGET_MACHINE_DEPENDENT_REORG. > > The implementation of this hook is simple. > I just run the DFA scheduler again manually, > and I just let the insns to be issued as well as 2nd sched pass. > > I figured out that the state_transition() returns -1 when I issuing > the `max' instruction, > and I figured out it only returns > 0 when "hardware structural > hazard" occured. > > Are there any solutions for me to insert 4 nops between the 2 insns? In the past, when working on such machines (MIPS, possibly the FR-V, and doing the planning for a VLIW machine where the contract fell through), I couldn't guarantee that the scheduler would run. Even if it ran, there was always the possibility that something else after the scheduler rearranged things. Generally my solution was to make sure the assembler would put in the NOPs if needed (and also have the assembler set the stop bits, etc. after the compiler did its best to fill the VLIW buckets).
Re: ppc eabi float arguments
On Tue, Sep 22, 2015 at 01:43:55PM -0400, David Edelsohn wrote: > On Tue, Sep 22, 2015 at 1:39 PM, Bernhard Schommer > wrote: > > Hi, > > > > if been working with the windriver Diab c compiler for 32bit ppc for and > > encountered an incompatibly with the eabi version of the gcc 4.83. When > > calling functions with more than 8 float arguments the gcc stores the 9th > > float argument (and so on) as a float where as the diab compiler stores the > > argument as a double using 8 byte. > > > > I checked the EABI document and it seems to support the way the diab > > compiler passes the arguments: > > > > "Arguments not otherwise handled above [i.e. not passed in registers] > > are passed in the parameter words of the caller=E2=80=99s stack frame. [...= > > ] > > float, long long (where implemented), and double arguments are > > considered to have 8-byte size and alignment, *with float arguments > > converted to double representation*. " > > > > Does anyone know the reason why the gcc passes the argument as single float? > > Hi, Bernhard > > First, are you certain that you have the final version of the 32 bit > PPC eABI? There were a few versions in circulation. > > Mike may remember the history of this. Well I worked on it around 1980 or so. I don't remember the details (nor do I have the original manuals I was working from). From this distance, it sure looks like a bug, but I'm not sure whether it should be fixed or grand-fathered in (and updating the stdargs.h support, if this is the offical calling sequence). -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: ppc eabi float arguments
On Thu, Sep 24, 2015 at 01:21:07AM +0200, Gabriel Paubert wrote: > You worked on PPC 10 years before the first Power systems were > announced? > > Amazing foresight :-) Ok, I was mis-remembering the dates when I started at Cygnus Solutions (forgetting the Open Software Foundation period). Lets see, I started at Cygnus in December of 1994, which would mean my initial work on Power to convert the AIX abi to System V.4/eabi would have started in early 1995. Looking at eabi.h which was added when I started the System V/eABI work, it was added in 1995. It looks like the date listed that Richard Kenner contributed the rs6000.c file was 1991. So, while I wasn't involved with Power/PowerPC at the time, it is fairly close to the 1990 date I mentioned. According to wikipedia, the first Power1 (i.e. rs/6000) were introduced in February 1990. https://en.wikipedia.org/wiki/RS/6000 -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: indirect load store on POWER8 and extra dress computation
On Mon, Nov 02, 2015 at 02:59:37PM +, Ewart Timothée wrote: > Hello All, > > I have a question about performance on Power8 (little-endian, GCC 4.9.1) > specially with load/store. > I evaluate all possibilities to evaluate polynomial, to simplify the thread I > provide a basic example > of meta-programing about Polynomial evaluation with Horner method. I have: ... > The code of XLC is more compact due to direct load (with the offset of the > address) contrary > to GCC where the address is computed with addis. Moreover XLC privileges VMX > computation > but I thing it should change nothing. For this kind of computation > ( I measure the latency with an external programs) XLC is faster than +- 30% > on other test cases (similar). > > Does this address computation costs extra time (I will say yes and it > introduces a data hazard in the pipeline) or does use the Instruction fusion > process described in « IBM POWER8 processor core microarchitecture » at > running time and so merge addis + ld to ld X(r3)? The power8 machine fusion does not cover fusing addis with the lfd/lfs instructions. Presently, it has 2 cases where instructions are fused: 1) If you have an addis instruction that sets a register, followed by a zero-extending load to the same general purpose register; 2) If you have an or immediate instruction that loads up an integer constant followed by a vector load instruction that uses the constant as an index register. Future machines may expand upon the list of fusable instructions. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Is MODES_TIEABLE_P transitive?
As I start to allow integer modes into vector registers, I need to revisit MODES_TIEABLE_P. I'm wondering if MODES_TIEABLE_P is transitive? For example, with the current Power8 system 1) Anything can go in GPRs 2) Currently, SFmode/DFmode/KFmode, and vectors can go in VSX registers (the first 32 are overlaid on top of the traditional floating point registers, and the remaining 32 are overload on top of the traditional Altivec registers). 3) At the moment, DImode can only go in the traditional floating point registers, but I have patches to allow it in the traditional Altivec registers as well. 4) The decimal types and the IBM extended double (SDmode, DDmode, TDmode, and IFmode/TFmode) can only go in the traditional floating point registers, and not the Altivec registers. In addition, for the 128-bit types (TDmode, IFmode, TFmode) take two registers and not one, even though the VSX registers are now 128-bits. 5) The current machines have the ability to load SImode into VSX registers and store them. However, GCC has never allowed SImode in the floating point or vector registers. Where we use it in the context of conversion to/from floating point, we wrap in UNSPECs to hide the usage. 6) The next machine will have the ability to load and store 8/16-bit values (QImode/HImode) to/from VSX registers. What I'd like to do, in a Power8 context, is to allow these to return true (after allowing SImode to go in VSX registers): MODES_TIEABLE_P (SImode, DFmode) MODES_TIEABLE_P (SImode, QImode) but, the following would return false for power8: MODES_TIEABLE_P (QImode, DFmode) In a power9 context, since there are loads/stores of 8/16-bit items it would return true. If it matters, we would be switching to LRA in the GCC 7.0 time frame once PR 69847 is fixed (this is the spec 2006 403.gcc benchmark that slows down because LRA is generating unnecessary address reloads because it doesn't use LEGITIMIZE_RELOAD_ADDRESS. So the question is whether we might need to support MODES_TIEABLE_P being transitive (i.e. it would return true a lot less of the time). I would prefer to not have to worry about odd corner cases where another type is tieable with one of the arguments, but not tieable with the other. And does it matter whether we are using RELOAD or IRA? Thanks in advance. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: Is MODES_TIEABLE_P transitive?
On Mon, Apr 25, 2016 at 11:04:01AM -0600, Jeff Law wrote: > On 04/21/2016 01:53 PM, Michael Meissner wrote: > >As I start to allow integer modes into vector registers, I need to revisit > >MODES_TIEABLE_P. I'm wondering if MODES_TIEABLE_P is transitive? > I don't recall a need for it to be transitive. The only really > special thing I remember about MODES_TIEABLE_P was its relation to > HARD_REGNO_MODE_OK and the need for them to be consistent. > > > > >What I'd like to do, in a Power8 context, is to allow these to return true > >(after allowing SImode to go in VSX registers): > > > > MODES_TIEABLE_P (SImode, DFmode) > > MODES_TIEABLE_P (SImode, QImode) > > > >but, the following would return false for power8: > > > > MODES_TIEABLE_P (QImode, DFmode) > > > >In a power9 context, since there are loads/stores of 8/16-bit items it would > >return true > So what this kindof setup would allow would be subregs more > aggressively between SImode/DFmode and SImode/QImode, but would > restrict QImode/DFmode. > > You may need to twiddle CANNOT_CHANGE_MODE_CLASS along the way. > > > >So the question is whether we might need to support MODES_TIEABLE_P being > >transitive (i.e. it would return true a lot less of the time). I would prefer > >to not have to worry about odd corner cases where another type is tieable > >with > >one of the arguments, but not tieable with the other. And does it matter > >whether we are using RELOAD or IRA? > IIRC MODES_TIEABLE_P was largely related to reloads [in]ability to > handle certain subreg extractions -- like trying to extract a QImode > subreg out of FP hard register and the like. Yeah, this is getting rather complex. I recall trying to change MODES_TIEABLE_P in the past, and back then having all sorts of reload issues. I kind of want my cake and eat it too. On one hand, I want the primary integer types to be tieable, and other hand, having 32/64-bit ints tieable with floating point (things like IBM extended double and vectors can never be tieable due to the extended double using 2 64-bit parts in 2 separate registers, and vectors using a single 128-bit part in 1 register). In particular, in power9 there are various instructions for packing and unpacking floating point types, and it would be natural to want to use them for unions to help speed up the math library (as well as the ability to do byte and half-word memory operations). I will put it on the back burner for now. Thanks. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [buildrobot] sparc64-linux broken
On Mon, Apr 21, 2014 at 08:02:39PM +0200, Jakub Jelinek wrote: > Sure, we could change this to use mode_size_inline ((enum machine_mode) > (MODE)) > in the macro instead, but I'd say for GCC codebase it is better if we fix > the few users of these macros that pass an int rather than enum machine_mode > to these macros. I fixed the powerpc (PR 60876), and while it would have been nice to have tested this against more backends, it was fairly simple to go through the GET_MODE_SIZE's and make them type correct. For the PowerPC, it tended to be in code of the form: for (m = 0; m < NUM_MACHINE_MODES; ++m) { // ... if (GET_MODE_SIZE (m)) ... } and the fix was to do something like: for (m = 0; m < NUM_MACHINE_MODES; ++m) { enum machine_mode m2 = (enum machine_mode)m; // ... if (GET_MODE_SIZE (m2)) ... } It reminds me when I was in the original ANSI C committee that made the 1989 ANSI and 1990 ISO C standards, we debated making enum's more first class citizens, so you could do ++/-- on them, but the committee tended to be divided into 3 camps, one that wanted to delete enums altogether, one that wanted them as int constants, and one that wanted more type checking. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
PowerPC IEEE 128-bit floating point: Meta discussion
I'm sorry for a wide angle shotgun approach, but I wanted to discuss with all of the stakeholders how to phase in IEEE 128-bit floating point to the PowerPC toolchain. For those of you who are not on the gcc@gcc.gnu.org mailing list, this thread will be archived at: https://gcc.gnu.org/ml/gcc/ What I'm going to do is break this into several followups that each cover one topic, so that it is easier to address issues without having to requote the whole article. What I want to do in the GCC 4.10 time frame is add the ability of users to use IEEE 128-bit extended precision support, with as minimal disruption as possible to existing code. It is unfortunate that we could not have squeezed the support in the PowerPC little endian support with Elf v2, so that we would not have backwards compatibility issues on that platform, but alas it did not happen. Before IEEE 128-bit can be fully implemented, we will need to modify the compiler, libgcc, glibc, the debugger, and possibly other tools as well. Because different teams work on different schedules, it will likely be some time before all of the pieces are in place. However, we likely will need to make sure the compiler piece is in place before work can start on the libraries and debuggers. In terms of user impact, I really don't know how many people are using long double, or want to use IEEE 128-bit floating point. I would hope that if a user program does not use long double, there is not a flag day where they have to move to a compiler/library combination for a feature they don't use. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: PowerPC IEEE 128-bit floating point: Where we are
I'm going to try and summarize the current state of 128-bit floating point on the PowerPC here. There are 2 switches that control long double support in the compiler, but without supporting libraries, it isn't useful to users: -mabi=ieeelongdouble vs. -mabi=ibmlongdouble: These switches control which 128-bit format to use. If you use either switch, you get two warning messages (one from the gcc drive, one from the compiler proper). -mlong-double-128 vs. -mlong-double 64 These switches control whether long double is 128-bits (either ibm/ieee formats), or 64-bits. AIX and Darwin hardwires the choice to -mabi=ibmlongdouble, and you cannot use the switch to override things. Linux and Freebsd set the default to -mabi=ibmlongdouble. Any PowerPC system that is not AIX, Darwin, Linux, nor Freebsd appears to default to IEEE 128-bit (vxworks?). In terms of places where TFmode is mentioned in GCC, it is the following files: predicates.md, rs6000.c, rs6000.h, rs6000.md, rs6000-modes.def, spe.md -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: PowerPC IEEE 128-bit floating point: Emulation functions
Right now the rs6000 backend of the GCC compiler is rather inconsistant in the names used for the IBM extended double floating point. For the basic operations it used __gcc_q: __gcc_qadd __gcc_qsub __gcc_qmul __gcc_qdiv __gcc_qneg __gcc_qne __gcc_qgt __gcc_qlt __gcc_qle In theory the conversions also use the __gcc_ format, but while it is set in rs6000.c, the compiler ignores these names and uses: __dpd_extendddtf __dpd_extendsdtf __dpd_extendtftd __dpd_trunctdtf __dpd_trunctfdd __dpd_trunctfsd __fixtfdi __fixtfti __fixunstfdi __fixunstfti __floatditf __floattitf __floatunditf __floatuntitf __powitf2 This means if have a flag day and change all of long double to IEEE 128-bit, we run the risk of the user inadvertently calling the wrong function. As I see it, we have a choice to have something like multilibs where you select which library to use, or we have to use alternate names for all of the IEEE 128-bit emulation functions. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: PowerPC IEEE 128-bit floating point: Two 128-bit floating point types
I assume we do not want a flag day, where the user flips a switch, and long double is now IEEE 128-bit. That would involve having 2 sets of libraries, etc. to work with existing programs and new programs. Even if the user does not directly use long double, it still may be visible with the switch, since many of the math libraries use long double internally to get more precision. I assume that the current IBM extended double format gives better performance than the emulation functions. I assume the way forward is to initially have a __float128 type that gives IEEE 128-bit support for those that need/want it, and keep long double to be the current format. When all the bits and pieces are in place, we can think about flipping the switch. However, there we have to think about appropriate times for distributions to change over. In terms of calling sequence, there are 2 ways to go: Either pass/return the IEEE 128-bit value in 2 registers (like long double is now) or treat it like a 128-bit vector. The v2 ELF abi explicitly says that it is treated like a vector object, and I would prefer the v1 ELF on big endian server PowerPC's also treat it like a vector. If we are building a compiler for a server target, to prevent confusion, I think it should be a requirement that you must have -mvsx (or at least -maltivec -mabi=altivec) to use __float128. Or do I need to implement two sets of conversion functions, one if the user builds his/her programs with -mcpu=power5 and the other for more recent customers? I don't have a handle on the need for IEEE 128-bit floating point in non-server platforms. I assume in these environments, if we need IEEE 128-bit, it will be passed as two floating point values. Do we need this support? -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: PowerPC IEEE 128-bit floating point: Internal GCC types
Currently within the GCC rs6000 backend, the type TFmode is the default 128-bit floating point type. Depending on TARGET_IEEEQUAD (the internal name for the -mabi=ieeelongdouble option), this gets mapped into either the IEEE 128-bit format or the IBM extended double format. In theory, any place that checks for TFmode should also check for TARGET_IEEEQUAD to determine which type of floating point format is used. Most places do do the check, but I have to assume that a few places might not do it properly. One question for GCC folk is did we want to simplify this, and make TFmode always be IEEE 128-bit, and add another floating type for the IBM extended double format. This will involve fixing up both the compiler and libgcc. I tried to do this in one set of changes, and the compiler itself was fairly straight forward. However, it broke libgcc, due to the use of the mode attribute to get to TFmode instead of using long double. While this too can be fixed (as well as uses in glibc), I wonder users in general use the mode attribute declaration and explicitly use TFmode, and assuming they will get the IBM extended double support. For now, I think it is better to not change TFmode, and just invent a new mode for IEEE 128-bit support. I've looked at code that add two new modes, such as JFmode for explicitly IBM extended double and KFmode mode for the IEEE 128-bit support, and TFmode would be the same as either JFmode or KFmode. These changes were getting a little complex, so I've also looked at using a single mode for IEEE 128-bit (JFmode), and leaving most of the TFmode support alone. Do people have a preference for names? I think using XFmode may cause confusion with x86 One issue is the current mode setup is when you create new floating point types, the widening system kicks in and the compiler will generate all sorts of widening from one 128-bit floating point format to another (because internally the precision for IBM extended double is less than the precision of IEEE 128-bit, due to the size of the mantisas). Ideally we need a different way to create an alternate floating point mode than FRACITION_FLOAT_MODE that does no automatic widening. If there is a way under the current system, I am not aware of it. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: PowerPC IEEE 128-bit floating point: Language standards
I have not been following the language standards recently. For any of the recent language standards, or the standards that are being worked on, will there be requirements for an IEEE 128-bit binary floating point type? For the C/C++ languages, is this type long double, or is another type being proprosed (such as __float128)? -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: target attributes/pragmas changing vector instruction availability and custom types
On Tue, May 19, 2015 at 03:17:26PM +0100, Kyrill Tkachov wrote: > Hi all, Sorry, it took me awhile to comment. > I'm working on enabling target attributes and pragmas on aarch64 and I'm > stuck on a particular issue. > I want to be able to use a target pragma to enable SIMD support in a SIMD > intrinsics header file. > So it will look like this: > > $ cat simd_header.h > #pragma GCC push_options > #pragma GCC target ("arch=armv8-a+simd") > > #pragma GCC pop_options > > I would then include it in a file with a function tagged with a simd target > attribute: > > $ cat foo.c > #inlcude "simd_header.h" > > __attribute__((target("arch=armv8-a+simd"))) > uint32x4_t > foo (uint32x4_t a) > { > return simd_intrinsic (a); //simd_intrinsic defined in simd_header.h and > implemented by a target builtin > } > > This works fine for me. But if I try to compile this without SIMD support, > say: aarch64-none-elf-gcc -c -march=armv8-a+nosimd foo.c > I get an ICE during builtin expansion time. > > I think I've tracked it down to the problem that the type uint32x4_t is a > builtin type that we define in the backend > (with add_builtin_type) during the target builtins initialisation code. We ran into this on the PowerPC. If you compile for 32-bit, the Altivec ABI is not enabled by default. The compiler won't let you add VSX/Altivec target options unless you compiled it with -mabi=altivec. > From what I can see, this code gets called early on after the command line > options have been processed, > but before target pragmas or attributes are processed, so the builtin types > are laid out assuming that no SIMD is available, > as per the command line option -march=armv8-a+nosimd, but later while > expanding the builtin in simd_intrinsic with SIMD available > the ICE occurs. I think that is because the types were not re-laid out. > > I'm somewhat stumped on ideas to work around this issue. > I notice that rs6000 also defines custom builtin vector types. > Michael, did you notice any issue similar to what I described above? > > Would re-laying the builtin vector type on target attribute changes be a > valid way to go forward here? You might look at the x86_64. When I was updating the target support for the last global change affecting target support, I was pointed to mods that the x86_64 folk had done. One of their mods is that they don't define all of the built-in functions at compiler startup. Instead, they only add the built-in functions that are allowed for the current set of switches. When a target attribute or pragma adds new features, they go through the built-ins and add any functions that were previously not added, but can now be added due to having the appropriate target options enabled. It helps avoid the big memory sync to create all ten million tree nodes for the built-in functions for programs that don't use target attributes or pragmas. It is something I thought about doing when I was working at AMD, and it looks like they've done it now. So if you are going to define all of the functions at compiler startup time (like the PowerPC does), you have to define all of the types used, even the current switches don't allow use of the types. Or you can only define what you need, and when you change options, you go through and define any stuff that wasn't previously defined that you can now use (like the current x86_64). -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: IEEE 128-bit floating point support for PowerPC RTEMS
On Tue, Jan 24, 2017 at 11:21:27AM +0100, Sebastian Huber wrote: > Hello, > > some time ago IEEE 128-bit floating point support for PowerPC was > added to GCC: > > https://gcc.gnu.org/wiki/Ieee128PowerPC > > I noticed some issues for RTEMS in this area. Firstly, RTEMS had no > __powerpc__ builtin define, so some source files were effectively > disabled, e.g. ibm-ldouble.c. With __powerpc__ defined, the > ibm-ldouble.c didn't compile due to: > > In file included from > /home/EB/sebastian_h/archive/gcc-git/libgcc/config/rs6000/ibm-ldouble.c:374:0: > /home/EB/sebastian_h/archive/gcc-git/libgcc/soft-fp/quad.h:72:1: > error: unable to emulate 'TF' > typedef float TFtype __attribute__ ((mode (TF))); > ^~~ > > I added > > #define RS6000_DEFAULT_LONG_DOUBLE_SIZE 128 > > #undef TARGET_IEEEQUAD > #define TARGET_IEEEQUAD 1 > > This fixed the problem above and changes the long double type from 8 > bytes to 16 bytes. > > The new compiler defines now (powerpc-rtems target): > > #define __LONG_DOUBLE_128__ 1 > #define __LONGDOUBLE128 1 > #define __LONG_DOUBLE_IEEE128__ 1 > > However, the libgcc multilib build fails due to several ICEs. See > attached errors.log. > > Is this supposed to work for 32-bit PowerPC. Did I miss some magic > configuration switch? Please check this patch: https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00861.html It has fixes for building on a 32-bit systems and non-VSX systems. Now, when I did the initial float128 work, as far as I could tell, long double was not enabled to be 128-bits on RTEMS. If it is now enabled, there may be gotchas, since RTEMS would be the only PowerPC system to use 128-bit floating point and use IEEE 128-bit instead of IBM double-double that the other PowerPC systems currently use. So it may be your call whether you want to enable it, and get it to work, or default back to long double == double. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
GCC target_clone support
I'm in the middle of adding support for the target_clone attribute to the PowerPC. At the moment, the x86_64/i386 port is the only port that supports target_clone. I added the arm/s390 port maintainers to this query, because those ports might also consider supporting target_clones in the future. In doing the work, I noticed that some of the functions called by the target hooks that were fairly generic, and we might want to move these functions into common code? Would people object if I put out a patch to move these functions to common code? I also have a question on the functionality of target_clone that I will address in a separate message. So far, the list of generic code that could be moved to common code to allow other ports to add target_clone support include: make_resolver_func: Make the resolver function decl to dispatch the versions of a multi-versioned function, DEFAULT_DECL. Create an empty basic block in the resolver and store the pointer in EMPTY_BB. Return the decl of the resolver function. make_dispatcher_decl: Make a dispatcher declaration for the multi-versioned function DECL. Calls to DECL function will be replaced with calls to the dispatcher by the front-end. Return the decl created. is_function_default_version: Returns true if decl is multi-versioned and DECL is the default function, that is it is not tagged with target specific optimization. attr_strcmp: Comparator function to be used in qsort routine to sort attribute specification strings to "target". sorted_attr_string: ARGLIST is the argument to target attribute. This function tokenizes the comma separated arguments, sorts them and returns a string which is a unique identifier for the comma separated arguments. It also replaces non-identifier characters "=,-" with "_". _function_versions: This function returns true if FN1 and FN2 are versions of the same function, that is, the target strings of the function decls are different. This assumes that FN1 and FN2 have the same signature. This is the TARGET_OPTION_FUNCTION_VERSIONS target hook, and it is the same between the x86 and ppc. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: GCC target_clone support (functionality question)
This message is separated from the question about moving code, as it is a questions about the functionality of target_clone support. Right now it looks like target_clone only generates the ifunc handler if there is a call to the function in the object file. It does not generate the ifunc handler if there is no call. For the default function, it generates the normal name. This means that any function that calls the function from a different object module will only get the standard function. From a library and user perspective, I think this is wrong. Instead the default function should be generated with a different name, and the ifunc function should list the standard name. Then you don't have to change all of the other calls in the object file, the normal ifunc handling will handle it. It also means you can more easily put this function in a library and automatically call the appropriate version. Do people agree with this assessment, and should I change the code to do this (for both x86 nd ppc targets)? If not, what is the reason for the objection? Consider mvc5.c from the testsuite: /* { dg-do compile } */ /* { dg-require-ifunc "" } */ /* { dg-options "-fno-inline" } */ /* { dg-final { scan-assembler-times "foo.ifunc" 6 } } */ __attribute__((target_clones("default","avx","avx2"))) int foo () { return 10; } __attribute__((target_clones("default","avx","avx2"))) int bar () { return -foo (); } It generates: .file "mvc5.c" .text .p2align 4,,15 .globl foo .type foo, @function foo: movl$10, %eax ret .type foo.avx.2, @function foo.avx.2: movl$10, %eax ret .type foo.avx2.3, @function foo.avx2.3: movl$10, %eax ret .weak foo.resolver .type foo.resolver, @function foo.resolver: subq$8, %rsp call__cpu_indicator_init movl__cpu_model+12(%rip), %eax testb $4, %ah je .L8 movl$foo.avx2.3, %eax addq$8, %rsp ret .L8: testb $2, %ah movl$foo.avx.2, %edx movl$foo, %eax cmovne %rdx, %rax addq$8, %rsp ret .type foo.ifunc, @gnu_indirect_function .setfoo.ifunc,foo.resolver // Note these functions are not referenced .type bar.avx2.1, @function bar.avx2.1: subq$8, %rsp xorl%eax, %eax callfoo.ifunc addq$8, %rsp negl%eax ret .type bar.avx.0, @function bar.avx.0: subq$8, %rsp xorl%eax, %eax callfoo.ifunc addq$8, %rsp negl%eax ret .type bar, @function // Note how it calls foo.ifunc instead of foo. bar: subq$8, %rsp xorl%eax, %eax callfoo.ifunc addq$8, %rsp negl%eax ret Now, if I remove the bar call, and just leave foo it generates: .type foo, @function foo: movl$10, %eax ret .type foo.avx.0, @function foo.avx.0: movl$10, %eax ret .type foo.avx2.1, @function foo.avx2.1: movl$10, %eax ret Note, it does not generate the resolver at all. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: GCC target_clone support
A minor feature that also should be considered is if you have two clone functions, one that calls the other, we should optimize the call to avoid using the indirect call setup by ifunc. I.e. extern __attribute__((target_clones("default","avx","avx2"))) int caller (); extern __attribute__((target_clones("default","avx","avx2"))) int callee (); __attribute__((target_clones("default","avx","avx2"))) int caller (void) { return -callee (); } __attribute__((target_clones("default","avx","avx2"))) int callee (void) { return 10; } I.e. caller.avx should call callee.avx, not callee (or callee.ifunc), and caller.avx2 should call callee.avx2. Do people think this is useful? -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: GCC target_clone support (functionality question)
On Fri, May 05, 2017 at 12:32:03PM -0700, Evgeny Stupachenko wrote: > Hi Michael, > > On Fri, May 5, 2017 at 11:45 AM, Michael Meissner > wrote: > > This message is separated from the question about moving code, as it is a > > questions about the functionality of target_clone support. > > > > Right now it looks like target_clone only generates the ifunc handler if > > there > > is a call to the function in the object file. It does not generate the > > ifunc > > handler if there is no call. > > > > For the default function, it generates the normal name. This means that any > > function that calls the function from a different object module will only > > get > > the standard function. From a library and user perspective, I think this is > > wrong. Instead the default function should be generated with a different > > name, > > and the ifunc function should list the standard name. Then you don't have > > to > > change all of the other calls in the object file, the normal ifunc handling > > will handle it. It also means you can more easily put this function in a > > library and automatically call the appropriate version. > > I think library issue could be resolved using special headers. You can > look into GLIBC, there should be similar technique. > When you call resolver from initial (default) function, you make an > assumption that ifunc is supported. Yes, ifunc should be required before you even consider using target_clone. What I'm trying to do is make it easier for people to add target clones to their code, but not having to go through using special headers or other gunk. I want them to be able to add just the target clone line, and everything will work. That means in the current case, the default case should be renamed to .default, and what is .ifunc should become . -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: GCC target_clone support (functionality question)
On Fri, May 05, 2017 at 01:38:03PM -0700, Evgeny Stupachenko wrote: > On Fri, May 5, 2017 at 12:48 PM, Michael Meissner > wrote: > > On Fri, May 05, 2017 at 12:32:03PM -0700, Evgeny Stupachenko wrote: > >> Hi Michael, > >> > >> On Fri, May 5, 2017 at 11:45 AM, Michael Meissner > >> wrote: > >> > This message is separated from the question about moving code, as it is a > >> > questions about the functionality of target_clone support. > >> > > >> > Right now it looks like target_clone only generates the ifunc handler if > >> > there > >> > is a call to the function in the object file. It does not generate the > >> > ifunc > >> > handler if there is no call. > >> > > >> > For the default function, it generates the normal name. This means that > >> > any > >> > function that calls the function from a different object module will > >> > only get > >> > the standard function. From a library and user perspective, I think > >> > this is > >> > wrong. Instead the default function should be generated with a > >> > different name, > >> > and the ifunc function should list the standard name. Then you don't > >> > have to > >> > change all of the other calls in the object file, the normal ifunc > >> > handling > >> > will handle it. It also means you can more easily put this function in a > >> > library and automatically call the appropriate version. > >> > >> I think library issue could be resolved using special headers. You can > >> look into GLIBC, there should be similar technique. > >> When you call resolver from initial (default) function, you make an > >> assumption that ifunc is supported. > > > > Yes, ifunc should be required before you even consider using target_clone. > > If function foo() called using regular name it usually means that > there is no information about ifunc support. The whole point of ifunc is that the caller should not need to know whether a function is a normal function or an ifunc function. The linker handles it all making it go through the PLT like it does for shared library functions. I don't understand what you mean by this: > Another object file which > is built with target_clones will have foo() definition with > foo.resolver inside. And for users outside of GLIBC, it is more cumbersome to have to have the whole infrastructure to build separate versions of the file, and then write separate ifunc declaration. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: Byte swapping support
ge_order("little-endian"))) struct { > uint32_t id; > uint16_t command; > uint16_t param1; > ... > } > } This definately requires support at the higher levels of the compiler. > and then I could access data in little ordering in the structures, then > in 16-bit big-endian lumps via the "protocol" array. One of the things you have to do is be prepared to do a full sweep of your backend to make sure you only used the named address memory functions and don't use the traditional functions that pass 0 for the named address. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
[RFC] Creating builtin functions on demand
I am looking at finishing up the PowerPC support for functions compiled with target specific options, and the PowerPC will have the same problem that the x86 has, namely in order to support target functions, you need to have all of the machine specific builtins created, even if the user did not say -m, because they might use the appropriate pragma or attribute to compile code for a different machine. As far as I could tell with a quick search: x86 has 818 machine specific builtins ppc has 958 machine specific builtins arm has 90 machine specific builtins mips has 262 machine specific builtins Ideally, these should all be created on the first usage. Not only is the space wasted for the tree decl itself, but there is the space for representing the argument list. We should put some infrastructure into GCC 4.7 that helps automate this process. I'm wondering what the various maintainers feel we need in terms of infrastructure for these create on demand functions? Should we put the majority of non-machine dependent builtins into create on demand as well? Do we want to have the ability to make the enum's for MD builtin functions to be in the same space as the non-MD builtins. I have had to track down several bugs where code was not checking if the builtin class was BUILT_IN_NORMAL before indexing into the built_in_decl arrays. Obviously, we need to look at built_in_decl and implicit_built_in_decl uses. I would probably make new macros to replace those two arrays (with GET and SET versions), and poison the old usage. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Hardware stream prefetching
Does any processor besides the PowerPC support varients of the prefetch instruction that you can tell it that there are hardware streams with a given stride at the beginning of the loop, rather than doing a prefetch inside the loop for a future cache entry? I'm just starting to look at adding support for the PowerPC's advanced forms of the dbct instruction that set up these hardware streams, and was wondering whether there were other architectures that could use this. I couldn't find anything in the AMD and Intel architecture manuals. The IBM XL compiler supports this, but I would like to add similar support in GCC. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: Supporting multiple pointer sizes in GCC
On Thu, Mar 31, 2011 at 01:32:24PM +0100, Paulo J. Matos wrote: > On 30/03/11 08:57, Claudiu Zissulescu wrote: > >Hi, > > > >I would try using the named address space for your issue (see > >TARGET_ADDR_SPACE_POINTER_MODE). Please check the SPU target for an > >implementation example. > > > > Hummm, I haven't noticed this hook before. Could this be used to > implement cases where function pointers have a different size than > data pointers (as in harvard archs). It looks like it but can > someone confirm? The SPU uses named address space to deal with local memory that the processor deals with normally, and global memory that the host powerpc can see, and it is moved in/out of the local memory. In -mea64 mode, the external memory is 64-bits, so it has two different sizes of pointer. The backend controls whether one named address is a subset of another, and whether you can convert between the named address spaces. The technical report (N1169) of the ISO C committee that the address space support is based on does not allow for function pointers to be different sizes, just the data pointers. The 2009 GCC summit proceedings has my paper describing adding the named address space support to the compiler. Note I have moved groups within IBM, and no longer work on the SPU compiler, so I haven't touched the named address space support since the early part of 2009. http://gcc.gnu.org/wiki/HomePage?action=AttachFile&do=view&target=2009-GCC-Summit-Proceedings.pdf -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: IRA: matches insn even though !reload_in_progress
On Mon, Jul 11, 2011 at 12:38:34PM +0200, Georg-Johann Lay wrote: > How do I write a pre-reload combine + pre-reload split correctly? > I'd like to avoid clobber reg. > > Thanks much for any hint. The move patterns are always kind of funny, particularly during register allocation. Lets see given your pattern is: (define_insn_and_split "*mulsqihi3.const" [(set (match_operand:HI 0 "register_operand" "=&r") (mult:HI (sign_extend:HI (match_operand:QI 1 "register_operand" "a")) (match_operand:HI 2 "u8_operand" "n")))] "AVR_HAVE_MUL && !reload_completed && !reload_in_progress" { gcc_unreachable(); } "&& 1" [(set (match_dup 3) (match_dup 2)) ; *mulsu (set (match_dup 0) (mult:HI (sign_extend:HI (match_dup 1)) (zero_extend:HI (match_dup 3] { operands[3] = gen_reg_rtx (QImode); }) I would probably rewrite it as: (define_insn_and_split "*mulsqihi3.const" [(set (match_operand:HI 0 "register_operand" "=&r") (mult:HI (sign_extend:HI (match_operand:QI 1 "register_operand" "a")) (match_operand:HI 2 "u8_operand" "n")))] "AVR_HAVE_MUL && !reload_completed && !reload_in_progress" { gcc_unreachable(); } "&& 1" [(set (match_dup 3) (unspec:QI [(match_dup 2)] WRAPPER)) ; *mulsu (set (match_dup 0) (mult:HI (sign_extend:HI (match_dup 1)) (zero_extend:HI (match_dup 3] { operands[3] = gen_reg_rtx (QImode); }) (define_insn "*wrapper" [(set (match_operand:QI 0 "register_operand" "=&r") (unspec:QI [(match_operand:QI 1 "u8_operand" "n")] WRAPPER))] "AVR_HAVE_MUL" "...") That way you are using the unspec to make the move not look like a generic move. The other way to do it, would be to split it to another pattern that combines the move and the HI multiply, which you then split after reload. Something like: (define_insn_and_split "*mulsqihi3_const" [(set (match_operand:HI 0 "register_operand" "=&r") (mult:HI (sign_extend:HI (match_operand:QI 1 "register_operand" "a")) (match_operand:HI 2 "u8_operand" "n")))] "AVR_HAVE_MUL && !reload_completed && !reload_in_progress" { gcc_unreachable(); } "&& 1" [(parallel [(set (match_dup 3) (match_dup 2)) ; *mulsu (set (match_dup 0) (mult:HI (sign_extend:HI (match_dup 1)) (zero_extend:HI (match_dup 3])] { operands[3] = gen_reg_rtx (QImode); }) (define_insn_and_split "*mulsqihi3_const2" [(set (match_operand:QI 0 "register_operand" "r") (match_operand:QI 1 "u8_operand" "n")) (set (match_operand:HI 2 "register_operand" "r") (mult:HI (sign_extend:HI (match_operand:QI 3 "register_operand" "a")) (zero_extend:HI (match_dup 0] "AVR_HAV_MUL" "#" "&& reload_completed" [(set (match_dup 0) (match_dup 1)) (set (match_dup 2) (mult:HI (sign_extend:HI (match_dup 3)) (zero_extend:HI (match_dup 0] {}) -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: IRA: matches insn even though !reload_in_progress
On Wed, Jul 13, 2011 at 01:42:29PM +0200, Georg-Johann Lay wrote: > All the trouble arises because there is no straight forward way to > write the right insn condition, doesn't it? > > Working around like that will work but it is obfuscating the code, IMHO. Given I don't know the AVR port, I don't know what the right condition is. I was just guessing from the patterns you provided. > Is there a specific reason for early-clobber ind *wrapper? You could instead put (clobber (match_scratch)) in the insn, and not split it until after reload. > As *wrapper is not a proper move, could this produce move-move-sequences? > These would have to be fixed in peep2 or so. In theory the register allocator will eliminate the normal move, and just use the wrapper. > This is the missing part, and if gcc learns something like > !ira_in_progress or !split1_completed in the future, the cleanup will > be minimal and straight forward. The code is obvious and without > obfuscation: I tend to think a few more _in_progress and _completed flags would be helpful, particularly ira_in_progress or make reload_in_progress be set by ira. You are probably the first person to run into this. The question is how many do we want and need? Note, a few years ago, this type of splitting was not possible. You had to have all of the match_scratch'es that you would need allocated in the RTL generation pass. Being able to allocate new psuedos before the split1 pass certainly makes things easier to do, but there are always things that could be done better. > bool > avr_gate_split1 (void) > { > if (current_pass->static_pass_number > < pass_match_asm_constraints.pass.static_pass_number) > return true; > > return false; > } > > > I choose .asmcons because it runs between IRA and split1, > and because I observed that pass numbers are fuzzy; > presumably because sub-passes like df etc. I'm not a big fan of this. I think it would be better to just add ira_in_progress and a few others as needed. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: libgcc/static-object.mk weird error on powerpc-rtems
On Fri, Nov 04, 2011 at 12:33:35PM -0500, Joel Sherrill wrote: > Hi, > > I am testing powerpc-rtems on the head and > have gotten a weird error compiling libgcc. > It is definitely a regression from 4.6. > I have no idea who might be the best person > to help resolve this. > > /home2/joel/build/b-powerpc-gcc/powerpc-rtems4.11/libgcc > [joel@rtbf64a libgcc]$ make > /users/joel/test-gcc/gcc-svn/libgcc/static-object.mk:18: *** > Unsupported file type: . Stop. > > Playing with the error message, it looks like > the "$o" in the message is empty. > > What can I pass along to help get this debugged? I've seen this in the past. Typically what happened is the compiler did not build, but in some cases it still tried to build libgcc. First make sure the compiler built (going into the gcc subdirectory and saying make). When I'm doing this, I tend to prefer eliminating any -j options so that it is clearer what is going on. To simplify things that break in libgcc, I often times just configure for C only, just to save the build time. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: cc1: warning: unrecognized command line option "-Wno-narrowing"
On Sun, Nov 06, 2011 at 07:18:52AM +0100, Ralf Corsepius wrote: > Hi, > > Since recently, I am facing several of the warnings above when > building GCC-trunk cross for RTEMS targets. > > So far, not much clues about what is going on, except that I see > -Wno-narrowing were recently added to > gcc/configure.ac and libcpp/configure.ac. FWIW, I'm seeing it also when I'm building on x86_64 RHEL 6.1 targeting powerpc64-linux, so I suspect it is a cross compiler issue, but I haven't checked it in detail. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: powerpc rs6000_explicit_options change help request
On Mon, Nov 07, 2011 at 09:44:25PM -0600, Joel Sherrill wrote: > Hi, > > powerpc-rtems does not compile on the head due > to what appear to be changes in the way CPU > features are represented for the arguments. > > The compilation error is: > > /users/joel/test-gcc/gcc-svn/gcc/config/rs6000/rs6000.c -o rs6000.o > /users/joel/test-gcc/gcc-svn/gcc/config/rs6000/rs6000.c: In function > ‘rs6000_option_override_internal’: > /users/joel/test-gcc/gcc-svn/gcc/config/rs6000/rs6000.c:2826:3: > error: ‘rs6000_explicit_options’ undeclared (first use in this > function) > /users/joel/test-gcc/gcc-svn/gcc/config/rs6000/rs6000.c:2826:3: > note: each undeclared identifier is reported only once for each > function it appears in > At top level: > > The code is in rtems.h is currently: > > if (TARGET_E500) \ > { \ > if (TARGET_HARD_FLOAT && !rs6000_explicit_options.float_gprs) \ > rs6000_float_gprs = 1; \ > if (rs6000_float_gprs != 0 && !rs6000_explicit_options.spe) \ > rs6000_spe = 1; \ > if (rs6000_spe && !rs6000_explicit_options.spe_abi) \ > rs6000_spe_abi = 1; \ > } \ > > I think that changes to something like: > > if (TARGET_E500) \ > { \ > if (!global_options_set.x_rs6000_float_gprs) \ > rs6000_float_gprs = 1; \ > if (!global_options_set.x_rs6000_spe) \ > rs6000_spe = 1; \ > if (!global_options_set.x_rs6000_spe_abi) \ > rs6000_spe_abi = 1; \ > } \ > > That compiles but I wanted a sanity check that it is the right > transformation. Yes, this is the right transformation. Here is an untested patch that fixes it: 2011-11-08 Michael Meissner * config/rs6000/rtems.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Use global_options_set instead of rs6000_explicit_options, which was removed on May 5th. Index: gcc/config/rs6000/rtems.h === --- gcc/config/rs6000/rtems.h (revision 181174) +++ gcc/config/rs6000/rtems.h (working copy) @@ -61,11 +61,11 @@ do { \ if (TARGET_E500) \ { \ -if (TARGET_HARD_FLOAT && !rs6000_explicit_options.float_gprs) \ +if (TARGET_HARD_FLOAT && !global_options_set.x_float_gprs) \ rs6000_float_gprs = 1; \ -if (rs6000_float_gprs != 0 && !rs6000_explicit_options.spe)\ +if (rs6000_float_gprs != 0 && !global_options_set.x_spe) \ rs6000_spe = 1; \ -if (rs6000_spe && !rs6000_explicit_options.spe_abi)\ +if (rs6000_spe && !global_options_set.x_spe_abi) \ rs6000_spe_abi = 1; \ } \ } while(0) -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: Why no strings in error messages?
On Wed, Aug 26, 2009 at 06:30:44AM -0700, Ian Lance Taylor wrote: > Bradley Lucier writes: > > > I've never seen the answer to the following question: Why do some > > versions of gcc that I build not have string substitutions in error > > messages? > > Perhaps you configured with --disable-intl? > > > > So, is -fschedule-insns an option to be avoided? > > -fschedule-insns should be avoided on Intel architectures. In general > -fschedule-insns tends to increase register lifespan, and because of the > small number of available registers this can sometimes befuddle the > register allocator, especially when asm instructions as used. When I worked at AMD, I was starting to suspect that it may be more beneficial to re-enable the first schedule insns pass if you were compiling in 64-bit mode, since you have more registers available, and the new registers do not have hard wired uses, which in the past always meant a lot of spills (also, the default floating point unit is SSE instead of the x87 stack). I never got around to testing this before AMD and I parted company. > On PPC -fschedule-insns is normally beneficial, and it is turned on by > -O2. > > Ian -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Overly-keen format string warning?
On Tue, Sep 15, 2009 at 05:20:06PM +0100, Dave Korn wrote: > > I added some debugging printfs, and ... > > > cc1: warnings being treated as errors > > /gnu/gcc/gcc/gcc/dwarf2out.c: In function > > 'add_location_or_const_value_attribute > > ': > > /gnu/gcc/gcc/gcc/dwarf2out.c:13532:1: error: format '%p' expects type 'void > > *', > > but argument 3 has type 'struct var_loc_list *' > > /gnu/gcc/gcc/gcc/dwarf2out.c:13532:1: error: format '%p' expects type 'void > > *', > > but argument 4 has type 'struct var_loc_node *' > > /gnu/gcc/gcc/gcc/dwarf2out.c:13532:1: error: format '%p' expects type 'void > > *', > > but argument 5 has type 'rtx' > > /gnu/gcc/gcc/gcc/dwarf2out.c:13532:1: error: format '%p' expects type 'void > > *', > > but argument 3 has type 'struct var_loc_list *' > > /gnu/gcc/gcc/gcc/dwarf2out.c:13532:1: error: format '%p' expects type 'void > > *', > > but argument 4 has type 'struct var_loc_node *' > > /gnu/gcc/gcc/gcc/dwarf2out.c:13532:1: error: format '%p' expects type 'void > > *', > > but argument 5 has type 'rtx' > > make: *** [dwarf2out.o] Error 1 > > > > dkad...@ubik /gnu/gcc/obj.libstdc.enabled/gcc > > Should the format string warnings really be complaining about this on a > platform (i686-pc-cygwin) where there's only one kind of pointer? I don't get > the rationale, if this is intentional. Yes. It still is a type violation, even if it will work in most cases. Just for the record, before working on GCC, I wrote the front end of a C compiler for a machine (Data General MV/Eclipse) that had different types of pointers, and it was definately a hassle, particularly in the era before prototypes were added to the language. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: apple blocks extension
On Tue, Sep 15, 2009 at 04:03:35PM +0200, Tristan Gingold wrote: > > On Sep 15, 2009, at 3:46 PM, Steven Bosscher wrote: > > > >Hi, > > > >The status is that there is no status, unfortunately (it's an > >interesting extension...). > > > >This extension is not presently implemented in the FSF GCC. > >AFAIU there is no reason to believe Apple will contribute patches to > >implement it. > > I think you may (are allowed to) port it from Apple GCC sources. If you are not the copyright owner of the sources, I doubt the FSF would allow it to be checked into GCC. However, a reimplementation from scratch would be doable. Like everything else in the free software world, it depends on the motavations of the people contributing the code. To paraphrase president Kennedy, ask not what GCC can do for you, ask what you can do for GCC? So roll up your sleeves and get coding, or convince other people (and their managers) that it is a good thing to do. I suspect there are people starting to think about it, and perhaps the interested parties need to organize to scope out the work. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: [LTO] Request for testing: Last merge from trunk before final merge
On Wed, Sep 30, 2009 at 10:41:26AM +0200, Richard Guenther wrote: > I think the vmx testcases fail on me because I don't have a > POWER7 machine to test on. But it would be nice if ppc > people would look at this. Mike? I or the others in my group should look at the failures. Note, VMX is the altivec instruction set, not the new power7 (VSX) instruction set. Power6 machines should have altivec support. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: __attribute__((optimize)) and fast-math related oddities
On Tue, Oct 20, 2009 at 11:13:48AM -0700, H.J. Lu wrote: > On Tue, Oct 20, 2009 at 2:19 AM, Richard Guenther > wrote: > > On Tue, Oct 20, 2009 at 1:54 AM, tbp wrote: > >> On Mon, Oct 19, 2009 at 7:34 PM, Ian Lance Taylor wrote: > >>> Please file a bug report. > >>> __attribute__((optimize())) is definitely only half-baked. > >> Apparently the code i've posted is just a variation around that 1 year > >> old PR 37565 and if that doesn't work, worrying about the rest is > >> entirely futile. > >> Half baked you say? It's comforting to see that much optimism but > >> couldn't the doc be adjusted a bit to reflect the fact that the baker > >> got hit by a bus or something? > > > > I would rather suggest to rip out the half-baked code again. > > > > I agree. The idea is good. But the design and implementation > are incomplete. As the original author I do tend to agree, we've been finding all of these places, particularly as we go to more and more tree optimizations. However, the question is what do we do at this particular point given it has been in the compiler for 2 years now. There were a number of people that did ask me for it when I presented the initial thoughts a few years ago. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: About behavior of -save-temps=obj option on GCC 4.5
On Thu, Mar 25, 2010 at 11:33:17PM +0900, Tadashi Koike wrote: > Hi Richard > (* I am weak in English, so pleas forgive my English mistake.) > > Thank you for your reply, and I'm sorry to be late a reply. > > > On Sat, Mar 20, 2010 at 5:40 PM, Tadashi Koike wrote: > >> [ summary ] > >> compiling is failed when more than two source file are > >>specified with both -save-temps=obj and -o options. > : > >> [[ operations being error ]] > >> % gcc -save-temps=obj -o hellow main.c func.c > >> % gcc -save-temps=obj -o obj_dir/hellow main.c func.c > > 2010/3/21 Richard Guenther : > > > > It should be an error to use -save-temps=obj with multiple > > input files. Mike, can you look at this? > > > > Thanks, > > Richard. > > > > After I received your reply, I confirm an information about "-save-temps=obj". I tend to agree with Richard, that if there are multiple source inputs, it should be an error to use -save-temps=obj without the -c/-S option. Though it might be friendlier to allow: gcc -o dir/exe a.c b.c c.c and put the temp files in: dir/a.{s,i} dir/b.{s,i} dir/c.{s,i} But it would fail for doing: gcc -o exe dir1/a.c dir2/a.c where the names overlap. Let me look at it. > In this URL(http://gcc.gnu.org/gcc-4.5/changes.html), "-save-temps=obj" > option are introduced as follow: > > | The -save-temps=obj switch will write files into the directory specified > | with the -o option, and the intermediate filenames are based on the > | output file. > > If above is a specification of "-save-temps=obj" option, found behaviors > in my report about "-save-temps=obj" are true behaviors (but cannot > deal in case of multiple input files). > > I considered. And I read a purpose of this option in above URL : > > | This will allow the user to get the compiler intermediate files when > | doing parallel builds without two builds of the same filename located in > | different directories from interfering with each other. > > My recognition is below: > - True purpose is a failure of compiling under parallel builds. In detail, > problem is a compiling failure under parallel builds caused by > interfering > intermediate file each other. To solve this problem, developer thought > that the intermediate files have to be created in different directory. > So a directory specified with the -o option are choose as these purpose. > > (I named this problem as "Compiling failure by interfering" for following > discussion.) Yes, the motivation was for doing large builds (notably spec 2006) where several of the files have the same base name but are in different subdirectories. In particular, I could not build 436.cactusADM with -j8 without getting conflicts (bugzilla 39293). Even if I built it with -j1 so they wouldn't interfere, I would lose the asm files for some of the builds, which would mean my static analysis programs would miss some objects. I have also seen conflicts with -save-temps if you use have -j numbers when doing a compiler bootstrap. > - If two builds of the same file name located in different directories are > done as serial, each build is successful fine, but intermediate file is > overwritten by a latest compiling. > (I named this problem as "Overwritten by interfering" for following > discussion.) Yes. > - A purpose of "-save-temps=obj" option is to solve the "Compiling > failure by interfering", not to solve the "Overwritten by interfering". Well the real purpose is to allow me to better debug stuff when doing large builds, so that I can go back without having to do the build by hand. Just like when I added the original -save-temps option around 1988 or so. > From my consideration, I reached some understanding as follow: > - True solution for "Compiling failure by interferng" is using a > true independent file name which is assured by System (such as > filename returned by tmpnam() function in stdio.h) for each > intermediate file. After compiling are successful, filename of > intermediate files need to change as true filenames based on > source/object files. Well if this is a big problem for you, when 4.6 opens up, feel free to add a new varient of -save-temps=. The current implementation of -save-temps=obj meets the needs that I had in doing large builds. > - "-save-temps=obj" option is useful to solve some "Overwritten > by interfering" problems. Trouble will decrease caused this > problem. (so I hope/want this option very much) > > - I hope that the filename of intermediate file by "-save-temps=obj" > is also based on source files. > > I hope to hear someone's opinion else. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: GSoC 2010 Project Idea
On Sun, Mar 28, 2010 at 10:37:07PM +0100, Артем Шинкаров wrote: > Hi, > > I have a project in mind which I'm going to propose to the GCC in terms of > Google Summer of Code. My project is not on the list of project ideas > (http://gcc.gnu.org/wiki/SummerOfCode) that is why it would be very > interesting > for me to hear any opinions and maybe even to find a mentor. > > > 1. Project idea > > A brief project idea is to create an abstract layer for vectorized > computations. This would allow to write a portable vectorized code. > > > 2. State of the art > > Nowadays most of processors have a support for SIMD computations. However, the > problem is that each hardware has a different set of SIMD instructions: Intel > MMX+SSE+AVX, PowerPC Altivec, ARM iWMMXt, and so on. GCC supports most of > architecture-specific instructions providing built-in functions. It is > considerably convenient to use these functions when you want to optimize some > piece of code. The problem starts when you want to make this code portable. > It is not a very common task, and of course GCC has a vectorizer. > Unfortunately, there are many examples which show that it is relatively simple > for a human to find a right place in the code and vectorize it, but it is > extremely hard for the compiler to do the same. As a result we end up with the > code which is not using the capabilities of the architecture. > It would be much easier for the programmer to use an abstract layer to > implement a vectorized code. A compiler should deal with the portability > issues > dispatching the code from the abstract layer to the particular architecture. > My > experience shows that there are no such a library for C/C++ that could solve > the problem. There are some attempts like: http://libsimd.sourceforge.net/ but > it is only a small part of the idea, and unfortunately the development is > suspended. Or maybe I am wrong and everything is already written? Note, the powerpc and cell compilers have the notion of a vector keyword that is followed by a type (powerpc needs -maltivec and/or -mvsx to enable it). So you can write: vector float sum (vector float a, vector float b) { return a+b; } Now, ideally, it would be useful to have sytax so you could change the vector size, and the compiler would do the conversion to/from hw types to abstract types. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Defining a common plugin machinery
On Thu, Oct 09, 2008 at 06:02:40PM +0200, Grigori Fursin wrote: > Well, I see the point and I am fine with that. And as I mentioned I can > continue > using some patches for my projects that currently use environment variables > or later > move to the GCC wrappers if the majority decides not to support this mode ;) > > Cheers, > Grigori Well your gcc wrapper scripts can always use the environment variables you want, and create the appropriate command lines. I tend to be on the side of no environment variables, because it makes is one fewer thing that a user could forget to include on their bug report. With explicit command lines, it is more likely that the GCC community would be able to debug problems. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA [EMAIL PROTECTED]
Re: RFC: overloading constraints
On Fri, Oct 17, 2008 at 05:51:03PM +0100, Joern Rennecke wrote: > I want some constriants to be register constraints when compiling with > a particular set of compiler flags, and extra constraints when compiling > with another. > (In the first case, alternatives that use this constraint can be selected by > reload for reloading non-matching insns, whereas in the second case, they > will only be used if they already match. The register constraint then > yields NO_REG to allow the other definition to have effect). > That was straightforward with the old constraint target macros, but when > you define this in constraints.md, you get an error. > > Are there any objections to implement constraint overloading in genpreds.c ? > > Is it OK to just accept multiple definitions for disjoint categories, > or would you like to see msome extra syntactic sugar to affirm that > the overloading is intentional? Why do you need new syntax? The current constraints file already supports target specific constraints. For example, from the i386: (define_register_constraint "x" "TARGET_SSE ? SSE_REGS : NO_REGS" "Any SSE register.") I could imagine that you could have more complex cases than just something or NO_REGS. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA [EMAIL PROTECTED]
Re: Including free compiler in a comercial software
On Thu, Oct 30, 2008 at 06:09:28PM -0400, Robert Dewar wrote: > Cristian Harja wrote: > >Hello, > > > >I'm a young developer, with very strong knowledge about programming, > >especially C++. I am working to one of my greatest projects. I'm building > >up a framework that can create native Win32 applications, console > >applications, and provides intelligent and efficient classes to manage > >different types of data. > > > >Since this framework is a set of classes, templates and functions, I need > >a compiler to "bring it to life". I want to create my own programming > >environment,including a class browser (for the framework), a header > >manager (to include / exclude components), and detailed documentation. I > >want to SELL this software, so there's going to be a problem. I can't make > >my own compiler, and free compilers, usually should not be comercialized. > > There is absolutely nothing to stop you selling a package that contains > GPL'ed software, lots of companies do this. The idea that "free > [software] compilers should not be commercialized" is misguided. > If it makes sense to sell a package that includes gcc or other > such components, you are free to do so. Of course you will have > to include the sources of the compiler, but that's no big deal. > It makes perfect sense to sell a package like this, of course > we would prefer to see you sell your framework as Free Software > as well, and encourage you to consider this possibility. Remember > the Free in Free Software is about non-restrictive user-friendly > licensing, not about free in dollars! Note, it is somewhat more complicated than just include the source files. See the file COPYING3 for the rules and regulations for current compiler sources (GNU public license version 3, or gplv3), and COPYING for the rules and regulations for 4.1 or earlier releases (GNU public license version 2, or gplv2). For example, you must not restrict the rights of people who get the compiler from you, and your changes to the compiler itself must be compatible with the appropriate gpl. > >I want an advice, or an approvement to include GCC in my program (of > >course, mentioning it). I would also like to know if there are several > >compilers that can be "comercialized" this way. I do not intend to sell > >any free compiler. I am not going to provide any source code from my > >framework (only ".lib" and ".h"). I can't provide just the headers and a > >lib, or something, because I don't know what differences may exist between > >the compilers. > > > >Please answer me at <[EMAIL PROTECTED]> If you are including GCC in your program, your program needs to be licensed such to allow this (typically gplv3). If you are just providing the compiler as a separate package without modifications, then this is a much simpler matter (this is allowed under section 4 of the gplv3). You really would need to talk to a lawyer to get an understanding of the rules and regulations for anything complicated (hint, most of the people reading this mailing list are not lawyers). -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA [EMAIL PROTECTED]
Re: Passing attributes to RTX
On Mon, Nov 10, 2008 at 03:40:13PM -0500, [EMAIL PROTECTED] wrote: > > Hello all, > > While prototyping a port of gcc I think that the RTX is lacking some > information needed to generate machine dependent files. The expression > trees have the correct information and I can likely hack in a quick fix to > pass that information down to the backend. However, I just want to make > sure I am not doing something completely wrong. > > Basically given the following code, > > void main(T x) > { > T * pt; //pointer to type T > *pt = x > } > > now I need the RTX to know about type T (specifically its qualifiers as I > am doing this for the named address space branch), I changed the backend to > encode these into the RTX generated by the VAR_DECL and PARM_DECL nodes, > however, I am not sure if it is suppose to be encoded in the INDIRECT_REF > node as pt is theoretically a pointer to T. > > I just quickly hacked this information into expand_expr_real_1 in expr.c, > however this may be the incorrect approach, is there any specific location > where I should be modifying RTX attributes and when expand_expr_real_1 is > done should the RTX returned have all the attributes (that can be deduced > from the expression tree) set? I am picking up the named address patches from Ben Elliston, and trying to make them acceptable for inclusion in the 4.5 GCC. I wasn't aware that anybody outside of Ben and I were using the branch. What you discovered is in fact a bug, and I just started looking into it. Thanks for alterting me to the fact that it doesn't completely work at present. I suspect it is a recent regression, and it shouldn't be too hard to get going again, maybe even some of my recent cleanups caused it. On the spu, I discovered that it works as intended for -O2, but does not work at -O and no optimization levels. For example on the spu, this program: #include vec_float4 get (__ea vec_float4 *ptr) { return *ptr; } Generates the code at -O2 which shows that the cache functions are being called to move the data to/from the host address space: .file "foo2.c" .global __cache_fetch .text .align 3 .global get .type get, @function get: ila $17,__cache_tag_array_size-128 stqd$lr,16($sp) ila $16,__cache_tag_array stqd$sp,-32($sp) and $15,$3,$17 ila $14,66051 a $12,$16,$15 lqx $7,$16,$15 andi$2,$3,127 shufb $11,$3,$3,$14 ai $sp,$sp,-32 lqd $8,32($12) andi$10,$11,-128 ceq $9,$10,$7 gbb $5,$9 clz $6,$5 nop 127 rotqby $4,$8,$6 a $4,$4,$2 lnop brz $5,.L6 lnop .L3: ai $sp,$sp,32 lqd $3,0($4) nop 127 lqd $lr,16($sp) bi $lr .L6: brsl$lr,__cache_fetch ori $4,$3,0 br .L3 .size get, .-get .ident "GCC: (GNU) 4.4.0 20081104 (experimental)" But if I compile it at -O1, it doesn't set up the appropriate fields and just does a simple load: .file "foo2.c" .text .align 3 .global get .type get, @function get: lqd $3,0($3) bi $lr .size get, .-get .ident "GCC: (GNU) 4.4.0 20081104 (experimental)" In terms of the RTL level, the MEM_ADDR_SPACE macro is the accessor that gives you the address space for that particular memory reference. Zero is the default address space, and 1-254 are available for other uses. Here is the declaration from rtl.h: /* For a MEM rtx, the address space. If 0, the MEM belongs to the generic address space. */ #define MEM_ADDR_SPACE(RTX) (MEM_ATTRS (RTX) == 0 ? 0 : MEM_ATTRS (RTX)->addrspace) Similarly, at the tree level the TYPE_ADDR_SPACE macro is the accessor macro for the type node. Here is the declaration from tree.h: /* If nonzero, this type is in the extended address space. */ #define TYPE_ADDR_SPACE(NODE) (TYPE_CHECK (NODE)->type.address_space) Out of curiosity, could you email me a short summary of how you plan to use the named address space support in your port? -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA [EMAIL PROTECTED]
Re: Passing attributes to RTX
Ok, I just checked in a fix for this into the branch. The problem was the function that was testing to see if one memory had default attributes was not looking at the address space field. Let me know if this helps your port. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA [EMAIL PROTECTED]
Re: Endianess attributes
On Thu, Nov 13, 2008 at 09:14:06PM +0100, Paul Chavent wrote: > Hi. > > I wonder why there aren't any endianess attributes ? > > For example we could write this : > int x __attribute__ ((endianess (lil))) = 0; > > Is it a silly suggestion ? > No one need it ? > Is it a bad design need ? > > Could someone give me some information ? Some people, particularly those who use embedded processors need it. Historically, the problem has been it is a lot of work to add the support, and often times when potential customers hear how much it would cost to add the support (either in terms of developer time and skill level needed, or cost if they are paying somebody else to do the work) they decide they don't need it as much. For many users, the place where they need the support is limited to a few places that deal with network packets or memory based hardware devices, and usually you can define a macro abstraction layer that does the right stuff (such as the ntoh/hton macros). In addition to endianess, you often times need to make sure the compiler's structure layout matches the device/network defintion. I have seen this come multiple times over the last 20 years I have been associated with GCC. It would be nice to have it in, but unless enough people dig in and add the support, it won't get done. One of the problems has been you often times lose the endianess information in the rtl phases, and when you recreate a MEM, it might be for the wrong endianess. The memory attributes that keep track of const, volatile (and soon named address support) are probably sufficient so that may not be as much of a problem nowadays. Another problem is you need to do it at all levels (gimple, rtl, etc.) and often times people specialize in one level or another. It can also be challenging to get something like this through the approval process. However, those are all reasons why it hasn't been done in the past. Things do change, so don't read this as meaning it can never be done. Often times, things like this can be done if people are motivated enough, so the fact it hasn't been done before shouldn't deter you or others from trying to do it, by working in a branch and doing merges, etc. I would make sure you take some time to understand the process for contributing changes if you aren't familar with them already. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA [EMAIL PROTECTED]
Re: A question regarding emitting additional info to an insn
On Thu, Nov 13, 2008 at 06:01:27PM +0200, Revital1 Eres wrote: > > Hello, > > I want print additional information for each branch insn which will be > used by the linker (for the SPU software i-cache), for example: > > brsl $lr,[EMAIL PROTECTED] > > and I wonder what is the best way to implement it in GCC. > > I defined a new note (in reg-notes.def) which can be attached to each > branch (using add_reg_note ()) and holds that info. I now want to print > the information of that note when emitting the branch, but I am not > sure exactly how to do that. Should I change the instruction template > to use the reg-notes? Unfortunately, it is harder than it should be. The approach I would take is define FINAL_SCAN_INSN to set a global variable to the loop level, based on the notes being present or 0 if the note is not present. At present, FINAL_SCAN_INSN is not defined for the SPU. Then I would define PRINT_OPERAND_PUNCT_VALID_P to add a new punctuation character, such as '*' which will print the thing you want if the note is there, or nothing if it isn't. For example, here is the i386 version which adds 4 special formats: #define PRINT_OPERAND_PUNCT_VALID_P(CODE) \ ((CODE) == '*' || (CODE) == '+' || (CODE) == '&' || (CODE) == ';') Then in print_operand, add a new case statement for '*' to do what you want. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA [EMAIL PROTECTED]
Re: Endianess attributes
On Thu, Nov 13, 2008 at 11:46:04PM +, Paul Brook wrote: > On Thursday 13 November 2008, Michael Meissner wrote: > > On Thu, Nov 13, 2008 at 09:14:06PM +0100, Paul Chavent wrote: > > > Hi. > > > > > > I wonder why there aren't any endianess attributes ? > > > > > > For example we could write this : > > > int x __attribute__ ((endianess (lil))) = 0; > > > > > > Is it a silly suggestion ? > > > No one need it ? > > > Is it a bad design need ? > > > > > > Could someone give me some information ? > > > > One of the problems has been you often times lose the endianess information > > in the rtl phases, and when you recreate a MEM, it might be for the wrong > > endianess. The memory attributes that keep track of const, volatile (and > > soon named address support) are probably sufficient so that may not be as > > much of a problem nowadays. > > The biggest problem is probably that you really what the attribute to apply > to > pointers and types, not variables. In the example above you want to be able > to do p = &x; *p = 42; and have it DTRT. It's the same problem that you have > with taking the address of __attribute__((packed)) stricture members[1]. Yes, that has been the problem the last couple of times an endianess attribute has been proposed. > Last time I checked gcc didn't have any real support for different types of > pointer, so it's quite a bit more work than you might expect to implement > this feature. I'm currently working on finishing patches that Ben Elliston wrote to add support for named address spaces to GCC. Named addresses come from a draft proposal from the ISO WG14 committee (the C standard committee). For example on the SPU, the __ea keyword would allow you to declare pointers and data items in the host address as opposed to the local it allows you to add new attributes that bind like const and volatile. So while named address spaces solves a somewhat different problem, it touches a lot of the places in the RTL backend that would be needed to be modified for endianess. I do think if endianess is added, it really needs to be understood at the gimple level, so that optimizations can take this into effect. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA [EMAIL PROTECTED]
Re: Various memory sizes
On Mon, Dec 15, 2008 at 04:17:34PM -0500, gobaan_raveend...@amis.com wrote: > > Hello Everyone, > > I am working on an architecture with multiple types of memory and I am > wondering about memory allocation. For the purpose of this explaination, > we'll assume I am working with an embedded processor that has both 32 bit > (named X) and 64 bit memory (named Y), 64 bit longs, and uses word > addressing for both. Now given the following code > > int main () > { > long array [5] = {1,2,3,4,5}; > return 0; > } > > the compiler should determine increment based on the length of a long, > however, if the data is in Y memory this is one, while if the data is in X > memory the increment is two. The same problem holds for the offset of local > variables and parameters that are longs etc. > > Given that I spent the last little while implementing a prototype that > dealt with the named address space branch, I was able to quickly hack in > some language hooks that allowed me to perform corrections to the increment > for arrays based on their address space, and I may be able to extend this > for other types as well. However, I think a correct & general > implementation may change BITS_PER_UNIT and BITS_PER_WORD in order to > generalize it for each address space (or at the very least introduce new > macros that extend the behavior). The problem as I've been finding in the named address branch, is at the end of the day, you have to find all of the assumptions about Pmode, ptr_mode, POINTER_SIZE, copy_to_addr_register, etc. In terms of your code fragment, the named address spec also only allows named address spaces for static memory. It assumes that stack based objects are all in the generic address space. The compiler really believes there is only one stack and that it can push/pop/reload at will. One thing that I've thought about in terms of named address spaces is whether you could use it to put in __little and __big to allow a user to encode that a particular data item is little endian or big. However, you get back to the stack issue again. > I am just wondering about any other similar work, the feasibilty of either > extension(my work is mainly a proof of concept), any input on how I should > continue or comments on the mistakes of my assumptions. In an ideal world, there should be a hook that allows a compiler to place each variable in a particular named address space. However in doing so, you would need to make sure that void */char * generally can point to any address space. Otherwise it breaks a lot of C that requires void * be a universal pointer. You might want to look at what other (non-GCC) compilers do for different address spaces. I suspect that by and large, they just put the named address stuff as special types of static, and use generic address spaces for auto variables. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Options of fixing biggest alignment in PR target/38736
On Thu, Jan 08, 2009 at 03:35:29PM -0800, Ian Lance Taylor wrote: > "H.J. Lu" writes: > > > For 4.4, we can apply my patch: > > > > http://gcc.gnu.org/ml/gcc-patches/2009-01/msg00367.html > > > > and update document with > > > > As in the preceding examples, you can explicitly specify the alignment > > (in bytes) that you wish the compiler to use for a given variable or > > structure field. Alternatively, you can leave out the alignment factor > > and just ask the compiler to align a variable or field to the maximum > > useful alignment for the target ABI you are compiling for. For > > example, you could write: > > > > @smallexample > > short array[3] __attribute__ ((aligned)); > > @end smallexample > > > > Whenever you leave out the alignment factor in an @code{aligned} attribute > > specification, the compiler automatically sets the alignment for the > > declared > > variable or field to the largest alignment for the target ABI you are > > compiling for. Doing this can often make copy operations more efficient, > > because the compiler can use whatever instructions copy the biggest > > chunks of memory when performing copies to or from the variables or > > fields that you have aligned this way. > > Shouldn't the docs say something about memory allocation? As in, > malloc should return memory aligned to that boundary? > > For that matter, don't we have a problem on x86 GNU/Linux, where > malloc returns an 8-byte alignment but attribute((aligned)) is a 16 > byte alignment? In that case, malloc should be changed to return items that are 16-byte aligned if any type needs 16-byte alignment for any possible ISA. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: GCC & OpenCL ?
On Fri, Jan 30, 2009 at 03:08:47PM +0100, Basile STARYNKEVITCH wrote: > Hello All, > > OpenCL http://www.khronos.org/opencl/ is a new standard proposing an API > for GPUs (targetting vector processing on heterogenous systems, like GPU > + CPU). It suggests some restricted & specialized C dialect to code > "kernel" functions (kernel in OpenCL means running on the GPU). > > Is there any branch or experimental code for adding OpenCL into GCC? > > I suppose that some people are already working on that, but I didn't > find easily precise references. > > Any clues, insights, contacts on these items? I am just starting to think about adding OpenCL support into future versions of GCC, as it looks like a useful way of programming highly parallel type systems, particularly with hetrogeneous processors. At this point, I am wondering what kind of interest people have in working together on OpenCL in the GCC compiler? I realize Apple is putting a lot of work into OpenCL on the LLVM compiler, but I think a parallel implementation in the classic GCC compiler would reach a lot of users. I'm curious how many users would consider using OpenCL if it was available in GCC, and whether there are other people intereested in working on it. Note, at present, I have not scoped out the work -- I am trying to guage what the interest level would be. Off hand, I think the first stage is to get OpenCL to work in a homogeneous multi-core system before diving into the hetrogeneous systems. It is obvious to me, that a GCC based OpenCL system should support various systems in development, including homogeneous systems, pcs with add-on graphic processors, combination systems like the Cell systems, and be flexible enough for new systems coming down the pike. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: GCC & OpenCL ?
On Tue, Feb 03, 2009 at 04:37:55PM -0500, Ross Ridge wrote: > > Basile STARYNKEVITCH writes: > >It seems to me that some specifications > >seems to be available. I am not a GPU expert, but > >http://developer.amd.com/documentation/guides/Pages/default.aspx > >contains a R8xx Family Instruction Set Archictectire document at > >http://developer.amd.com/gpu_assets/r600isa.pdf and at a very quick > >first glance (perhaps wrongly) I feel that it could be enough to design & > >write a code generator for it. > > Oh, ok, that makes a world of difference. Even with just AMD GPU > support a GCC-based OpenCL implementation becomes a lot more practical. And bear in mind that x86's with GPUs are not the only platform of interest. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: GCC & OpenCL ?
On Tue, Feb 03, 2009 at 09:40:20AM -0800, Chris Lattner wrote: > > On Feb 3, 2009, at 8:48 AM, Mark Mitchell wrote: > > >Andrey Belevantsev wrote: > > > >>Obviously, a library is not enough for a heterogeneous system, or > >>am I missing anything from your description? As I know, e.g. there > >>is > >>no device-independent bytecode in the OpenCL standard which such a > >>backend could generate. > > > >That's correct. I was envisioning a proper compiler that would take > >OpenCL input and generate binary output, for a particular target, just > >as with all other GCC input languages. That target might be a GPU, or > >it might be a multi-core CPU, or it might be a single-core CPU. > > > >Of course, OpenCL also specifies some library functionality; that > >could > >be provided in a library that included hand-written assembly code, or > >was generated by some non-GCC tool. > > That's an interesting and very reasonable approach to start with. > However, realize that while this will give you ability to run programs > that use some of the OpenCL language extensions, you won't be able to > run any real OpenCL apps. The OpenCL programming model fundamentally > requires runtime compilation and the kernel-manipulation library APIs > are a key part of that. Yes, but you need to get to the basic level before you get to the runtime compilation and kernel manipulation. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Machine description question
On Sat, Feb 07, 2009 at 03:54:51PM -0500, Jean Christophe Beyler wrote: > Dear all, > > I have a question about the way the machine description works and how > it affects the different passes of the compiler. I was reading the GNU > Compiler Collection Internals and I found this part (in section > 14.8.1): > > (define_insn "" >[(set (match_operand:SI 0 "general_operand" "=r") > (plus:SI (match_dup 0) > (match_operand:SI 1 "general_operand" "r")))] >"" >"...") > > which has two operands, one of which must appear in two places, and > > (define_insn "" >[(set (match_operand:SI 0 "general_operand" "=r") > (plus:SI (match_operand:SI 1 "general_operand" "0") > (match_operand:SI 2 "general_operand" "r")))] >"" >"...") > > which has three operands, two of which are required by a constraint to > be identical. > > > My question is : > - How do the constraints work for match_dup or for when you put "0" as > a constraint? Since we want the same register but as input and not > output, how does the compiler set up its dependencies? Does it just > forget the "=" or "+" for the match_dup or "0" ? Note, the two are not the same thing before register allocation. Before register allocation, in the first example, the argument to the plus must point to the exact RTL that the result is. Given that normally a different pseudo register is used for the output, the first example won't match a lot of insns that are created. The two general_operands are distinct and the normal live/death information will track the insns. So you are saying that the first general operand is a write only value, and the second is a read only value. During register allocation, the register allocator notices the "0" and then reuses the register. In that example, the value of operand1 dies in that insn, replaced by the value in operand0. For example, consider a 2 address machine like the 386 (and ignoring things like LEA). If the value of operand1 is live after the insn, and operand0 wants to be %eax, operand1 is in %ebx, and operand2 is in %edx, then the register allocation typically generates code like: movl %ebx, %eax addl %edx, %eax If operand1's value was not needed after the insn, then the compiler would possibly allocate the output to %ebx, and do: addl %edx, %ebx > > - I also read that it is possible to assign a letter with a number for > the constraint. Does this mean I can do "r0" to say, it's going to be > an input register but it must look like the one you'll find in > position 0? I'm not sure what you are asking here. If you are talking about the multi-letter constraints in define_register_constraint, that is basically a way to extend the machine description for more constraints when you run out of single letters. > - My third question is: why not put "+r" as the constraint instead of > "=r" since we are going to be reading and writing into r0. Because the output value is not an input to the add in the second example. It is an input to the add in the first, due to the use of match_dup. > - Finally, I have a similar instruction definition, if I use the first > solution (match_dup), the instruction that will calculate the > input/output operand 0 sometimes gets removed by the web pass. But if > I use the second solution (put a "0" constraint), then it is no longer > removed. Any idea why these two definitions would be treated > differently in the web pass? I don't know much about the web pass, however as I said, the two definitions are fundamentally different. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Native support for vector shift
On Tue, Feb 24, 2009 at 06:15:37AM -0800, Bingfeng Mei wrote: > Hello, > For the targets that support vectors, we can write the following code: > > typedef shortV4H __attribute__ ((vector_size (8))); > > V4H tst(V4H a, V4H b){ > return a + b; > } > > Other operators such as -, *, |, &, ^ etc are also supported. However, > vector shift > is not supported by frontend, including both scalar and vector second > operands. > > V4H tst(V4H a, V4H b){ > return a << 3; > } > > V4H tst(V4H a, V4H b){ > return a << b; > } > > Currently, we have to use intrinsics to support such shift. Isn't syntax of > vector > shift intuitive enough to be supported natively? Someone may argue it breaks > the > C language. But vector is a GCC extension anyway. Support for vector > add/sub/etc > already break C syntax. Any thought? Sorry if this issue had been raised in > past. Note, internally there are two different types of vector shift. Some machines support a vector shift by a scalar, some machines support a vector shift by a vector. One future machine (x86_64 with -msse5) can support both types of vector shifts. The auto vectorizer now can deal with both types: for (i = 0; i < n; i++) a[i] = b[i] << c will generate a vector shift by a scalar on machines with that support, and splat the scalar into a vector for the second set of machines. If the machine only has vector shift by a scalar, the auto vectorizer will not generate a vector shift for: for (i = 0; i < n; i++) a[i] = b[i] << c[i] Internally, the compiler uses the standard shift names for vector shift by a scalar (i.e. ashl, ashr, lshl), and a v prefix for the vector by vector shifts (i.e. vashl, vashr, vlshl). The rotate patterns are also similar. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Native support for vector shift
On Tue, Feb 24, 2009 at 01:19:44PM -0800, Bingfeng Mei wrote: > Yes, I am aware of both types of vector shift. Our target VLIW > actually supports both and I have implemented all related patterns > in our porting. But it would be still nice to allow programmer > explicitly use vector shift, preferably both types. > > Bingfeng It shouldn't be too hard to add the support. I suspect the person who did the initial support may have been on a machine without vector shifts. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Matrix multiplication: performance drop
On Tue, Mar 03, 2009 at 03:44:35PM +0300, Yury Serdyuk wrote: > So for N multiple of 512 there is very strong drop of performance. > The question is - why and how to avoid it ? > > In fact, given effect is present for any platforms ( Intel Xeon, > Pentium, AMD Athlon, IBM Power PC) > and for gcc 4.1.2, 4.3.0. > Moreover, that effect is present for Intel icc compiler also, > but only till to -O2 option. For -O3, there is good smooth performance. > Trying to turn on -ftree-vectorize do nothing: Basically at higher N's you are thrashing the cache. Computers tend to have prefetching for sequential access, but accessing one matrix on a column basis does not fit into that prefetching. Given caches are a fixed size, sooner or later the whole matrix will not fit in the cache. If you have to go out to main memory, it can cause the processor to take a long time as it waits for the memory to be betched. It may be the Intel compiler has better support for handling matrix multiply. The usual way is to recode your multiply so that it is more cache friendly. This is an active research topic, so using google or other search enginee is your friend. For instance, this was one of the first links I found with looking for 'matrix multiple cache' http://www.cs.umd.edu/class/fall2001/cmsc411/proj01/cache/index.html -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: help for arm avr bfin cris frv h8300 m68k mcore mmix pdp11 rs6000 sh vax
On Fri, Mar 13, 2009 at 12:34:49PM +0100, Paolo Bonzini wrote: > These are all the !SHIFT_COUNT_TRUNCATED targets. > > For 4.5 I would like to improve our RTL canonicalization so that no > out-of-range shifts are ever in the RTL representation. > > This in turn means that the description given by SHIFT_COUNT_TRUNCATED > must be exact. Right now !SHIFT_COUNT_TRUNCATED means "I don't know", > I want it to mean "it is never truncated". > > I would like to know whether for avr,bfin,cris,frv,h8300,pdp11,rs6000 > (which define SHIFT_COUNT_TRUNCATED as 0) and for mcore,sh,vax (which > do not define it at all) it is right that shift counts are never > truncated. > > In addition, for arm and m68k I'd like to know whether bitfield > instructions truncate the bit position the same as shifts (8 bits for > arm, 6 bits for m68k). > > This information is particularly important for targets that do not > have a simulator in src. Note, one thing I encountered when doing the SSE5 work at AMD, is SHIFT_COUNT_TRUNCATED really needs a mode argument (and ideally should be moved into the gcc_target structure). In particular, shifts done in the general purpose registers were not truncated, but vector shifts done in the SSE5 instructions were truncated (or vice versa). -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: question on 16 bit registers with 32 bit pointers
On Sat, Apr 11, 2009 at 01:40:57AM +0100, Dave Korn wrote: > Stelian Pop wrote: > > >>> Do I need to define movsi3(), addsi3() etc. patterns manually or should > >>> GCC > >>> figure those by itself ? > >> Not sure I understand you. You always need to define movMM3 etc. GCC > >> will > >> correctly select between movhi3 and movsi3 based on your Pmode macro when > >> handling pointers, but you still need to write the patterns. > > > > The thing is that this CPU does not have any real 32 bit registers, or > > instructions to do assignments/additions/etc to 32 bit registers. So the 32 > > bit > > operations (on pointers) need to be emulated using the 16 bit components, > > and I > > thought that GCC can do this automatically for me ... > > Ah. In theory GCC should move everything by pieces. In practice, you have > to define mov patterns for SI and DI because rtl-level CSE isn't as smart as > it should be. You can use expanders for these. Though if you use expanders that need temporary registers, you may have problems in reload, and need to delve into the mysteries of secondary reload. I would imagine that for pointer sized things it is best if you do need to implement multiple instructions that you hold off on splitting until after reload is completed. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Please help me test the power7-meissner branch before I submit the changes
The power7 stuff should be fairly stable at this point, and I feel it is time to submit it to the mainline GCC. If other powerpc users can try out the branch to see if there are regressions, it would be helpful. Note with the recent gimple type validation errors, the mainline now has more failures than it had previously, so be sure to compare it to the mainline branch of the same vintage (the branch is currently syned to svn version 147021). I have fixed some of the vectorizer failures in this branch, but there are still failures for both mainline and this branch. In particular, I would like to know if I caused any regressions in the embedded powerpc (spe, e500, etc.). Some of the E500/SPE stuff would fit in with the abstractions I put in for distinguishing between altivec and vsx. The branch is at: svn+ssh://@gcc.gnu.org/svn/gcc/branches/ibm/power7-meissner Thanks in advance. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: opaque vector types?
On Wed, May 06, 2009 at 02:29:46AM -0400, DJ Delorie wrote: > > Andrew Pinski writes: > > You could do what the rs6000 back-end does for the altivec builtins > > and resolve them while the parser is run (the SPU back-end does the > > same thing too). Yes there are opaque vector types, you just use > > build_opaque_vector_type instead of build_vector_type. > > Thanks, I'll look at those. Any way to prototype such functions in C ? As Andrew says the rs6000/spu have the notion of overloaded builtins. I've been working in this area somewhat for the power7 port, and you might want to look at my power7-branch. In rs6000.h it uses REGISTER_TARGET_PRAGMAS to set the resolve_overloaded_builtin target hook: /* Target pragma. */ #define REGISTER_TARGET_PRAGMAS() do { \ c_register_pragma (0, "longcall", rs6000_pragma_longcall);\ targetm.resolve_overloaded_builtin = altivec_resolve_overloaded_builtin; \ } while (0) In rs6000-c.c you have the function that tries to resolve the builtin given the argument types: tree altivec_resolve_overloaded_builtin (tree fndecl, void *passed_arglist) { ... } It returns a tree of the builtin function with the appropriate types. In the code, there is a giant table (altivec_overloaded_builtins) that maps the generic builtin functions to the specific ones based on the argument types. For example, altivec has a builtin function that does absolute value for any vector type, and internally it converts this to the appropriate builtin for each type: const struct altivec_builtin_types altivec_overloaded_builtins[] = { /* Unary AltiVec/VSX builtins. */ { ALTIVEC_BUILTIN_VEC_ABS, ALTIVEC_BUILTIN_ABS_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, 0, 0 }, { ALTIVEC_BUILTIN_VEC_ABS, ALTIVEC_BUILTIN_ABS_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, 0, 0 }, { ALTIVEC_BUILTIN_VEC_ABS, ALTIVEC_BUILTIN_ABS_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, 0, 0 }, { ALTIVEC_BUILTIN_VEC_ABS, ALTIVEC_BUILTIN_ABS_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0, 0 }, ... }; -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: New GCC releases comparison and comparison of GCC4.4 and LLVM2.5 on SPEC2000
On Wed, May 13, 2009 at 05:42:03PM +0200, Jan Hubicka wrote: > > Paolo Bonzini wrote: > > > > > >Rather, we should seriously understand what caused the compilation time > > >jump in 4.2, and whether those are still a problem. We made a good job > > >in 4.0 and 4.3 offsetting the slowdowns from infrastructure changes with > > >speedups from other changes; and 4.4 while slower than 4.3 at least > > >stays below 4.2. But, 4.2 was a disaster for compilation time. > > > > > > > > It was interesting for me that GCC4.4 is faster than GCC4.3 on Intel > > Core I7. This is not true for most other processors (at least for P4, > > Core2, Power4/5/6 etc). Intel Core I7 has 3rd level big and fast cache > > and memory controller on the chip. I guess that slowdown in GCC is > > mostly because of data and/or code locality. > > Note that one of important reasons for slowdowns is the startup cost. > Binarry is bigger, we need more libraries and we do more initialization. > With per-function flags via attributes, we are now initializing all the > builtins and amount of builtins in i386 backend has grown to quite > extreme sizes. In 4.3 timeframe I did some work on optimizing startup > costs that translated quite noticeably to kernel compilation time. It > might be interesting to consider some scheme for initializing them > lazilly. For C and C++ on x86/x86_64, the compiler does not create the decls for the builtin functions until it is compiling an ISA that supports the builtin. So on x86_64, it will create the sse and sse2 builtins by default, but not the sse4, sse5, avx builtins until you use the appropriate -m flag or specify a target specific option in the attributes or pragmas. Yes, an initial version of my patches did create all of the nodes, but the August 29th checkin did restrict building the builtins to the ISA supported. Now, it does create the builtins all of the time for other languages, since they don't register the appropriate create a builtin function in external scope hook. However, that still leaves the compiler creating a lot of builtins that mostly aren't used. It may be useful to register names, and have a call back to create the builtin if the name is actually used. It certainly would eliminate a lot of memory space, and time in the compiler. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Extending constraints using register subclasses
On Mon, May 11, 2009 at 04:45:26PM -0700, Jamie Prescott wrote: > > Hi! > I wanted to add finer (one per) register subclasses, so that I can more > finely control > the register placement inside the inline assembly. > These are the relevant definitions inside my include file: > > enum reg_class > { > NO_REGS = 0, > GENERAL_REGS, > X_REGS, > R0_REG, R1_REG, R2_REG, R3_REG, > R4_REG, R5_REG, R6_REG, R7_REG, > X0_REG, X1_REG, X2_REG, X3_REG, > X4_REG, X5_REG, X6_REG, X7_REG, > ALL_REGS, > LIM_REG_CLASSES > }; > > #define N_REG_CLASSES ((int) LIM_REG_CLASSES) > > /* Give names of register classes as strings for dump file. */ > > #define REG_CLASS_NAMES \ > { \ > "NO_REGS", \ > "GENERAL_REGS", \ > "X_REGS", \ > "R0_REG", "R1_REG", "R2_REG", "R3_REG", \ > "R4_REG", "R5_REG", "R6_REG", "R7_REG", \ > "X0_REG", "X1_REG", "X2_REG", "X3_REG", \ > "X4_REG", "X5_REG", "X6_REG", "X7_REG", \ > "ALL_REGS", \ > "LIM_REGS" \ > } > > /* Define which registers fit in which classes. >This is an initializer for a vector of HARD_REG_SET >of length N_REG_CLASSES. */ > > #define REG_CLASS_CONTENTS \ > { \ > { 0x, 0x, 0x, 0x }, /* > NO_REGS */ \ > { 0x, 0x007f, 0x, 0x }, /* > GENERAL_REGS */ \ > { 0x, 0xff80, 0x, 0x007f }, /* X_REGS > */ \ > { 0x0001, 0x, 0x, 0x }, /* R0_REG > */ \ > { 0x0002, 0x, 0x, 0x }, /* R1_REG > */ \ > { 0x0004, 0x, 0x, 0x }, /* R2_REG > */ \ > { 0x0008, 0x, 0x, 0x }, /* R3_REG > */ \ > { 0x0010, 0x, 0x, 0x }, /* R4_REG > */ \ > { 0x0020, 0x, 0x, 0x }, /* R5_REG > */ \ > { 0x0040, 0x, 0x, 0x }, /* R6_REG > */ \ > { 0x0080, 0x, 0x, 0x }, /* R7_REG > */ \ > { 0x, 0x0080, 0x, 0x }, /* X0_REG > */ \ > { 0x, 0x0100, 0x, 0x }, /* X1_REG > */ \ > { 0x, 0x0200, 0x, 0x }, /* X2_REG > */ \ > { 0x, 0x0400, 0x, 0x }, /* X3_REG > */ \ > { 0x, 0x0800, 0x, 0x }, /* X4_REG > */ \ > { 0x, 0x1000, 0x, 0x }, /* X5_REG > */ \ > { 0x, 0x2000, 0x, 0x }, /* X6_REG > */ \ > { 0x, 0x4000, 0x, 0x }, /* X7_REG > */ \ > { 0x, 0x, 0x, 0x007f }, /* > ALL_REGS */ \ > } In addition to the ordering GENERAL_REGS after the R/X_REGS, you didn't mention defining IRA_COVER_CLASSES. With the new IRA register allocator, you need to define a cover class for all registers that can be moved back and forth. For your setup, I would imagine the following would work: #define IRA_COVER_CLASS { GENERAL_REGS } If you don't define an IRA_COVERT_CLASS, then the compiler assumes each register class is unique, and it can't copy between them. I ran into this on the power7 support. On previous power machines, you have two classes of registers FLOAT_REGS and ALTIVEC_REGS (in addition to the GPRs and other registers), but the new VSX instruction set has a merged register set that both the traditional floating point registers and the altivec vector registers are a set of. I found I needed to have a different cover class for VSX using the VSX_REGS class which is the union of the two, and FLOAT_REGS, ALTIVEC_REGS for the pre-vsx code, and switch which is used in the ira_cover_classes target hook. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Extending constraints using register subclasses
On Thu, May 14, 2009 at 04:22:13PM -0700, Jamie Prescott wrote: > OK, I tried reordering the classes by putting smaller ones first. Did not > work. > As far as IRA_COVER_CLASS, this should be a new thing, isn't it? I'm currently > on 4.3.3 and I found no mention of it anywhere. Yes, it was added in GCC 4.4. It might make sense to start the rebase to the current mainline (you will need to do it sooner or later, and it becomes more painful the further away the code is). -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Seeking suggestion
On Fri, May 22, 2009 at 05:04:22PM -0700, Jamie Prescott wrote: > > > From: Jamie Prescott > > To: gcc@gcc.gnu.org > > Sent: Friday, May 22, 2009 10:36:47 AM > > Subject: Seeking suggestion > > > > > > Suppose you're writing the backend for a VM supporting two architectures, > > in > > which > > one of them clobbers the CC registers for certain instructions, while the > > other > > does not. > > The instructions themselves are exactly the same. > > What is the best/shortest/more-elegant way to write this, possibly w/out > > duplicating the > > instructions? > > I know I can write a define_expand and redirect, based on the TARGET, to > > two > > different > > instructions (one with "clobber", the other w/out), but that's basically > > three > > declarations > > for each insns. Is there a shorter way? > > I ended up doing something like this (long way, but the only one I know of). > Example, for addsi3: > > (define_insn "addsi3_xxx2" > [(set (match_operand:SI 0 "fullreg_operand" "=r,r") > (plus:SI (match_operand:SI 1 "fullreg_operand" "0,r") > (match_operand:SI 2 "fullreg_or_imm_operand" "rn,rn")))] > "" > "@ >add\t%0,%2,%0 >add\t%1,%2,%0" > ) > > (define_insn "addsi3_xxx" > [(set (match_operand:SI 0 "fullreg_operand" "=r,r") > (plus:SI (match_operand:SI 1 "fullreg_operand" "0,r") > (match_operand:SI 2 "fullreg_or_imm_operand" "rn,rn"))) >(clobber (reg:CC CC_REG))] > "" > "@ >add\t%0,%2,%0 >add\t%1,%2,%0" > ) > > (define_expand "addsi3" > [(set (match_operand:SI 0 "fullreg_operand" "=r,r") > (plus:SI (match_operand:SI 1 "fullreg_operand" "0,r") > (match_operand:SI 2 "fullreg_or_imm_operand" "rn,rn")))] > "" > { > if (!TARGET_XXX2) > emit_insn(gen_addsi3_xxx(operands[0], operands[1], operands[2])); > else > emit_insn(gen_addsi3_xxx2(operands[0], operands[1], operands[2])); > DONE; > } > ) One way is to use match_scratch, and different register classes for the two cases. (define_insn "add3" [(set (match_operand:SI 0 "register_operand" "=x,y") (plus:SI (match_operand:SI 1 "register_operand" "%x,y") (match_operand:SI 2 "register_operand" "x,y"))) (clobber (match_scratch:CC 3 "=X,z"))] "" "add %0,%1,%2") (define_register_constraint "x" "TARGET_MACHINE ? GENERAL_REGS : NO_REGS" "@internal") (define_register_constraint "y" "!TARGET_MACHINE ? GENERAL_REGS : NO_REGS" "@internal") (define_register_constraint "z" CR_REGS "@interal") This assumes you have a register class for the condition code register. Most machines however, use the normal define_expand with two different insns. In theory, you could define a second condition code register that doesn't actually exist in the machine, and change the clobber from the main CC to the fake one. > But now I get and invalid rtx sharing from the push/pop parallels: > > > .c: In function 'test_dashr': > .c:32: error: invalid rtl sharing found in the insn > (insn 26 3 28 2 .c:26 (parallel [ > (insn/f 25 0 0 (set (reg/f:SI 51 SP) > (minus:SI (reg/f:SI 51 SP) > (const_int 4 [0x4]))) -1 (nil)) > (set/f (mem:SI (reg/f:SI 51 SP) [0 S4 A8]) > (reg:SI 8 r8)) > ]) -1 (nil)) > .c:32: error: shared rtx > (insn/f 25 0 0 (set (reg/f:SI 51 SP) > (minus:SI (reg/f:SI 51 SP) > (const_int 4 [0x4]))) -1 (nil)) > .c:32: internal compiler error: internal consistency failure I suspect you don't have the proper guards on the push/pop insns, and the combiner is eliminating the clobber. You probably need to have parallel insns for the push and pop. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Limiting the use of pointer registers
On Sun, May 24, 2009 at 10:23:05PM +1200, Michael Hope wrote: > Hi there. I'm working on a port to an architecture where the pointer > registers X and Y are directly backed by small 128 byte caches. > Changing one of these registers to a different memory row causes a > cache load cycle, so using them for memory access is fine but using > them as general purpose registers is expensive. > > How can I prevent the register allocator from using these for anything > but memory access? I have a register class called ADDR_REGS that > contains just X and Y and one called DATA_REGS which contains the > general registers R10 to R1E. GENERAL_REGS is the same as DATA_REGS. > The order they appear in in reg_class is DATA_REGS, GENERAL_REGS, then > ADDR_REGS. I've defined the constrains for most of the patterns to > only take 'r' which prevents X or Y being used as operands for those > patterns. I have to allow X and Y to be used in movsi and addsi3 to > allow indirect memory addresses to be calculated. > > Unfortunately Pmode is SImode so I can't tell the difference between > pointer and normal values in PREFERRED_RELOAD_CLASS, > LIMIT_RELOAD_CLASS, or TARGET_SECONDARY_RELOAD. I tried setting > REGISTER_MOVE_COST and MEMORY_MOVE_COST to 100 when the source or > destination is ADDR_REGS but this didn't affect the output. > > I suspect that I'll have to do the same as the accumulator and hide X > and Y from the register allocator. Pretend that any general register > can access memory and then use post reload split to turn the patterns > into X based patterns for the later phases to tidy up. One thought is to make Pmode PSImode instead of SImode. You may have to define a bunch of duplicate insns that do arithmetic on PSImode values instead of SImode values, and you may run into some problems about PSImode because it isn't widely used. I am in the middle of writing a paper right now, where I mention that one of the things I've felt was wrong since I've been hacking GCC is that the RTL backend assumes pointers are just integers. > One more question. The backing caches aren't coherent so X and Y > can't read and write to the same 128 bytes of memory at the same time. > Does GCC have any other information about the location of a pointer > that I could use? Something like: > * Pointer is to text memory or read only data, so it is safe to read from > * Pointer 1 is in the stack and pointer 2 is in BSS, so they are > definitely far apart > * Pointer 1 is to to one on stack item and pointer 2 is to a stack > item at least 128 bytes apart > * The call stack is known and pointer 1 and pointer 2 point to different rows Look at the mem_attrs structure in rtl.h. However, it looks like you have a very challenging machine on your hands. > My fallback plan is to add a variable attribute so the programmer can > mark the pointer as non overlapping and push the problem onto them. > Something clever would be nice though :) Another place where the named address spaces stuff I worked on last year might be useful. > Sorry for all the questions - this is quite a difficult architecture. > I hope to collect all the answers and do a write up for others to use > when I'm done. > > -- Michael -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: RFC: Option handling and PR 37565
On Fri, May 29, 2009 at 09:02:23AM -0700, Steve Ellcey wrote: > While looking into PR 37565 I began to wonder if we need to modify how > we handle optimization flag handling. Currently we have > OVERRIDE_OPTIONS, C_COMMON_OVERRIDE_OPTIONS, and OPTIMIZATION_OPTIONS to > set or override the optimization flags a user gives. One proposal to > fix 37565 was to split OVERRIDE_OPTIONS into OVERRIDE_OPTIONS_ALWAYS and > OVERRIDE_OPTIONS_ONCE which would create two new macros. > > But I was wondering if a cleaner method to deal with these options would > be to get rid of all these macros and use target functions to access > flag values. > > My idea is that we would never change the values of the flags that are > set by the user but instead would have a target overridable function > that returns the value for a given flag by default but in which the > return value could be overridden for some flags on some targets. > > So instead of > if (flag_var_tracking) > we would have > if (targetm.get_optimization_flag_value(OPT_fvar_tracking)) As Ian says, we probably don't want to put a function where we check for each of the flags. This will likely be a major slowdown for the compiler. The trouble is as the compiler gets more complex, it is hard to find just where the right points to change things are, and what do you want to do about more global options. In my original thoughts about target specific support, I didn't include optimization because it was too hard, but quite a few potential users of it chimed in of the usefulness of marking a function as a cold function and compile with -Os instead of a hot function compiling with -O3. Unfortunately, I have been away from the code for about a year, that I don't remember all of the details of why particular choices were made. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Machine Description Template?
On Fri, Jun 05, 2009 at 05:11:06PM -0500, Graham Reitz wrote: > > Is there a machine description template in the gcc file source tree? > > If there is also template for the 'C header file of macro definitions' > that would be good to know too. > > I did a file search for '.md' and there are tons of examples. > Although, I was curious if there was a generic template. Many years ago, I wrote a generic machine that was intended to be a template for this, but it quickly became out of date and useless. I'm not aware of a more modern version. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Regressions with dwarf debugging
On Sat, Jun 13, 2009 at 08:14:36PM -0700, Steve Kargl wrote: > Someone has broken gfortran on FreeBSD with dwarf debugging. > This is a regression. Please fix! I just wrote a patch to fix this: http://gcc.gnu.org/ml/gcc-patches/2009-06/msg01097.html -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: GCC and boehm-gc
On Thu, Jun 18, 2009 at 12:10:53PM -0400, NightStrike wrote: > Given the recent issues with libffi being so drastically out of synch > with upstream, I was curious about boehm-gc and how that is handled. > In getting gcj to work on Win64, the next step is boehm-gc now that > libffi works just fine. However, the garbage collector is in terrible > shape and will need a bit of work. Do we send those fixes here to > GCC, or to some other project? Who handles it? How is the synching > done compared to other external projects? I have the following patch that is waiting for approval. I sent mail to the list Tom Tromey mentioned in the followup, but I haven't seen a reply yet. http://gcc.gnu.org/ml/gcc-patches/2009-06/msg01094.html -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Ping: New Toshiba Media Processor (mep-elf) port and maintainer
On Wed, Jun 17, 2009 at 12:52:33AM -0400, DJ Delorie wrote: > > > Pending initial (technical) approval > > Ping? Still waiting for technical approval from a global maintainer. > > http://people.redhat.com/dj/mep/ I thought I had sent you mail approving the changes 2 weeks ago. Perhaps I misremembered, or perhaps it got lost somewhere. All of my comments about the internals of the mep would be things to do if you desired, but as port maintainer, I don't see that we have to apply standards set for the rest of the compiler (if we did, probably half of the ports would be rejected for formatting in some of the files). I don't recall having any problems with the machine independent changes. -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Endianess attribute
On Thu, Jul 02, 2009 at 12:02:29PM +0200, Paul Chavent wrote: > Hi. > > I already have posted about the endianess attribute > (http://gcc.gnu.org/ml/gcc/2008-11/threads.html#00146). > > For some year, i really need this feature on c projects. > > Today i would like to go inside the internals of gcc, and i would like to > implement this feature as an exercise. > > You already prevent me that it would be a hard task (aliasing, etc.), but i > would like to begin with basic specs. Well actually, if we can ever get the named address space patches checked in, it provides the framework for different address spaces, where pointers might be different sizes or encodings from standard pointers. Non-native endian would be handled by a different named address space, and the compiler would not let you convert different endian pointers. I suspect there are still holes, but those will only be fixed when it gets more mainstream testing and use. Tree level aliasing might be one such case. During the recent GCC summit, I gave a talk about the named address space support that I had worked on last year before being transfered to a different group within IBM. Unfortunately all of my focus is getting the powerpc changes in the current release, and I no longer officially work on the named address space stuff. Anyway I had some time during the summit, and I decided to see how hard it would be to add explicit big/little endian support to the powerpc port. It only took a few hours to add the support for __little and __big qualifier keywords, and in fact more time to get the byte swap instructions nailed down (bear in mind, since I've written a lot of the named address space stuff, I knew exactly where to add stuff, so it might take somewhat longer for somebody else to add the support). So for example, with my patches: __little int foo; would declare foo to be little endian (there are restrictions that named address space variables can only be global/static or referenced through a pointer). Then you can declare: int *__little bar = &foo; would declare bar to be a normal pointer, which points to a little endian item. The following would be illegal, since bletch and bar point to different named address spaces, and the backend says you can't convert them. int *bletch = bar; -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Endianess attribute
On Thu, Jul 02, 2009 at 06:54:52PM -0400, Ken Raeburn wrote: > On Jul 2, 2009, at 16:44, Michael Meissner wrote: >> Anyway I had some time during the summit, and I decided to see how >> hard it >> would be to add explicit big/little endian support to the powerpc >> port. It >> only took a few hours to add the support for __little and __big >> qualifier >> keywords, and in fact more time to get the byte swap instructions >> nailed down > > That sounds great! > >> (there are restrictions that named >> address space variables can only be global/static or referenced >> through a >> pointer). > > That sounds like a potential problem, depending on the use cases. No > structure field members with explicit byte order? That could be > annoying for dealing with network protocols or file formats with > explicit byte ordering. The technical report that named address spaces is based on does not allow you to mix address spaces within a structure (pointers to different address spaces can be mixed, just not the data themselves). For example, stack variables are always in the generic address space. I do tend to think it would be better to have little/big be more offical than a target specific thing. However, most of the places you need to modify for named address spaces are the places you need to modify for mixed endian. > On the other hand, if we're talking about address spaces... I would > guess you could apply it to a structure? That would be good for > memory-mapped devices accepting only one byte order that may not be that > of the main CPU. For that use case, it would be unfortunate to have to > tag every integer field. > > I don't think Paul indicated what his use case was... -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Target macros vs. target hooks - policy/goal is hooks, isn't it?
On Wed, May 26, 2010 at 10:16:22AM -0700, Mark Mitchell wrote: > Ulrich Weigand wrote: > > >> So the question is: The goal is to have hooks, not macros, right? If > >> so, can reviewers please take care to reject patches that introduce > >> new macros? > > > > I don't know to which extent this is a formal goal these days, but I > > personally agree that it would be nice to eliminate macros. > > Yes, the (informally agreed) policy is to have hooks, not macros. There > may be situations where that is technically impossible, but I'd expect > those to be very rare. For the target address space stuff, it is to the __ea keyword. You don't want a target hook that is called on every identifier, but instead you want a target hook that is called in c_parse_init (in c-parser.c) and init_reswords (in cp/lex.c) to set up the keywords. The target hook would have to duplicate the functionality of all of the setup that c_parse_init and init_reswords do, particularly if they have different semantics. -- Michael Meissner, IBM Until June 14: 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA After June 14: 5 Technology Place Drive, MS 2757, Westford, MA 01886, USA meiss...@linux.vnet.ibm.com
Re: [patch] Remove TARGET_ADDR_SPACE_KEYWORDS target macro
On Wed, May 26, 2010 at 08:09:57PM +0200, Ulrich Weigand wrote: > Mike Meissner wrote: > > On Wed, May 26, 2010 at 10:16:22AM -0700, Mark Mitchell wrote: > > > Ulrich Weigand wrote: > > > > > > >> So the question is: The goal is to have hooks, not macros, right? If > > > >> so, can reviewers please take care to reject patches that introduce > > > >> new macros? > > > > > > > > I don't know to which extent this is a formal goal these days, but I > > > > personally agree that it would be nice to eliminate macros. > > > > > > Yes, the (informally agreed) policy is to have hooks, not macros. There > > > may be situations where that is technically impossible, but I'd expect > > > those to be very rare. > > > > For the target address space stuff, it is to the __ea keyword. You don't > > want > > a target hook that is called on every identifier, but instead you want a > > target > > hook that is called in c_parse_init (in c-parser.c) and init_reswords (in > > cp/lex.c) to set up the keywords. The target hook would have to duplicate > > the > > functionality of all of the setup that c_parse_init and init_reswords do, > > particularly if they have different semantics. > > It looks like this may be simpler than I thought. The following patch > introduces a "c_register_addr_space" routine that the back-end may call > to register a named address space qualifier keyword. (This could be done > either statically at start-up, or even dynamically e.g. in respose to a > target-specific #pragma.) > > In the current patch, the SPU back-end uses somewhat of a hack to actually > perform that call: it is included in the REGISTER_TARGET_PRAGMAS macro. > Note that this macro is already being used by the SPU and several other > back-end for similar hacks. It seems that it might be a good idea to > replace this by something like a "register_target_extensions" callback > in targetcm, but that can probably be done in a separate patch. Note, many of the things done by REGISTER_TARGET_PRAGMAS deal with the preprocessor and keywords, which are used by the C-like front ends, but not used for Ada, Fortran, etc. Those front ends don't link in libcpp. So, you need something in -c.c that sets the hook, rather than moving it to .c. -- Michael Meissner, IBM Until June 14: 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA After June 14: 5 Technology Place Drive, MS 2757, Westford, MA 01886, USA meiss...@linux.vnet.ibm.com
Time to create wwwdocs/htdocs/gcc-4.6?
As I was about to check in the -mrecip changes for powerpc on GCC 4.6, I figured to get a start on documentation, and I was going to edit the gcc-4.6/changes.html file. I realize this is early in the cycle, but did we want to create the gcc-4.6 directory? -- Michael Meissner, IBM Until June 30: 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA After June 30: 5 Technology Place Drive, MS 2757, Westford, MA 01886, USA meiss...@linux.vnet.ibm.com
Re: Time to create wwwdocs/htdocs/gcc-4.6?
On Thu, Jun 03, 2010 at 03:31:30PM +0200, Gerald Pfeifer wrote: > On Thu, 3 Jun 2010, Richard Guenther wrote: > >> As I was about to check in the -mrecip changes for powerpc on GCC 4.6, > >> I figured to get a start on documentation, and I was going to edit the > >> gcc-4.6/changes.html file. I realize this is early in the cycle, but > >> did we want to create the gcc-4.6 directory? > > Hm? It's already there. (just no index.html) > > Michael, perhaps you need to run `cvs update -d` to see the new directory? > > (I'd _love_ to move wwwdocs under SVN, just have not done something like > this before and don't want to lose all history. Any help welcome.) You and Richard are correct. I've forgotten that I needed -d on cvs to add directories after using svn for a few years. I recall there were tools to convert cvs to svn (such as cvs2svn), but I haven't used them personally. http://svnbook.red-bean.com/en/1.0/apas11.html -- Michael Meissner, IBM Until June 30: 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA After June 30: 5 Technology Place Drive, MS 2757, Westford, MA 01886, USA meiss...@linux.vnet.ibm.com
Re: CALL_USED_REGISTERS per function basis
On Thu, Aug 05, 2010 at 03:48:55PM +0200, Claudiu Zissulescu wrote: > Hi, > > I want to use a different CALL_USED_REGISTER set per individual > function. The issue here is to inform a caller about the callee > CALL_USED_REGISTERS and save/restore at caller level the possible > altered registers. This should reduce the number of saved/restored > operation in a function. > > Can someone give me some pointers in this direction? In the x86 they use TARGET_EXPAND_TO_RTL_HOOK which points to the ix86_maybe_switch_abi function. That function is: /* MS and SYSV ABI have different set of call used registers. Avoid expensive re-initialization of init_regs each time we switch function context since this is needed only during RTL expansion. */ static void ix86_maybe_switch_abi (void) { if (TARGET_64BIT && call_used_regs[SI_REG] == (cfun->machine->call_abi == MS_ABI)) reinit_regs (); } Going beyond the above, about 2 years ago, I and another programmer wrote the function specific support that allows you to use attributes and pragmas to say a particular function is compiled with non-standard options. My intention was that you could declare one function normally, another with SSE2 support, and a third with SSE4 support, and that at runtime you could switch to use the function that supported the instruction set. This was done via the TARGET_SET_CURRENT_FUNCTION callback, which in the x86 case is ix86_set_current_function. This does the target_reinit and eventually reinit_regs. Now, unfortunately, I've been away from the code for about 2 years, and I don't know whether it has bit-rotted or not. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com
Re: CALL_USED_REGISTERS per function basis
On Wed, Aug 18, 2010 at 12:56:47PM -0700, Richard Henderson wrote: > On 08/18/2010 12:06 PM, Michael Meissner wrote: > > Now, unfortunately, I've been away from the code for about 2 years, and I > > don't > > know whether it has bit-rotted or not. > > It hasn't. In fact, the Vectorize _cpp_clean_line thread > contains a patch that uses it. Not quite ready to commit, > but nearly. Cool. I've been hoping to get the decks cleared, and add the power support as well. It would be nice to do the later stages that I had thought about for fat binaries. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com
Re: %pc relative addressing of string literals/const data
On Tue, Oct 05, 2010 at 11:56:55AM -0700, Richard Henderson wrote: > On 10/05/2010 06:54 AM, Joakim Tjernlund wrote: > > Ian Lance Taylor wrote on 2010/10/05 15:47:38: > >> Joakim Tjernlund writes: > >>> While doing relocation work on u-boot I often whish for strings/const data > >>> to be accessible through %pc relative address rather than and ABS address > >>> or through GOT. Has this feature ever been considered by gcc? > >> > >> The feature can only be supported on processors which are reasonably > >> able to support it. So, what target are you asking about? > > > > In my case PowerPC but I think most arch's would want this feature. > > Is there arch's where this cannot be support at all or just within > > some limits? I think within limits could work for small apps > > like u-boot. > > PowerPC doesn't really have the relocations for pc-relative offsets > within code -- primarily because it doesn't have pc-relative insns > to match. Nor, unfortunately, does it have got-relative relocations, > like some other targets. These are normally how we get around not > having pc-relative relocations and avoiding tables in memory. C.f. The new PowerPC medium code model for 64-bit now uses GOT-relative relocations for local data: The compiler issues (r2 has the TOC/GOT pointer): addis ,r2,la...@toc@ha addi ,,la...@toc@l to load the address in . If the data happens to be within 32K of the TOC pointer, the linker will transform this to: nop addi ,r2,la...@toc@l For non-local loads or the large code model, the compiler issues: addis ,r2,la...@got@ha ld ,la...@got@l() If the address happens to be defined in the current program unit (main program or shared librar), the linker can transform this to: addis ,r2,la...@toc@ha addi ,,la...@toc@l or: nop addi ,r2,la...@toc@l or: nop ld ,la...@got@l(r2) -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com
Re: %pc relative addressing of string literals/const data
On Thu, Oct 07, 2010 at 04:50:50PM +0200, Joakim Tjernlund wrote: > Why not offer some of this on PowerPC32? mcmodel=small would probably be > enough. Well as they say, contributions are welcome. Note, 32-bit mode doesn't need this when compiling for the main program, since it does addis/addi already (or moves the addi into the offset of the memory reference instructions) to load up addresses (in 32-bit mode, the addis/addi will form any address, but in 64-bit mode, this can't be used, because addresses are loaded above the 32k limit). For pic code, it does need to use the GOT in traditional fashion. Note, the 64-bit ABI requires that r2 have the current function's GOT in it when the function is called, while the 32-bit ABI uses r2 as a small data pointer (and possibly r13 as a second small data pointer). So, in 32-bit mode you have to get the GOT address by doing call to the next instruction to get the address in the LR. So, it isn't as simple as moving the the 64-bit stuff in 32-bit, since there are different assumptions. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com
Re: %pc relative addressing of string literals/const data
On Thu, Oct 21, 2010 at 08:17:51PM +0200, Gunther Nikl wrote: > Michael Meissner wrote: > > Note, the 64-bit ABI requires that r2 have the current function's GOT in it > > when the function is called, while the 32-bit ABI uses r2 as a small data > > pointer (and possibly r13 as a second small data pointer). > > If the 32-bit ABI is the SYSV-ABI, then you got the register usage > wrong. r13 is the small data pointer with -msdata=sysv and r2 is a > second small data pointer when using -msdata=eabi. > > Joakim, did you try using -msdata=sysv together with "-G 64000"? Yes, I think I mentally swapped the registers. Sorry. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com
Re: define_split
On Thu, Oct 28, 2010 at 09:11:44AM +0200, roy rosen wrote: > Hi all, > > I am trying to use define_split, but it seems to me that I don't > understand how it is used. > It says in the gccint.pdf (which I use as my tutorial (is there > anything better or more up to date?)) that the combiner only uses the > define_split if it doesn't find any define_insn to match. This is not > clear to me: If there is no define_insn to match a pattern in the > combine stage, then how is this pattern there in the first place? The > only way I can think for it to happen is that such a pattern was > created by the combiner itself (but it seems unreasonable to me that > we now want to split what the combiner just combined). Can someone > please explain this to me? Basically combine.c works by trying to build a combined insn, and then it sees if it matches an insn. The canonical example is fused multiply/add (avoiding all of the issues with FMA's for the moment): (define_insn "addsf3" [(set (match_operand:SF 0 "f_operand" "=f") (plus:SF (match_operand:SF 1 "f_operand" "f") (match_operand:SF 2 "f_operand" "f")))] "" "fadd %0,%1,%2") (define_insn "mulsf3" [(set (match_operand:SF 0 "f_operand" "=f") (mult:SF (match_operand:SF 1 "f_operand" "f") (match_operand:SF 2 "f_operand" "f")))] "" "fadd %0,%1,%2") (define_insn "fmasf3" [(set (match_operand:SF 0 "f_operand" "=f") (plus:SF (mult:SF (match_operand:SF 1 "f_operand" "f") (match_operand:SF 2 "f_operand" "f")) (match_operand:SF 3 "f_operand" "f")))] "" "fma %0,%1,%2,%3") What the documentation is trying to say is that instead of the fmasf3 define_insn, you could have a define_split. I personally would do a define_insn_and_split for the combined insn. Consider a weird machine that you need to do FMA in a special fixed register, but you need to do the multiply and add as separate instructions: (define_split [(set (match_operand:SF 0 "f_operand" "=f") (plus:SF (mult:SF (match_operand:SF 1 "f_operand" "f") (match_operand:SF 2 "f_operand" "f")) (match_operand:SF 3 "f_operand" "f")))] "" [(set (match_dup 4) (mult:SF (match_dup 1) (match_dup 2))) (set (match_dup 4) (plus:SF (match_dup 4) (match_dup 3))) (set (match_dup 0) (match_dup 4))] "operands[4] = gen_rtx_REG (SFmode, ACC_REGISTER);") I would probably write it as: (define_insn_and_split "fmasf3" [(set (match_operand:SF 0 "f_operand" "=f") (plus:SF (mult:SF (match_operand:SF 1 "f_operand" "f") (match_operand:SF 2 "f_operand" "f")) (match_operand:SF 3 "f_operand" "f"))) (clobber (reg:SF ACC_REGISTER))] "" "#" "" [(set (match_dup 4) (mult:SF (match_dup 1) (match_dup 2))) (set (match_dup 4) (plus:SF (match_dup 4) (match_dup 3))) (set (match_dup 0) (match_dup 4))] "operands[4] = gen_rtx_REG (SFmode, ACC_REGISTER);") In the old days, define_split was only processed before each of the two scheduling passes if they were run and at the very end if final finds an insn string "#". Now, it is always run several times. Looking at passes.c, we see it is run: Before the 2nd lower subreg pass (which is before the 1st scheduling pass) After gcse2 and reload Just before the 2nd scheduling pass Before regstack elimination on x86 Before the execption handling/short branch passes I talked about this in part of my tutorial on using modern features in your MD file that I gave at this year's GCC summit: http://gcc.gnu.org/wiki/summit2010 -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com
Re: peephole2: dead regs not marked as dead
On Tue, Nov 02, 2010 at 10:41:49AM +0100, Georg Lay wrote: > This solution works: > > Generating a named insn in andsi3-expander as > > (define_insn_and_split "..." > [(set (match_operand:SI 0 "register_operand" "") > (and:SI (match_operand:SI 1 "register_operand" "") > (match_operand:SI 2 "const_int_operand" ""))) >(clobber (match_operand:SI 3 "register_operand" ""))] > "... >&& !reload_completed >&& !reload_in_progress" > { > gcc_unreachable(); > } > "&& 1" > [(set (match_dup 3) > (and:SI (match_dup 1) > (match_dup 4))) >(set (match_dup 0) > (xor:SI (match_dup 3) > (match_dup 1)))] > { > ... > }) > > The insn passes combine unchanged and gets split in split1 as expected :-) > > What I do not understand is *why* this works. > The internals "16.16 How to Split Instructions" mention two flavours of insn > splitting: One after reload for the scheduler and one during combine stage, > the > latter just doing single_set --> 2 * single_set splits for insns that do *not* > match during combine stage. As I just wrote for another reply: In the old days, define_split was only processed before each of the two scheduling passes if they were run and at the very end if final finds an insn string "#". Now, it is always run several times. Looking at passes.c, we see it is run: Before the 2nd lower subreg pass (which is before the 1st scheduling pass) After gcse2 and reload Just before the 2nd scheduling pass Before regstack elimination on x86 Before the execption handling/short branch passes I talked about this in part of my tutorial on using modern features in your MD file that I gave at this year's GCC summit: http://gcc.gnu.org/wiki/summit2010 In particular, go to pages 5-7 of my tutorial (chapter 6) where I talk about scratch registers and allocating a new pseudo in split1 which is now always run. > However, this insn matches. So the combiner does'n split in accordance with > internals doc. But the opportunity to use split1 is neither mentioned nor > described in the internals. > > I observed that the insn gets split in split1 even if the splitter produces > more > than two insns and also if optimization is off. > > That's nice but can I rely on that behaviour? (As far as -O0 is concerned, I > will emit the special insn just if optimization is turned on, but nevertheless > it would be interesting to know why this works smooth with -O0 as I expected > to > run in unrecognizable insn or something like that during reload). In gcc/passes.c the split passes are always run and do not depend on the optimization flags, so yes, you can now rely on it with -O0. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com
Re: how much is the effort required to retarget gcc?
On Mon, Nov 01, 2010 at 09:21:14PM +0800, Hui Yan Cheah wrote: > Hi, > > We are working on a new project which requires a retargetting a > compiler to a small cpu on FPGA. > The cpu is hand-coded and it supports only a limited number of instruction > sets. > > My questions are: > > 1. Since I have very limited experience with compilers (this is my > first compiler project), is it wise to begin with gcc? I have > googled-up smaller compilers like pcc, lcc and small-c and they seem > like very good candidates. However, I would like to listen to the > opinions of programmers who have worked with gcc or retargetted gcc. > > 2. How much is the effort in retargetting compilers? I heard it took > months but it all depends on the level of experience of the > programmer. So if an experienced programmer took 2 month, it might > have taken a beginner 6 months or so. It really depends on the complexity of the machine and how much you know about the RTL interface to the compiler. It also depends on whether you also have to do the assembler, linker, library and debugger, or if it is just the compiler work. Back in the day when I was doing new ports, I tended to estimate 3 months for an initial port of a normal machine generating reasonable code that is passing all of the test suites, etc. However, generally within 1 month, I would be generating code, but not all of the features targetted. That assumes somebody else did the assembler/linker/debugger/library. However, note that I have been working on GCC for quite some time, and have done at least 5 ports from scratch, so you probably don't want to use my time estimates :-) The more irregular/limited the machine is, the more it takes to get reasonable code generation. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com
Re: UNITS_PER_SIMD_WORD
On Mon, Nov 01, 2010 at 04:52:28PM +0200, roy rosen wrote: > Hi All, > > Is it possible to define UNITS_PER_SIMD_WORD as a global variable and > to set this varibale using a pragma (even once for a compilation) and > that way to be able to compile one file with UNITS_PER_SIMD_WORD = 8 > and another file with UNITS_PER_SIMD_WORD = 16? The general way to do this would be to have an appropriate -m option to control this. Typically it is based on the instructions available to the back end. If you have an -m option, you could presumably define target attributes or pragmas to change this for particular functions. At the moment, the i386 port is the only one to support target attributes/pragmas, though I have just submitted patches for the rs6000 port to add them there, so you can see what types of changes are needed (note, I need to do another patch after the first is approved for builtin functions). -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com
Re: define_split
On Tue, Nov 09, 2010 at 09:38:28AM +0200, roy rosen wrote: > I still don't understand the difference between your two examples: > If you write a define_split then whenever during combine it gets into > a pattern which matches the define_split then it splits. > > What is the difference when writing define_insn_and_split? > From what I understood from the docs then if there is such an insn > then the split does not occur so it would simply match it as an insn > without splitting and at the end would print the #? > Can you please elaborate? In the case where you have a define and split, such as: (define_insn_and_split "fmasf3" [(set (match_operand:SF 0 "f_operand" "=f") (plus:SF (mult:SF (match_operand:SF 1 "f_operand" "f") (match_operand:SF 2 "f_operand" "f")) (match_operand:SF 3 "f_operand" "f"))) (clobber (reg:SF ACC_REGISTER))] "" "#" "" [(set (match_dup 4) (mult:SF (match_dup 1) (match_dup 2))) (set (match_dup 4) (plus:SF (match_dup 4) (match_dup 3))) (set (match_dup 0) (match_dup 4))] "operands[4] = gen_rtx_REG (SFmode, ACC_REGISTER);") At the end of the combine pass will be an insn with the plus and mult. This insn will exist for at least the if_after_combine, partition_blocks, regmove, and outof_cfg_layout_mode passes before it is potentially split in the split_all_insns pass, unless you have a "&& reload_completed" in the test for the split (second "" in the example above). The define_insn_and_split is just syntactic sugar. You could have written this as: (define_insn "fmasf3" [(set (match_operand:SF 0 "f_operand" "=f") (plus:SF (mult:SF (match_operand:SF 1 "f_operand" "f") (match_operand:SF 2 "f_operand" "f")) (match_operand:SF 3 "f_operand" "f"))) (clobber (reg:SF ACC_REGISTER))] "" "#") (define_split [(set (match_operand:SF 0 "f_operand" "=f") (plus:SF (mult:SF (match_operand:SF 1 "f_operand" "f") (match_operand:SF 2 "f_operand" "f")) (match_operand:SF 3 "f_operand" "f"))) (clobber (reg:SF ACC_REGISTER))] "" [(set (match_dup 4) (mult:SF (match_dup 1) (match_dup 2))) (set (match_dup 4) (plus:SF (match_dup 4) (match_dup 3))) (set (match_dup 0) (match_dup 4))] "operands[4] = gen_rtx_REG (SFmode, ACC_REGISTER);") Thus, if you had the following insns before combine: (insn ... (set (reg:SF 123) (mult:SF (reg:SF 124) (reg:SF 125 (insn ... (set (reg:SF 126) (plus:SF (reg:SF 123) (reg:SF 127 After combine, the combined insn would be (note register 123 disappears after combine): (insn ... (set (reg:SF 126) (plus:SF (mult:SF (reg:SF 124) (reg:SF 125)) (reg:SF 127 The split pass would then break this back into three insns: (insn ... (set (reg:SF ACC_REGISTER) (mult:SF (reg:SF 124) (reg:SF 125 (insn ... (set (reg:SF ACC_REGISTER) (plus:SF (reg:SF ACC_REGISTER) (reg:SF 127 (insn ... (set (reg:SF 126) (reg:SF ACC_REGISTER))) Now, if you just had the split and no define_insn, combine would try and form the (plus (mult ...)) and not find an insn to match, so while it had the temporary insn created, it would try to immediately split the insn, so at the end of the combine pass, you would have: (insn ... (set (reg:SF ACC_REGISTER) (mult:SF (reg:SF 124) (reg:SF 125 (insn ... (set (reg:SF ACC_REGISTER) (plus:SF (reg:SF ACC_REGISTER) (reg:SF 127 (insn ... (set (reg:SF 126) (reg:SF ACC_REGISTER))) So whether the passes in between combine and the split pass care, is a different question. I didn't recall that combine had this split feature. As I said, my preference is to create the insn, and then split it later. Note, you can only allocate new registers (either hard registers or pseudo registers) in the split passes before register allocation. You will get an error if you create a new register after reload. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com
Re: define_split
On Tue, Nov 09, 2010 at 01:38:17PM -0500, Joern Rennecke wrote: > Quoting Paolo Bonzini : > > >On 11/09/2010 05:38 PM, Joern Rennecke wrote: > >>A define_insn will be recognized in all contexts. > >>Having an insn pattern for an insn that does not actually exist can cause > >>all kinds of unintended consequences as the optimizers try to generate > >>and recognize 'optimized' patterns, or when reload does its stuff. > > > >Before reload that is not a problem at all, I think, actually it can be > >an advantage. > > No, it is a code quality problem. And yes, I have seen actual > SH patterns being recognized that were not wanted, and lead to worse > code overall. Generally you need to tighten the pattern conditions to make sure it doesn't match. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com
Re: Idea - big and little endian data areas using named address spaces
On Wed, Nov 10, 2010 at 01:00:49PM +0100, David Brown wrote: > Would it be possible to use the named address space syntax to > implement reverse-endian data? Conversion between little-endian and > big-endian data structures is something that turns up regularly in > embedded systems, where you might well be using two different > architectures with different endianness. Some compilers offer > direct support for endian swapping, but gcc has no neat solution. > You can use the __builtin_bswap32 (but no __builtin_bswap16?) > function in recent versions of gcc, but you still need to handle the > swapping explicitly. > > Named address spaces would give a very neat syntax for using such > byte-swapped areas. Ideally you'd be able to write something like: > > __swapendian stuct { int a; short b; } data; > > and every access to data.a and data.b would be endian-swapped. You > could also have __bigendian and __litteendian defined to > __swapendian or blank depending on the native ordering of the > target. > > > I've started reading a little about how named address spaces work, > but I don't know enough to see whether this is feasible or not. > > > Another addition in a similar vein would be __nonaligned, for > targets which cannot directly access non-aligned data. The loads > and stores would be done byte-wise for slower but correct > functionality. Yes. In fact when I gave the talk on named address spaces I mentioned this, and during the summit last year, I made a toy demonstration set of patches in the PowerPC to add cross endian support. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com
Re: Method to disable code SSE2 generation but still use -msse2
On Mon, Nov 22, 2010 at 02:33:32PM -0800, David Mathog wrote: > My software implementation of SSE2 now passes all the testsuite > programs. In case anybody else ever needs this, it is here: > > http://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/soft_emmintrin.h > > I compiled that with a target program and gprof showed > all the time in resulting binary in the inlined functions. It ran about > 4X slower than the SSE2 hardware version, which is about what I > expected. So, so far so good. What I am worried about now is that > since it was invoked with "-msse2" the compiler may still be generating > SSE2 calls within the inlined functions. Is there a way to definitively > disable this but still retain -msse2 on the command line? > > For instance, here is one of the software version inline functions: > > /* vector subtract the two doubles in an __m128d */ > static __inline __m128d __attribute__((__always_inline__)) > _mm_sub_pd (__m128d __A, __m128d __B) > { > return (__m128d)((__v2df)__A - (__v2df)__B); > } Use target attributes (or pragmas): static __inline __m128d __attribute__((__always_inline__,__target__("no-sse2"))) _mm_sub_pd (__m128d __A, __m128d __B) { return (__m128d)((__v2df)__A - (__v2df)__B); } or: #pragma GCC push_options #pragma GCC target ("no-sse2") static __inline __m128d __attribute__((__always_inline__)) _mm_sub_pd (__m128d __A, __m128d __B) { return (__m128d)((__v2df)__A - (__v2df)__B); } #pragma GCC pop_options > In the original gcc emmintrin.h that called a builtin _explicitly_. I > also want to avoid having the compiler use the same builtin > _implicitly_. If it uses SSE, 3DNOW or MMX implicitly, in this example, > that would be fine, it just cannot use any SSE2 hardware. > > Actually, one thing I was never very clear on, do -msse2 -m3dnow > etc. only provide access to the corresponding machine operations through > the _mm* (or whatever) definitions in the header file, or does the > compiler also figure out vector operations by itself during the > optimization phase of compilation? If -msse2 is used on the command line or inside of a target attribute/pragma, the compiler feels free to use the sse2 instructions in any fashion, including when vectorizing. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899
Re: [RS6000] strict alignment for little-endian
On Fri, Jun 07, 2013 at 10:54:39AM +0930, Alan Modra wrote: > I'd like to remove -mstrict-align for little-endian powerpc, because > the assumption that mis-aligned accesses are massively slow isn't true > for current powerpc processors. However I also don't want to break > old machines, so probably should add -mstrict-align back for some set > of cpus. Can anyone tell me the set? I don't know the set. I recall the original PPC's (604, 603, etc.) had the strict alignment rule in little endian rule, and the 750 relaxed it for GPRs (but not FPRs), but I don't know about later machines. > Index: gcc/config/rs6000/sysv4.h > === > --- gcc/config/rs6000/sysv4.h (revision 199718) > +++ gcc/config/rs6000/sysv4.h (working copy) > @@ -538,12 +538,7 @@ > > #define CC1_ENDIAN_BIG_SPEC "" > > -#define CC1_ENDIAN_LITTLE_SPEC "\ > -%{!mstrict-align: %{!mno-strict-align: \ > -%{!mcall-i960-old: \ > - -mstrict-align \ > -} \ > -}}" > +#define CC1_ENDIAN_LITTLE_SPEC "" > > #define CC1_ENDIAN_DEFAULT_SPEC "%(cc1_endian_big)" > > > -- > Alan Modra > Australia Development Lab, IBM > -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: Why high vsr registers [vsr32 - vsr63] are not used when -mvsx is specified on powerpc?
On Fri, Jul 19, 2013 at 04:22:48PM -0700, Carrot Wei wrote: > On Wed, Jul 17, 2013 at 6:01 PM, David Edelsohn wrote: > > On Wed, Jul 17, 2013 at 7:27 PM, Carrot Wei wrote: > >> Hi > >> > >> When I tried to build 444.namd with options "-O2 -m64 -mvsx > >> -mcpu=power7", I can see vsx instructions are actually used, there are > >> many xs- started instructions, but none of them use high registers > >> [vsr32 -vsr63], does anybody know the reason? > >> > >> One example is function calc_pair_energy_fullelect in file > >> ComputeNonbondedUtil.o, there are many vsr register spilling but high > >> vsr registers are never used. > > > > For scalar floating point, not vector floating point, GCC currently > > uses only the lower VSRs because the upper registers only allow > > indexed addressing modes (register + register) and not displacement > > forms. It's one register class with different valid addressing forms > > depending on the register number, which is difficult for GCC. We did > > not want to disable displacement address form in the initial support. > > In insn patterns the register class is usually not directly used, instead > different predicates and constraints are used. So can we use different > predicates and constraints in memory access instructions and floating > point arithmetic instructions? Unfortunately, move patterns are special. At the point of register allocation, you need to have only one pattern that does a move of a given type. Predicates can only work on a single operand, so a a predicate operating on a memory operand has no idea what register is being loaded or stored. Constraints will allow you to select the different patterns, and those are used. Then in the secondary reload stage, you are given the move and have various ways to 'fix' it. Adding those fixes to the secondary reload hooks is way to deal with machines that have different address formats for different registers. Unfortunately the first machines GCC originally targeted (68k, vax) had general addressing formats that worked everywhere, and later machines you knew for a given type what type of addressing could be used. In the PowerPC you don't want register+offset if you are targetting VSX registers, while you don't want register+register if you loading the value into a GPR register and it is larger than a single register. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: making sizeof(void*) different from sizeof(void(*)())
On Sat, Apr 28, 2012 at 02:32:18PM -0500, Peter Bigot wrote: > The MSP430's split address space and ISA make it expensive to place > data above the 64 kB boundary, but cheap to place code there. So I'm > looking for a way to use HImode for data pointers, but PSImode for > function pointers. If gcc supports this, it's not obvious how. > > I get partway there with FUNCTION_MODE and some hacks for the case > where the called object is a symbol, but not when it's a > pointer-to-function data object. As an example, bootstrapping fails > in libgcc/unwind-dw2-fde.c because the fde_compare pointer-to-function > object is stored in a HImode register instead of a PSImode register. > > The only candidate solution I've seen (and haven't yet tried) is to > somehow assign all functions to be in a special named address space > and use TARGET_ADDR_SPACE_POINTER_MODE to override the normal use of > ptr_mode in build_pointer_type. > > I haven't been able to identify an existing back-end that has such a > case, though I'd swear I've seen memory models like this for some > other processor in the past. In the original 1989 ANSI C standard (which became the 1990 ISO C standard) function pointers could be a different size than void * pointers. In that standard, void * and char * had to have the same underlying size and representation. I don't recall if C99 changed this in any way. The D10V port sort of cheated in that it had 16-bit pointers that were byte pointers for data, and 16-bit pointers that were word pointers for functions, and it caused a number of issues, back when I did the D10V many years ago. However, since the D10V has been removed quite some time ago, it is no longer relevant. Another way to go is what we do in 64-bit powerpc -- function pointers are actually pointers to a 3 word descriptor, that contains the real function address, the value to load into the GOT pointer, and the value to load into the register holding the static chain. -- Michael Meissner, IBM 5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899