Re: Live range shrinkage in pre-reload scheduling
On 15/05/14 09:52, Ramana Radhakrishnan wrote: On Thu, May 15, 2014 at 8:36 AM, Maxim Kuvyrkov wrote: On May 15, 2014, at 6:46 PM, Ramana Radhakrishnan wrote: I'm not claiming it's a great heuristic or anything. There's bound to be room for improvement. But it was based on "reality" and real results. Of course, if it turns out not be a win for ARM or s390x any more then it should be disabled. The current situation that Kyrill is investigating is a case where we notice the first scheduler pass being a bit too aggressive with creating ILP opportunities with the A15 scheduler that causes performance differences with not turning on the first scheduler pass vs using the defaults. Charles has a work-in-progress patch that fixes a bug in SCHED_PRESSURE_MODEL that causes the above symptoms. The bug causes 1st scheduler to unnecessarily increase live ranges of pseudo registers when there are a lot of instructions in the ready list. Is this something that you've seen shows up in general integer code as well ? Do you or Charles have an example for us to look at ? I'm not sure what "lot of instructions in the ready list" really means here. The specific case Kyrill's been looking into is Dhrystone Proc_8 when tuned for a Cortex-A15 with neon and float-abi=hard but I am not sure if that has "too many instructions" :) . Kyrill, could you also look into the other cases we have from SPEC2k where we see this as well and come back with any specific testcases that Charles / Richard could also take a look into. Hi all, From what I can see the most significant regression from this pre-regalloc scheduling on SPEC2k is in 171.swim. It seems to suffer from similar symptoms to Proc_8 (lots of extra spills on the stack) Looking forward to the patch :). Let me know if I can help with any testing/validation. Kyrill Charles, can you finish your patch in the next several days and post it for review? I think we'll await this but we'll go look into some of the benchmarks. Ramana Thank you, -- Maxim Kuvyrkov www.linaro.org
Using particular register class (like floating point registers) as spill register class
I would like to know if there is anyway we can use registers from particular register class just as spill registers (in places where register allocator would normally spill to stack and nothing more), when it can be useful. In AArch64, in some cases, compiling with -mgeneral-regs-only produces better performance compared not using it. The difference here is that when -mgeneral-regs-only is not used, floating point register are also used in register allocation. Then IRA/LRA has to move them to core registers before performing operations as shown below. . fmovs1, w8 <-- mov w21, 49622 movkw21, 0xca62, lsl 16 add w21, w16, w21 add w21, w21, w2 eor w10, w0, w10 add w10, w21, w10 ror w8, w7, 27 add w7, w10, w8 ror w7, w7, 27 fmovw0, s1 <-- add w7, w0, w7 add w13, w13, w7 fmovw0, s4 <-- add w0, w0, w20 fmovs4, w0 <-- ror w18, w18, 2 fmovw0, s2 <-- add w0, w0, w18 fmovs2, w0 <-- add w12, w12, w27 add w14, w14, w15 mov w15, w24 fmovx0, d3 <-- subsx0, x0, #1 fmovd3, x0 <-- bne .L2 fmovx0, d0 <-- . In this case, costs for allocnos calculated by IRA based on the cost model supplied by the back-end is like: a0(r667,l0) costs: GENERAL_REGS:0,0 FP_LO_REGS:3960,3960 FP_REGS:3960,3960 ALL_REGS:3960,3960 MEM:3960,3960 Thus, changing the cost of floating point register class is not going to help. If I increase further, register allocated will just spill these live ranges to memory and will ignore floating point register in this case. Is there any other back-end in gcc that does anything to improve cases like this, that I can refer to? Thanks in advance, Kugan
Re: Using particular register class (like floating point registers) as spill register class
> On May 16, 2014, at 3:23 AM, Kugan wrote: > > I would like to know if there is anyway we can use registers from > particular register class just as spill registers (in places where > register allocator would normally spill to stack and nothing more), when > it can be useful. > > In AArch64, in some cases, compiling with -mgeneral-regs-only produces > better performance compared not using it. The difference here is that > when -mgeneral-regs-only is not used, floating point register are also > used in register allocation. Then IRA/LRA has to move them to core > registers before performing operations as shown below. Can you show the code with fp register disabled? Does it use the stack to spill? Normally this is due to register to register class costs compared to register to memory move cost. Also I think it depends on the processor rather the target. For thunder, using the fp registers might actually be better than using the stack depending if the stack was in L1. Thanks, Andrew > > . >fmovs1, w8 <-- >movw21, 49622 >movkw21, 0xca62, lsl 16 >addw21, w16, w21 >addw21, w21, w2 >eorw10, w0, w10 >addw10, w21, w10 >rorw8, w7, 27 >addw7, w10, w8 >rorw7, w7, 27 >fmovw0, s1 <-- >addw7, w0, w7 >addw13, w13, w7 >fmovw0, s4 <-- >addw0, w0, w20 >fmovs4, w0 <-- >rorw18, w18, 2 >fmovw0, s2 <-- >addw0, w0, w18 >fmovs2, w0 <-- >addw12, w12, w27 >addw14, w14, w15 >movw15, w24 >fmovx0, d3 <-- >subsx0, x0, #1 >fmovd3, x0 <-- >bne.L2 >fmovx0, d0 <-- > > . > > In this case, costs for allocnos calculated by IRA based on the cost > model supplied by the back-end is like: > a0(r667,l0) costs: GENERAL_REGS:0,0 FP_LO_REGS:3960,3960 > FP_REGS:3960,3960 ALL_REGS:3960,3960 MEM:3960,3960 > > Thus, changing the cost of floating point register class is not going to > help. If I increase further, register allocated will just spill these > live ranges to memory and will ignore floating point register in this case. > > Is there any other back-end in gcc that does anything to improve cases > like this, that I can refer to? > > Thanks in advance, > Kugan
Re: Using particular register class (like floating point registers) as spill register class
On 16/05/14 20:40, pins...@gmail.com wrote: > > >> On May 16, 2014, at 3:23 AM, Kugan wrote: >> >> I would like to know if there is anyway we can use registers from >> particular register class just as spill registers (in places where >> register allocator would normally spill to stack and nothing more), when >> it can be useful. >> >> In AArch64, in some cases, compiling with -mgeneral-regs-only produces >> better performance compared not using it. The difference here is that >> when -mgeneral-regs-only is not used, floating point register are also >> used in register allocation. Then IRA/LRA has to move them to core >> registers before performing operations as shown below. > > Can you show the code with fp register disabled? Does it use the stack to > spill? Normally this is due to register to register class costs compared to > register to memory move cost. Also I think it depends on the processor > rather the target. For thunder, using the fp registers might actually be > better than using the stack depending if the stack was in L1. Not all the LDR/STR combination match to fmov. In the testcase I have, aarch64-none-linux-gnu-gcc sha_dgst.c -O2 -S -mgeneral-regs-only grep -c "ldr" sha_dgst.s 50 grep -c "str" sha_dgst.s 42 grep -c "fmov" sha_dgst.s 0 aarch64-none-linux-gnu-gcc sha_dgst.c -O2 -S grep -c "ldr" sha_dgst.s 42 grep -c "str" sha_dgst.s 31 grep -c "fmov" sha_dgst.s 105 I am not saying that we shouldn’t use floating point register here. But from the above, it seems like register allocator is using it as more like core register (even though the cost mode has higher cost) and then moving the values to core registers before operations. if that is the case, my question is, how do we just make this as spill register class so that we will replace ldr/str with equal number of fmov when it is possible. Thanks, Kugan
Offload Library
Dear steering committee, To support the offloading features for Intel's Xeon Phi cards we need to add a foreign library (liboffload) into the gcc repository. README with build instructions is attached. I am also copy-pasting the header comment from one of the liboffload files. The header shown below will be in all the source files in liboffload. Sources can be downloaded from [1]. Additionally to that sources we going to add few headers (released under GPL v2.1 license) and couple of new sources (license in the bottom of the message). Does this look OK? [1] - https://www.openmprtl.org/sites/default/files/liboffload_oss.tgz -- Thanks, K /* Copyright (c) 2014 Intel Corporation. All Rights Reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of Intel Corporation nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ README for Intel(R) Offload Runtime Library === How to Build Documentation == The main documentation is in Doxygen* format, and this distribution should come with pre-built PDF documentation in doc/Reference.pdf. However, an HTML version can be built by executing: % doxygen doc/doxygen/config in this directory. That will produce HTML documentation in the doc/doxygen/generated directory, which can be accessed by pointing a web browser at the index.html file there. If you don't have Doxygen installed, you can download it from www.doxygen.org. How to Build the Intel(R) Offload Runtime Library = The Makefile at the top-level will attempt to detect what it needs to build the Intel(R) Offload Runtime Library. To see the default settings, type: make info You can change the Makefile's behavior with the following options: root_dir: The path to the top-level directory containing the top-level Makefile. By default, this will take on the value of the current working directory. build_dir:The path to the build directory. By default, this will take on value [root_dir]/build. mpss_dir: The path to the Intel(R) Manycore Platform Software Stack install directory. By default, this will take on the value of operating system's root directory. compiler_host:Which compiler to use for the build of the host part. Defaults to "gcc"*. Also supports "icc" and "clang"*. You should provide the full path to the compiler or it should be in the user's path. compiler_host:Which compiler to use for the build of the target part. Defaults to "gcc"*. Also supports "icc" and "clang"*. You should provide the full path to the compiler or it should be in the user's path. options_host: Additional options for the host compiler. options_target: Additional options for the target compiler. To use any of the options above, simple add =. For example, if you want to build with icc instead of gcc, type: make compiler_host=icc compiler_target=icc Supported RTL Build Configurations == Supported Architectures: Intel(R) 64, and Intel(R) Many Integrated Core Architecture - | icc/icl |gcc |clang| --|---|
Re: Using particular register class (like floating point registers) as spill register class
On 05/16/2014 12:05 PM, Kugan wrote: > > > On 16/05/14 20:40, pins...@gmail.com wrote: >> >> >>> On May 16, 2014, at 3:23 AM, Kugan >>> wrote: >>> >>> I would like to know if there is anyway we can use registers from >>> particular register class just as spill registers (in places where >>> register allocator would normally spill to stack and nothing more), when >>> it can be useful. >>> >>> In AArch64, in some cases, compiling with -mgeneral-regs-only produces >>> better performance compared not using it. The difference here is that >>> when -mgeneral-regs-only is not used, floating point register are also >>> used in register allocation. Then IRA/LRA has to move them to core >>> registers before performing operations as shown below. >> >> Can you show the code with fp register disabled? Does it use the stack to >> spill? Normally this is due to register to register class costs compared to >> register to memory move cost. Also I think it depends on the processor >> rather the target. For thunder, using the fp registers might actually be >> better than using the stack depending if the stack was in L1. > Not all the LDR/STR combination match to fmov. In the testcase I have, > > aarch64-none-linux-gnu-gcc sha_dgst.c -O2 -S -mgeneral-regs-only > grep -c "ldr" sha_dgst.s > 50 > grep -c "str" sha_dgst.s > 42 > grep -c "fmov" sha_dgst.s > 0 > > aarch64-none-linux-gnu-gcc sha_dgst.c -O2 -S > grep -c "ldr" sha_dgst.s > 42 > grep -c "str" sha_dgst.s > 31 > grep -c "fmov" sha_dgst.s > 105 > > I am not saying that we shouldn’t use floating point register here. But > from the above, it seems like register allocator is using it as more > like core register (even though the cost mode has higher cost) and then > moving the values to core registers before operations. if that is the > case, my question is, how do we just make this as spill register class > so that we will replace ldr/str with equal number of fmov when it is > possible. I'm also seeing stuff like this: => 0x7fb72a0928 : add x21, x4, x21, lsl #3 => 0x7fb72a092c : fmovw2, s8 => 0x7fb72a0930 : str w2, [x21,#88] I guess GCC doesn't know how to store an SImode value in an FP register into memory? This is 4.8.1. Andrew.
soft-fp functions support without using libgcc
Hi all, I am trying to provide soft-fp support to a an 18-bit soft-core processor architecture at my university. But the problem is that libgcc has not been cross-compiled for my target architecture and some functions are missing so i cannot build libgcc.I believe soft-fp is compiled in libgcc so i am usable to invoke soft-fp functions from libgcc. It is possible for me to provide soft-fp support without using libgcc. How should i proceed in defining the functions? Any idea? And does any archoitecture provide floating point support withoput using libgcc? Regards Sheheryar
Re: Offload Library
On Fri, May 16, 2014 at 4:47 AM, Kirill Yukhin wrote: > > To support the offloading features for Intel's Xeon Phi cards > we need to add a foreign library (liboffload) into the gcc repository. > README with build instructions is attached. Can you explain why this library should be part of GCC, and how GCC would use it? I'm sure it's obvious to you but it's not obvious to me. Ian
Re: soft-fp functions support without using libgcc
On Fri, May 16, 2014 at 6:34 AM, Sheheryar Zahoor Qazi wrote: > > I am trying to provide soft-fp support to a an 18-bit soft-core > processor architecture at my university. But the problem is that > libgcc has not been cross-compiled for my target architecture and some > functions are missing so i cannot build libgcc.I believe soft-fp is > compiled in libgcc so i am usable to invoke soft-fp functions from > libgcc. > It is possible for me to provide soft-fp support without using libgcc. > How should i proceed in defining the functions? Any idea? And does any > archoitecture provide floating point support withoput using libgcc? I'm sorry, I don't understand the premise of your question. It is not necessary to build libgcc before building libgcc. That would not make sense. If you have a working compiler that is missing some functions provided by libgcc, that should be sufficient to build libgcc. Ian
RE: Using particular register class (like floating point registers) as spill register class
> On 05/16/2014 12:05 PM, Kugan wrote: > > > > > > On 16/05/14 20:40, pins...@gmail.com wrote: > >> > >> > >>> On May 16, 2014, at 3:23 AM, Kugan > wrote: > >>> > >>> I would like to know if there is anyway we can use registers from > >>> particular register class just as spill registers (in places where > >>> register allocator would normally spill to stack and nothing more), > when > >>> it can be useful. > >>> > >>> In AArch64, in some cases, compiling with -mgeneral-regs-only > produces > >>> better performance compared not using it. The difference here is > that > >>> when -mgeneral-regs-only is not used, floating point register are > also > >>> used in register allocation. Then IRA/LRA has to move them to core > >>> registers before performing operations as shown below. > >> > >> Can you show the code with fp register disabled? Does it use the > stack to spill? Normally this is due to register to register class > costs compared to register to memory move cost. Also I think it > depends on the processor rather the target. For thunder, using the fp > registers might actually be better than using the stack depending if > the stack was in L1. > > Not all the LDR/STR combination match to fmov. In the testcase I > have, > > > > aarch64-none-linux-gnu-gcc sha_dgst.c -O2 -S -mgeneral-regs-only > > grep -c "ldr" sha_dgst.s > > 50 > > grep -c "str" sha_dgst.s > > 42 > > grep -c "fmov" sha_dgst.s > > 0 > > > > aarch64-none-linux-gnu-gcc sha_dgst.c -O2 -S > > grep -c "ldr" sha_dgst.s > > 42 > > grep -c "str" sha_dgst.s > > 31 > > grep -c "fmov" sha_dgst.s > > 105 > > > > I am not saying that we shouldn't use floating point register here. > But > > from the above, it seems like register allocator is using it as more > > like core register (even though the cost mode has higher cost) and > then > > moving the values to core registers before operations. if that is the > > case, my question is, how do we just make this as spill register > class > > so that we will replace ldr/str with equal number of fmov when it is > > possible. > > I'm also seeing stuff like this: > > => 0x7fb72a0928 Thread*)+2500>: > add x21, x4, x21, lsl #3 > => 0x7fb72a092c Thread*)+2504>: > fmov w2, s8 > => 0x7fb72a0930 Thread*)+2508>: > str w2, [x21,#88] > > I guess GCC doesn't know how to store an SImode value in an FP register > into > memory? This is 4.8.1. > Please can you try that on trunk and report back. Thanks, Ian
RE: soft-fp functions support without using libgcc
> On Fri, May 16, 2014 at 6:34 AM, Sheheryar Zahoor Qazi > wrote: > > > > I am trying to provide soft-fp support to a an 18-bit soft-core > > processor architecture at my university. But the problem is that > > libgcc has not been cross-compiled for my target architecture and > some > > functions are missing so i cannot build libgcc.I believe soft-fp is > > compiled in libgcc so i am usable to invoke soft-fp functions from > > libgcc. > > It is possible for me to provide soft-fp support without using > libgcc. > > How should i proceed in defining the functions? Any idea? And does > any > > archoitecture provide floating point support withoput using libgcc? > > I'm sorry, I don't understand the premise of your question. It is not > necessary to build libgcc before building libgcc. That would not make > sense. If you have a working compiler that is missing some functions > provided by libgcc, that should be sufficient to build libgcc. If you replace "cross-compiled" with "ported", I think it makes senses. Can one provide soft-fp support without porting libgcc for their architecture? Cheers, Ian
Re: soft-fp functions support without using libgcc
On May 16, 2014, at 12:25 PM, Ian Bolton wrote: >> On Fri, May 16, 2014 at 6:34 AM, Sheheryar Zahoor Qazi >> wrote: >>> >>> I am trying to provide soft-fp support to a an 18-bit soft-core >>> processor architecture at my university. But the problem is that >>> libgcc has not been cross-compiled for my target architecture and >> some >>> functions are missing so i cannot build libgcc.I believe soft-fp is >>> compiled in libgcc so i am usable to invoke soft-fp functions from >>> libgcc. >>> It is possible for me to provide soft-fp support without using >> libgcc. >>> How should i proceed in defining the functions? Any idea? And does >> any >>> archoitecture provide floating point support withoput using libgcc? >> >> I'm sorry, I don't understand the premise of your question. It is not >> necessary to build libgcc before building libgcc. That would not make >> sense. If you have a working compiler that is missing some functions >> provided by libgcc, that should be sufficient to build libgcc. > > If you replace "cross-compiled" with "ported", I think it makes senses. > Can one provide soft-fp support without porting libgcc for their > architecture? By definition, in soft-fp you have to implement the FP operations in software. That’s not quite the same as porting libgcc to the target architecture. It should translate to porting libgcc (the FP emulation part) to the floating point format being used. In other words, if you want soft-fp for IEEE float, the job should be very simple because that has already been done. If you want soft-fp for CDC 6000 float, you have to do a full implementation of that. paul
Re: we are starting the wide int merge
On Sat, 10 May 2014, Gerald Pfeifer wrote: > Since (at least) 16:40 UTC that day my i386-unknown-freebsd10.0 builds > fail as follows: > > Comparing stages 2 and 3 > warning: gcc/cc1obj-checksum.o differs > warning: gcc/cc1-checksum.o differs > warning: gcc/cc1plus-checksum.o differs > Bootstrap comparison failure! > gcc/fold-const.o differs > gcc/simplify-rtx.o differs > gcc/tree-ssa-ccp.o differs > > (FreeBSD/i386 really builds for i486, but retains the original name; > I'm traveling with limited access, but would not be surprised for this > to also show up for i386-*-linux-gnu or i486-*-linux-gnu.) Is anybody able to reproduce this, for example on a GNU/Linux system? This tester of mine hasn't been able to bootstrap for nearly a week, and timing-wise it would be really a coincidence were this not due to wide-int. Gerald
Re: RFC: Doc update for attribute
On 05/12/2014 11:13 PM, David Wohlferd wrote: > After updating gcc's docs about inline asm, I'm trying to improve > some of the related sections. One that I feel has problems with > clarity is __attribute__ naked. > > I have attached my proposed update. Comments/corrections are > welcome. > > In a related question: > > To better understand how this attribute is used, I looked at the > Linux kernel. While the existing docs say "only ... asm statements > that do not have operands" can safely be used, Linux routinely uses > asm WITH operands. That's a bug. Period. You must not use naked with an asm that has operands. Any kind of operand might inadvertently cause the compiler to generate code and that would violate the requirements of the attribute and potentially generate an ICE. The correct solution, and we've talked about this in the past, is to have the compiler generate a hard error if you use an asm statement with operands and naked. I don't know what anyone ever got around to it. > Some examples: > > memory clobber operand: > http://lxr.free-electrons.com/source/arch/arm/kernel/kprobes.c#L377 Is this needed? > Input arguments: > http://lxr.free-electrons.com/source/arch/arm/mm/copypage-feroceon.c#L17 This is a bug and it's wrong. The naked asm can just assume the use of first and second arguments as per AAPCS. > Since I don't know why "asm with operands" was excluded from the > existing docs, I'm not sure whether what Linux does here is supported > or not (maybe with some limitations?). If someone can clarify, I'll > add it to this text. The "asm with operands" was excluded because to allow them in the implementation would require gcc to potentially copy the argumnents to temporary storage depending on their type. There is no prologue so the compiler has no stack in which to place the arguments, therefore the result is an impossible to satisfy constraint which usually results in an ICE or compiler error. Even if you said it was OK to use the incoming arguments with "r" type operands the optimization level of the compile might inadvertently try to force those values to the stack and that again is an impossible to satisfy condition with a naked function. > Even without discussing "asm with operands," I believe this text is > an improvement. > Thanks in advance, > dw > > extend.texi.patch > > > Index: extend.texi > === > --- extend.texi (revision 210349) > +++ extend.texi (working copy) > @@ -3330,16 +3330,15 @@ > > @item naked > @cindex function without a prologue/epilogue code > -Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX and SPU > -ports to indicate that the specified function does not need prologue/epilogue > -sequences generated by the compiler. > -It is up to the programmer to provide these sequences. The > -only statements that can be safely included in naked functions are > -@code{asm} statements that do not have operands. All other statements, > -including declarations of local variables, @code{if} statements, and so > -forth, should be avoided. Naked functions should be used to implement the > -body of an assembly function, while allowing the compiler to construct > -the requisite function declaration for the assembler. > +This attribute is available on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX > +and SPU ports. It allows the compiler to construct the requisite function > +declaration, while allowing the body of the function to be assembly code. > +The specified function will not have prologue/epilogue sequences generated > +by the compiler; it is up to the programmer to provide these sequences if > +the function requires them. The expectation is that only Basic @code{asm} > +statements will be included in naked functions (@pxref{Basic Asm}). While it > +is discouraged, it is possible to write your own prologue/epilogue code > +using asm and use ``C'' code in the middle. I wouldn't remove the last sentence since IMO it's not the intent of the feature to ever support that and the compiler doesn't guarantee it and may result in wrong code given that `naked' is a fragile low-level feature. > > @item near > @cindex functions that do not handle memory bank switching on 68HC11/68HC12 Cheers, Carlos.
Re: Offload Library
Hi! On Fri, 16 May 2014 15:47:58 +0400, Kirill Yukhin wrote: > To support the offloading features for Intel's Xeon Phi cards > we need to add a foreign library (liboffload) into the gcc repository. As written in the README, this library currently is specific to Intel hardware (understandably, of course), and I assume also in the future is to remain that way (?) -- should it thus get a more specific name in GCC, than the generic liboffload? > Additionally to that sources we going to add few headers [...] > and couple of new sources For interfacing with GCC, presumably. You haven't stated it explicitly, but do I assume right that this work will be going onto the gomp-4_0-branch, integrated with the offloading work developed there, as a plugin for libgomp? Grüße, Thomas pgpYu694qEtjc.pgp Description: PGP signature
Re: Using particular register class (like floating point registers) as spill register class
On 2014-05-16, 6:23 AM, Kugan wrote: I would like to know if there is anyway we can use registers from particular register class just as spill registers (in places where register allocator would normally spill to stack and nothing more), when it can be useful. In AArch64, in some cases, compiling with -mgeneral-regs-only produces better performance compared not using it. The difference here is that when -mgeneral-regs-only is not used, floating point register are also used in register allocation. Then IRA/LRA has to move them to core registers before performing operations as shown below. . fmovs1, w8 <-- mov w21, 49622 movkw21, 0xca62, lsl 16 add w21, w16, w21 add w21, w21, w2 eor w10, w0, w10 add w10, w21, w10 ror w8, w7, 27 add w7, w10, w8 ror w7, w7, 27 fmovw0, s1 <-- add w7, w0, w7 add w13, w13, w7 fmovw0, s4 <-- add w0, w0, w20 fmovs4, w0 <-- ror w18, w18, 2 fmovw0, s2 <-- add w0, w0, w18 fmovs2, w0 <-- add w12, w12, w27 add w14, w14, w15 mov w15, w24 fmovx0, d3 <-- subsx0, x0, #1 fmovd3, x0 <-- bne .L2 fmovx0, d0 <-- . In this case, costs for allocnos calculated by IRA based on the cost model supplied by the back-end is like: a0(r667,l0) costs: GENERAL_REGS:0,0 FP_LO_REGS:3960,3960 FP_REGS:3960,3960 ALL_REGS:3960,3960 MEM:3960,3960 Thus, changing the cost of floating point register class is not going to help. If I increase further, register allocated will just spill these live ranges to memory and will ignore floating point register in this case. Is there any other back-end in gcc that does anything to improve cases like this, that I can refer to? There is a target hook spill_class. You can see how can it be defined in i386.c. Instead of memory, the pseudos are stored in vector regs. It is profitable for modern Intel processors which have a fast path between general regs and SSE regs. It results in generation of smaller code too as movd is shorter than ld/st insns. So you can increase costs of fp regs and define the hook, then fp regs will be used for pseudos not getting general regs and fmov will be generated instead of ld/st. I am working on improving spilling general regs into vector ones. So I hope there will be more cases when GCC does it.
Re: [GSoC] a wiki page on the gcc wiki
Thank you! -- Cheers, Roman Gareev
Re: [GSoC] How to get started with the isl code generation
Hi Tobias, > what is the difference you see between ISL AST generation and code > generation? By “ISL AST generation”, I mean ISL AST generation without generation of GIMPLE code. > What are your plans to separate the ISL AST generation? Do you foresee any > difficulties/problems? According to the plan mentioned in my proposal, I wanted to get more familiar with ISL AST generation by generation of ISL AST in a file, which is separate from the GCC sources. This could help to avoid problems with interpretation and verification of results, because I worked with my own input to ISL AST generator instead of the input built by Graphite from GIMPLE code. This could also help to avoid rebuilding of GCC in the process of debugging. However, I've come to the conclusion that the way you advised me is better, because it helps to save the time of integration of ISL AST generation in GCC. I've set up a second code generation in parallel that generates ISL AST and can be enabled by a command line flag. Could you please advise me how to verify the results of this generation? Below is the code of this generation. -- Cheers, Roman Gareev code Description: Binary data
[GSoC] Status - 20140516
Hi Community, The community bonding period is coming to a close, students can officially start coding on Monday, May 19th. In the past month the student should have applied for FSF copyright assignment and, hopefully, executed on a couple of test tasks to get a feel for GCC development. The GSoC Reunion (an unconference to discuss results of concluded GSoC) will be held in San Jose, CA, on 23-26 October 2014. GCC gets to send 2 delegates on Google's dime (airfare, hotel, food), but more can attend via a registration lottery and covering their own expenses. If you are interested in going to GSoC Reunion, please let me know. Thank you, -- Maxim Kuvyrkov www.linaro.org
Re: [GSoC] Status - 20140516
On 17/05/2014 00:27, Maxim Kuvyrkov wrote: Hi Community, The community bonding period is coming to a close, students can officially start coding on Monday, May 19th. In the past month the student should have applied for FSF copyright assignment and, hopefully, executed on a couple of test tasks to get a feel for GCC development. In the last mail, I got the impression that you will keep track of the copyright assignments. Is this the case? Cheers, Tobias
Re: [GSoC] Status - 20140516
On May 17, 2014, at 10:41 AM, Tobias Grosser wrote: > > > On 17/05/2014 00:27, Maxim Kuvyrkov wrote: >> Hi Community, >> >> The community bonding period is coming to a close, students can officially >> start coding on Monday, May 19th. >> >> In the past month the student should have applied for FSF copyright >> assignment and, hopefully, executed on a couple of test tasks to get a feel >> for GCC development. > > In the last mail, I got the impression that you will keep track of the > copyright assignments. Is this the case? Yes. Two of the students already have copyright assignment in place, and I have asked the other 3 about their assignment progress today. Thank you, -- Maxim Kuvyrkov www.linaro.org
Re: [GSoC] Status - 20140516
On 17/05/2014 00:43, Maxim Kuvyrkov wrote: On May 17, 2014, at 10:41 AM, Tobias Grosser wrote: On 17/05/2014 00:27, Maxim Kuvyrkov wrote: Hi Community, The community bonding period is coming to a close, students can officially start coding on Monday, May 19th. In the past month the student should have applied for FSF copyright assignment and, hopefully, executed on a couple of test tasks to get a feel for GCC development. In the last mail, I got the impression that you will keep track of the copyright assignments. Is this the case? Yes. Two of the students already have copyright assignment in place, and I have asked the other 3 about their assignment progress today. Great. Could you let me know when Roman's copyright assignment is in? Thanks, Tobias
Re: RFC: Doc update for attribute
Thank you for your response. This is exactly what I wanted to know. One last question: +While it +is discouraged, it is possible to write your own prologue/epilogue code +using asm and use ``C'' code in the middle. I wouldn't remove the last sentence since IMO it's not the intent of the feature to ever support that and the compiler doesn't guarantee it and may result in wrong code given that `naked' is a fragile low-level feature. I'm assuming you meant "would remove." I wasn't comfortable including that sentence, but I was following the existing docs. Since they said you could "only" use basic asm, following that with a warning to "avoid" locals/if/etc was really confusing without this text. Also, as ugly as this is, apparently some people really do this (comment 6): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43404#c6 We don't have to doc every crazy thing people try to do with gcc. But since it's out there, maybe we should this time? If only to discourage it. I'm *slightly* more in favor of keeping it. But if you still feel it should go, it's gone. Thanks, dw