Re: GCC support for PowerPC VLE
On 21/03/13 18:03, David Edelsohn wrote: > On Thu, Mar 21, 2013 at 4:58 AM, Will wrote: >> James Lemke codesourcery.com> writes: >> >>> I have completed the binutils submission for VLE. >>> I am working on the gcc submission. The test results are looking good >>> now. Patches will be posted very soon. >> >> Do you have any update on the work on VLE-support? >> >> Thanks for any feedback you can provide! > > The problem is the changes are very invasive and significantly > complicate the common parts of the rs6000 port. A lot of people may > use applications built for PPC VLE on embedded systems using Freescale > parts, but there are few developers who need to build and use the > compiler. Most, if not all, of those developers will receive a > pre-built SDK. > > I am happy to work with Jim to merge some of the VLE patches into GCC > to reduce divergence and simplify maintenance, but merging in all > support is too disruptive to the general powerpc port. I have not > heard a lot of advantage or need for most developers to be able to > build GCC for PPC VLE from the FSF sources, other than a few, vocal > users. Merging in some of the less disruptive pieces and obtaining > patches or an SDK from Freescale does not seem overly burdensome for > the few people who need that support. > > Thanks, David > I am not currently a user of gcc on the PPC, much less a developer - so my opinion here may be well off mark. And I don't expect to be helping with the work here, or paying for it (except perhaps as a customer of CodeSourcery or similar suppliers, and of Freescale). But perhaps my comments will be of interest anyway. I use Freescale PPC devices with VLE, and I use Freescale's CodeWarrior to do so. At the start of the project, I looked at CodeSourcery's PPC-EABI tools (I have used CodeSourcery's gcc tools for other targets) - but without VLE support, I had to pick something else. VLE can make a very big difference to performance and code space, and is particularly relevant for the smaller PPC microcontrollers (smaller code size means better use of caches, internal buses, etc., for significantly faster code). I am sure the "big" users of these sorts of devices have no problem. They can deal with Freescale directly for gcc with VLE support - or more likely, they will pay vast sums to Green Hills for their tools. But for us "small" users, we need to be able to get the tools through more publicly available sources. The ideal for small developers is vendors such as CodeSourcery - you pay a very reasonable fee, and get a pre-compiled, pre-packaged integrated toolchain. Is it worth the time and money for companies like Freescale and CodeSourcery (Mentor) to pay for VLE integration and better PPC/MPC support in gcc? I have no idea - I don't know the numbers at all. But I do know that one of the reasons that ARM Cortex devices are so popular in embedded systems is the easy availability of good quality tools, mostly based on gcc. If Freescale wants more small developers to buy their MPC parts (and that seems to be an aim), they should be doing what they can here for gcc support. As for out-of-tree support, I have seen it done with other targets. The msp430 port of gcc has had a lot of trouble because it was developed out-of-tree - this has lead to a great deal of extra work for the maintainers (or "maintainer", since a high proportion of the work was done by a single person in recent years). Texas Instruments saw the benefit of having good gcc support for these devices, and have hired Red Hat to get the msp430 support moved into the mainline - and this appears to be more effort than originally envisioned. Long-term, there are apparently lots of benefits of keeping everything in-tree, especially for maintenance. Finally, and I ask this as someone with no idea about the gcc internals here, is it perhaps worth splitting the Power and the PowerPC architectures? As far as I can see, despite their common ancestry there are significant differences in many of the details, and they are diverging with each new generation of device. They also seem to be aimed at very different uses - Power is used on big systems, while PowerPC is very much for embedded systems. I can't imagine many people have need of a single gcc build that supports both families. Splitting the target would mean changes to the PPC support would not affect the RS6000 support. (Of course, it may cause more problems - as I say, I don't know about the internals here. I'm just throwing around some ideas - if they are worthless, feel free to throw them away!.) mvh., David
Re: GCC support for PowerPC VLE
On Thu, Mar 21, 2013 at 6:03 PM, David Edelsohn wrote: > On Thu, Mar 21, 2013 at 4:58 AM, Will wrote: >> James Lemke codesourcery.com> writes: >> >>> I have completed the binutils submission for VLE. >>> I am working on the gcc submission. The test results are looking good >>> now. Patches will be posted very soon. >> >> Do you have any update on the work on VLE-support? >> >> Thanks for any feedback you can provide! > > The problem is the changes are very invasive and significantly > complicate the common parts of the rs6000 port. A lot of people may > use applications built for PPC VLE on embedded systems using Freescale > parts, but there are few developers who need to build and use the > compiler. Most, if not all, of those developers will receive a > pre-built SDK. > > I am happy to work with Jim to merge some of the VLE patches into GCC > to reduce divergence and simplify maintenance, but merging in all > support is too disruptive to the general powerpc port. I have not > heard a lot of advantage or need for most developers to be able to > build GCC for PPC VLE from the FSF sources, other than a few, vocal > users. Merging in some of the less disruptive pieces and obtaining > patches or an SDK from Freescale does not seem overly burdensome for > the few people who need that support. Maybe it's also possible to refactor some of the powerpc port to make adding VLE support less invasive or disruptive (disclaimer: I have not looked at the patches). Richard. > Thanks, David
Re: GCC support for PowerPC VLE
On Fri, Mar 22, 2013 at 6:28 AM, David Brown wrote: > I use Freescale PPC devices with VLE, and I use Freescale's CodeWarrior > to do so. At the start of the project, I looked at CodeSourcery's > PPC-EABI tools (I have used CodeSourcery's gcc tools for other targets) > - but without VLE support, I had to pick something else. VLE can make a > very big difference to performance and code space, and is particularly > relevant for the smaller PPC microcontrollers (smaller code size means > better use of caches, internal buses, etc., for significantly faster code). I fully appreciate that VLE can provide a lot of benefit and advantages. I want to encourage use of and adoption of PowerPC, including VLE. > Is it worth the time and money for companies like Freescale and > CodeSourcery (Mentor) to pay for VLE integration and better PPC/MPC > support in gcc? I have no idea - I don't know the numbers at all. But > I do know that one of the reasons that ARM Cortex devices are so popular > in embedded systems is the easy availability of good quality tools, > mostly based on gcc. If Freescale wants more small developers to buy > their MPC parts (and that seems to be an aim), they should be doing what > they can here for gcc support. The problem is parts of the VLE patches are very invasive and disrupt the common parts of the PowerPC port, including making it more difficult to maintain the common and non-VLE parts of the PowerPC port. I was very accommodating of the patches to support Freescale e500, SPE, and FP in GPRs, despite their impact on the PowerPC port. That did not lead to greater GCC community engagement or contributions from Freescale or the e500 community -- neither maintaining the support for Freescale processors nor improving any common GCC features to benefit and exploit PowerPC in general or e500/SPE specifically. The organizations who contributed the patches have not maintained them and the burden has fallen to me and the other non-e500 PowerPC developers. The only communication we receive is that we broke e500. It is my understanding that Freescale has made no commitment to maintaining the VLE support patches in the GCC repository, neither themselves nor funding another organization. ARM has a very successful ecosystem, but ARM Ltd has an interest in all of the ARM ISAs (ARM, Thumb, Thumb2, AArch64) and has contributed to maintaining GCC support -- either directly or funding organizations to perform the work. > Finally, and I ask this as someone with no idea about the gcc internals > here, is it perhaps worth splitting the Power and the PowerPC > architectures? As far as I can see, despite their common ancestry there > are significant differences in many of the details, and they are > diverging with each new generation of device. They also seem to be > aimed at very different uses - Power is used on big systems, while > PowerPC is very much for embedded systems. I can't imagine many people > have need of a single gcc build that supports both families. Splitting > the target would mean changes to the PPC support would not affect the > RS6000 support. (Of course, it may cause more problems - as I say, I > don't know about the internals here. I'm just throwing around some > ideas - if they are worthless, feel free to throw them away!.) This option has been discussed. Much of the port is common and duplicating the port creates its own maintenance issue where someone needs to merge the changes into the corresponding port. Whether a single port or split into two ports, someone needs to take responsibility for the maintenance. A port without a maintainer will not be accepted, and if the port is not maintained, it will be deprecated and removed. If Freescale and/or developers for its processors want VLE support merged into GCC, they need to make a larger, visible, long-term commitment to contribute to GCC and support for their ISA differences. Thanks, David
Re: GCC support for PowerPC VLE
On 22/03/13 16:18, David Edelsohn wrote: > On Fri, Mar 22, 2013 at 6:28 AM, David Brown wrote: > >> I use Freescale PPC devices with VLE, and I use Freescale's CodeWarrior >> to do so. At the start of the project, I looked at CodeSourcery's >> PPC-EABI tools (I have used CodeSourcery's gcc tools for other targets) >> - but without VLE support, I had to pick something else. VLE can make a >> very big difference to performance and code space, and is particularly >> relevant for the smaller PPC microcontrollers (smaller code size means >> better use of caches, internal buses, etc., for significantly faster code). > > I fully appreciate that VLE can provide a lot of benefit and > advantages. I want to encourage use of and adoption of PowerPC, > including VLE. > >> Is it worth the time and money for companies like Freescale and >> CodeSourcery (Mentor) to pay for VLE integration and better PPC/MPC >> support in gcc? I have no idea - I don't know the numbers at all. But >> I do know that one of the reasons that ARM Cortex devices are so popular >> in embedded systems is the easy availability of good quality tools, >> mostly based on gcc. If Freescale wants more small developers to buy >> their MPC parts (and that seems to be an aim), they should be doing what >> they can here for gcc support. > > The problem is parts of the VLE patches are very invasive and disrupt > the common parts of the PowerPC port, including making it more > difficult to maintain the common and non-VLE parts of the PowerPC > port. > > I was very accommodating of the patches to support Freescale e500, > SPE, and FP in GPRs, despite their impact on the PowerPC port. That > did not lead to greater GCC community engagement or contributions from > Freescale or the e500 community -- neither maintaining the support for > Freescale processors nor improving any common GCC features to benefit > and exploit PowerPC in general or e500/SPE specifically. The > organizations who contributed the patches have not maintained them and > the burden has fallen to me and the other non-e500 PowerPC developers. > The only communication we receive is that we broke e500. > > It is my understanding that Freescale has made no commitment to > maintaining the VLE support patches in the GCC repository, neither > themselves nor funding another organization. > > ARM has a very successful ecosystem, but ARM Ltd has an interest in > all of the ARM ISAs (ARM, Thumb, Thumb2, AArch64) and has contributed > to maintaining GCC support -- either directly or funding organizations > to perform the work. > >> Finally, and I ask this as someone with no idea about the gcc internals >> here, is it perhaps worth splitting the Power and the PowerPC >> architectures? As far as I can see, despite their common ancestry there >> are significant differences in many of the details, and they are >> diverging with each new generation of device. They also seem to be >> aimed at very different uses - Power is used on big systems, while >> PowerPC is very much for embedded systems. I can't imagine many people >> have need of a single gcc build that supports both families. Splitting >> the target would mean changes to the PPC support would not affect the >> RS6000 support. (Of course, it may cause more problems - as I say, I >> don't know about the internals here. I'm just throwing around some >> ideas - if they are worthless, feel free to throw them away!.) > > This option has been discussed. Much of the port is common and > duplicating the port creates its own maintenance issue where someone > needs to merge the changes into the corresponding port. Whether a > single port or split into two ports, someone needs to take > responsibility for the maintenance. A port without a maintainer will > not be accepted, and if the port is not maintained, it will be > deprecated and removed. > > If Freescale and/or developers for its processors want VLE support > merged into GCC, they need to make a larger, visible, long-term > commitment to contribute to GCC and support for their ISA differences. > > Thanks, David > Thanks for that explanation. I can well understand the problems you have here - it is not uncommon in the open source world for people to contribute code then leave it for others to maintain, and also not uncommon for the commercial beneficiaries (Freescale in this case) to fail to pull their weight. My understanding is that Freescale contributes/has contributed to gcc, and they certainly make use of gcc as they encourage the use of embedded Linux and Android. Could it be a matter of contacting the right people or pulling the right strings to persuade them to help with the code, the time, or the money? Perhaps CodeSourcery, with the weight of Mentor Graphics behind them, can be a useful influence here - and they have plenty to gain from the best possible gcc support of these devices since they sell toolchains for them? But I suppose the CodeSourcery people have a
Switch optimization idea
I am looking at implementing a GCC optimization pass based on constant propagation into a switch statement. Given: if (expr) s = 1; codeX; (code that allows definition of s to propogate through) switch (s) { 1: code1; break; 2: code2; break; default: code3; break; } I would like to replace this with: if (expr) s = 1; codeX; go directly to label 1 of switch statement codeX switch (s) { 1: code1; break; 2: code2; break; default: code3; break; } Obviously this optimization would only make sense if 'codeX' were reasonably small chunk of code. This idea comes from the blog http://blogs.arm.com/embedded/895-coremark-and-compiler-performance/ and is obviously geared towards CoreMark but it seems like it could be a generally useful optimization for any program that uses a switch statement to implement a finite state machine. Looking at the GCC SSA form I can easily find a switch statement that uses a variable as its controlling expression and I can find out if any constant values are propagated into that use of the variable via PHI nodes. The problem is that the next step is to find the path(s) that lead from the block where 's' was set to a constant to the switch statement. Once I have that path I believe I can use copy_bbs to copy the basic blocks in that path and replace the edge leaving the block where 's' is set with an edge to this new set of blocks and change the final 'switch' edges in the new blocks to a simple goto edge leading to one of the switch labels. Righ now the only way I can see in GCC to find this path is to do a recursive search through the edges with some cutoff based on the length of the path. And it seems I also need to examine each block on this path to make sure it doesn't change the value of 's' since there may be paths that change 's' as well as paths that don't. Does anyone have any ideas on a more efficient way to find the paths I am looking for or have any other comments on this optimization? Steve Ellcey sell...@imgtec.com
Re: Switch optimization idea
How about finding the single-entry/single exit region that dominates the switch and post-dominates the s assignment. You can then examine if s is modified in the region. David On Fri, Mar 22, 2013 at 10:17 AM, Steve Ellcey wrote: > I am looking at implementing a GCC optimization pass based on constant > propagation into a switch statement. > > Given: > > if (expr) > s = 1; > codeX; (code that allows definition of s to propogate through) > switch (s) { > 1: code1; break; > 2: code2; break; > default: code3; break; > } > > > I would like to replace this with: > > > if (expr) > s = 1; > codeX; > go directly to label 1 of switch statement > codeX > switch (s) { > 1: code1; break; > 2: code2; break; > default: code3; break; > } > > Obviously this optimization would only make sense if 'codeX' were > reasonably small chunk of code. > > This idea comes from the blog > > http://blogs.arm.com/embedded/895-coremark-and-compiler-performance/ > > and is obviously geared towards CoreMark but it seems like it could > be a generally useful optimization for any program that uses a switch > statement to implement a finite state machine. > > Looking at the GCC SSA form I can easily find a switch statement that > uses a variable as its controlling expression and I can find out if any > constant values are propagated into that use of the variable via PHI > nodes. The problem is that the next step is to find the path(s) > that lead from the block where 's' was set to a constant to the switch > statement. Once I have that path I believe I can use copy_bbs to copy the > basic blocks in that path and replace the edge leaving the block where 's' > is set with an edge to this new set of blocks and change the final 'switch' > edges in the new blocks to a simple goto edge leading to one of the switch > labels. > > Righ now the only way I can see in GCC to find this path is to do a > recursive search through the edges with some cutoff based on the length > of the path. And it seems I also need to examine each block on this path > to make sure it doesn't change the value of 's' since there may be paths > that change 's' as well as paths that don't. > > Does anyone have any ideas on a more efficient way to find the paths I am > looking for or have any other comments on this optimization? > > Steve Ellcey > sell...@imgtec.com >
Re: Switch optimization idea
On Fri, Mar 22, 2013 at 10:17 AM, Steve Ellcey wrote: > I am looking at implementing a GCC optimization pass based on constant > propagation into a switch statement. > > Given: > > if (expr) > s = 1; > codeX; (code that allows definition of s to propogate through) > switch (s) { > 1: code1; break; > 2: code2; break; > default: code3; break; > } > > > I would like to replace this with: > > > if (expr) > s = 1; > codeX; > go directly to label 1 of switch statement > codeX > switch (s) { > 1: code1; break; > 2: code2; break; > default: code3; break; > } > > Obviously this optimization would only make sense if 'codeX' were > reasonably small chunk of code. > > This idea comes from the blog > > http://blogs.arm.com/embedded/895-coremark-and-compiler-performance/ > > and is obviously geared towards CoreMark but it seems like it could > be a generally useful optimization for any program that uses a switch > statement to implement a finite state machine. > > Looking at the GCC SSA form I can easily find a switch statement that > uses a variable as its controlling expression and I can find out if any > constant values are propagated into that use of the variable via PHI > nodes. The problem is that the next step is to find the path(s) > that lead from the block where 's' was set to a constant to the switch > statement. Once I have that path I believe I can use copy_bbs to copy the > basic blocks in that path and replace the edge leaving the block where 's' > is set with an edge to this new set of blocks and change the final 'switch' > edges in the new blocks to a simple goto edge leading to one of the switch > labels. > > Righ now the only way I can see in GCC to find this path is to do a > recursive search through the edges with some cutoff based on the length > of the path. And it seems I also need to examine each block on this path > to make sure it doesn't change the value of 's' since there may be paths > that change 's' as well as paths that don't. > > Does anyone have any ideas on a more efficient way to find the paths I am > looking for or have any other comments on this optimization? This sounds exactly what jump threading does. I don't know why it does not happen in this case though. Thanks, Andrew > > Steve Ellcey > sell...@imgtec.com >
Re: Switch optimization idea
On 03/22/2013 11:17 AM, Steve Ellcey wrote: I am looking at implementing a GCC optimization pass based on constant propagation into a switch statement. Given: if (expr) s = 1; codeX; (code that allows definition of s to propogate through) switch (s) { 1: code1; break; 2: code2; break; default: code3; break; } I would like to replace this with: if (expr) s = 1; codeX; go directly to label 1 of switch statement codeX switch (s) { 1: code1; break; 2: code2; break; default: code3; break; } Obviously this optimization would only make sense if 'codeX' were reasonably small chunk of code. As others have pointed out, this is jump threading. The reason you're not seeing jump threading in the CoreMark test is the switch is inside a loop and threading a backedge is severely constrained. There's a BZ for this issue with a bit more state for this issue. jeff
GCC 4.8.0 Released
Exactly one year after the last major GCC release has been announced, celebrating the 26th anniversary of the GNU Compiler Collection, the GCC development team announces a new major GCC release, 4.8.0. GCC 4.8.0 is a major release containing substantial new functionality not available in GCC 4.7.x or previous GCC releases. GCC 4.8 features a new Local Register Allocator which replaces the 26 years old reload pass and improves generated code quality on ia32 and x86-64 targets. The C++ frontend and standard library have been enhanced with various improvements for C++11 support not limited to C++11 attribute syntax, thread_local or inheriting constructors support. AddressSanitizer and ThreadSanitizer instrumentation have been added to detect heap, stack and global buffer overflows, uses after free and data races. Many scalability bottle-necks have been removed from GCC optimization passes, thus it is now possible to compile extremely large functions with smaller memory consumption in less time. Extending the widest support for hardware architectures in the industry, GCC 4.8 has gained support for the upcoming 64-bit ARM instruction set architecture, AArch64. GCC 4.8 also features support for Hardware Transactional Memory on the upcoming Intel Haswell CPU architecture. The S/390 target now supports the zEC12 architecture. The ARM 32-bit target has gained support for AArch32 ARM v8 ISA additions. See http://gcc.gnu.org/gcc-4.8/changes.html for more information about changes in GCC 4.8. This release is available from the FTP servers listed here: http://www.gnu.org/order/ftp.html The release is in gcc/gcc-4.8.0/ subdirectory. If you encounter difficulties using GCC 4.8, please do not contact me directly. Instead, please visit http://gcc.gnu.org for information about getting help. Driving a leading free software project such as GNU Compiler Collection would not be possible without support from its many contributors. Not to only mention its developers but especially its regular testers and users which contribute to its high quality. The list of individuals is too large to thank individually!
GCC 4.8.1 Status Report (2013-03-22)
Status == GCC 4.8.0 has been released, the branch is now open again under the usual release branch rules (regression fixes and documentation fixes only). The next release, 4.8.1, should be released in about two or three months from now, unless something very urgent forces us to release earlier. Quality Data Priority # Change from Last Report --- --- P10 0 P2 65 0 P39 - 9 --- --- Total74 - 9 Previous Report === http://gcc.gnu.org/ml/gcc/2013-03/msg00124.html The next report will be sent by Joseph.
Re: Switch optimization idea
On Fri, 2013-03-22 at 13:00 -0600, Jeff Law wrote: > As others have pointed out, this is jump threading. > > The reason you're not seeing jump threading in the CoreMark test is the > switch is inside a loop and threading a backedge is severely constrained. > > There's a BZ for this issue with a bit more state for this issue. > > jeff It took me a minute to map BZ to bugzilla, but I assume you mean there is a bugzilla report with information on this issue, I'll look around to see what I can find. The example I gave was a simple case and ideally the duplicated code (codeX) could have other edges leading into or out of it without that preventing the jump threading. But it sounds like it does prevent it with the current implementation of jump threading. Maybe it is just the presence of a back edge that is preventing it. I will look more into the GCC implementation and what constraints there are and why they prevent its use in CoreMark. Steve Ellcey sell...@imgtec.com
Re: Switch optimization idea
On Fri, 2013-03-22 at 13:00 -0600, Jeff Law wrote: > There's a BZ for this issue with a bit more state for this issue. > > jeff Found it. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54742 Steve Ellcey sell...@imgtec.com
gcc-4.6-20130322 is now available
Snapshot gcc-4.6-20130322 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.6-20130322/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.6 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_6-branch revision 196984 You'll find: gcc-4.6-20130322.tar.bz2 Complete GCC MD5=cc2a39cd42fddac497bf8471ff2aa0b8 SHA1=f056d93946470ecf606c72b38a7083a274d713ac Diffs from 4.6-20130319 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.6 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
gcc 4.8 and N3276
Does gcc 4.8 include the changes to decltype specified in N3276 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3276.pdf)? If not, can we expect these for 4.8.1? Thanks, Joe Gottman