Re: Remove sel-sched?
On 14.01.2016 20:26, Jeff Law wrote: On 01/14/2016 12:07 AM, Andrey Belevantsev wrote: Hello Bernd, On 13.01.2016 21:25, Bernd Schmidt wrote: There are a few open PRs involving sel-sched, and I'd like to start a discussion about removing it. Having two separate schedulers isn't a very good idea in the first place IMO, and since ia64 is dead, sel-sched gets practically no testing despite being the more complex one. Thoughts? Out of the PRs we have, two are actually fixed but not marked as such. This year's PRs are from the recent Zdenek's Debian rebuild with GCC 6 and I will be on them now. For the other two last year PRs, it is my fault not to fix them in a timely manner. Frankly, 2015 was very tough for me and my colleagues (we worked 6 days a week most part of the year), but since January it is fine again and we'll catch up now. Sorry for that. You're also right that sel-sched now gets limited testing. We're made it work initially for ia64, x64, ppc and cell, and then added ARM, too. Outside of ia64 world, I had private reports of sel-sched being used for cell with success, and we used it in our own contractor work for optimizing some ARM apps with GCC. In short, we're willing to maintain sel-sched and I apologize for the slow PR fixing speed last year, it should be no problem anymore as of now. If there are any big plans of reorganizing schedulers and sel-sched stands in the way of those, let's discuss it and we'll be willing to help in any way. FWIW, I've downgraded the sel-sched stuff to P4 for this release given how that scheduler is typically used (ia64, which is a dead platform). I think the bigger question Bernd is asking here is whether or not it makes sense to have multiple schedulers. In an ideal world we'd bake them off select the best and deprecate/remove the others. I didn't follow sel-sched development closely, so forgive me if the questions are simplistic/naive, but what are the main benefits of sel-sched and is it at a point (performance-wise) where it could conceivably replace the aging haifa scheduler infrastructure? The main sel-sched points at the time of its inclusion were as follows: bookkeeping code support (move an insn between any blocks in the scheduling region), insn transformations support (renaming, unification, substitution through register copies), scheduling at several points at once, pipelining support. Together it paid off with something like 7-8% on SPEC at the time on ia64, but not so on the other archs, where we didn't spend much time for tuning and usually got both ups and downs compared to haifa. On ia64 the speedup was mostly because of pipelining with speculation, as far as I recall, for others including ARM renaming and substitution were useful. Since then, Vlad and Bernd put more improvements to the haifa scheduler, including sched pressure, predication and backtracking, so both schedulers now have features not present in the other one and the initial feature advantage somewhat wore off. Also, the big problem of sel-sched is speed -- it is slow because the dependency lists are not maintained through the scheduler, most of transformation stuff is implemented through an insn movement up the region and looking what should happen to allow insn A move up through insn B. I've done most of I could imagine to speed it up but haven't managed making sel-sched by default on -O2. So to sum this up, I don't think sel-sched can replace haifa in its current state. These days to speed up the scheduler I'd add something like path based dependency tracking with bit vectors like it is done in Intel's wavefront scheduling, though it is patented (Vlad may correct me here). Or, we need to devise other means of keeping dependencies up to date. We've tried that but never got it working good enough. The thing I would not like to lose is sel-sched pipelining. It can work on any loops, not only countable ones like modulo scheduling, and this can make a difference for some apps even outside of ia64. But if one basic scheduler is desired, maybe the better use of our resources will be to improve modulo scheduling instead to not lose pipelining capabilies in gcc. It is completely unmaintained now, my colleague Roman Zhuykov had a couple of improvements ~4yrs ago but most of them never got into trunk due to lack of review. He can step up as a modulo-sched maintainer if needed, the code is alive (see PR69252). Sorry for a long mail :) Andrey Jeff
Re: Remove sel-sched?
On Fri, Jan 15, 2016 at 10:48 AM, Andrey Belevantsev wrote: > On 14.01.2016 20:26, Jeff Law wrote: >> >> On 01/14/2016 12:07 AM, Andrey Belevantsev wrote: >>> >>> Hello Bernd, >>> >>> On 13.01.2016 21:25, Bernd Schmidt wrote: There are a few open PRs involving sel-sched, and I'd like to start a discussion about removing it. Having two separate schedulers isn't a very good idea in the first place IMO, and since ia64 is dead, sel-sched gets practically no testing despite being the more complex one. Thoughts? >>> >>> >>> Out of the PRs we have, two are actually fixed but not marked as such. >>> This year's PRs are from the recent Zdenek's Debian rebuild with GCC 6 >>> and I will be on them now. For the other two last year PRs, it is my >>> fault not to fix them in a timely manner. Frankly, 2015 was very tough >>> for me and my colleagues (we worked 6 days a week most part of the >>> year), but since January it is fine again and we'll catch up now. Sorry >>> for that. >>> >>> You're also right that sel-sched now gets limited testing. We're made >>> it work initially for ia64, x64, ppc and cell, and then added ARM, too. >>> Outside of ia64 world, I had private reports of sel-sched being used for >>> cell with success, and we used it in our own contractor work for >>> optimizing some ARM apps with GCC. >>> >>> In short, we're willing to maintain sel-sched and I apologize for the >>> slow PR fixing speed last year, it should be no problem anymore as of >>> now. If there are any big plans of reorganizing schedulers and >>> sel-sched stands in the way of those, let's discuss it and we'll be >>> willing to help in any way. >> >> FWIW, I've downgraded the sel-sched stuff to P4 for this release given how >> that scheduler is typically used (ia64, which is a dead platform). >> >> I think the bigger question Bernd is asking here is whether or not it >> makes >> sense to have multiple schedulers. In an ideal world we'd bake them off >> select the best and deprecate/remove the others. >> >> I didn't follow sel-sched development closely, so forgive me if the >> questions are simplistic/naive, but what are the main benefits of >> sel-sched >> and is it at a point (performance-wise) where it could conceivably replace >> the aging haifa scheduler infrastructure? > > > The main sel-sched points at the time of its inclusion were as follows: > bookkeeping code support (move an insn between any blocks in the scheduling > region), insn transformations support (renaming, unification, substitution > through register copies), scheduling at several points at once, pipelining > support. Together it paid off with something like 7-8% on SPEC at the time > on ia64, but not so on the other archs, where we didn't spend much time for > tuning and usually got both ups and downs compared to haifa. On ia64 the > speedup was mostly because of pipelining with speculation, as far as I > recall, for others including ARM renaming and substitution were useful. > > Since then, Vlad and Bernd put more improvements to the haifa scheduler, > including sched pressure, predication and backtracking, so both schedulers > now have features not present in the other one and the initial feature > advantage somewhat wore off. > > Also, the big problem of sel-sched is speed -- it is slow because the > dependency lists are not maintained through the scheduler, most of > transformation stuff is implemented through an insn movement up the region > and looking what should happen to allow insn A move up through insn B. I've > done most of I could imagine to speed it up but haven't managed making > sel-sched by default on -O2. > > So to sum this up, I don't think sel-sched can replace haifa in its current > state. These days to speed up the scheduler I'd add something like path > based dependency tracking with bit vectors like it is done in Intel's > wavefront scheduling, though it is patented (Vlad may correct me here). Or, > we need to devise other means of keeping dependencies up to date. We've > tried that but never got it working good enough. > > The thing I would not like to lose is sel-sched pipelining. It can work on > any loops, not only countable ones like modulo scheduling, and this can make > a difference for some apps even outside of ia64. But if one basic scheduler > is desired, maybe the better use of our resources will be to improve modulo > scheduling instead to not lose pipelining capabilies in gcc. It is > completely unmaintained now, my colleague Roman Zhuykov had a couple of > improvements ~4yrs ago but most of them never got into trunk due to lack of > review. He can step up as a modulo-sched maintainer if needed, the code is > alive (see PR69252). Btw, I'd like people to start thinking if the scheduling algorithms working on loops (and sometimes requiring unrolling of loops) can be implemented in a way to apply that unrolling on the GIMPLE level (not the scheduling itself of course). Thus have
Re: Remove sel-sched?
On 01/15/2016 11:13 AM, Richard Biener wrote: Btw, I'd like people to start thinking if the scheduling algorithms working on loops (and sometimes requiring unrolling of loops) can be implemented in a way to apply that unrolling on the GIMPLE level (not the scheduling itself of course). Thus have an analysis phase (for the unrolling compute) that can be shared across ILs. Scheduling of loads / stores might still happen on GIMPLE if we have a good enough idea of register pressure - I remember we jumped throuhg hoops in the past to get better dependence info on RTL for this (ddr export to RTL, never merged). Basically unrolling on RTL should go away. I don't think that's such a great idea. For C6X modulo scheduling I actually wanted unrolling during the final sched phase. I don't expect to be working on anything like this in the foreseeable future, but IMO at the gimple stage we simply don't know enough to make reasonable decisions for certain targets. Bernd
Re: Reorder/combine insns on superscalar arch
On 01/15/2016 07:05 AM, Jeff Law wrote: Well, you have to write the pattern and a splitter. But these days there's define_insn_and_split to help with that. Reusing Bernd's work may ultimately be easier though. Maybe, but maybe also not in the way you think. I've always wanted the ability to combine 2->2 insns, for cases like this. The parallel would be split into two separate insns if it doesn't match. This would allow more complicated forms to be used if they are equally cheap, and can lead to elimination of instructions if it triggers more combinations. I had a patch for this... 20 years ago. It was in pgcc for a while but apparently it had some bookkeeping problems. It would be nice to solve this at some point. Bernd
Re: Remove sel-sched?
On Fri, 2016-01-15 at 11:13 +0100, Richard Biener wrote: > Btw, I'd like people to start thinking if the scheduling algorithms > working on loops (and sometimes requiring unrolling of loops) can be > implemented in a way to apply that unrolling on the GIMPLE level > (not the scheduling itself of course). We've been underwhelmed with the RTL unroller on POWER and I think we concur that a GIMPLE level unroller would be interesting. Peter
Source Code for Profile Guided Code Positioning
Hello GCC Developers, Are 'Profile Guided Code Positioning' algorithms mentioned in http://dl.acm.org/citation.cfm?id=93550 this paper ( Pettis and Hanse ) implemented in gcc ? If yes kindly help me with code file location in gcc source tree. Sincerely, Vivek Pandya
Re: [RFC][AArch64] function prologue analyzer in linux kernel
On Wed, Jan 13, 2016 at 05:13:29PM +0900, AKASHI Takahiro wrote: > On 01/13/2016 03:04 AM, Will Deacon wrote: > >On Tue, Jan 12, 2016 at 03:11:29PM +0900, AKASHI Takahiro wrote: > >>On 01/09/2016 12:53 AM, Will Deacon wrote: > >>>I still don't understand why you can't use fstack-usage. Can you please > >>>tell me why that doesn't work? Am I missing something? > >> > >>I don't know how gcc calculates the usage here, but I guess it would be more > >>robust than my analyzer. > >> > >>The issues, that come up to my mind, are > >>- -fstack-usage generates a separate output file, *.su and so we have to > >> manage them to be incorporated in the kernel binary. > > > >That doesn't sound too bad to me. How much data are we talking about here? > > > >> This implies that (common) kernel makefiles might have to be a bit > >> changed. > >>- more worse, what if kernel module case? We will have no way to let the > >>kernel > >> know the stack usage without adding an extra step at loading. > > > >We can easily add a new __init section to modules, which is a table > >representing the module functions and their stack sizes (like we do > >for other things like alternatives). We'd just then need to slurp this > >information at load time and throw it into an rbtree or something. > > I found another issue. > Let's think about 'dynamic storage' case like: > $ cat stack.c > extern long fooX(long a); > extern long fooY(long b[]); > > long foo1(long a) { > > if (a > 1) { > long b[a]; <== Here > > return a + fooY(b); > } else { > return a + fooX(a); > } > } > > Then, -fstack-usage returns 48 for foo1(): > $ aarch64-linux-gnu-gcc -fno-omit-frame-pointer -fstack-usage main.c stack.c \ > -pg -O2 -fasynchronous-unwind-tables > $ cat stack.su > stack.c:4:6:foo1 48 dynamic > > This indicates that foo1() may use 48 bytes or more depending on a condition. > But in my case (ftrace-based stack tracer), I always expect 32 whether we're > backtracing from fooY() or from fooX() because my stack tracer estimates: >(stack pointer) = (callee's frame pointer) + (callee's stack usage) > (in my previous e-mail, '-(minus)' was wrong.) > > where (callee's stack usage) is, as I described in my previous e-mail, a size > of > memory which is initially allocated on a stack in a function prologue, and > should not > contain a size of dynamically allocate area. According to who? What's the use in reporting only the prologue size? Will
Re: Source Code for Profile Guided Code Positioning
On 01/15/2016 06:53 PM, vivek pandya wrote: Hello GCC Developers, Are 'Profile Guided Code Positioning' algorithms mentioned in http://dl.acm.org/citation.cfm?id=93550 this paper ( Pettis and Hanse ) implemented in gcc ? If yes kindly help me with code file location in gcc source tree. There's some stuff on Google branch: https://gcc.gnu.org/ml/gcc-patches/2011-09/msg01440.html -Y
Re: Source Code for Profile Guided Code Positioning
Thanks Yury for https://gcc.gnu.org/ml/gcc-patches/2011-09/msg01440.html this link. It implements procedure reordering as linker plugin. I have some questions : 1 ) Can you point me to some documentation for "how to write plugin for linkers " I am I have not seen doc for structs with 'ld_' prefix (i.e defined in plugin-api.h ) 2 ) There is one more algorithm for Basic Block ordering with execution frequency count in PH paper . Is there any implementation available for it ? Sincerely, Vivek
Re: Source Code for Profile Guided Code Positioning
On 01/15/2016 08:44 PM, vivek pandya wrote: Thanks Yury for https://gcc.gnu.org/ml/gcc-patches/2011-09/msg01440.html this link. It implements procedure reordering as linker plugin. I have some questions : 1 ) Can you point me to some documentation for "how to write plugin for linkers " I am I have not seen doc for structs with 'ld_' prefix (i.e defined in plugin-api.h ) 2 ) There is one more algorithm for Basic Block ordering with execution frequency count in PH paper . Is there any implementation available for it ? Quite frankly - I don't know (I've only learned about Google implementation recently). I've added Sriram to maybe comment. -Y
Re: distro test rebuild using GCC 6
On Thu, Jan 14, 2016 at 05:15:29PM +, James Greenhalgh wrote: > On Wed, Jan 13, 2016 at 02:17:16PM +0100, Matthias Klose wrote: > > Here are some first results from a distro test rebuild using GCC 6. > > A snapshot of the current Ubuntu development series was taken on > > 20151218 for all architectures (amd64, arm64, armhf, i386/i686, > > powerpc, ppc64el, s390x), and rebuilt unmodified using the current > > GCC 5 branch, and using GCC 6 20160101 (then updated to 20160109). > > > > I haven't yet looked into the build failures except for the ICEs. > > If somebody wants to help please let me know so that work isn't > > duplicated. > > I've flicked through the 42 unique arm64 failures and given them a > first-step triage. The majority of issues look to be source based and > more than a few can be blamed on the move to C++14. Two of these I don't > understand (qutecom_2.2.1+dfsg1-5.2ubuntu2, sitplus_1.0.3-4.1build1). The > VLC one is strange, and I don't know how it has ever managed to build! Hi, Today I've looked at the 30 unique armhf failures and given them the same treatment as arm64. Some of the testsuite failures I can't find details or reports of online so may be indicative of wrong-code bugs. Other than that there are a larger number of ICEs for the ARM port, but these are logged and actively being worked on. A number of packages have not made a clean transition to C++14 and some build failures I just don't understand how they could ever have worked. As before, I generated the list of failures with: grep xenial-armhf 00list | cut -f 1 -d " " | sed "s@^@http://people.canonical.com/~doko/tmp/gcc6-regr/@"; | xargs wget Triage notes follow... Hope this helps. Please do let me know if there is something more useful I could do instead. Thanks, James Eliminating First order junk with "grep -v GPG" to remove error lines that looked like: W: GPG error: http://ppa.launchpad.net xenial InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 1E9377A2BA9EF27F Has no effect. Looking for testsuite failures and environment issues, I see: guile-1.8_1.8.8+1-10ubuntu1 [ Testsuite failures... ] ERROR: Value out of range -9223372036854775808 to 9223372036854775807: -9223372036854775808 FAIL: test-num2integral fail: scm_is_signed_integer ((- (expt 2 63)), -9223372036854775808, 9223372036854775807) == 1 FAIL: test-conversion == 2 of 16 tests failed haskell-cipher-aes_0.2.11-1 [ Testsuite failures, look bad... ] AE1: [Failed] expected: AuthTag "(Pd/\168\139\171!*g\SUB\151Hi\165l" but got: AuthTag "u:\252\SYN\141\165\&0\186S\191\GS\151\SYN\198E{" AD1: [Failed] expected: AuthTag "(Pd/\168\139\171!*g\SUB\151Hi\165l" but got: AuthTag "r\205\252\225Sz0e\EM\203\GS\227\228lE\209" { ... etc ... } Properties Test CasesTotal Passed 34 146 180 Failed 02626 Total 34 172 206 Test suite test-cipher-aes: FAIL Test suite logged to: dist-ghc/test/cipher-aes-0.2.11-test-cipher-aes.log 0 of 1 test suites (0 of 1 test cases) passed. haskell-cryptohash_0.11.6-4build1 [ Testsuite failures, look bad... ] SHA1 0 one-pass: FAIL expected: "da39a3ee5e6b4b0d3255bfef95601890afd80709" but got: "5a600060e4e200e24faa00aa71e400e456c400c4" 0 inc 1: FAIL expected: "da39a3ee5e6b4b0d3255bfef95601890afd80709" but got: "5a600060e4e200e24faa00aa71e400e456c400c4" 0 inc 2: FAIL expected: "da39a3ee5e6b4b0d3255bfef95601890afd80709" but got: "5a600060e4e200e24faa00aa71e400e456c400c4" { ... etc ... } 176 out of 592 tests failed Test suite test-kat: FAIL 0 of 1 test suites (0 of 1 test cases) passed. kdelibs4support_5.15.0-0ubuntu2 [ Symbol mismatch ] dpkg-gensymbols: warning: debian/libkf5kdelibs4support5/DEBIAN/symbols doesn't match completely debian/libkf5kdelibs4support5.symbols +#MISSING: 5.15.0-0ubuntu2# (arch=armhf ppc64el)_ZN3KDE4statERK7QStringP4stat@Base 5.13.0 kjsembed_5.15.0-0ubuntu2 [ Symbol mismatch ] dpkg-gensymbols: warning: debian/libkf5jsembed5/DEBIAN/symbols doesn't match completely debian/libkf5jsembed5.symbols +#MISSING: 5.15.0-0ubuntu2# _ZN3KJS7JSValueD0Ev@Base 4.96.0 +#MISSING: 5.15.0-0ubuntu2# _ZN3KJS7JSValueD1Ev@Base 4.96.0 +#MISSING: 5.15.0-0ubuntu2# _ZN3KJS7JSValueD2Ev@Base 4.96.0 linux-flo_3.4.0-5.19 linux-hammerhead_3.4.0-1.9 linux-mako_3.4.0-7.41 [ Mystery build failure (looks like: https://lkml.org/lkml/2012/11/18/159 ) ] Can't use 'defined(@array)' (Maybe you should just omit the defined()?) at /«PKGBUILDDIR»/kernel/timeconst.pl line 373. make[3]: *** [kernel/timeconst.h] Error 255 make[3]: *** Waiting fo
Re: distro test rebuild using GCC 6
On 14 January 2016 at 17:15, James Greenhalgh wrote: > Hope this helps, if it is useless, let me know what would be a better way > for me to help out with the AArch64 stuff. It's useful for me to get pointers to some of the C++-related failures, thanks. > --- > -Wnarrowing > > This is a mismatch between signed values and arm64's unsigned char. This > hits 21 packages. A typical failure looks like: > > s3m.cpp:29:90: error: narrowing conversion of '-1' from 'int' to 'char' > inside { } [-Wnarrowing] > > {-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,0,1,2,3,4,5,6,7,8,-1,-1,-1,-1,-1,-1,-1}; > >^ > The package builds broken by this warning promoted to an error are: > > adplug_2.2.1+dfsg3-0.2ubuntu2 > alsa-tools_1.0.29-1ubuntu1 > biosig4c++_1.3.0-2.1build1 > calligra_1 > edtsurf_0.2009-3 > garmindev_0.3.4+svn3432-3 > kelbt_0.15-1 > ksirk_4 > lrzip_0.621-1 > mimetic_0.9.8-2.1 > nootka_1.2.0-0ubuntu3 > opencollada_0.1.0~20140703.ddf8f47+dfsg1-2 > psi_0.15-2build1 > psi-plus_0.16.330-1build2 > qlandkartegt_1.8.1+ds-3build1 > qpxtool_0.7.2-4 > ruby-unf-ext_0.0.7.1-1build1 > sidplay-libs_2.1.1-14ubuntu2 > tennix_1.1-3 > ufraw_0.20-3build1 > xmlrpc-c_1.33.14-0.2ubuntu3 I was expecting to see several of these, narrowing conversions are usually the biggest cause of problems when moving from c++03 to c++11 or later. > [ Invalid code (as far as I know my c++11...) using both auto and a type > name won't work. Presumably worked before as a redundant storage > duration specifier but is broken now we are C++14 by default. ] > > > plugins/org.python.pydev/pysrc/pydevd_attach_to_process/linux/attach_linux.c: > In function 'int _PYDEVD_ExecWithGILSetSysStrace(bool, bool)': > > plugins/org.python.pydev/pysrc/pydevd_attach_to_process/linux/attach_linux.c:237:25: > error: expected initializer before 'pydevdTracingMod' > auto PyObjectHolder pydevdTracingMod = PyObjectHolder(isDebug, > pyImportModFunc("pydevd_tracing")); > ^~~~ Wow, people are actually using auto as a storage specifier! > > qutecom_2.2.1+dfsg1-5.2ubuntu2 > > [ No idea why this should start failing now. Possibly not C++11 clean? ] > > /usr/include/boost/bind/bind.hpp:392:35: error: no match for call to > '(boost::_mfi::mf2 EnumSipLoginState::SipLoginState>) (UserProfile*&, const SipAccount&, const > EnumSipLoginState::SipLoginState&)' > unwrapper::unwrap(f, 0)(a[base_type::a1_], a[base_type::a2_], > a[base_type::a3_]); > > ~~^ There's a const-mistmatch there, it's trying to pass const SipAccount& to a function taking (non-const) SipAccount&. It's possible the relevant Boost component is stricter in C++11 mode, but it's a bit surprising. > sitplus_1.0.3-4.1build1 > > [ Uses boost::shared_ptr and the std:: namespace, gets in to trouble now > that C++11 provides a std::shared_ptr ] > > > In file included from > /«PKGBUILDDIR»/src/mod_collage/Kernel/kernel.cpp:36:0: > /«PKGBUILDDIR»/src/mod_collage/Kernel/delay.h:27:4: error: reference to > 'shared_ptr' is ambiguous > shared_ptr m_img; I'm surprised there aren't more like this. > stardict_3.0.1-9.2ubuntu3 > > [ No idea why this would start to fail now. Possibly not C++11 clean? ] > > stardict.cpp: In member function 'bool AppCore::SimpleLookupToFloat(const > char*, bool)': > stardict.cpp:656:100: error: no matching function for call to > 'AppCore::BuildResultData(std::vector&, char*&, > CurrentIndex*&, bool, size_t&, gchar***&, gchar&, bool&, int)' > BuildResultData(scan_dictmask, SearchWord, iIndex, false, iLib, > pppWord, WordData, bFound, 2); > > ^ > > vlc_2.2.1-5 > > [ Includes an atomic header from QtCore that seems to get in a > lot of trouble. Many errors, more than are worth reproducing > here. ] >
Re: distro test rebuild using GCC 6
On 15 January 2016 at 17:52, James Greenhalgh wrote: > vbrfix_0.24-7 > > [ Source error, not sure how this ever worked! ] > > In file included from vbrfix.h:22:0, > from vbrfix.cpp:17: > wputil.h: In static member function 'static bool wfile::copyFile(const > char*, const char*, bool)': > wputil.h:202:28: error: cannot convert 'std::basic_ostream' to > 'bool' in return > return out << in.rdbuf(); In C++03 iostreams have an implicit conversion to void*, which can then be implicitly converted to bool. In C++11 they have an explicit conversion to bool, so the code needs to be changed to do the conversion explicitly: return bool(out << in.rdbuf()); or return static_cast(out << in.rdbuf()); if (out << in.rdbuf()) return true; return false; or return (out << in.rdbuf()) ? true : false; or some similar variation.
Re: Reorder/combine insns on superscalar arch
On 01/15/2016 06:06 AM, Bernd Schmidt wrote: > On 01/15/2016 07:05 AM, Jeff Law wrote: > >> Well, you have to write the pattern and a splitter. But these days >> there's define_insn_and_split to help with that. Reusing Bernd's work >> may ultimately be easier though. > > Maybe, but maybe also not in the way you think. I've always wanted the ability > to combine 2->2 insns, for cases like this. The parallel would be split into > two separate insns if it doesn't match. This would allow more complicated > forms > to be used if they are equally cheap, and can lead to elimination of > instructions if it triggers more combinations. Maybe it's the same thing in the end, but I was thinking more in terms of 2->1 or 3->1, but without deleting some of the input instructions if their outputs are still used. r~