Re: core changes for mep port
Steven Bosscher wrote: All of this feels (to me anyway) like adding a lot of code to the middle end to support MEP specific arch features. I understand it is in the mission statement that more ports is a goal for GCC, but I wonder if this set of changes is worth the maintenance burden... FWIW, it sounds to me like this feature may also be useful for current iterations of the ARM NEON extension (which we're planning to submit support for quite soon). NEON supports various operations on DImode quantities, but we don't use them for normal code at present because moving values from NEON back to ARM core registers is relatively slow, so we want to avoid doing that as far as possible. So, if there was a way of specifying that a particular value should be kept in a NEON register, that'd be a good thing, I think. Cheers, Julian
Re: core changes for mep port
Steven Bosscher wrote: On 3/28/07, Julian Brown <[EMAIL PROTECTED]> wrote: Steven Bosscher wrote: > All of this feels (to me anyway) like adding a lot of code to the > middle end to support MEP specific arch features. I understand it is > in the mission statement that more ports is a goal for GCC, but I > wonder if this set of changes is worth the maintenance burden... FWIW, it sounds to me like this feature may also be useful for current iterations of the ARM NEON extension (which we're planning to submit support for quite soon). NEON supports various operations on DImode quantities, but we don't use them for normal code at present because moving values from NEON back to ARM core registers is relatively slow, so we want to avoid doing that as far as possible. So, if there was a way of specifying that a particular value should be kept in a NEON register, that'd be a good thing, I think. And if you use this coprocessor hackery, it will be exactly what Ian opposed in his first reply: "As far as I can see you're using new modes to drive register class preferences." Quite possibly. I don't really know enough about how any of this works to say much useful, it just seemed like another potential use for the feature (albeit a rather esoteric one) if it does go in. Cheers, Julian
Re: Different sized data and code pointers
On 2005-03-02, Thomas Gill <[EMAIL PROTECTED]> wrote: > Paul Schlie wrote: > >> With the arguable exception of function pointers (which need not be literal >> address) all pointers are presumed to point to data, not code; therefore >> may be simplest to define pointers as being 16-bits, and call functions >> indirectly through a lookup table constructed at link time from program >> memory, assuming it's readable via some mechanism; as the call penalty >> incurred would likely be insignificant relative to the potential complexity >> of attempting to support 24-bit code pointers in the rare circumstances >> they're typically used, on an otherwise native 16-bit machine. > > Thanks for the response. > > Suppose we don't have enough space to burn on a layer of indirection for > every function pointer. Do I take it that there's really not a clean way > to make GCC treat function pointers as 24 bit while still treating data > pointers as 16 bits? FWIW, a port I did used indirection for all function pointers, albeit for a different reason, and I can report that it seems to work OK in practice with a little linker magic. It wasn't really production-quality code though, I admit. Perhaps the indirection table can safely hold only those functions whose address is taken? (Or maybe that was assumed anyway?) Julian -- Julian Brown CodeSourcery, LLC
Re: GCC 4.0 RC1 Available
On 2005-04-10, Mark Mitchell <[EMAIL PROTECTED]> wrote: > > * The DejaGNU testsuite has been run, and compared with a run of > the testsuite on the previous release of GCC, and no regressions are > observed. > > If you are willing to help, please download the release candidate, build > it on appropriate platforms, and post testresults by using > contrib/test_summary. Please use the release candidate itself, *not* > the CVS 4.0 release branch, as part of the goal is to ensure that the > packaging scripts are working. For arm-none-elf (cross from i686-pc-linux-gnu), with binutils and newlib from CVS: http://gcc.gnu.org/ml/gcc-testresults/2005-04/msg00800.html And, for comparison, 3.4.3 tests: http://gcc.gnu.org/ml/gcc-testresults/2005-04/msg00799.html Quite a few of the 4.0 RC1 tests FAIL, though I'm not sure how many of these are regressions, and how many are just new tests which fail. Julian
Re: GCC 4.0 RC1 Available
On 2005-04-11, Julian Brown <[EMAIL PROTECTED]> wrote: > On 2005-04-10, Mark Mitchell <[EMAIL PROTECTED]> wrote: >> >> * The DejaGNU testsuite has been run, and compared with a run of >> the testsuite on the previous release of GCC, and no regressions are >> observed. >> >> If you are willing to help, please download the release candidate, build >> it on appropriate platforms, and post testresults by using >> contrib/test_summary. Please use the release candidate itself, *not* >> the CVS 4.0 release branch, as part of the goal is to ensure that the >> packaging scripts are working. > > For arm-none-elf (cross from i686-pc-linux-gnu), with binutils and newlib > from CVS: > > http://gcc.gnu.org/ml/gcc-testresults/2005-04/msg00800.html > > And, for comparison, 3.4.3 tests: > > http://gcc.gnu.org/ml/gcc-testresults/2005-04/msg00799.html > > Quite a few of the 4.0 RC1 tests FAIL, though I'm not sure how many of > these are regressions, and how many are just new tests which fail. In more detail, for gcc.sum: Tests that now fail, but worked before: gcc.c-torture/execute/bitfld-1.c execution, -O0 gcc.c-torture/execute/bitfld-1.c execution, -O1 gcc.c-torture/execute/bitfld-1.c execution, -O2 gcc.c-torture/execute/bitfld-1.c execution, -O3 -fomit-frame-pointer gcc.c-torture/execute/bitfld-1.c execution, -O3 -g gcc.c-torture/execute/bitfld-1.c execution, -Os gcc.c-torture/execute/builtin-constant.c execution, -O1 gcc.dg/array-5.c bad vla handling (test for bogus messages, line 40) gcc.dg/bitfld-2.c (test for warnings, line 14) gcc.dg/bitfld-2.c (test for warnings, line 15) gcc.dg/bitfld-2.c (test for warnings, line 20) gcc.dg/bitfld-2.c (test for warnings, line 21) gcc.dg/builtins-18.c (test for excess errors) gcc.dg/builtins-20.c (test for excess errors) gcc.dg/const-elim-1.c scan-assembler-not L\\$?C[^A-Z] gcc.dg/cpp/trad/include.c (test for excess errors) gcc.dg/redecl-1.c (test for errors, line 67) gcc.dg/sequence-pt-1.c sequence point warning (test for warnings, line 59) gcc.dg/uninit-1.c uninitialized variable warning (test for bogus messages, line 16) gcc.dg/uninit-2.c uninitialized variable warning (test for bogus messages, line 28) gcc.dg/uninit-3.c uninitialized variable warning (test for bogus messages, line 11) gcc.dg/uninit-8.c uninitialized variable warning (test for bogus messages, line 14) gcc.dg/Wunreachable-1.c (test for excess errors) For g++.sum: Tests that now fail, but worked before: g++.dg/other/error8.C duplicate error messages (test for bogus messages, line 8) g++.dg/other/error8.C duplicate error messages (test for bogus messages, line 9) g++.dg/rtti/tinfo1.C scan-assembler-not .section[^\n\r]*_ZTIP9CTemplateIhE[^\n\r ]* g++.dg/template/nested3.C (test for errors, line 12) g++.dg/template/nested3.C (test for errors, line 14) g++.dg/template/nested3.C (test for errors, line 25) g++.dg/template/nested3.C (test for errors, line 8) g++.old-deja/g++.jason/cond.C (test for errors, line 20) g++.old-deja/g++.jason/cond.C (test for errors, line 22) g++.old-deja/g++.jason/cond.C (test for errors, line 25) g++.old-deja/g++.jason/cond.C (test for errors, line 27) g++.old-deja/g++.oliva/expr2.C execution test g++.old-deja/g++.oliva/template10.C (test for errors, line 22) g++.old-deja/g++.other/decl5.C (test for warnings, line 55) g++.old-deja/g++.other/decl5.C (test for warnings, line 56) For libstdc++.sum: Tests that now fail, but worked before: 27_io/basic_filebuf/open/char/9507.cc (test for excess errors) That's a total of about 39 regressions, I think. I also got quite a few "Old tests that passed, that have disappeared" results. Is that expected? Julian
Re: GCC 4.0 RC2 Available
On 2005-04-18, Mark Mitchell <[EMAIL PROTECTED]> wrote: > > RC2 is available here: > > ftp://gcc.gnu.org/pub/gcc/prerelease-4.0.0-20050417/ > > As before, I'd very much appreciate it if people would test these bits > on primary and secondary platforms, post test results with the > contrib/test_summary script, and send me a message saying whether or > not there are any regressions, together with a pointer to the results. Results for arm-none-elf, cross-compiled from i686-pc-linux-gnu (Debian) for C and C++ are here: http://gcc.gnu.org/ml/gcc-testresults/2005-04/msg01301.html Relative to RC1, there are several new tests which pass, and: g++.dg/warn/Wdtor1.C (test for excess errors) works whereas it didn't before. Julian
Re: Branch instructions that depend on target distance
On Mon, 24 Feb 2020 15:03:21 +0300 (MSK) Alexander Monakov wrote: > On Mon, 24 Feb 2020, Andreas Schwab wrote: > > > On Feb 24 2020, Petr Tesarik wrote: > > > > > On Mon, 24 Feb 2020 12:29:40 +0100 > > > Andreas Schwab wrote: > > > > > >> On Feb 24 2020, Petr Tesarik wrote: > > >> > > >> > This works great ... until there's some inline asm() > > >> > statement, for which gcc cannot keep track of the length > > >> > attribute, so it is probably taken as zero. > > >> > > >> GCC computes it by counting the number of asm insns. You can use > > >> ADJUST_INSN_LENGTH to adjust this as needed. > > > > > > Hmm, that's interesting, but does it work for inline asm() > > > statements? > > > > Yes, for a suitable definition of work. > > > > > The argument is essentially a free-form string (with some > > > substitution), and the compiler cannot know how many bytes they > > > occupy. > > > > That's why ADJUST_INSN_LENGTH can adjust it. > > I think Petr might be unaware of the fact that GCC counts the > **number of instructions in an inline asm statement** by counting > separators in the asm string. This may overcount when a separator > appears in a string literal for example, but triggering > under-counting is trickier. > > Petr, please see > https://gcc.gnu.org/onlinedocs/gcc/Size-of-an-asm.html for some more > discussion. VC4 instructions vary between 16 & 80 bits in length -- I guess you need to arrange things so that the maximum is used for inline asms (per instruction, counting by separators). That's not 100% ideal since most instructions will be much shorter, but at least it should give working code. Julian
Re: Repository for the conversion machinery
On Fri, 28 Aug 2015 17:50:53 + Joseph Myers wrote: > shinwell = Mark Shinwell > (Jane Street) Mark's current address is mshinw...@janestreet.com. Julian
Re: ivopts vs. garbage collection
On Mon, 11 Jan 2016 13:51:25 -0700 Tom Tromey wrote: > > "Michael" == Michael Matz writes: > > Michael> Well, that's a hack. A solution is to design something that > Michael> works generally for garbage collected languages with such > Michael> requirements instead of arbitrarily limiting transformations > Michael> here and there. It could be something like the notion of > Michael> derived pointers, where the base pointer needs to stay alive > Michael> as long as the derived pointers are. > > This was done once in GCC, for the Modula 3 compiler. > There was a paper about it, but I can't find it any more. > > The basic idea was to emit a description of the stack frame that their > GC could read. They had a moving GC that could use this information > to rewrite the frame when moving objects. This one perhaps? https://www.cs.purdue.edu/homes/hosking/papers/ismm06.pdf Julian
Re: GCC ARM: aligned access
On Mon, 1 Sep 2014 09:14:31 +0800 Peng Fan wrote: > On 09/01/2014 08:09 AM, Matt Thomas wrote: > > > > On Aug 31, 2014, at 11:32 AM, Joel Sherrill > > wrote: > >> I think this is totally expected. You were passed a u8 pointer > >> which is aligned for that type (no restrictions likely). You cast > >> it to a type with stricter alignment requirements. The code is > >> just flawed. Some CPUs handle unaligned accesses but not your ARM. > > > armv7 and armv6 arch except armv6-m support unaligned access. a u8 > pointer is casted to u32 pointer, should gcc take the align problem > into consideration to avoid possible errors? because > -mno-unaligned-access. Using -munaligned-access (or its inverse) isn't enough to make GCC generate code that can perform arbitrary unaligned accesses, because several instructions (e.g. VFP loads/stores or load/store multiple instructions IIRC) must still act on naturally-aligned data even when the hardware flag to enable unaligned accesses is on, and those instructions will still be generated by GCC when they are considered safe, i.e. when not doing explicitly-unaligned accesses in packed structures or similar. It would be *possible* to add an option to the backend to allow arbitrary alignment for any access, I think, but it's not at all clear that it's a good idea, and would certainly negatively affect performance. (If you need unaligned accesses, you can use e.g. memcpy, and that will probably generate good inline code.) Julian
Re: general_operand not validating "(const_int 65535 [0xffff])"
On Wed, 9 Oct 2019 14:40:42 +0100 Jozef Lawrynowicz wrote: > Constants generated for modes with fewer bits than in HOST_WIDE_INT > must be sign extended to full width (e.g., with gen_int_mode). For > constants for modes with more bits than in HOST_WIDE_INT the implied > high order bits of that con- stant are copies of the top bit. Note > however that values are neither inherently signed nor inherently > unsigned; where necessary, signedness is determined by the rtl > operation instead. > > Can anyone offer any further insight? Do I just need to track down > what is generating this const_int and fix that? I think you need to change a GEN_INT(x) into a gen_int_mode(x, HImode), somewhere. HTH, Julian
Re: Incidents in ARM-NEON-Intrinsics
On Wed, 05 Oct 2011 10:37:22 +0900 shiot...@rd.ten.fujitsu.com (塩谷晶彦) wrote: > Hi, Maintainer, > > I found some incidents in > http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html#ARM-NEON-Intrinsics > > Please check the following: > > |6.54.3.8 Comparison (less-than-or-equal-to) > | > | uint32x2_t vcle_u32 (uint32x2_t, uint32x2_t) > | Form of expected instruction(s): vcge.u32 d0, d0, d0 > (snip) > | uint32x4_t vcleq_f32 (float32x4_t, float32x4_t) > | Form of expected instruction(s): vcge.f32 q0, q0, q0 > > in above "vcge"s may be "vcle" or "vcgt". This is deliberate I think: the register/register forms of vcle, vclt, vacle etc. are pseudo-instructions, and assemble to vcge, vcgt, etc. with the operands reversed (a <= b === b >= a). Are you seeing incorrect code being generated? Julian
Re: [ARM] Neon / Ocaml question
On Mon, 11 Jan 2010 09:52:59 + Ramana Radhakrishnan wrote: > cam-bc3-b12:ramrad01 68 > ocamlc -c neon-schedgen.ml > File "neon-schedgen.ml", line 51, characters 0-10: > Unbound module Utils > > It sounds like a configuration issue but given my rather rusty ocaml > skills - I'm not sure where to look. Googling around doesn't show me > anything obvious. I see this both with v. 3.09.3 and v 3.11 (on > karmic). This is apparently due to a missing source file, "utils.ml", but unfortunately I have no idea what has happened to it (I'm not the original author). Luckily it only seems to be used for a single function definition (find_with_result), so we can just re-implement that. Another thing which might bite you is that recent OCaml versions don't like hyphens in filenames: replacing them with underscores works OK though (i.e. neon_schedgen.ml). Compiling neon-schedgen.ml with the attached patch, the parts of cortex-a8-neon.md below the line: ;; The remainder of this file is auto-generated by neon-schedgen. are generated identically. Would you like to try this out and see how you get on with it? Followups set to gcc-patches. Thanks, Julian ChangeLog gcc/ * config/arm/neon-schedgen.ml (Utils): Don't try to open missing module. (find_with_result): New. Index: neon-schedgen.ml === --- neon-schedgen.ml (revision 155808) +++ neon-schedgen.ml (working copy) @@ -48,7 +48,14 @@ and at present we do not emit specific guards.) *) -open Utils +let find_with_result fn lst = + let rec scan = function +[] -> raise Not_found + | l::ls -> + match fn l with +Some result -> result + | _ -> scan ls in + scan lst let n1 = 1 and n2 = 2 and n3 = 3 and n4 = 4 and n5 = 5 and n6 = 6 and n7 = 7 and n8 = 8 and n9 = 9
Re: __builtin_return_address for ARM
On Thu, 26 Feb 2009 15:54:14 + Andrew Haley wrote: > Paul Brook wrote: > >> Well, but wouldn't it still be nice if > >> __builtin_return_address(N) was implemented for N>0 by libcalling > >> into the unwinder for you? Obviously this would still have to > >> return NULL at runtime when you're running on a DW2 target without > >> any EH frame data present in memory (and I guess it wouldn't work > >> on SjLj targets either), but wouldn't it still be a nice > >> convenience feature for users? > > > > There are sufficiently many caveats and system specific bits of > > weirdness that you probably just have to know what you're doing (or > > rely on backtrace(3) to do it for you). > > > > IMHO builtins are for things that you can't do in normal C. So > > __builtin_return_address(0) makes a lot of sense. Having it start > > guessing how to do N>0 much less so. > > I suggest we could contribute a version of backtrace.c for ARM to > glibc. An example to follow is libc/sysdeps/ia64/backtrace.c. GLIBC already knows how to do backtracing if the ARM-specific unwind tables are present (.ARM.exidx, etc.), using _Unwind_Backtrace. Unfortunately backtraces don't currently terminate cleanly if code without unwind data is reached: CodeSourcery are currently working on fixing the linker so that non-unwindable regions are marked properly, which we consider essential to making this feature usable. Of course, you'll need to compile all your code with -funwind-tables for this to work. We haven't measured the size impact of this yet: we're planning on optimising the unwind tables by merging duplicate entries whenever possible, so hopefully it won't be too bad. Just a heads-up to avoid duplicate effort! Cheers, Julian
Re: __builtin_return_address for ARM
On Fri, 27 Feb 2009 13:32:11 + Julian Brown wrote: > GLIBC already knows how to do backtracing if the ARM-specific unwind > tables are present (.ARM.exidx, etc.), using _Unwind_Backtrace. I'm told this probably isn't true for upstream GLIBC -- but we definitely have a patch somewhere to make GLIBC backtrace use _Unwind_Backtrace, which we'll submit upstream in due course. Sorry for the misinformation! Cheers, Julian
Re: Using a umulhisi3
On Wed, 3 Jun 2009 21:39:34 +1200 Michael Hope wrote: > How does the combine stage work? It looks like it could get multiple > potential matches for a set of RTLs. Does it use some type of costing > function to pick between them? Can I tell combine that a umulhisi3 is > cheaper than a mulsi3? You could try defining TARGET_RTX_COSTS, if you haven't already. Julian
Re: Multiple types of load/store: how to create .md rules?
On Mon, 02 May 2022 19:10:41 -0700 Andras Tantos wrote: > To a previous problem I've asked, Andrew Pinski replied that I should > merge all *movsi patterns into a single one to avoid (in that case) > strange deletions in the generated assembly. Is that possible here? It > appears to me that I would need the ability to differentiate the > different patterns using constraints, but is there a way to define > custom versions of the 'm' pattern? I didn't find anything on that in > the documentation. Did I miss something? Check "define_memory_constraint" in existing ports, i.e.: https://gcc.gnu.org/onlinedocs/gccint/Define-Constraints.html#index-define_005fmemory_005fconstraint HTH, Julian
Re: I have questions regarding the 4.3 codebase...
On Wed, 22 Mar 2023 18:27:28 -0400 Sid Maxwell via Gcc wrote: > Is there anyone on the list with experience with the gcc 4.3 > codebase? I'm currently maintaining a fork of it, with a PDP10 code > generator. > > I've run into an issue involving the transformation of a movmemhi to a > single PDP10 instruction (an xblt, if you're curious). The > transformation appears to 'lose' its knowledge of being a store, > resulting in certain following stores being declared dead, and code > motion that shouldn't happen (e.g. a load moved before the xblt that > depends on the result of the xblt). > > I'm hoping to find someone who can help me diagnose the problem. We > want to use this instruction rather than the copy-word-loop currently > generated for struct assignments. I think we'd need a bit more info than that... what does the movmemhi instruction pattern look like, for example? Julian
Re: GNU C extension: Function Error vs. Success
On Mon, 10 Mar 2014 15:27:06 +0100 Shahbaz Youssefi wrote: > Feedback > > > Please let me know what you think. In particular, what would be the > limitations of such a syntax? Would you be interested in seeing this > extension to the GNU C language? What alternative symbols do you think > would better show the intention/simplify parsing/look more beautiful? I suggest you think about how this is better than C++ exceptions, and also consider alternatives like OCaml's option types that can be used to achieve similar ends. For your suggested syntax at function call sites, consider that functions can be called in more complicated ways than simply as "bar = foo();" statements, and the part following the "!!" in your examples appears to be a statement itself: in more complicated expressions, that interleaving of expressions and statements going to get very ugly very quickly. E.g.: x = foo() + bar(); would need to become something like: x = (foo() !! goto label1) + (bar () !! goto label2); And there are all sorts of issues with that. Anyway, I quite like the idea of rationalising error-code returns in C code, but I don't think this is the right way of going about it. HTH, Julian
Re: Legitimize address after reload
On Fri, 14 Mar 2014 12:52:35 +0100 David Guillen wrote: > If I allow also a 'PLUS' expression to be a valid address (adding the > restriction that the two addends are a register and a constant) it > happens (sometimes) that gcc comes up with an expression like this > one: > > (plus:SI (plus:SI (reg:SI somereg) > (const_int 4)) > (const_int 8)) > > > After taking a look at the 386 backend (and others) I just discovered > that there is a function called LEGITIMIZE_RELOAD_ADDRESS which is > responsible for handling this case. My issue is that this function is > not being called and, from what I saw while debugging, it seems that > the offending RTX expression is created after the address_reload pass, > and thus impossible for this pass to legitimize the address. Look at how e.g. the ARM backend and others handle the "strict" parameter to the legitimate_address hook -- you need to use that to forbid pseudo registers being allowed in RTXs in the strict case. LEGITIMIZE_RELOAD_ADDRESS is probably a red herring (at least for the simple cases you're probably dealing with to start with), and isn't used for LRA anyway. Getting these bits right can be very fiddly! The (plus (reg) (const)) operands can arise before/during during register elimination, IIRC. (You might need to get the register-elimination bits right, too...) Just a guess, anyway. (http://gcc.gnu.org/wiki/reload might be helpful if you've not read it.) Julian
Re: [Question, C6X] Under what situations should we disable DCE in sched2?
On Thu, 27 Mar 2014 09:41:28 -0600 Jeff Law wrote: > On 03/27/14 07:50, Felix Yang wrote: > > Hello, > > > > I find DCE in sched2 is disabled for C6X backend. Is this a > > performance consideration? Or a GCC BUG? > > And under what situations should we disable DCE in sched2? > > Can anyone explain this? Many thanks. > In general, if a port is disabling an optimization like this, then > there's something wrong with the port. > > As to this specific issue, git/svn blame ought to point you at > whomever wrote this code and you can ping them directly. IIRC, sched2 is run from md-reorg for C6X. The same trick is used by the ia-64 backend -- it is to avoid late passes moving instructions around after scheduling (breaking the arrangement of insns into dispatch packets). The md-reorg pass is one of the last things the compiler does before emitting assembly. HTH, Julian
Re: [Question, C6X] Under what situations should we disable DCE in sched2?
On Thu, 27 Mar 2014 16:02:49 + Julian Brown wrote: > On Thu, 27 Mar 2014 09:41:28 -0600 > Jeff Law wrote: > > > On 03/27/14 07:50, Felix Yang wrote: > > > Hello, > > > > > > I find DCE in sched2 is disabled for C6X backend. Is this a > > > performance consideration? Or a GCC BUG? > > > And under what situations should we disable DCE in sched2? > > > Can anyone explain this? Many thanks. > > In general, if a port is disabling an optimization like this, then > > there's something wrong with the port. > > > > As to this specific issue, git/svn blame ought to point you at > > whomever wrote this code and you can ping them directly. > > IIRC, sched2 is run from md-reorg for C6X. The same trick is used by > the ia-64 backend -- it is to avoid late passes moving instructions > around after scheduling (breaking the arrangement of insns into > dispatch packets). The md-reorg pass is one of the last things the > compiler does before emitting assembly. Oops, I missed the "DCE" part. Ignore me! Apologies, Julian
Re: How to implement conditional execution
On Fri, 27 Jun 2008 15:52:22 +0530 "Mohamed Shafi" <[EMAIL PROTECTED]> wrote: > If the condition in the 'if' instruction is satisfied the processor > will execute the next instruction or it will replace with a nop. So > this means that i can instructions similar to: > > if eq Rx, Ry > add Rx, Ry > add Rx, 2 > Will it be possible to implement this in the Gcc backend ? > Does any other targets have similar instructions? This is very much like (a simpler version of) the ARM Thumb-2 IT instruction. Look how config/arm/thumb2.md handles that. I think the basic idea should be that you should define conditional instruction patterns which emit assembly for both instructions simultaneously, e.g. (excuse my pseudocode): (define_insn "..." [(...)] "if eq Rx, Ry\;add Rx, Ry") then there's no possibility for scheduling or other optimisations to split the second instruction away from the first. Julian