Re: inline assembly question (memory side-effects)
> What is the proper way to tell gcc that a inline assembly statement either > modifies > a particular area of memory or needs it to be updated/in-sync because the > assembly > reads from it. Maybe also related to: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32642 i.e. "=m" works for variables/structures in memory (i.e. declared global) but not for variables/structures in the stack. The only way around I found it is using the "=X" modifier, but that has not been seen as a good solution - still works for my sourceforge project. Etienne. __ Do You Yahoo!? En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible contre les messages non sollicités http://mail.yahoo.fr Yahoo! Mail
Re: bit_size_type - a data type?
On Mon, May 12, 2008 at 11:58 AM, Mohamed Shafi <[EMAIL PROTECTED]> wrote: > Hello all, > > During debugging in gimple dumps i found a term that is used along > with other data types - bit_size_type. > > I am getting ICEs when size of int is 32 bit and no errors when the > size of int is 16. This is for a back-end whose native size is 16bit. > > Is this any internal data type used to for representation? > What is the size of this data type? Will i be able to control the size > of this like we can for other data types? bitsizetype is a type that can hold any values of sizetype * BITS_PER_UNIT so we can safely do bit-size calculations without overflow. This type shouldn't end up used in code though. Richard.
bit_size_type - a data type?
Hello all, During debugging in gimple dumps i found a term that is used along with other data types - bit_size_type. I am getting ICEs when size of int is 32 bit and no errors when the size of int is 16. This is for a back-end whose native size is 16bit. Is this any internal data type used to for representation? What is the size of this data type? Will i be able to control the size of this like we can for other data types? Regards, Shafi
Re: inline assembly question (memory side-effects)
Till Straumann wrote: > What is the proper way to tell gcc that a > inline assembly statement either modifies > a particular area of memory or needs it > to be updated/in-sync because the assembly > reads from it. > > E.g., assume I have a > > struct blah { >int sum; > ... > }; > > which is accessed by my assembly code. > > int my_chksum(struct blah *blah_p) > { > blah_p->sum = 0; > /* assembly code computes checksum, writes to blah_p->sum */ > asm volatile("..."::"b"(blah_p)); > return blah_p->sum; > } > > How can I tell gcc that the object *blah_p > is accessed (read and/or written to) by the > asm (without declaring the object volatile or > using a general "memory" clobber). > (If I don't take any measures then gcc won't know > that blah_p->sum is modified by the assembly and > will optimize the load operation away and return > always 0) > > A while ago (see this thread > http://gcc.gnu.org/ml/gcc/2008-03/msg00976.html) > I was told that simply adding the affected > memory area in question as memory ('m') output- > or input-operands is not appropriate -- note that > this is in contradiction to what the gcc info page > suggests (section about 'extended asm': Thia was pretty well explained in http://gcc.gnu.org/ml/gcc/2008-03/msg01084.html This is arguably a bug in gcc. However, you are making this unduly hard for yourself, since the checksum is in a register, then written to memory, then read from memoty again. Andrew.
Re: Division using FMAC, reciprocal estimates and Newton-Raphson - eg ia64, rs6000, SSE, ARM MaverickCrunch?
Paolo Bonzini wrote: > >> I'd like to implement something similar for MaverickCrunch, using the >> integer 32-bit MAC functions, but there is no reciprocal estimate >> function on the MaverickCrunch. I guess a lookup table could be >> implemented, but how many entries will need to be generated, and how >> accurate will it have to be IEEE754 compliant (in the swdiv routine)? Getting a division that is always correctly rounded is going to be hard. I can't prove it's impossible, though. > I think sh does something like that. It is quite a mess, as it has half > a dozen ways to implement division. > > The idea is to use integer arithmetic to compute the right exponent, and > the lookup table to estimate the mantissa. I used something like this > for square root: > > 1) shift the entire FP number by 1 to the right (logical right shift) > 2) sum 0x2000 so that the exponent is still offset by 64 > 3) extract the 8 bits from 14 to 22 and look them up in a 256-entry, > 32-bit table > 4) sum the value (as a 32-bit integer!) with the content of the table > 5) perform 2 Newton-Raphson iterations as necessary You don't need the table. This is the "magic" approximation to 1/sqrt(x) described at http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf: float invSqrt(float x) { float xhalf = 0.5f*x; union { float x; int i; } u; u.x = x; u.i = 0x5f3759df - (u.i >> 1); x = u.x * (1.5f - xhalf * u.x * u.x); return x; } Square that, and you've got a good approximation to 1/x. Two goes through Newton/Raphson gets you float, three gets you double. This may be faster, depending on the target. Andrew.
Re: bit_size_type - a data type?
> bitsizetype is a type that can hold any values of sizetype * > BITS_PER_UNIT so we can safely do bit-size calculations without overflow. > This type shouldn't end up used in code though. Unfortunately, it does, though not in C. Ada's 'Size attribute returns the size in bits, so the proper type for that computations is bitsizetype, though many computations end up being constant-folded.
Re: bit_size_type - a data type?
On Mon, May 12, 2008 at 1:26 PM, Richard Kenner <[EMAIL PROTECTED]> wrote: > > bitsizetype is a type that can hold any values of sizetype * > > BITS_PER_UNIT so we can safely do bit-size calculations without overflow. > > This type shouldn't end up used in code though. > > Unfortunately, it does, though not in C. Ada's 'Size attribute returns > the size in bits, so the proper type for that computations is bitsizetype, > though many computations end up being constant-folded. I gues you'll get funny effects then, as for Nbit sizetype there's usually no valid mode for (N+8)bit bitsizetype. In fact, I believe bitsizetype simply "inherits" the mode from sizetype. Richard.
Re: bit_size_type - a data type?
> I gues you'll get funny effects then, as for Nbit sizetype there's > usually no valid mode for (N+8)bit bitsizetype. In fact, I believe > bitsizetype simply "inherits" the mode from sizetype. No. We pick the mode that's *at least* 3 bits wider than sizetype, so it's usually the next larger integer mode.
Re: bit_size_type - a data type?
On Mon, May 12, 2008 at 6:54 PM, Richard Kenner <[EMAIL PROTECTED]> wrote: >> I gues you'll get funny effects then, as for Nbit sizetype there's >> usually no valid mode for (N+8)bit bitsizetype. In fact, I believe >> bitsizetype simply "inherits" the mode from sizetype. > > No. We pick the mode that's *at least* 3 bits wider than sizetype, so > it's usually the next larger integer mode. > Does this mean that if the sizetype is 32bit then bitsizetype is 64bit? Regards, Shafi
IRA performance testing on Fortran
Here is my report on Fortran benchmarking. I compare the trunk dated 20080507 (no revision number, sorry) and the IRA branch rev. 135035. I run the Polyhedron benchmark (http://www.polyhedron.co.uk/polyhedron_benchmark_suite0html) which is probably the most widely used benchmark in the Fortran community. I don't have many points, but they're very well converged (the "standard" parameters were used, which means each test and each set of compilation option is run between 10 and 100 times, until timing standard deviation becomes less than 0.1%). I compile with -march=native -ffast-math -funroll-loops -O3 and run on a dual-core biprocessor machine with 8GB RAM. /proc/cpuinfo says it's a Dual-Core AMD Opteron(tm) Processor 2220 running around 2.8 GHz. Full timings are below, with a summary here: Overall (judged by geometric mean exec time), IRA introduces a 2.2% regression in execution time (and a 2.7% regression in compilation time, consistent with my previous mail). Using the CB algorithm doesn't change this significantly. The performance regression is mainly due to one testcase, induct, which is taking a 30% hit on IRA. If the performance of that one were the same with IRA than with the old allocator, the switch would be (for this benchmark) performance-neutral. So, I have investigated the case of induct, and I found that with the IRA branch compiler without -fira, it's already 30% slower than with trunk. So, is it an issue with the IRA branch, or has it just not been merged recently and we had a recent great improvement of induct on trunk? I'd appreciate if you could enlighten me on this point. So, other than that small question, everything seems mostly good on the Fortran performance front. Cheers, FX Comparison of execution time (see in fixed-width font): Benchmark Execution time, compared to mainline NameIRA IRA-CB - ac +1.59% +6.80% aermod +5.87% +3.14% air -0.33% -0.83% capacita +5.17% +2.58% channel +0.30% 0.00% doduc -3.61% -3.61% fatigue -0.93% -2.67% gas_dyn 0.99% +2.48% induct+30.28%+29.64% linpk -1.80% -1.57% mdbx -2.19% -2.74% nf +0.74% -0.30% protein +1.30% +1.58% rnflow +0.16% +0.22% test_fpu -0.83% -0.39% tfft -0.72% +0.14% -- geometric mean +2.25% +2.16% Detailed timings for mainline: Benchmark Compile Executable Ave Run Number Estim Name(secs) (bytes)(secs) Repeats Err % - --- -- --- --- -- ac 7.36 1175251 11.32 15 0.0938 aermod 90.03 2424785 38.82 14 0.0866 air 6.83 1365405 12.04 19 0.0983 capacita 2.54 1235764 78.93 23 0.0785 channel 1.66 1254613 10.12 19 0.0885 doduc 13.20 1416729 35.21 13 0.0870 fatigue 6.20 1299862 8.60 12 0.0951 gas_dyn 6.45 1269413 10.08 100 0.1026 induct 19.57 1593762 34.38 10 0.0965 linpk 1.43 1162116 26.17 77 0.2626 mdbx 3.37 1192451 16.41 24 0.0939 nf 7.65 1217536 29.72 68 0.1240 protein 12.77 1342400 57.54 10 0.0942 rnflow 12.81 1357019 31.42 12 0.0976 test_fpu 11.78 1331485 18.07 24 0.0879 tfft 1.13 1173880 6.91 24 0.0991 Geometric Mean Execution Time = 20.85 seconds Timing for IRA branch with -fira: Benchmark Compile Executable Ave Run Number Estim Name(secs) (bytes)(secs) Repeats Err % - --- -- --- --- -- ac 6.06 1158971 11.50 15 0.0979 aermod 94.22 2421896 41.10 12 0.0725 air 7.07 1352645 12.00 23 0.0899 capacita 2.89 1221980 83.01 25 0.1860 channel 1.81 1241539 10.15 31 0.0879 doduc 15.20 1404025 33.94 10 0.0628 fatigue 6.17 1273630 8.52 14 0.0966 gas_dyn 7.79 1256267 10.18 32 0.0920 induct 14.28 1567935 44.79 12 0.0772 linpk 1.44 1145546 25.70 77 0.0920 mdbx 3.54 1181755 16.05 15 0.0588 nf 7.73 1205207 29.94 66 0.0890 protein 12.89 1325392 58.29 10 0.0458 rnflow 12.45 1340531 31.47 12 0.0570 test_fpu 12.18 1312704 17.92 58 0.0797 tfft 1.28 1158396 6.86
Re: IRA performance testing on Fortran
> FX writes: FX> The performance regression is mainly due to one testcase, induct, FX> which is taking a 30% hit on IRA. If the performance of that one were FX> the same with IRA than with the old allocator, the switch would be FX> (for this benchmark) performance-neutral. So, I have investigated the FX> case of induct, and I found that with the IRA branch compiler without FX> -fira, it's already 30% slower than with trunk. So, is it an issue FX> with the IRA branch, or has it just not been merged recently and we FX> had a recent great improvement of induct on trunk? I'd appreciate if FX> you could enlighten me on this point. The IRA branch has not been merged recently and does not include many of Richi's recent alias analysis improvements. David
Re: bit_size_type - a data type?
On Mon, May 12, 2008 at 3:24 PM, Richard Kenner <[EMAIL PROTECTED]> wrote: > > I gues you'll get funny effects then, as for Nbit sizetype there's > > usually no valid mode for (N+8)bit bitsizetype. In fact, I believe > > bitsizetype simply "inherits" the mode from sizetype. > > No. We pick the mode that's *at least* 3 bits wider than sizetype, so > it's usually the next larger integer mode. Yes, but for example when cross-compiling from a 32bit host to a 64bit target that would be TImode which most 32bit hosts cannot handle (in which case we'd ICE if set_sizetype in this case didn't simply chose to make bitsizetype the same precision as sizetype...) So in summary, you cannot really rely on bitsizetype capable of doing what it is supposed to do. Richard.
Re: bit_size_type - a data type?
> Does this mean that if the sizetype is 32bit then bitsizetype is 64bit? Yes.
Re: bit_size_type - a data type?
> Yes, but for example when cross-compiling from a 32bit host to a 64bit target ... which is something that's always never worked completey properly ...
Re: RFH: Building and testing gimple-tuples-branch
Diego Novillo wrote: The tuples branch is at the point now that it should bootstrap all primary languages and targets. There are things that are still broken and being worked on (http://gcc.gnu.org/wiki/tuples), but by and large things should Just Work. I expect things like code generation to be sub-par because some optimizations are still not converted (notably, loop passes, PRE, and TER). So, for folks with free cycles to spare, could you build the branch on your favourite target and report bugs? Bugzilla and/or email reports are OK. If you are creating a bugzilla report, please add my address to the CC field. Other than obvious brokenness, we are interested in compile time slow downs and increased memory utilization. Both of which are possible because we have spent no effort tuning the data structures yet. To build the branch: $ svn co svn://gcc.gnu.org/svn/gcc/branches/gimple-tuples-branch $ mkdir bld && cd bld $ ../gimple-tuples-branch/configure --disable-libgomp --disable-libmudflap $ make && make -k check For mipsel-linux: http://gcc.gnu.org/ml/gcc-testresults/2008-05/msg01055.html The complete bootstrap/test cycle seems to be about 10% faster than on the trunk. I didn't try to figure out why, although I can speculate that it is due to the fact that some optimization passes are still disabled and that tests that ICE prevent the corresponding execution test to be run. Comparing the test results to a recent trunk build shows many FAILures that are only on the branch. Although I didn't investigate, the FAILure in libjava/Array_3, usually indicate that exception handling is broken in some way. David Daney
Re: bit_size_type - a data type?
On Mon, May 12, 2008 at 12:11:26PM -0400, Richard Kenner wrote: > > Yes, but for example when cross-compiling from a 32bit host to a 64bit > > target > > ... which is something that's always never worked completey properly ... If it doesn't today, then there's no tests covering the problems. CodeSourcery does this all the time. -- Daniel Jacobowitz CodeSourcery
Re: bit_size_type - a data type?
> If it doesn't today, then there's no tests covering the problems. It does if you have a 64-bit HOST_WIDE_INT on your 32-bit host and this is now enforced for most 64-bit targets. -- Eric Botcazou
Re: US-CERT Vulnerability Note VU#162289
"Robert C. Seacord" <[EMAIL PROTECTED]> writes: > Once a new version or patch is available that will warn users that > this optimization is taking place, I will recommend that we change the > work around from "Avoid newer versions of gcc" to "Avoid effected > versions of gcc" and/or recommend that users download the patch / > revision. The behaviour of pointer overflow has now changed as of the following (as yet unreleased) versions: gcc 4.2.4 gcc 4.3.1 gcc 4.4.0 and all subsequent versions (4.2.x where x >= 4, 4.3.y where y >= 1, 4.z where z >= 4). The optimization under discussion is for comparisons between P + V1 and P + V2, where P is the same pointer and V1 and V2 are variables of some integer type. The C/C++ language standards permit this to be reduced to a comparison between V1 and V2. However, if V1 or V2 are such that the sum with P overflows, then the comparison of V1 and V2 will not yield the same result as actually computing P + V1 and P + V2 and comparing the sums. The new behaviour as of the above releases is that this optimization is performed by default at -O2 and above, including -Os. It is not performed by default at -O1 or (of course) -O0. The optimization may be enabled for -O1 with the -fstrict-overflow option. The optimization may be disabled for -O2 and above with the -fno-strict-overflow option. When the optimization is enabled, cases where it occurs may be detected by using -Wstrict-overflow=N where N >= 3. Note that using this warning option is likely to yield a number of false positive reports--cases where this or other overflow optimizations are being applied, but where there is no actual problem. Please see the gcc manual for more information about these options. Ian
[tuples] Merged with trunk @135126
This merge brought quite a few conflicts because of recent changes in trunk to SFTs and other middle-end changes. Bootstrapped and tested x86_64. Diego.
[lto] Merged trunk @135136
There were a couple of fixes needed due to the SFT removal. Bootstrapped and tested on x86_64. 2008-05-12 Diego Novillo <[EMAIL PROTECTED]> Mainline merge @135136 * configure.ac (ACX_PKGVERSION): Update revision merge string. * configure: Regenerate. 2008-05-12 Diego Novillo <[EMAIL PROTECTED]> * lto-tree-flags.def (STRUCT_FIELD_TAG): Remove. * treestruct.def (STRUCT_FIELD_TAG): Remove. Update all users. * lto-cgraph-out.c (output_node): Update references to inline summary fields. lto/ChangeLog: 2008-05-12 Diego Novillo <[EMAIL PROTECTED]> * lto-cgraph-in.c (overwrite_node): Update references to inline summary fields. * lto-function-in.c (input_expr_operand): Do not handle STRUCT_FIELD_TAG.
Re: US-CERT Vulnerability Note VU#162289
Ian, Sounds great, thanks, I'll work with Chad to get the vul note updated accordingly. rCs "Robert C. Seacord" <[EMAIL PROTECTED]> writes: Once a new version or patch is available that will warn users that this optimization is taking place, I will recommend that we change the work around from "Avoid newer versions of gcc" to "Avoid effected versions of gcc" and/or recommend that users download the patch / revision. The behaviour of pointer overflow has now changed as of the following (as yet unreleased) versions: gcc 4.2.4 gcc 4.3.1 gcc 4.4.0 and all subsequent versions (4.2.x where x >= 4, 4.3.y where y >= 1, 4.z where z >= 4). The optimization under discussion is for comparisons between P + V1 and P + V2, where P is the same pointer and V1 and V2 are variables of some integer type. The C/C++ language standards permit this to be reduced to a comparison between V1 and V2. However, if V1 or V2 are such that the sum with P overflows, then the comparison of V1 and V2 will not yield the same result as actually computing P + V1 and P + V2 and comparing the sums. The new behaviour as of the above releases is that this optimization is performed by default at -O2 and above, including -Os. It is not performed by default at -O1 or (of course) -O0. The optimization may be enabled for -O1 with the -fstrict-overflow option. The optimization may be disabled for -O2 and above with the -fno-strict-overflow option. When the optimization is enabled, cases where it occurs may be detected by using -Wstrict-overflow=N where N >= 3. Note that using this warning option is likely to yield a number of false positive reports--cases where this or other overflow optimizations are being applied, but where there is no actual problem. Please see the gcc manual for more information about these options. Ian
Re: IRA performance testing on Fortran
David Edelsohn wrote: FX writes: FX> The performance regression is mainly due to one testcase, induct, FX> which is taking a 30% hit on IRA. If the performance of that one were FX> the same with IRA than with the old allocator, the switch would be FX> (for this benchmark) performance-neutral. So, I have investigated the FX> case of induct, and I found that with the IRA branch compiler without FX> -fira, it's already 30% slower than with trunk. So, is it an issue FX> with the IRA branch, or has it just not been merged recently and we FX> had a recent great improvement of induct on trunk? I'd appreciate if FX> you could enlighten me on this point. The IRA branch has not been merged recently and does not include many of Richi's recent alias analysis improvements. Thanks for the info, David. I think you are right. I've checked polyhedron on my machine (Intel Core2) on ira branch using IRA and the old register allocator and I don't see difference between the allocators. I should have merged mainline into the branch but I am afraid it will only happens on next week because I have some patches which I am working on.
Re: IRA performance testing on Fortran
FX wrote: Here is my report on Fortran benchmarking. I compare the trunk dated 20080507 (no revision number, sorry) and the IRA branch rev. 135035. I run the Polyhedron benchmark (http://www.polyhedron.co.uk/polyhedron_benchmark_suite0html) which is probably the most widely used benchmark in the Fortran community. Thanks for the testing but I am sorry to say that the comparison is a bit incorrect. As David Edelsohn wrote there is a significant difference between mainline and ira branch. I would do this on ira-branch where the old register allocator is allocator by default and IRA is switched on by -fira. I've checked induct on my machine (Core2) and I don't see the difference. I'll play the benchmark too. It is interesting what programs it contains (I mostly interesting how they are memory bound and is it possible to decrease the data to fit CPU caches to see RA effect).
GCC 4.2.4 Status Report (2008-05-12)
Status == GCC 4.2.4-rc1 is now available from <ftp://gcc.gnu.org/pub/gcc/snapshots/4.2.4-RC-20080512/>. All commits to the GCC 4.2 branch need to be approved by a release manager until GCC 4.2.4 is released. All fixes going on that branch should first have gone on trunk and 4.3 branch. GCC 4.2.4 will follow in about a week in the absence of any significant problems with 4.2.4-rc1. If you believe a problem should delay the release, please file a bug in Bugzilla, mark it as a regression, note there the details of the previous 4.2 releases in which it worked and CC me on it. Anything that is not a regression from a previous 4.2 release is unlikely to delay this release. Any further 4.2 releases after 4.2.4 may depend on whether there is expressed user and developer interest in further releases from this branch, or whether 4.3 has been widely adopted in place of 4.2. I propose to leave the branch open for now after the 4.2.4 release, but not to make any further regular status reports or start any planning for 4.2.5. Quality Data Priority # Change from Last Report --- --- P1 22 - 1 P2 138 + 1 P3 40 + 2 --- --- Total 200 + 2 Previous Report === http://gcc.gnu.org/ml/gcc/2008-04/msg00243.html I will send the next report for the 4.2 branch after 4.2.4 or 4.2.4-rc2 is released. -- Joseph S. Myers [EMAIL PROTECTED]
gcc-4.1-20080512 is now available
Snapshot gcc-4.1-20080512 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.1-20080512/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.1 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_1-branch revision 135235 You'll find: gcc-4.1-20080512.tar.bz2 Complete GCC (includes all of below) gcc-core-4.1-20080512.tar.bz2 C front end and core compiler gcc-ada-4.1-20080512.tar.bz2 Ada front end and runtime gcc-fortran-4.1-20080512.tar.bz2 Fortran front end and runtime gcc-g++-4.1-20080512.tar.bz2 C++ front end and runtime gcc-java-4.1-20080512.tar.bz2 Java front end and runtime gcc-objc-4.1-20080512.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.1-20080512.tar.bz2The GCC testsuite Diffs from 4.1-20080505 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.1 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Reviving old port
Hi All, BACKGROUND: Downloaded the source code of the Xemics CoolRISC gcc port from Raisonance's website (ftp://www.raisonance.com/pub/CoolRISC/cool3_2.zip). I concluded that the port is based in gcc 3.2 from looking at the NEWS file in the gcc folder. I what to bring to this gcc port up to the latest version of gcc. The motivations for doing this are: 1- Gain experience on how to port gcc. This will be useful in the not-so-distant future. 2- Fix bugs in current version of port. For example: the #pragma interrupt do not work correctly, the size of structures is limited to 255 (otherwise the generated code is broken). 3- Try to improve the generated code density CURRENT STATUS: 1- I downloaded the latest version of gcc. Then I took the target-specific files from the CoolRISC port (c816.md, c816.c and c816.h) and add them to the latest gcc code under the gcc/config/c816 folder. 2- Fixed multiple poised #defines in the c816.h file 3- Changed references to gen_rtx ( RTXCODE, ...) to gen_rtx_RTXCODE (...) in the c816.c file. 4- Currently, I am able to compile the updated files. Furthermore, I used the cc1 to do test compiles and the output asm code looks promising. NEXT STEPS: 1- Modify code to make use of newer gcc techniques: ie, make use of define_constraint instead of using the old #define method. 2- Better define the cpu using the md file's rtl macros (I want to move away from the current approach of relying heavily in the c816.c to do the rtl->asm transformations). QUESTIONS: 1-The c816's port is heavily using the c816.c instead of better describing the cpu in the c816.md file. For instance, looking at the define_insn movhi: ;; movhi (define_insn "movhi" [(set (match_operand:HI 0 "general_operand" "=g") (match_operand:HI 1 "general_operand" "g"))] "" "* { return output_move(insn, operands); } ") I noticed that predicates and constraints are essentially disabled, and the output_move function in the c816.c does all the work. Also, since c816 can only move bytes, it looks to me like this definition is incorrectly since it is basically lying to gcc and making it believe that there is a two byte move. Is a define_split (or a define_expand) more appropriate in this situation? 2- I am trying to sort-out define_expand from define_split. Looks like they can be used to perform similar tasks. By looking at other ports I came to the following conclusions (I gather this from reading multiple posts, please correct me if this statements are not true): 2.1 define_split should be used to break an operation into simpler operations *of a different mode*. That is, to describe a movhi in terms of two movqi I should use a define_split instead of a define_expand. 2.2 the define_expand expects the mode of the matching rtl macro and the output operations to be preserve. Otherwise gcc could get confuse since the mode will be change for no apparent reason. Thanks for your help. Best regards, -Omar