Trampoline implementation for MIPS
hi, I'm having some trouble while understanding the generated assembly code for trampoline by mips back-end. Following is the code for which i'm trying to undertsand the generated trampoline code. int foo(int (*f)()){ (*f)(); } main(){ int i; int g(){printf("hello,%d",i);} foo(g); } Parts of generated assembly code which are confusing are $LTRAMP0: .word 0x03e00821 # move $1,$31 .word 0x04110001 # bgezal $0,.+8 .word 0x # nop .word 0x8fe30014 # lw $3,20($31) .word 0x8fe20018 # lw $2,24($31) .word 0x0060c821 # move $25,$3 (abicalls) .word 0x0068 # jr $3 .word 0x0020f821 # move $31,$1 .word 0x # .word 0x # .globl _flush_cache main: - - - - - - - - - - - - addiu $sp,$sp,-80 sw $31,72($sp) sw $fp,68($sp) sw $16,64($sp) move$fp,$sp addiu $2,$fp,20 addiu $16,$2,4 lui $2,%hi($LTRAMP0) move$3,$16 addiu $2,$2,%lo($LTRAMP0) li $5,40 # 0x28 move$4,$2 jal memcpy nop lui $2,%hi(g.1238) addiu $2,$2,%lo(g.1238) sw $2,32($16) move$3,$16 li $4,40 # 0x28 li $5,3# 0x3 jal _flush_cache nop addiu $2,$fp,20 addiu $2,$2,4 move$3,$2 jal foo nop A bit of explaination of above code will be helpful
RE : Re: [RFC] Program Bounds Checking
--- Robert Dewar <[EMAIL PROTECTED]> wrote: > Tzi-cker Chiueh wrote: > > We have considered the bound instruction in the CASH project. But > > we found that bound instruction is slower than the six normal > > instructions it is meant to replace for range checking. For example, the > > bound instruction on a 1.1 GHz PIII machine requires 7-8 clock cycles > > while the 6 equivalent instructions require 6-7 clock cycles. We have not > > tested it on newer processors, though. > > Might still be appropriate to use it in -Os mode I would think ... The important thing here is probably the jump prediction subsystem of some processors - i.e. some memory bits read/written at each conditionnal jump (for instance addressed by the 4 low order address bits of the jcc instruction for a 16 bits jump prediction system, encoding the last result at this address). Very long pipeline processors like ia32 are very sensitive to the rate of failed prediction. I am just talking here of ia32, because ia64 does not have "bound" and it has the predict-taken/predict-not-taken encoded in the jcc (by using segment prefix on the jcc assembly instruction). I assume that ia32 "bound" instruction is never predicted "taken", and does not modify the prediction subsystem. If you add few tens thousands "bound" instructions to a software that has a 80% success rate of jump prediction, you slow it down by 7-8 times few tens thousands (vague estimate assuming each assembly instruction has the same probability to be executed). If you add three times few tens thousands "test/jcc" instruction, you reduce seriously the 80% success rate of jump prediction of your initial program because of the added noise. Basically, that means a real measure of the "test/jcc" versus "bound" timing influence on a real application will worth the time spend. By the way, someone knows how the jump prediction behaves on "jecxz" and "loop" assembly instructions - i.e. is the prediction based on the current content of %ecx register? Etienne. ___ Découvrez un nouveau moyen de poser toutes vos questions quelque soit le sujet ! Yahoo! Questions/Réponses pour partager vos connaissances, vos opinions et vos expériences. http://fr.answers.yahoo.com
Abt Writing a GCC front end
Hello all, I am involved with a GCC port where i have to add fixed point support to C based on the fixed point extension of DSP-C specification. So intially i thought that i will typedef long or short to support the new data type. But for my hardware the fixed point registers are 48 bit long. So idea to typedef is not good. So i thought i will use a structure or an array for this purpose. But it again had problems like for instance supporting signed,unsigned,longm,short format of the new data type _fixed. This is not possible if i use structures. Can you suggest some solution for this. The format for my hardware is s7.40 Should i go for adding a new data type in the front end? I am new to GCC internals.So any help that you can offer is of great value for me. Regards, Shafi.
Re: Abt Writing a GCC front end
On Fri, 2006-09-29 at 02:58 -0700, Mohamed Shafi wrote: > Hello all, > > I am involved with a GCC port where i have to add fixed point support > to C based on the fixed point extension of DSP-C specification. I think you are in luck as there is a project to add that to 4.3: http://gcc.gnu.org/wiki/FixedPointArithmetic -- Pinski
Re: frame unwind issue with discontiguous code
On Thu, Sep 28, 2006 at 01:26:00PM +0200, Jan Beulich wrote: > While I'm not certain whether gcc is able to split one function's code > between different sections (if for nothing else, this might help reduce > TLB pressure by moving code unlikely to be executed not just out of Yes, and that caused major grief recently and is still unfixed, see http://gcc.gnu.org/PR22313 http://gcc.gnu.org/PR29132 where we essentially traded of an assembly time failure for generating invalid unwind info. > the main function body), by way of inline assembly the Linux kernel > certainly does in many places. Obviously, pure assembly make use > of such even more heavily. > > However, when frame unwind information is generated, one quickly > becomes aware of a problem with this - the unwind information at a > continuation point in other than the base section would need to > replicate all unwind directives (note that DW_CFA_remember_state > and DW_CFA_restore_state are not suitable here, as there need > to be separate FDEs attached to the secondary code fragments). > While this is generally possible (albeit tedious) in pure assembly code, > doing so in inline assembly doesn't seem to be possible in any way > (the compiler may not even use .cfi_* directives to emit frame > unwind info). > > To cover all cases, it would basically appear to be necessary to > add a referral op to the set of DW_CFA_* ops, which would > indicate that the frame state at the given point is to be derived > by assuming the location counter would in fact be at the origin > of the control transfer). > > As I don't know how to approach requesting an addition like this > to the Dwarf standard, I'm trying my luck here. > > Any pointers or suggestions are greatly appreciated. I believe the way to go forward is for GCC to start using .cfi_* directives rather than emitting .eh_frame section on its own, that's the only way how inline assembly can add its stuff to what GCC generates for the C code. But, we need first to extend gas .cfi_* support so that everything GCC now generates can be expressed with .cfi_* directives and also we need to fix the .cfi_* directives, so that they are local to the current section/subsection (if you use .section or .subsection directive in assembly, you change the CFI op sequence, either to a newly created one, or to the one that has been populated previously for that section/subsection). One thing that can't be expressed currently is e.g. the LSDA pointer, all .cfi_* created CIEs don't have 'P' at all. So we need some way how to tell GAS that this .cfi_start wants a LSDA, what encoding does it use and a block of data that will be stuck into the FDE. Then I think it would be desirable to have a new .cfi to remember resp. restore state with some ID, a mailbox in which the CFA sequence (except intervening loc adjustments, but e.g. including the LSDA) would be stored. So, you could in inline assembly do something like: asm (".cfi_remember_state .LRS%=; .subsection 25; .cfi_start; .cfi_restore_state .LRS%=; subq $128, %rsp; .cfi_...; ; .cfi_end; .previous"); and the .cfi_restore_state .LRS25301 would copy there whatever non-DW_CFA_adjust_loc* ops there are in the current section/subsection from its .cfi_start until now. Jakub
Re: paired register loads and stores
> Erich Plondke writes: Erich> rs6000 and Sparc ports seem to use a peephole2 to get the ldd or lfq Erich> instructions (respectively), but it looks like there's no reason for Erich> the register allocater to allocate registers together. The peephole2 Erich> just picks up loads to adjacent memory locations if the allocater Erich> happens to choose adjacent registers (is that correct?) or the Erich> variables are specified as living in hard registers with the help Erich> of an asm. On both rs6000 and sparc, the code includes a test that the registers are adjacent. See registers_ok_for_quad_peep() in rs6000.md and rs6000.c, and equivalent in sparc. David
Re: paired register loads and stores
On 9/29/06, David Edelsohn <[EMAIL PROTECTED]> wrote: > Erich Plondke writes: Erich> rs6000 and Sparc ports seem to use a peephole2 to get the ldd or lfq Erich> instructions (respectively), but it looks like there's no reason for Erich> the register allocater to allocate registers together. The peephole2 Erich> just picks up loads to adjacent memory locations if the allocater Erich> happens to choose adjacent registers (is that correct?) or the Erich> variables are specified as living in hard registers with the help Erich> of an asm. On both rs6000 and sparc, the code includes a test that the registers are adjacent. See registers_ok_for_quad_peep() in rs6000.md and rs6000.c, and equivalent in sparc. Yes, but peephole2 lives after register allocation, so how does the target tell the register allocator that adjacent values loaded from memory should also attempt to be placed adjacent in the register file? It looks like if they /happen/ to be allocated together in the register file, the peephole pattern will catch them, but I don't see anything to help them be allocated in such a way. -- Why are ``tolerant'' people so intolerant of intolerant people?
Re: paired register loads and stores
> Erich Plondke writes: Erich> Yes, but peephole2 lives after register allocation, so how does the target Erich> tell the register allocator that adjacent values loaded from memory should Erich> also attempt to be placed adjacent in the register file? Erich> It looks like if they /happen/ to be allocated together in the Erich> register file, the Erich> peephole pattern will catch them, but I don't see anything to help them be Erich> allocated in such a way. The GCC register allocator allocates objects that span multiple registers in adjacent registers. For instance, a 64-bit doubleword integer (long long int) will be allocated in two adjacent hardware registers when GCC is targeted at a processor with 32-bit registers. David
Re: representation of struct field offsets
On 9/28/06, Richard Kenner <[EMAIL PROTECTED]> wrote: > The only trouble you'll probably run into is with fields whose offset > from the start of a structure is variable. Exactly. That's the reason it's defined the way it is. There is no way to synthesize that field from any other in the FIELD_DECL in the most general case: it is unique information. Unique, but uncommon. Thus, it would make sense to make it a union with the other information with a discriminator. As a plus, you'd be able to tell variable offset fields by checking a single bit instead of a load (TREE_CODE ). IE unsigned int: offset_is_variable:1; union { tree offset_when_variable; unsigned HOST_WIDE_INT offset_when constant; } Most of our optimizers just want to know "is this variable size/variable offset", not "what is the variable offset".
inapropriate 'comparison is always false due to limited range of data type' warning
Hello, [EMAIL PROTECTED] ~ $ gcc -O0 test.c -o test test.c: In function 'main': test.c:5: warning: comparison is always false due to limited range of data type does not make sense to me in the following code: #include int main (char *argv[], int argc) { unsigned short int number; for (number=0; number < ~(number&(~number)) ; number++) { printf("hello"); } return 0; } i believe gcc is misinterpreting ~(number&(~number)) any coments, help welcome.
Re: paired register loads and stores
On 9/29/06, David Edelsohn <[EMAIL PROTECTED]> wrote: The GCC register allocator allocates objects that span multiple registers in adjacent registers. For instance, a 64-bit doubleword integer (long long int) will be allocated in two adjacent hardware registers when GCC is targeted at a processor with 32-bit registers. I guess I'm still not being clear. Let's say I have a function like: typedef int aligned_int __attribute__((__aligned__(8))); int foo(aligned_int *a, aligned_int *b, int count) { int i; int sum = 0; for (i = 0; i < count; i+=2) { sum += a[0] + b[0]; sum += a[1] + b[1]; a += 2; b += 2; } return sum; } The load of a[i] and a[i+1] could be loaded together, if the register allocater places them next to each other. Instead I get: .LL5: ld [%o0], %g3 add %g1, %g3, %g3 ld [%o1], %g1 add %g3, %g1, %g3 ld [%o1+4], %g1 ld [%o0+4], %g2 add %g4, 2, %g4 add %g2, %g1, %g2 cmp %o2, %g4 add %o0, 8, %o0 add %g3, %g2, %g1 bg,pt %icc, .LL5 add%o1, 8, %o1 Now the peephole2 never really has a chance, because register allocation has already assigned registers that aren't paired. But if we could tell an earlier pass that two values next to each other in memory can be loaded together, we could have fused the loads and the register allocater would have probably been just fine. And this can make a difference for microarchitectures that are limited by bandwidth in and out of the cache, which is not uncommon. I guess in a way this is "autovectorization of random code snippets" so maybe it's too complex but it seems within the realm of what combine could do... -- Why are ``tolerant'' people so intolerant of intolerant people?
Re: inapropriate 'comparison is always false due to limited range of data type' warning
On Fri, Sep 29, 2006 at 10:55:27AM -0400, Rodolfo Hansen wrote: > [EMAIL PROTECTED] ~ $ gcc -O0 test.c -o test > test.c: In function 'main': > test.c:5: warning: comparison is always false due to limited range of data > type > > > does not make sense to me in the following code: > > > > #include > int main (char *argv[], int argc) { > unsigned short int number; > > for (number=0; number < ~(number&(~number)) ; number++) { >printf("hello"); > } > > return 0; > } This question is appropriate for gcc-help mailing list, not here. > i believe gcc is misinterpreting ~(number&(~number)) > > any coments, help welcome. No, please read the C promotion rules. number&(~number) is always 0, ~0 on two's complement arches is -1 (signed int). number < ~(number&(~number)) comparison is done in promoted mode - int, so is (int) number < ~0 which is the same as (int) number < -1. As no value of unsigned short number promoted to int is negative (unless sizeof (int) == sizeof (short)), the compiler correctly warns that the condition is always false. Jakub
RE: Re: Visibility=hidden for x86_64 Xen builds -- problems?
Jan, > Xen gets compiled with -fPIC, and we recently added a global > visibility > pragma to avoid the cost of going through the GOT for all access to > global data objects (PIC isn't really needed here, all we need is > sufficient compiler support to get the final image located outside the > +/-2Gb ranges, but large model support is neither there in older > compilers nor do we really need all of it either). Have you tried protected visibility? Internal accesses are direct, but they still use GOT slots... HTH ___ Evandro Menezes GNU Tools Team 512-602-9940 Advanced Micro Devices [EMAIL PROTECTED] Austin, TX
S/390 as GCC 4.3 secondary plattform?
Hello Mark, sorry for tuning in so late to the GCC 4.3 primary/secondary plattform discussion. As you probably expect I'll give a vote for s390-ibm-linux-gnu and s390x-ibm-linux-gnu to be marked as gcc secondary plattforms. I think the s390 back end is in a pretty good shape. The languages c,ada,c++,fortran,java,objc and obj-c++ bootstrap fine and are tested regularly. I've set up a gcc daily build posting the testresults of the last 3 gcc branches every day. We have a competent maintainer - Ulrich Weigand and 2 developers at IBM (Wolfgang Gellerich and myself) who are able to spent at least a good part of their time working on the S/390 back-end fixing bugs, developing new features and providing support of new cpus. In the criteria for primary plattforms I've read that primary plattforms have to be "popular systems". Reading this as "widely used" I think that this will be a requirement which mainframes are unlikely to meet in the near future, so I propose to make s390 and s390x secondary plattforms for now. I think this can be important to show users that gcc works reliably on S/390 and that it can be expected to do so in the future as well. Bye, -Andreas-
Re: paired register loads and stores
> Erich Plondke writes: Erich> I guess in a way this is "autovectorization of random code snippets" so maybe Erich> it's too complex but it seems within the realm of what combine could do... Yes, this is more appropriately addressed by straight-line code vectorizations, i.e., SLP. The peephole and GCC register allocator are not trying to provide that functionality. David
issue about post testresults
Hi, All, I tried to send my GCC testsuite to [EMAIL PROTECTED] followed the instructions in gcc website last sunday, but there was no response for about 3 days. Before I sent my testsuite, I've aleady sent mail to "[EMAIL PROTECTED]" and "[EMAIL PROTECTED]" as GCC website say. I also consulted to my ISP, they said they didn't setup mail system specially. I checked my dynamic IP in "www.robtex.com" website and found it in some Real-time Blackhole Lists. But you know, I can do nothing about it. Thanks in advance. Sincerely Cui Weidong [EMAIL PROTECTED]
Re: issue about post testresults
On Fri, Sep 29, 2006 at 05:03:49PM +0800, Weidong Cui wrote: > I checked my dynamic IP in "www.robtex.com" website and found it in > some Real-time Blackhole Lists. But you know, I can do nothing about > it. If you have a dynamic IP, you should not configure your system to make a direct SMTP connection to the destination site. Most people don't allow that any more, because this is what spammers do (take over thousands of machines with viruses, make them into "bots" and have them spam like crazy; since there isn't a fixed IP address it's hard to trace the culprit). You need to use your ISP's designated SMTP machine as a relay.
intermittent failures on Darwin using java.lang.Process.waitFor()
The intermittent failures on Darwin are due to a kernel bug tripped by java.lang.Process.waitFor(). The bug appears to be that if: - the program is multithreaded - it is blocking SIGCHLD - it receives a SIGCHLD due to a process terminating - later it calls sigsuspend (but not sigwait) then the SIGCHLD may never be delivered, and so the process will wait for one forever. It's intermittent because it works fine if the sigsuspend starts before the SIGCHLD is sent. This also explains why it happens more often with gij. I've filed this as . We could work around it by using a timeout of some kind; for example, creating a new thread which sends a SIGCHLD manually after some period of time. (Obviously, only on Darwin, and maybe only on versions with the bug.) Do we think this is a good idea? smime.p7s Description: S/MIME cryptographic signature
Re: GCC 4.3 project to merge representation changes
> "Mark" == Mark Mitchell <[EMAIL PROTECTED]> writes: >> Kazu Hirata wrote: >> The Java frontend uses a >> flag within the TREE_LIST object that makes up TYPE_ARG_TYPEs, so it >> is blocking the propsed merge. (Java maintainers are planning to >> fix this in future.) Mark> Yes, I agree that Sandra's stuff is closer. I would hope that with Mark> the ECJ conversion (planned for Stage 1), the Java issue goes away. Yes, it does. I still haven't deleted the code that uses this, but it will never be invoked. Deleting it is on my to-do list, but I'm currently not considering this as an issue that would block the merge. Let me know if you think otherwise (or you can file a PR that blocks 28067). Tom
Re: RFC: deprecated functions calling deprecated functions
Eric Christopher wrote: So, a testcase like this: extern void foo() __attribute__((deprecated)); extern void bar() __attribute__((deprecated)); void foo() {} void bar() { foo(); } Should we warn on the invocation of foo() since it's also being called from within a deprecated function? We are today, but I've gotten a request for that to not warn. This seems reasonable, for example, if you deprecate an entire API or something, but still need to compile the library. I think we should continue to warn. I can see the arguments on both sides, but I think warning makes sense. The person compiling the library should use -Wno-deprecated, and accept that they be calling some other deprecated function they don't intend to call. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: RFC: deprecated functions calling deprecated functions
I think we should continue to warn. I can see the arguments on both sides, but I think warning makes sense. The person compiling the library should use -Wno-deprecated, and accept that they be calling some other deprecated function they don't intend to call. how about suppressing nested warnings only with -Wno-depreciated-nested or something like that? -- The Country Of The Blind, by H.G. Wells http://cronos.advenge.com/pc/Wells/p528.html
Re: representation of struct field offsets
> Unique, but uncommon. Right. > Thus, it would make sense to make it a union with the other > information with a discriminator. As a plus, you'd be able to tell > variable offset fields by checking a single bit instead of a load > (TREE_CODE ). > > IE > unsigned int: offset_is_variable:1; > union { > tree offset_when_variable; > unsigned HOST_WIDE_INT offset_when constant; > } I don't follow. The current representation of variable-position fields uses *both* fields, a variable offset in *bytes*, plus a constant offset in *bits*. Yes, you can use a narrower integer for the bit offset in the variable case, but I don't follow how that would save anything. > Most of our optimizers just want to know "is this variable > size/variable offset", not "what is the variable offset". Testing TREE_CONSTANT of the offset says that right now.
gcc-4.1-20060929 is now available
Snapshot gcc-4.1-20060929 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.1-20060929/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.1 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_1-branch revision 117320 You'll find: gcc-4.1-20060929.tar.bz2 Complete GCC (includes all of below) gcc-core-4.1-20060929.tar.bz2 C front end and core compiler gcc-ada-4.1-20060929.tar.bz2 Ada front end and runtime gcc-fortran-4.1-20060929.tar.bz2 Fortran front end and runtime gcc-g++-4.1-20060929.tar.bz2 C++ front end and runtime gcc-java-4.1-20060929.tar.bz2 Java front end and runtime gcc-objc-4.1-20060929.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.1-20060929.tar.bz2The GCC testsuite Diffs from 4.1-20060922 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.1 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: intermittent failures on Darwin using java.lang.Process.waitFor()
Geoffrey Keating wrote: The intermittent failures on Darwin are due to a kernel bug tripped by java.lang.Process.waitFor(). The bug appears to be that if: - the program is multithreaded - it is blocking SIGCHLD - it receives a SIGCHLD due to a process terminating - later it calls sigsuspend (but not sigwait) then the SIGCHLD may never be delivered, and so the process will wait for one forever. It's intermittent because it works fine if the sigsuspend starts before the SIGCHLD is sent. This also explains why it happens more often with gij. I've filed this as . We could work around it by using a timeout of some kind; for example, creating a new thread which sends a SIGCHLD manually after some period of time. (Obviously, only on Darwin, and maybe only on versions with the bug.) Do we think this is a good idea? The obvious solution would be to use a kernel without the bug. But since you want to work around the bug, it seems you want a solution for kernels with the bug also. Your suggestion to have a thread that periodically sends SIGCHLD seems like it should work (but would be ugly). You could have a configure test to detect configurations where the workaround was needed. Then in natPosixProcess.cc you would add code to create said thread based on the configure check. David Daney
Re: RFC: deprecated functions calling deprecated functions
On Sep 29, 2006, at 2:04 PM, Mark Mitchell wrote: Eric Christopher wrote: So, a testcase like this: extern void foo() __attribute__((deprecated)); extern void bar() __attribute__((deprecated)); void foo() {} void bar() { foo(); } Should we warn on the invocation of foo() since it's also being called from within a deprecated function? We are today, but I've gotten a request for that to not warn. This seems reasonable, for example, if you deprecate an entire API or something, but still need to compile the library. I think we should continue to warn. I can see the arguments on both sides, but I think warning makes sense. The person compiling the library should use -Wno-deprecated, and accept that they be calling some other deprecated function they don't intend to call. I'd just note that in use headers often want to make use of deprecated things to define other deprecated things, and those uses of those deprecated things, they don't want to warn about being deprecated: static inline max() __attr__(deprecanted) { ... } inline bar() __attr__((deprecated)) { max(); } I think in cases that we care about around here, these happen in the system header files, and with uses inside system headers, we can avoid giving a warning for the use of deprecated things.
Re: representation of struct field offsets
Richard Kenner wrote: I don't follow. The current representation of variable-position fields uses *both* fields, a variable offset in *bytes*, plus a constant offset in *bits*. That doesn't explain why the bit value isn't normalized to be smaller than BITS_PER_UNIT; any whole bytes could be incorporated into the variably sized offset. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: GCC 4.3 project to merge representation changes
Tom Tromey wrote: Yes, it does. I still haven't deleted the code that uses this, but it will never be invoked. I think that's fine; as long as there are no actual user-visible problems with Java, that shouldn't block merging Sandra/Kazu's changes. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: Trampoline implementation for MIPS
"kernel coder" <[EMAIL PROTECTED]> writes: > Following is the code for which i'm trying to undertsand the generated > trampoline code. > > int foo(int (*f)()){ > (*f)(); > } > main(){ > int i; > int g(){printf("hello,%d",i);} > foo(g); > } > > Parts of generated assembly code which are confusing are I think you need to trim down your question. Which parts do you not understand? > $LTRAMP0: > .word 0x03e00821 # move $1,$31 > .word 0x04110001 # bgezal $0,.+8 > .word 0x # nop > .word 0x8fe30014 # lw $3,20($31) > .word 0x8fe20018 # lw $2,24($31) > .word 0x0060c821 # move $25,$3 (abicalls) > .word 0x0068 # jr $3 > .word 0x0020f821 # move $31,$1 > .word 0x # > .word 0x # This is the trampoline. Its purpose is to load the static chain pointer into $2 and then jump to the function. In your example, the static chain pointer is, conceptually, the address of the local variable i in the stack frame of main(). The function g() needs that address so that it can access the variable. When you take the address of g(), you get a pointer to a copy of the trampoline with the values of the static chain pointer and the real function address filled in. That way a call through the function pointer will pass the correct static chain pointer to g(). This has to be done dynamically because the address of i is not a fixed number at compile time. > lui $2,%hi($LTRAMP0) > move$3,$16 > addiu $2,$2,%lo($LTRAMP0) > li $5,40 # 0x28 > move$4,$2 > jal memcpy > nop Here the compiler copies the trampoline template onto the stack (except it is passing parameters in $3/$4/$5, which is not MIPS standard, but I assume that is your doing somehow). > lui $2,%hi(g.1238) > addiu $2,$2,%lo(g.1238) > sw $2,32($16) Here the compiler fills in the function address, but it apparently fails to fill in the static chain pointer--not sure what's up with that, but I again attribute it to your non-standard compiler. > move$3,$16 > li $4,40 # 0x28 > li $5,3# 0x3 > jal _flush_cache > nop Here the compiler tells the OS that it is using self-modifying code: the writes have gone out through the data cache and may not be visible in the instruction cache, depending on the CPU architecture. Hope this helps. Ian