Extracting Function Pointer Information??
Hi. Given a function pointer in GIMPLE IR, is there any way to find address/offset to which function pointd to? e.g. I have written a code, /** C code **/ void foo() { . . . . } void (*fptr)(void) = foo; int main() { . . . . . } GIMPLE tree node for fptr would be, VAR_DECL--*--> POINTER_TYPE--*-->FUNCTION_TYPE ---*--> (Star specifies that, deferencing few more fields inside a tree_node) In Gimple code, I wont see any assignment statement for fptr=foo, as its a globally initialized. I was trying to trace GIMPLE data structure where possibly the information that, fptr points to foo would be stored. Is there any way to find it out? - Seema Ravandale Note: I am working on GCC-4.3.0
Re: Extracting Function Pointer Information??
Seema Ravandale wrote: > Hi. > > Given a function pointer in GIMPLE IR, is there any way to find > address/offset to which function pointd to? > > e.g. I have written a code, > /** C code **/ > void foo() > { > . . . . > } > > void (*fptr)(void) = foo; > > int main() > { > . . . . . > } > > GIMPLE tree node for fptr would be, VAR_DECL--*--> > POINTER_TYPE--*-->FUNCTION_TYPE ---*--> > (Star specifies that, deferencing few more fields inside a tree_node) > In Gimple code, I wont see any assignment statement for fptr=foo, as > its a globally initialized. > > I was trying to trace GIMPLE data structure where possibly the > information that, fptr points to foo would be stored. Isn't it in the DECL_INITIAL ? Andrew.
define_peephol2 insn
Hello, I have ported gcc to a 16-bit target. Now problem is, gcc generates wrong code with -O1 and above optimization for move and load/store instructions, b using the 32-bit registers with 16-bit instructions. For ex: === move r13, r1 // move 0-15 bit to r1 register move r13, r0 // move 16-31 bits to r0 register, but this move performs // only 0-15 bits to r0 registers, which NOT INTENDED. To solve the above issue, can I use the "define_peephole2" insn pattern? [like: movd r13, (r1,r0) // move r13 0-31 bits r1,r0 registers. Note: r1 and r0 are 16-bits registers] Please advise. Any comments or suggestions most welcome. Thanks in advance. Thanks Swami
Matrix multiplication: performance drop
Hi ! I have a simple matrix multiplication code: #include #include #include main ( int argc, char *argv[] ) { int i, j, k; clock_t t1, t2; double elapsed_time; intN = 10; float *a, *b, *c; if ( argc > 1 ) N = atoi ( argv [ 1 ] ); a = (float *) malloc ( N * N * sizeof ( float ) ); b = (float *) malloc ( N * N * sizeof ( float ) ); c = (float *) malloc ( N * N * sizeof ( float ) ); for ( i = 0; i < N; i++ ) for ( j = 0; j < N; j++ ) { a [ i * N + j ] = 0.99; b [ i * N + j ] = 0.33; c [ i * N + j ] = 0.0; } t1 = clock(); for ( i = 0; i < N; i++ ) for ( j = 0; j < N; j++ ) for ( k = 0; k < N; k++ ) c [ i * N + j ] += a [ i * N + k ] * b [ k * N + j ]; t2 = clock(); elapsed_time = ( (double) ( t2 - t1 ) ) / CLOCKS_PER_SEC; printf ( "%f\n", c [ (N - 1) * N + ( N - 1 ) ] ); printf ( "N = %d Elapsed time = %lf\n", N, elapsed_time ); } I compile it as > gcc -O3 ... and then trying it for different N ( Intel Xeon, 3.00 GHz, 8 Gb memory) Here are the results: N Time (secs.) -- 5000.25 5120.86 5200.29 1000 4.46 1024 12.5 1100 6.48 1500 20.42 1536 30.43 1600 21.04 2000 46.75 2048 446.61 ( !!! ) 2100 59.80 So for N multiple of 512 there is very strong drop of performance. The question is - why and how to avoid it ? In fact, given effect is present for any platforms ( Intel Xeon, Pentium, AMD Athlon, IBM Power PC) and for gcc 4.1.2, 4.3.0. Moreover, that effect is present for Intel icc compiler also, but only till to -O2 option. For -O3, there is good smooth performance. Trying to turn on -ftree-vectorize do nothing: $ gcc -O3 -ftree-vectorize -ftree-vectorizer-verbose=5 -o mm_float2 mm_float2.c mm_float2.c:25: note: not vectorized: number of iterations cannot be computed. mm_float2.c:35: note: not vectorized: number of iterations cannot be computed. mm_float2.c:35: note: vectorized 0 loops in function. Please, help. Yury
Re: define_peephol2 insn
To solve the above issue, can I use the "define_peephole2" insn pattern? No. At most you could abuse it to hide the issue some of the time. You probably have one or more of your target macros / hooks wrong, e.g. HARD_REGNO_NREGS. Any comments or suggestions most welcome. read and understand all the documentation on porting gcc in the doc directory. Or work with someone who has.
Re: Matrix multiplication: performance drop
On Tue, Mar 03, 2009 at 03:44:35PM +0300, Yury Serdyuk wrote: > So for N multiple of 512 there is very strong drop of performance. > The question is - why and how to avoid it ? > > In fact, given effect is present for any platforms ( Intel Xeon, > Pentium, AMD Athlon, IBM Power PC) > and for gcc 4.1.2, 4.3.0. > Moreover, that effect is present for Intel icc compiler also, > but only till to -O2 option. For -O3, there is good smooth performance. > Trying to turn on -ftree-vectorize do nothing: Basically at higher N's you are thrashing the cache. Computers tend to have prefetching for sequential access, but accessing one matrix on a column basis does not fit into that prefetching. Given caches are a fixed size, sooner or later the whole matrix will not fit in the cache. If you have to go out to main memory, it can cause the processor to take a long time as it waits for the memory to be betched. It may be the Intel compiler has better support for handling matrix multiply. The usual way is to recode your multiply so that it is more cache friendly. This is an active research topic, so using google or other search enginee is your friend. For instance, this was one of the first links I found with looking for 'matrix multiple cache' http://www.cs.umd.edu/class/fall2001/cmsc411/proj01/cache/index.html -- Michael Meissner, IBM 4 Technology Place Drive, MS 2203A, Westford, MA, 01886, USA meiss...@linux.vnet.ibm.com
Re: Matrix multiplication: performance drop
Gcc usage questions should go to the gcc-help mailing list. Questions about computer architecture should go to another forum, like comp.arch, but you should check first if they explain caches, DRAM pages, and memory hierarchies in general in an FAQ. If you had a constructive proposal how to improve loop optimizations, that would be on-topic for this list.
Re: query automaton
Alex Turjan wrote: Dear Vladimir, Not really. There is no requirement for "the units part of the alternatives of a reservation must belong to the same automaton". Querying should also work in this case because function cpu_unit_reservation_p checks all automata for an unit reservation. Indeed it checks all automata but Im afraid that according to my pipeline description this check is not enough to guarantee a correct scheduling decision, e.g., suppose the following insn reservation: (define_reservation "move" "( (unit1_aut1, unit1_aut2) | (*) (unit2_aut1, unit2_aut2) ) ,where unitN_autM refers to unit N from automata M. In this case there are 2 automata. Now supose a scheduling state S made of the individual states of the two automatons S=. According to what I see happening in insn-automata.c (and target.dfa), from S_aut1 there is a transition for unit1_aut1 and from S_aut2 there is a transition for unit2_aut2. It seems that the automata do not communicate with each other. As a consequence, A scheduling decision which results in the resource reservation (unit1_aut1, unit2_aut2) would not be rejected, while it should. In my opinion, the current implementation sees the reservation defined in (*) As equivalent to the following one (define_reservation "move" "( (unit1_aut1| unit2_aut1) , (unit1_aut2| unit2_aut2) ) Which does not seem true to me. Is there a way for automatons to communicate so that the alternative (unit1_aut1, unit2_aut2) would be rejected? Last two days, I've been working on this issue. I found that you are right, genautomata permits to generate incorrect automata although I did not find that it is a case for major current description which I checked. I found the issue is complicated. Genuatomata has already a check for correct automata generations in check_regexp_units_distribution but that is not enough. I am working on formulation of general rules for correct automata generation and implementation of their check. I think it will take a week or two to do this and check how does it work for all current automata description. Alex, thanks for pointing out this issue to me.
Why are these two functions compiled differently?
Hello, I came across the following example and their .final_cleanup files. To me, both functions should produce the same code. But tst1 function actually requires two extra sign_extend instructions compared with tst2. Is this a C semantics thing, or GCC mis-compile (over-conservatively) in the first case. Cheers, Bingfeng Mei Broadcom UK #define A 255 int tst1(short a, short b){ if(a > (b - A)) return 0; else return 1; } int tst2(short a, short b){ short c = b - A; if(a > c) return 0; else return 1; } .final_cleanup ;; Function tst1 (tst1) tst1 (short int a, short int b) { : return (int) b + -254 > (int) a; } ;; Function tst2 (tst2) tst2 (short int a, short int b) { : return (short int) ((short unsigned int) b + 65281) >= a; }
Re: Why are these two functions compiled differently?
On Tue, Mar 3, 2009 at 4:06 PM, Bingfeng Mei wrote: > Hello, > I came across the following example and their .final_cleanup files. To me, > both functions should produce the same code. But tst1 function actually > requires two extra sign_extend instructions compared with tst2. Is this a C > semantics thing, or GCC mis-compile (over-conservatively) in the first case. Both transformations are already done by the fronted (or fold), likely shorten_compare is quilty for tst1 and fold_unary for tst2 (which folds (short)((int)b - (int)A). Richard. > Cheers, > Bingfeng Mei > Broadcom UK > > > #define A 255 > > int tst1(short a, short b){ > if(a > (b - A)) > return 0; > else > return 1; > > } > > > int tst2(short a, short b){ > short c = b - A; > if(a > c) > return 0; > else > return 1; > > } > > > .final_cleanup > ;; Function tst1 (tst1) > > tst1 (short int a, short int b) > { > : > return (int) b + -254 > (int) a; > > } > > > > ;; Function tst2 (tst2) > > tst2 (short int a, short int b) > { > : > return (short int) ((short unsigned int) b + 65281) >= a; > > } > > > >
RE: Why are these two functions compiled differently?
Should I file a bug report? If it is not a C semantics thing, GCC certainly produces unnecessarily big code. .file "tst.c" .text .p2align 4,,15 .globl tst1 .type tst1, @function tst1: .LFB0: .cfi_startproc movswl %si,%esi movswl %di,%edi xorl%eax, %eax subl$254, %esi cmpl%edi, %esi setg%al ret .cfi_endproc .LFE0: .size tst1, .-tst1 .p2align 4,,15 .globl tst2 .type tst2, @function tst2: .LFB1: .cfi_startproc subw$255, %si xorl%eax, %eax cmpw%di, %si setge %al ret .cfi_endproc .LFE1: .size tst2, .-tst2 .ident "GCC: (GNU) 4.4.0 20090218 (experimental) [trunk revision 143368]" .section.note.GNU-stack,"",@progbits > -Original Message- > From: Richard Guenther [mailto:richard.guent...@gmail.com] > Sent: 03 March 2009 15:16 > To: Bingfeng Mei > Cc: gcc@gcc.gnu.org; John Redford > Subject: Re: Why are these two functions compiled differently? > > On Tue, Mar 3, 2009 at 4:06 PM, Bingfeng Mei > wrote: > > Hello, > > I came across the following example and their > .final_cleanup files. To me, both functions should produce > the same code. But tst1 function actually requires two extra > sign_extend instructions compared with tst2. Is this a C > semantics thing, or GCC mis-compile (over-conservatively) in > the first case. > > Both transformations are already done by the fronted (or fold), likely > shorten_compare is quilty for tst1 and fold_unary for tst2 (which > folds (short)((int)b - (int)A). > > Richard. > > > Cheers, > > Bingfeng Mei > > Broadcom UK > > > > > > #define A 255 > > > > int tst1(short a, short b){ > > if(a > (b - A)) > > return 0; > > else > > return 1; > > > > } > > > > > > int tst2(short a, short b){ > > short c = b - A; > > if(a > c) > > return 0; > > else > > return 1; > > > > } > > > > > > .final_cleanup > > ;; Function tst1 (tst1) > > > > tst1 (short int a, short int b) > > { > > : > > return (int) b + -254 > (int) a; > > > > } > > > > > > > > ;; Function tst2 (tst2) > > > > tst2 (short int a, short int b) > > { > > : > > return (short int) ((short unsigned int) b + 65281) >= a; > > > > } > > > > > > > > > >
Re: Why are these two functions compiled differently?
"Bingfeng Mei" writes: > #define A 255 > > int tst1(short a, short b){ > if(a > (b - A)) > return 0; > else > return 1; > > } > > > int tst2(short a, short b){ > short c = b - A; > if(a > c) > return 0; > else > return 1; > > } These computations are different. Assume short is 16 bits and int is 32 bits. Consider the case of b == 0x8000. The tst1 function will sign extend that to 0x8000, subtract 255, sign extend a to int, and compare the sign extended a with 0x7f01. The tst2 function will compute 0x8000 - 255 as a 16 bit value, getting 0x7f01. It will sign extend that to 0x7f01, and compare the sign extended a with 0x7f01. It may seem that there is an undefined signed overflow when using this value, but there actually isn't. All computations in C are done in type int. So the computations have no overflow. The truncation from int to short is not undefined, it is implementation defined. Ian
Re: Constant folding and Constant propagation
Adam Nemet writes: > I am actually looking at something similar for PR33699 for MIPS. My plan is > to experiment extending cse.c with putting "anchor" constants to the available > expressions along with the original constant and then querying those later for > constant expressions. See http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00161.html. Adam
Re: load large immediate
2009/3/2 daniel tian : > 2009/2/27 daniel tian : >> 2009/2/27 Dave Korn : >>> daniel tian wrote: >>> That seems to solving a address mode problem. My problem is that while loading a large immediate data or SYMBOL_REF, the destination is a specified general register (register 0:R0). So I don't how to let the define_expand "movsi" pattern to generate destination register in R0. >>> >>> Well, the RTL that you emit in your define_expand has to match an insn >>> pattern in the end, so you could make an insn for it that uses a predicate >>> and >>> matching constraint to enforce only accepting r0. If you use a predicate >>> that >>> only accepts r0 you'll get better codegen than if you use a predicate that >>> accepts general regs and use an r0-only constraint to instruct reload to >>> place >>> the operand in r0. >> >> Well, I have already done this. There is insn pattern that the >> predicate limits the operand in R0. But if in define_expand "movsi" do >> not put the register in R0, the compiler will crashed because of the >> unrecognized RTL(load big immediate or Symbol). Like the below: >> (define_insn "load_imm_big" >> [(set (match_operand:SI 0 "zero_register_operand" "=r") >> (match_operand:SI 1 "rice_imm32_operand" "i")) >> (clobber (reg:SI 0))] >> "TARGET_RICE" >> { >> return rice_output_move (operands, SImode); >> } >> ) >> >> PS:rice_output_move is function to output assemble code. >> >> Thanks. >> > > Hello, Dave, Rennecke : > I defined the PREFERRED_RELOAD_CLASS macro, but this cc1 doesn't > go through it. > #define PREFERRED_RELOAD_CLASS(X, CLASS) > rice_preferred_reload_class(X, CLASS) > > And rice_preferred_reload_class(X, CLASS) is: > > enum reg_class rice_preferred_reload_class (rtx x, enum reg_class class) > { > printf("Come to rice_preferred_reload_class! \n"); > > if((GET_CODE(x) == SYMBOL_REF) || > (GET_CODE(x) == LABEL_REF) || > (GET_CODE(x) == CONST) || > ((GET_CODE(x) == CONST_INT) && (!Bit10_constant_operand(x, > GET_MODE(x) > { > return R0_REG; > } > return class; > } > > I run the cc1 file with command "./cc1 test_reload.c -Os". But > the gcc never goes through this function. Did I missed something? > Thank you very much. > Hello, I resolved the problem. The key is the macro LEGITIMIZE_RELOAD_ADDRESS, and GO_IF_LEGITIMATE_ADDRESS, and PREFERRED_RELOAD_CLASS. And the predicate "zero_register_operand". Here is the thing I learnt. If I did something wrong, let me know. :) LEGITIMIZE_RELOAD_ADDRESS: in this macro, I should push the invalid the rtx like large immeidate, symbol_rel, label_rel, into the function push_reload which will call macro PREFERRED_RELOAD_CLASS. GO_IF_LEGITIMATE_ADDRESS: this macro will have two mode, strict or non_strict, the former used in reload process, the latter used in before reload. I made a mistake in this macro, defined the two mode exactly in the same way( I defined the large immeidate, SYMBOL_REL, LABEL_REL, CONST is valid in two mode ). So it nerver call macro LEGITIMIZE_RELOAD_ADDRESS. The function find_reloads_address in reloads.c will call GO_IF_LEGITIMATE_ADDRESS first, if rtx is a valid address, it will nerver goto macro LEGITIMIZE_RELOAD_ADDRESS. PREFERRED_RELOAD_CLASS: this macro is in function push_reload which was mentioned above. and the predicate "zero_register_operand", I defined the rtx with the register NO. being 0. So Everytime, cc1 will abort with information like "rtl unrecognize". Because pseudo register are allocated before reload. I revised the predicate which register NO should be 0 or pseudo. Now It is Ok. But it still has some redundency code. I will keep going. Your guys are so kind. Thank you very much. Thank you for your help again! Best Regards! Daniel.Tian
Re: define_peephol2 insn
To solve the above issue, can I use the "define_peephole2" insn pattern? No. At most you could abuse it to hide the issue some of the time. You probably have one or more of your target macros / hooks wrong, e.g. HARD_REGNO_NREGS. Thank you very much for your reply. In my case, code generation is correct for all test cases without optimization option. Wrong code generated (as mentioned in previous mail) for rare case with optimization + PIC options (ie -O1 -fPIC) enabled only. Thanks Swami
ANNOUNCEMENT: Generic Data Flow Analyzer for GCC
Announcement: gdfa - Generic data flow analyzer for GCC. Developed by: GCC resource center, IITB Patch and the Documentation can be found at the below link, http://www.cse.iitb.ac.in/grc/gdfa.html Ms. Seema S. Ravandale Project Engg, GCC Resource Center Department of Computer Science & Engg. IIT Bombay, Powai, Mumbai 400 076, India. email - se...@cse.iitb.ac.in