Re: libgcc-arch.ver details
On Thu, Mar 18, 2010 at 5:10 PM, Ian Lance Taylor wrote: > > Unlikely. The question here is whether your target uses HFmode. If > it does, you have to arrange to provide the HFmode libgcc functions. > That does not happen automatically. HFmode is a 16-bit floating point > mode; currently the only target which uses that mode is ARM. > > Ian > Thanks for the reply and the reference to ARM. I will look into how it is importing the HF functions. Cheers, -- PMatos
Re: libgcc-arch.ver details
On Thu, Mar 18, 2010 at 5:10 PM, Ian Lance Taylor wrote: > > Unlikely. The question here is whether your target uses HFmode. If > it does, you have to arrange to provide the HFmode libgcc functions. > That does not happen automatically. HFmode is a 16-bit floating point > mode; currently the only target which uses that mode is ARM. > > Ian > After looking into the arm code I am quite confused since even though it uses HF (at least I found references to it in gcc4.5, but not in gcc4.3 or gcc4.4), I can't see how it's importing floatunsihf. In fact, I can't find any reference to a function called floatunsihf anywhere on gccs source code and documentation also doesn't contain any references to it. Am I missing something? -- PMatos
Re: Hash Function for "switch statement"
Hi, the discussion so far did omit one specific aspect. When comparing two implementations for a switch, you have to compare the full code. For the hash you have to include the code to calculate the hash function. This might be more code than a simple tree lookup. The example function: >public int hash32shift(int key) >{ > key = ~key + (key << 15); // key = (key << 15) - key - 1; > key = key ^ (key >>> 12); > key = key + (key << 2); > key = key ^ (key >>> 4); > key = key * 2057; // key = (key + (key << 3)) + (key << 11); > key = key ^ (key >>> 16); > return key; >} has 12 operations. Add a table and verification you get to about 18. That is worse than a tree search with 9 levels. So for all switches with less than 512 elements, the hash is not faster. Erwin
Re: Hash Function for "switch statement"
Unruh, Erwin wrote: Hi, the discussion so far did omit one specific aspect. When comparing two implementations for a switch, you have to compare the full code. For the hash you have to include the code to calculate the hash function. This might be more code than a simple tree lookup. The example function: public int hash32shift(int key) { key = ~key + (key << 15); // key = (key << 15) - key - 1; key = key ^ (key >>> 12); key = key + (key << 2); key = key ^ (key >>> 4); key = key * 2057; // key = (key + (key << 3)) + (key << 11); key = key ^ (key >>> 16); return key; } has 12 operations. Add a table and verification you get to about 18. That is worse than a tree search with 9 levels. So for all switches with less than 512 elements, the hash is not faster. You can't just count operations on modern machines, so this conclusion is not valid, yes, of course the code for computing the hash has to be taken into account, but nothing else than actual benchmarks will give an accurate comparison. Erwin
Help with gimple_build_call
Hi, I am writing a pass in which I need to insert a call to a library function. Specifically I need to call the function "pthread_mutex_lock" from the library "pthread". I am working on GIMPLE, specifically after the "build_cfg" pass. Looking around the documentation, what I found is I will have to use the function gimple_build_call. My question is how do I build the tree fn, and the arguments which have to be provided to gimple_build_call. Do I have to use build_function_type_list and build_fn_decl for building tree fn? The original function call "pthread_mutex_lock" requires an "pthread_mutex_t" type variable as an argument, how can I declare that in the pass, and use it as an argument in gimple_build_call? Thanks, Ashish
Re: Hash Function for "switch statement"
On 03/22/2010 12:43 PM, Robert Dewar wrote: > the code for computing the hash has to be taken into account, but > nothing else than actual benchmarks will give an accurate > comparison. I agree. I'd also like to point out that as it is not very difficult to do these benchmarks, the proponent(s) should produce some numbers before much more discussion takes place. Andrew.
Re: Coloring problem - Pass 0 for finding allocno costs
On 03/18/10 08:30, Frank Isamov wrote: From the h file: #define REG_CLASS_CONTENTS \ { \ {0x, 0x, 0x}, /* NO_REGS*/ \ {0x, 0x, 0x}, /* D_REGS*/ \ {0x, 0x, 0x}, /* R_REGS*/ \ ABI requires use of R registers for arguments and return value. Other than that all of these instructions are more or less symmetrical in sense of using D or R. So, an optimal choice would be use of R for this example. And if D register is chosen, it involves additional copy from R to D and back to R. Define a union class which includes all the registers in D_REGS and R_REGS, then define IRA_COVER_CLASSES to that union class. When register classes are mostly symmetrical, except for stuff like argument passing, return values and the like, you usually get better code by defining IRA_COVER_CLASSES with a single union class rather than the component subclasses. In fact, if the only reason D & R are separate is calling conventions, then I'd just drop them those classes completely and define GENERAL_REGISTERS. You would typically only define separate register classes if there are instructions which have to operate on specific subsets of the register file. Jeff
RE: Understanding Scheduling
> Enabling BB-reorder only if profile info is available, is not the > right way to go. The compiler really doesn't place blocks in sane > places without it -- and it shouldn't have to, either. For example if > you split an edge at some point, the last thing you want to worry > about, is where the new basic block is going to end up. > > There are actually a few bugs in Bugzilla about BB-reorder, FWIW. I've done a few searches in Bugzilla and am not sure if I have found the BB reorder bugs you are referring to. The ones I have found are: 16797: Opportunity to remove unnecessary load instructions 41396: missed space optimization related to basic block reorder 21002: RTL prologue and basic-block reordering pessimizes delay-slot filling. (If you can recall any others, I'd appreciate hearing of them.) Based on 41396, it looks like BB reorder is disabled for -Os. But you said in your post above that "the compiler really doesn't place blocks in sane places without it", so does that mean that we could probably increase performance for -Os if BB reorder was (improved) and enabled for -Os? Cheers, Ian
Re: libgcc-arch.ver details
"Paulo J. Matos" writes: > After looking into the arm code I am quite confused since even though > it uses HF (at least I found references to it in gcc4.5, but not in > gcc4.3 or gcc4.4), I can't see how it's importing floatunsihf. In > fact, I can't find any reference to a function called floatunsihf > anywhere on gccs source code and documentation also doesn't contain > any references to it. See, e.g., LIB2FUNCS_STATIC_EXTRA in config/arm/t-bpabi, and config/arm/fp16.c, and the calls to set_conv_libfunc in arm_init_libfuncs in config/arm/arm.c. Ian
Re: libgcc-arch.ver details
On Mon, Mar 22, 2010 at 5:48 PM, Ian Lance Taylor wrote: > > See, e.g., LIB2FUNCS_STATIC_EXTRA in config/arm/t-bpabi, and > config/arm/fp16.c, and the calls to set_conv_libfunc in > arm_init_libfuncs in config/arm/arm.c. > > Ian > Thanks for the refs once again Ian, that's extremelly helpful. Cheers, -- PMatos
How to get the Tree ARRAY_TYPE declaration size
Hi everyone! I need to get the array size from a declaration like "int v[100]" (here the size is "100"). For example: if (TREE_CODE (TREE_TYPE (var))) == ARRAY_TYPE) { int array_size = // ...here I want to get the size } How can I do? Thank you Max
Re: How to get the Tree ARRAY_TYPE declaration size
Massimo Nazaria writes: > I need to get the array size from a declaration like "int v[100]" (here the > size is "100"). > > For example: > if (TREE_CODE (TREE_TYPE (var))) == ARRAY_TYPE) { > int array_size = // ...here I want to get the size > } Quoting gcc/tree.def: /* Types of arrays. Special fields: TREE_TYPE Type of an array element. TYPE_DOMAINType to index by. Its range of values specifies the array length. The field TYPE_POINTER_TO (TREE_TYPE (array_type)) is always nonzero and holds the type to coerce a value of that array type to in C. TYPE_STRING_FLAG indicates a string (in contrast to an array of chars) in languages (such as Chill) that make a distinction. */ In other words, look at TYPE_DOMAIN. It will often be an INTEGER_TYPE whose TYPE_MIN_VALUE and TYPE_MAX_VALUE give you the minimum and maximum valid array indices. Ian
GCC vs ICC
Hi all, I'm interested in knowing how GCC differs from Intel's ICC in terms of the optimization levels and catering to specific processor architecture. I'm using GCC 4.1.2 20070626 and ICC v11.1 for Linux. How does ICC's optimization levels (O1 to O3) differ from GCC, if they differ at all? The ICC is able to cater specifically to different architectures (IA-32, intel64 and IA-64). I've read that GCC has the -march compiler option which I think is similar, but I can't find a list of the options to use. I'm using Intel Xeon X5570, which is 64-bit. Are there any other GCC compiler options I could use that would cater my applications for 64-bit Intel CPUs? Thank you. Regards, Rayne
Re: GCC vs ICC
On 3/22/2010 7:46 PM, Rayne wrote: Hi all, I'm interested in knowing how GCC differs from Intel's ICC in terms of the optimization levels and catering to specific processor architecture. I'm using GCC 4.1.2 20070626 and ICC v11.1 for Linux. How does ICC's optimization levels (O1 to O3) differ from GCC, if they differ at all? The ICC is able to cater specifically to different architectures (IA-32, intel64 and IA-64). I've read that GCC has the -march compiler option which I think is similar, but I can't find a list of the options to use. I'm using Intel Xeon X5570, which is 64-bit. Are there any other GCC compiler options I could use that would cater my applications for 64-bit Intel CPUs? Some of that seems more topical on the Intel software forum for icc, and the following more topical on either that forum or gcc-help, where you should go for follow-up. If you are using gcc on Xeon 5570, gcc -mtune=barcelona -ffast-math -O3 -msse4.2 might be a comparable level of optimization to icc -xSSE4.2 For gcc 4.1, you would have to set also -ftree-vectorize, but you would be better off with a current version. But, if you are optimizing for early Intel 64-bit Xeon, -mtune=barcelona would not be consistently good, and you could not use -msse4 or -xSSE4.2. For optimization which observes standards and also disables vectorized sum reduction, you would omit -ffast-math for gcc, and set icc -fp-model source. -- Tim Prince