Re: libgcc-arch.ver details

2010-03-22 Thread Paulo J. Matos
On Thu, Mar 18, 2010 at 5:10 PM, Ian Lance Taylor  wrote:
>
> Unlikely.  The question here is whether your target uses HFmode.  If
> it does, you have to arrange to provide the HFmode libgcc functions.
> That does not happen automatically.  HFmode is a 16-bit floating point
> mode; currently the only target which uses that mode is ARM.
>
> Ian
>

Thanks for the reply and the reference to ARM. I will look into how it
is importing the HF functions.

Cheers,
-- 
PMatos


Re: libgcc-arch.ver details

2010-03-22 Thread Paulo J. Matos
On Thu, Mar 18, 2010 at 5:10 PM, Ian Lance Taylor  wrote:
>
> Unlikely.  The question here is whether your target uses HFmode.  If
> it does, you have to arrange to provide the HFmode libgcc functions.
> That does not happen automatically.  HFmode is a 16-bit floating point
> mode; currently the only target which uses that mode is ARM.
>
> Ian
>

After looking into the arm code I am quite confused since even though
it uses HF (at least I found references to it in gcc4.5, but not in
gcc4.3 or gcc4.4), I can't see how it's importing floatunsihf. In
fact, I can't find any reference to a function called floatunsihf
anywhere on gccs source code and documentation also doesn't contain
any references to it.

Am I missing something?

-- 
PMatos


Re: Hash Function for "switch statement"

2010-03-22 Thread Unruh, Erwin
Hi,

the discussion so far did omit one specific aspect. When comparing two 
implementations for a switch, you have to compare the full code. For the hash 
you have to include the code to calculate the hash function. This might be more 
code than a simple tree lookup.
The example function:

>public int hash32shift(int key)
>{
>  key = ~key + (key << 15); // key = (key << 15) - key - 1;
>  key = key ^ (key >>> 12);
>  key = key + (key << 2);
>  key = key ^ (key >>> 4);
>  key = key * 2057; // key = (key + (key << 3)) + (key << 11);
>  key = key ^ (key >>> 16);
>  return key;
>}

has 12 operations. Add a table and verification you get to about 18. That is 
worse than a tree search with 9 levels. So for all switches with less than 512 
elements, the hash is not faster.

Erwin



Re: Hash Function for "switch statement"

2010-03-22 Thread Robert Dewar

Unruh, Erwin wrote:

Hi,

the discussion so far did omit one specific aspect. When comparing two 
implementations for a switch, you have to compare the full code. For the hash 
you have to include the code to calculate the hash function. This might be more 
code than a simple tree lookup.
The example function:


public int hash32shift(int key)
{
 key = ~key + (key << 15); // key = (key << 15) - key - 1;
 key = key ^ (key >>> 12);
 key = key + (key << 2);
 key = key ^ (key >>> 4);
 key = key * 2057; // key = (key + (key << 3)) + (key << 11);
 key = key ^ (key >>> 16);
 return key;
}


has 12 operations. Add a table and verification you get to about 18. That is 
worse than a tree search with 9 levels. So for all switches with less than 512 
elements, the hash is not faster.


You can't just count operations on modern machines, so this conclusion 
is not valid, yes, of course the code for computing the hash has to be

taken into account, but nothing else than actual benchmarks will give
an accurate comparison.


Erwin




Help with gimple_build_call

2010-03-22 Thread ashish jain
Hi,

I am writing a pass in which I need to insert a call to a library
function. Specifically I need to call the function
"pthread_mutex_lock" from the library "pthread". I am working on
GIMPLE, specifically after the "build_cfg" pass.

Looking around the documentation, what I found is I will have to use
the function gimple_build_call. My question is how do I build the tree
fn, and the arguments which have to be provided to gimple_build_call.

Do I have to use build_function_type_list and build_fn_decl for
building tree fn? The original function call "pthread_mutex_lock"
requires an "pthread_mutex_t" type variable as an argument, how can I
declare that in the pass, and use it as an argument in
gimple_build_call?

Thanks,
Ashish


Re: Hash Function for "switch statement"

2010-03-22 Thread Andrew Haley
On 03/22/2010 12:43 PM, Robert Dewar wrote:

> the code for computing the hash has to be taken into account, but
> nothing else than actual benchmarks will give an accurate
> comparison.

I agree.  I'd also like to point out that as it is not very difficult
to do these benchmarks, the proponent(s) should produce some numbers
before much more discussion takes place.

Andrew.


Re: Coloring problem - Pass 0 for finding allocno costs

2010-03-22 Thread Jeff Law

On 03/18/10 08:30, Frank Isamov wrote:


 From the h file:

#define REG_CLASS_CONTENTS  \
  {
 \
{0x, 0x, 0x}, /* NO_REGS*/  \
{0x, 0x, 0x}, /* D_REGS*/  \
{0x, 0x, 0x}, /* R_REGS*/   \

ABI requires use of R registers for arguments and return value. Other
than that all of these instructions are more or less symmetrical in
sense of using D or R. So, an optimal choice would be use of R for
this example. And if D register is chosen, it involves additional copy
from R to D and back to R.
   
Define a union class which includes all the registers in D_REGS and 
R_REGS, then define IRA_COVER_CLASSES to that union class.  When 
register classes are mostly symmetrical, except for stuff like argument 
passing, return values and the like, you usually get better code by 
defining IRA_COVER_CLASSES with a single union class rather than the 
component subclasses.


In fact, if the only reason D & R are separate is calling conventions, 
then I'd just drop them those classes completely and define 
GENERAL_REGISTERS.   You would typically only define separate register 
classes if there are instructions which have to operate on specific 
subsets of the register file.



Jeff


RE: Understanding Scheduling

2010-03-22 Thread Ian Bolton
> Enabling BB-reorder only if profile info is available, is not the
> right way to go. The compiler really doesn't place blocks in sane
> places without it -- and it shouldn't have to, either. For example if
> you split an edge at some point, the last thing you want to worry
> about, is where the new basic block is going to end up.
> 
> There are actually a few bugs in Bugzilla about BB-reorder, FWIW.

I've done a few searches in Bugzilla and am not sure if I have found
the BB reorder bugs you are referring to.

The ones I have found are:

16797: Opportunity to remove unnecessary load instructions
41396: missed space optimization related to basic block reorder
21002: RTL prologue and basic-block reordering pessimizes delay-slot
   filling.

(If you can recall any others, I'd appreciate hearing of them.)

Based on 41396, it looks like BB reorder is disabled for -Os.  But
you said in your post above that "the compiler really doesn't place
blocks in sane places without it", so does that mean that we could
probably increase performance for -Os if BB reorder was (improved)
and enabled for -Os?

Cheers,
Ian


Re: libgcc-arch.ver details

2010-03-22 Thread Ian Lance Taylor
"Paulo J. Matos"  writes:

> After looking into the arm code I am quite confused since even though
> it uses HF (at least I found references to it in gcc4.5, but not in
> gcc4.3 or gcc4.4), I can't see how it's importing floatunsihf. In
> fact, I can't find any reference to a function called floatunsihf
> anywhere on gccs source code and documentation also doesn't contain
> any references to it.

See, e.g., LIB2FUNCS_STATIC_EXTRA in config/arm/t-bpabi, and
config/arm/fp16.c, and the calls to set_conv_libfunc in
arm_init_libfuncs in config/arm/arm.c.

Ian


Re: libgcc-arch.ver details

2010-03-22 Thread Paulo J. Matos
On Mon, Mar 22, 2010 at 5:48 PM, Ian Lance Taylor  wrote:
>
> See, e.g., LIB2FUNCS_STATIC_EXTRA in config/arm/t-bpabi, and
> config/arm/fp16.c, and the calls to set_conv_libfunc in
> arm_init_libfuncs in config/arm/arm.c.
>
> Ian
>

Thanks for the refs once again Ian, that's extremelly helpful.

Cheers,

-- 
PMatos


How to get the Tree ARRAY_TYPE declaration size

2010-03-22 Thread Massimo Nazaria
Hi everyone!

I need to get the array size from a declaration like "int v[100]" (here the 
size is "100").

For example:
  if (TREE_CODE (TREE_TYPE (var))) == ARRAY_TYPE) {
int array_size = // ...here I want to get the size
  }

How can I do?

Thank you
Max







Re: How to get the Tree ARRAY_TYPE declaration size

2010-03-22 Thread Ian Lance Taylor
Massimo Nazaria  writes:

> I need to get the array size from a declaration like "int v[100]" (here the 
> size is "100").
>
> For example:
>   if (TREE_CODE (TREE_TYPE (var))) == ARRAY_TYPE) {
> int array_size = // ...here I want to get the size
>   }

Quoting gcc/tree.def:

/* Types of arrays.  Special fields:
   TREE_TYPE  Type of an array element.
   TYPE_DOMAINType to index by.
Its range of values specifies the array length.
 The field TYPE_POINTER_TO (TREE_TYPE (array_type)) is always nonzero
 and holds the type to coerce a value of that array type to in C.
 TYPE_STRING_FLAG indicates a string (in contrast to an array of chars)
 in languages (such as Chill) that make a distinction.  */


In other words, look at TYPE_DOMAIN.  It will often be an INTEGER_TYPE
whose TYPE_MIN_VALUE and TYPE_MAX_VALUE give you the minimum and
maximum valid array indices.

Ian



GCC vs ICC

2010-03-22 Thread Rayne
Hi all,

I'm interested in knowing how GCC differs from Intel's ICC in terms of the 
optimization levels and catering to specific processor architecture. I'm using 
GCC 4.1.2 20070626 and ICC v11.1 for Linux.

How does ICC's optimization levels (O1 to O3) differ from GCC, if they differ 
at all?

The ICC is able to cater specifically to different architectures (IA-32, 
intel64 and IA-64). I've read that GCC has the -march compiler option which I 
think is similar, but I can't find a list of the options to use. I'm using 
Intel Xeon X5570, which is 64-bit. Are there any other GCC compiler options I 
could use that would cater my applications for 64-bit Intel CPUs?

Thank you.

Regards,
Rayne


  


Re: GCC vs ICC

2010-03-22 Thread Tim Prince

On 3/22/2010 7:46 PM, Rayne wrote:

Hi all,

I'm interested in knowing how GCC differs from Intel's ICC in terms of the 
optimization levels and catering to specific processor architecture. I'm using 
GCC 4.1.2 20070626 and ICC v11.1 for Linux.

How does ICC's optimization levels (O1 to O3) differ from GCC, if they differ 
at all?

The ICC is able to cater specifically to different architectures (IA-32, 
intel64 and IA-64). I've read that GCC has the -march compiler option which I 
think is similar, but I can't find a list of the options to use. I'm using 
Intel Xeon X5570, which is 64-bit. Are there any other GCC compiler options I 
could use that would cater my applications for 64-bit Intel CPUs?

   
Some of that seems more topical on the Intel software forum for icc, and 
the following more topical on either that forum or gcc-help, where you 
should go for follow-up.

If you are using gcc on Xeon 5570,
gcc -mtune=barcelona -ffast-math -O3 -msse4.2
might be a comparable level of optimization to
icc -xSSE4.2
For gcc 4.1, you would have to set also -ftree-vectorize, but you would 
be better off with a current version.
But, if you are optimizing for early Intel 64-bit Xeon, -mtune=barcelona 
would not be consistently good, and you could not use -msse4 or -xSSE4.2.
For optimization which observes standards and also disables vectorized 
sum reduction, you would omit -ffast-math for gcc, and set icc -fp-model 
source.


--
Tim Prince