Re: Adding Leon processor to the SPARC list of processors

2010-11-22 Thread Jorge PEREZ
Geert Bosch wrote:
> On Nov 19, 2010, at 11:53, Eric Botcazou wrote:
>   
>>> Yes, if all the people who want only one set of libraries agree on what
>>> that set shall be (or this can be selected with existing configure flags),
>>> this is the simplest way.
>>>   
>> Yes, this can be selected at configure time with --with-cpu and --with-float.
>>
>> The default configuration is also straightforward: LEON is an implementation 
>> of the SPARC-V8 architecture so --with-cpu=v8 and --with-float=hard.
>> 
>
> There is LEON2, which is V7, and LEON3/LEON4, which are V8.
> While LEON3 can support all of V8 in hardware, LEON3 is a 
> configurable system-on-a-chip, targetting both FPGAs and ASICs, 
> where users can configure and  synthesize different aspects of
> the CPU:
>
> * CONFIG_PROC_NUM: The number of processor cores.
>
> * CONFIG_IU_V8MULDIV: Implements V8 multiply and divide instructions
>   UMUL, UMULCC, SMUL, SMULCC, UDIV, UDIVCC, SDIV, SDIVCC.
>   Costs about 8k gates.
>
> * CONFIG_IU_MUL_MAC: Implements the SPARC V8e UMAC/SMAC
>   (multiply-accumulate) instructions with a 40-bits accumulator
>
> * CONFIG_FPU_ENABLE: Enable or disable floating point unit
>
> Apart from these settings that determine wether instructions are
> present at all, other settings allow selection of FPU implementation
> (trading off between cycle count, area and timing), such as:
>
> * CONFIG_IU_MUL_LATENCY_2: Implementation options for the integer multiplier.
>   TypeImplementation  issue-rate/latency
>   2-clocks32x32 pipelined multiplier 1/2 
>   4-clocks16x16 standard multiplier  4/4
>   5-clocks16x16 pipelined multiplier 4/5
>
> * CONFIG_IU_LDELAY: One cycle load delay for best performance, or 2-cycles
>   to improve timing at the cost of about 5% reduced performance.
>
> CONFIG_FPU_ENABLE Y/N would correspond to --with-float=hard/soft, and
> I believe setting CONFIG_IU_V8MULDIV to Y/N requires --with-cpu=V8/V7,
> is that correct? I think it would make sense to build these as multilibs,
> so the user can experiment to find out performance impacts of
> the various hardware configurations on generated code.
>
> I wonder if it also would be worthwhile to have compiler options
> for fpu=fast/slow and multiply=fast/slow, so we can schedule
> appropriately. For the FPU, issue-rate/latency are as follows:
>   GR FPU:  1/4, with FDIV? 16 and FSQRT? 24 cycles,
> non-pipelined on separate unit
>   GR FPU Lite: 8/8, with FDIVS/FDIVD/FSQRTS/FSQRTD 31/57/46/57 cycles,
> non-pipelined on same unit
>
> While the FPU Lite is not pipelined, integer instructions can be
> executed in parallel with a FPU instruction as long as no new FPU
> instructions are pending.
>
>   -Geert
>
>   
Just a humble opinion: Geert points out a very important fact, LEON's
RTL is very configurable and if the compiler takes away such flexibility
could be a bit of a pitty. Maybe the user should always have the choice
to implement in software or hardware any given configuration.


Jorge




Re: Adding Leon processor to the SPARC list of processors

2010-11-22 Thread Konrad Eisele
Eric Botcazou wrote:
>> How do you see this impacting the sparc-rtems target?
>>
>> We have v7/v8 with HW and SW FP multilibs now and
>> leon is important to us. :-D
> 
> Note that LEON will also be available as mere default cpu, i.e. you'll be 
> able 
> to configure sparc-rtems --with-tune=leon.  The new multilib stuff is for the 
> default target sparc-leon-elf (and maybe sparc-leon-linux if we want one).
> 

I agree. The patch I submitted only adds some extras. It shouldnt have a impact 
on
the sparc-rtems target (or others).




Re: Adding Leon processor to the SPARC list of processors

2010-11-22 Thread Konrad Eisele
Eric Botcazou wrote:
>> Yes, if all the people who want only one set of libraries agree on what
>> that set shall be (or this can be selected with existing configure flags),
>> this is the simplest way.
> 
> Yes, this can be selected at configure time with --with-cpu and --with-float.
> 
> The default configuration is also straightforward: LEON is an implementation 
> of the SPARC-V8 architecture so --with-cpu=v8 and --with-float=hard.
> 
>> Also, it might happen that someone doesn't want one multilib dimension, but
>> they want to keep another one.
> 
> Indeed, being able to partially disable multilibs would be nice.
> 

I would suggest a simple solution:
I can have 5 --with-cpu configure possibilies:

1. single-lib explicit selection:
 - --with-cpu=sfsparcleon: v7/soft |
 - --with-cpu=sfsparcleonv8  : v8/soft |
 - --with-cpu=hfsparcleon: v7/hard |
 - --with-cpu=hfsparcleonv8  : v8/hard |

2. generic multilib:
 - --with-cpu=leon   : defaults to v7/hard
   use [-mcpu=v8 / -msoft-float ]
   at compile-time to select the hardware setting.

Is this a practical approach? It would only
require one extra file, say "gcc/sparc/config/t-leon-multilib" that
enables multilib and is included with configure when --with-cpu=leon is given.

I'll prepare a patch that provides such a setup.

-- Greetings Konrad



Re: Adding Leon processor to the SPARC list of processors

2010-11-22 Thread Eric Botcazou
> CONFIG_FPU_ENABLE Y/N would correspond to --with-float=hard/soft, and
> I believe setting CONFIG_IU_V8MULDIV to Y/N requires --with-cpu=V8/V7,
> is that correct? I think it would make sense to build these as multilibs,
> so the user can experiment to find out performance impacts of
> the various hardware configurations on generated code.
>
> I wonder if it also would be worthwhile to have compiler options
> for fpu=fast/slow and multiply=fast/slow, so we can schedule
> appropriately. For the FPU, issue-rate/latency are as follows:
>   GR FPU:  1/4, with FDIV? 16 and FSQRT? 24 cycles,
> non-pipelined on separate unit
>   GR FPU Lite: 8/8, with FDIVS/FDIVD/FSQRTS/FSQRTD 31/57/46/57 cycles,
> non-pipelined on same unit

Let's not make this too complex for a first try, the settings used at AdaCore 
seem a good starting point to me.

-- 
Eric Botcazou


Re: Adding Leon processor to the SPARC list of processors

2010-11-22 Thread Konrad Eisele
>> * CONFIG_IU_MUL_LATENCY_2: Implementation options for the integer multiplier.
>>   TypeImplementation  issue-rate/latency
>>   2-clocks32x32 pipelined multiplier 1/2 
>>   4-clocks16x16 standard multiplier  4/4
>>   5-clocks16x16 pipelined multiplier 4/5

I'm not shure how I should model this in gcc. I'm not that familiar with
the gcc internals. Maybe someone could assist me?

>>   GR FPU:  1/4, with FDIV? 16 and FSQRT? 24 cycles,
>> non-pipelined on separate unit
>>   GR FPU Lite: 8/8, with FDIVS/FDIVD/FSQRTS/FSQRTD 31/57/46/57 cycles,
>> non-pipelined on same unit

I could add a tune option that would switch the processor cost  struct for 
FPU/FPU-lite.

-- Greetings Konrad


Re: Adding Leon processor to the SPARC list of processors

2010-11-22 Thread Eric Botcazou
> I would suggest a simple solution:
> I can have 5 --with-cpu configure possibilies:
>
> 1. single-lib explicit selection:
>  - --with-cpu=sfsparcleon: v7/soft |
>  - --with-cpu=sfsparcleonv8  : v8/soft |
>  - --with-cpu=hfsparcleon: v7/hard |
>  - --with-cpu=hfsparcleonv8  : v8/hard |

--with-cpu isn't really appropriate for this, we already have --with-cpu=v7/v8 
and --with-float=soft/hard and --disable-multilib.

> 2. generic multilib:
>  - --with-cpu=leon   : defaults to v7/hard
>use [-mcpu=v8 / -msoft-float ]
>at compile-time to select the hardware
> setting.

--with-cpu shouldn't change multilibs.  Multilibs are a property of a target, 
e.g. sparc-leon-elf or sparc-rtems, not that of a cpu setting.

-- 
Eric Botcazou


Re: Adding Leon processor to the SPARC list of processors

2010-11-22 Thread Konrad Eisele
Eric Botcazou wrote:
>> I would suggest a simple solution:
>> I can have 5 --with-cpu configure possibilies:
>>
>> 1. single-lib explicit selection:
>>  - --with-cpu=sfsparcleon: v7/soft |
>>  - --with-cpu=sfsparcleonv8  : v8/soft |
>>  - --with-cpu=hfsparcleon: v7/hard |
>>  - --with-cpu=hfsparcleonv8  : v8/hard |
> 
> --with-cpu isn't really appropriate for this, we already have 
> --with-cpu=v7/v8 
> and --with-float=soft/hard and --disable-multilib.

Still I need to select sparc_cpu and leon.md too. I could then add -mtune=leon 
at
compiletime to switch sparc_cpu, but the I have to give -mtune=leon every time.
I would like to be able to make it the default. With just
 [ --with-cpu=v7/v8 | --with-float=soft/hard | --disable-multilib ]  to 
configure
you cant.
So then my suggestion would be to use tripple
 [ --with-cpu=sparcleonv7/sparcleonv8 | --with-float=soft/hard | 
--disable-multilib ]
to configure. And add the 2 cpu types sparcleonv7,sparcleonv8 that would replace
v7/v8.
Does this sound good?

-- Greetings Konrad






Re: Adding Leon processor to the SPARC list of processors

2010-11-22 Thread Konrad Eisele
Eric Botcazou wrote:
>> CONFIG_FPU_ENABLE Y/N would correspond to --with-float=hard/soft, and
>> I believe setting CONFIG_IU_V8MULDIV to Y/N requires --with-cpu=V8/V7,
>> is that correct? I think it would make sense to build these as multilibs,
>> so the user can experiment to find out performance impacts of
>> the various hardware configurations on generated code.
>>
>> I wonder if it also would be worthwhile to have compiler options
>> for fpu=fast/slow and multiply=fast/slow, so we can schedule
>> appropriately. For the FPU, issue-rate/latency are as follows:
>>   GR FPU:  1/4, with FDIV? 16 and FSQRT? 24 cycles,
>> non-pipelined on separate unit
>>   GR FPU Lite: 8/8, with FDIVS/FDIVD/FSQRTS/FSQRTD 31/57/46/57 cycles,
>> non-pipelined on same unit
> 
> Let's not make this too complex for a first try, the settings used at AdaCore 
> seem a good starting point to me.
> 

I Agree



Re: Adding Leon processor to the SPARC list of processors

2010-11-22 Thread Konrad Eisele
Hi,
Appended is a new patch, this time against svn://gcc.gnu.org/svn/gcc/trunk.

Following the recent comments by Eric, the patch now sketches the
following setup:

If multi-lib is wanted:
 configure --with-cpu=leon ... : creates multilib-dir soft|v8 
combinations using [-msoft-float|-mcpu=sparcleonv8]
 (MULTILIB_OPTIONS = msoft-float 
mcpu=sparcleonv8)

If Single-lib is wanted:
 configure --with-cpu=sparcleonv7 --with-float=soft --disable-multilib ...  : 
(v7 | soft | no-multilib)
 configure --with-cpu=sparcleonv8 --with-float=soft --disable-multilib ...  : 
(v8 | soft | no-multilib)
 configure --with-cpu=sparcleonv7 --with-float=hard --disable-multilib ...  : 
(v7 | hard | no-multilib)
 configure --with-cpu=sparcleonv8 --with-float=hard --disable-multilib ...  : 
(v8 | hard | no-multilib)

Using --with-cpu=leon|sparcleonv7|sparcleonv8 the the sparc_cpu is switched to 
PROCESSOR_LEON.

If this sheme is ok, i'll test it more thoroughly to check that the various 
version
create the right output...
Please comment.

-- Greetings Konrad



Konrad Eisele wrote:
> Hello,
> Jiri Gaisler has now signed the FSF copyleft (it took quite long to get
> through the procedure) and I was said that I could post the patches
> now.
> 
> The patches are straightforward I think.
> 1. Adds machine description gcc-4.4.2/gcc/config/sparc/leon.md
> 2. gcc-4.4.2.ori/gcc/config/sparc/sparc.c:
>+ adds leon_costs struct.
>+ 4 target CPUs are added:
>  sparchfleon  : hard float v7
>  sparchfleonv8: hard float v8
>  sparcsfleon  : soft float v7
>  sparcsfleonv8: soft float v8
>+ 1 cpu type: PROCESSOR_LEON
>  that is called "leon" in sparc.md
> 3. gcc-4.4.2.ori/gcc/config/sparc/sparc.h:
>add the 4 target cpu defines
> 4. gcc-4.4.2.ori/gcc/config/sparc/sparc.md:
>define cpu "leon" and include "leon.md"
> 5. gcc-4.4.2/gcc/config/sparc/t-leon:
>makefile template for leon
> 6. gcc-4.4.2/gcc/config.gcc:
>include t-leon for sparc[sf|hf]leon[v8].
> 
> They dont interfere with current code. If I should change something,
> please let me know or maybe here is something I didnt think of...
> 
>> Leon is a conforming implementation of the SPARC V7/V8 architecture so it
>> should be possible to support it alongside the other SPARC implementations in
>> the SPARC back-end of the mainline compiler.  I'd be happy to review patches
>> to this effect (and I presume the other SPARC maintainers are OK with this).
>>
>> So I'd suggest that Luís Vitório and/or Konrad do the required paperwork, and
>> then start to post their patches on the gcc-patches@ list.  I'll sponsor them
>> for write access at that point.
>>
>> -- Eric Botcazou
> 
> I come back to the offer of Eric: if the patches are approved I'd be
> greatfull if you could check them in.
> 
> -- Thanks Konrad
> 
> 
> 
> To verify (if someone is interested):
> I have created a crosstool-ng based install script that will build the 4
> sparc-leon cross-compilers:
> 
> $wget ftp://gaisler.com/gaisler.com/linux/linuxbuild/linuxbuild-0.0.3.tar.bz2
> $tar xvf linuxbuild-0.0.3.tar.bz2
> $cd linuxbuild-0.0.3
> $make help
> $make cts
> 
> This will create /opt/sparc-linux-toolchains/{hfleon,hfleonv8,sfleon,sfleonv8}
> (Write premissions needed for /opt/sparc-linux-toolchains/).
> 
> The crosstool-ng script uses --with-cpu=sparc[sf|hf]leon[v8] to select
> the desired proc type.
> 
> 
> 
> 
> 


Index: gcc/gcc/config.gcc
===
--- gcc/gcc/config.gcc	(revision 167027)
+++ gcc/gcc/config.gcc	(working copy)
@@ -3437,6 +3437,9 @@
 			| v9 | ultrasparc | ultrasparc3 | niagara | niagara2)
 # OK
 ;;
+			sparcleonv7 | sparcleonv8 | leon)
+tmake_file="${tmake_file} sparc/t-leon"
+;;
 			*)
 echo "Unknown cpu used in --with-$which=$val" 1>&2
 exit 1
Index: gcc/gcc/config/sparc/sparc.md
===
--- gcc/gcc/config/sparc/sparc.md	(revision 167027)
+++ gcc/gcc/config/sparc/sparc.md	(working copy)
@@ -103,6 +103,7 @@
   "v7,
cypress,
v8,
+   leon,
supersparc,
sparclite,f930,f934,
hypersparc,sparclite86x,
@@ -344,6 +345,7 @@
 (include "ultra3.md")
 (include "niagara.md")
 (include "niagara2.md")
+(include "leon.md")
 
 
 ;; Operand and operator predicates and constraints
Index: gcc/gcc/config/sparc/sparc.c
===
--- gcc/gcc/config/sparc/sparc.c	(revision 167027)
+++ gcc/gcc/config/sparc/sparc.c	(working copy)
@@ -249,6 +249,30 @@
   0, /* shift penalty */
 };
 
+static const
+struct processor_costs leon_costs = {
+  COSTS_N_INSNS (1), /* int load */
+  COSTS_N_INSNS (1), /* int signed load */
+  COSTS_N_INSNS (1), /* int zeroed load */
+  COSTS_N_INSNS (1), /* float load */
+  COSTS_N_INSNS (1), /* fmov, fneg, fabs */
+  COSTS_N_INSNS (1), /* fadd, fsub */
+  COSTS_N_INSNS (1), /* fcmp */
+  COSTS_N_INSNS (1), 

more robust debug_bb?

2010-11-22 Thread Basile Starynkevitch
Hello All,

While debugging a MELT pass, I am sigsegv in debug_bb.

The culprit is check_bb_profile which starts with 
 if (profile_status == PROFILE_ABSENT)
return;
and we have in basic-block.h
#define profile_status  (cfun->cfg->x_profile_status)

and unfortunately, my pass is a SIMPLE IPA pass so don't have any cfun.

Would a patch defining

#define profile_status  (cfun?(cfun->cfg->x_profile_status):PROFILE_ABSENT)

be acceptable?

Cheers.
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Re: more robust debug_bb?

2010-11-22 Thread Richard Guenther
On Mon, Nov 22, 2010 at 4:43 PM, Basile Starynkevitch
 wrote:
> Hello All,
>
> While debugging a MELT pass, I am sigsegv in debug_bb.
>
> The culprit is check_bb_profile which starts with
>  if (profile_status == PROFILE_ABSENT)
>    return;
> and we have in basic-block.h
> #define profile_status          (cfun->cfg->x_profile_status)
>
> and unfortunately, my pass is a SIMPLE IPA pass so don't have any cfun.
>
> Would a patch defining
>
> #define profile_status  (cfun?(cfun->cfg->x_profile_status):PROFILE_ABSENT)
>
> be acceptable?

Huh, no.  Just watch were you are calling debug_bb from (or
even better, rewrite it with python).

Richard.

> Cheers.
> --
> Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
> email: basilestarynkevitchnet mobile: +33 6 8501 2359
> 8, rue de la Faiencerie, 92340 Bourg La Reine, France
> *** opinions {are only mines, sont seulement les miennes} ***
>


Re: more robust debug_bb?

2010-11-22 Thread Basile Starynkevitch
On Mon, 22 Nov 2010 17:28:21 +0100
Richard Guenther  wrote:

> On Mon, Nov 22, 2010 at 4:43 PM, Basile Starynkevitch
>  wrote:
> > Hello All,
> >
> > While debugging a MELT pass, I am sigsegv in debug_bb.
> >
> > The culprit is check_bb_profile which starts with
> >  if (profile_status == PROFILE_ABSENT)
> >    return;
> > and we have in basic-block.h
> > #define profile_status          (cfun->cfg->x_profile_status)
> >
> > and unfortunately, my pass is a SIMPLE IPA pass so don't have any cfun.
> >
> > Would a patch defining
> >
> > #define profile_status  (cfun?(cfun->cfg->x_profile_status):PROFILE_ABSENT)
> >
> > be acceptable?
> 
> Huh, no.  Just watch were you are calling debug_bb from (or
> even better, rewrite it with python).


I'm calling debug_bb from MELT code, so python is not really possible
(and I guess you mean python inside gdb). [the details are probably in
comments inside the gcc/testsuite/melt/topengpu-1.c file of the MELT
branch and the debug_bb is called from opengpudetect_exec function
inside gcc/melt/xtramelt-opengpu.melt rev 167035 of the melt branch]

My feeling is that debug printing routines are not mainly for gurus
like Richard Guenther or Diego Novillo or any other global reviewer or
top GCC expert (I would imagine neither Richard nor Diego needs them or
uses them), but for newbies.  And newbies make bugs. We can't change
that sad fact (unless you consider unethical solutions like killing
newbies, but I am against such solutions :-) because I might be the
victim!).

And debug printing is a common way to help find bugs.

Also, by definition, debug printing (& also dump printing) routines are
never called when GCC is used normally. They are only useful for people
hunting bugs (e.g. inside their plugins) or for newbies trying to
understand the internals of GCC.

A huge "production" compilation (e.g. by people compiling the kernel,
LibreOffice, Mozilla or Google proprietary code) never call any debug_*
debug-printing or dump_* dump-printing routine.

But people (e.g. newbies) making bugs are, perhaps naively, expecting
the debug printing routines to be robust, and not crash when given data
comming from inside GCC. 

Or can we consider adding something like
  if (!cfun) 
return;
at start of check_bb_profile () in cfg.c, or at least replacing the last
 check_bb_profile (bb, buffer->buffer->stream);
in function dump_bb_header of gimple-pretty-print.c with
 if (cfun)
 check_bb_profile (bb, buffer->buffer->stream);

If we require a valid cfun from debug_bb, we should at least add a
gcc_assert in it.

BTW, I am just trying to understand how to code a SIMPLE IPA pass (and
where to place it).


Please GCC gurus, accept the sad fact that some people (me included)
understand GCC less than you, and precisely these people need debug
printing. You probably don't need these routines, they are to help
newbies! So they need to be reasonably robust. Don't forget that
newbies make bugs, more than you do.

Otherwise, please add at least a comment in header files explaining
when a debug_* or dump_* routine can be used (and even more when it
cannot be used)! I am not able to explain that!

Cheers.
-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***


Re: more robust debug_bb?

2010-11-22 Thread Ian Lance Taylor
Basile Starynkevitch  writes:

> or at least replacing the last
>  check_bb_profile (bb, buffer->buffer->stream);
> in function dump_bb_header of gimple-pretty-print.c with
>  if (cfun)
>  check_bb_profile (bb, buffer->buffer->stream);

I think something like this is the way to go.  The debugging code should
be as robust as possible.

Ian


Re: more robust debug_bb?

2010-11-22 Thread Diego Novillo
On Mon, Nov 22, 2010 at 12:31, Ian Lance Taylor  wrote:
> Basile Starynkevitch  writes:
>
>> or at least replacing the last
>>      check_bb_profile (bb, buffer->buffer->stream);
>> in function dump_bb_header of gimple-pretty-print.c with
>>      if (cfun)
>>              check_bb_profile (bb, buffer->buffer->stream);
>
> I think something like this is the way to go.  The debugging code should
> be as robust as possible.

Agreed.


Diego.


Method to disable code SSE2 generation but still use -msse2

2010-11-22 Thread David Mathog
My software implementation of SSE2 now passes all the testsuite
programs. In case anybody else ever needs this, it is here: 

http://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/soft_emmintrin.h

I compiled that with a target program and gprof showed
all the time in resulting binary in the inlined functions.  It ran about
4X slower than the SSE2 hardware version, which is about what I
expected.  So, so far so good.  What I am worried about now is that
since it was invoked with "-msse2" the compiler may still be generating
SSE2 calls within the inlined functions.  Is there a way to definitively
disable this but still retain -msse2 on the command line?  

For instance, here is one of the software version inline functions:

/*  vector subtract the two doubles in an __m128d  */
static __inline __m128d __attribute__((__always_inline__))
_mm_sub_pd (__m128d __A, __m128d __B)
{
  return (__m128d)((__v2df)__A - (__v2df)__B);
}

In the original gcc emmintrin.h that called a builtin _explicitly_.  I
also want to avoid having the compiler use the same builtin
_implicitly_.  If it uses SSE, 3DNOW or MMX implicitly, in this example,
that would be fine, it just cannot use any SSE2 hardware.

Actually, one thing I was never very clear on, do -msse2 -m3dnow
etc. only provide access to the corresponding machine operations through
the _mm* (or whatever) definitions in the header file, or does the
compiler also figure out vector operations by itself during the
optimization phase of compilation?

Thank you,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


Re: Method to disable code SSE2 generation but still use -msse2

2010-11-22 Thread Ian Lance Taylor
"David Mathog"  writes:

> I compiled that with a target program and gprof showed
> all the time in resulting binary in the inlined functions.  It ran about
> 4X slower than the SSE2 hardware version, which is about what I
> expected.  So, so far so good.  What I am worried about now is that
> since it was invoked with "-msse2" the compiler may still be generating
> SSE2 calls within the inlined functions.  Is there a way to definitively
> disable this but still retain -msse2 on the command line?  

No.  If I understand what you are doing, I don't think you want to use
-msse2 at all.  In fact I think you want -mno-sse2.

> Actually, one thing I was never very clear on, do -msse2 -m3dnow
> etc. only provide access to the corresponding machine operations through
> the _mm* (or whatever) definitions in the header file, or does the
> compiler also figure out vector operations by itself during the
> optimization phase of compilation?

The latter: the compiler also figures out vector operations by itself,
particularly if you use the -ftree-vectorize option.

Ian


Re: Method to disable code SSE2 generation but still use -msse2

2010-11-22 Thread Xinliang David Li
As Ian said, you want to make your emulation inline functions
available when __SSE2__ macro is not defined so that you get the
definitions when -msse2 is not specified, but not getting them when
-msse2 is specified. In the future, gcc may be enhanced to exposed
those mm intrinsics unconditionally (regardless of weather -mssex is
defined or not), you may have a problem here due to name conflicts...

David

On Mon, Nov 22, 2010 at 2:33 PM, David Mathog  wrote:
> My software implementation of SSE2 now passes all the testsuite
> programs. In case anybody else ever needs this, it is here:
>
> http://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/soft_emmintrin.h
>
> I compiled that with a target program and gprof showed
> all the time in resulting binary in the inlined functions.  It ran about
> 4X slower than the SSE2 hardware version, which is about what I
> expected.  So, so far so good.  What I am worried about now is that
> since it was invoked with "-msse2" the compiler may still be generating
> SSE2 calls within the inlined functions.  Is there a way to definitively
> disable this but still retain -msse2 on the command line?
>
> For instance, here is one of the software version inline functions:
>
> /*  vector subtract the two doubles in an __m128d  */
> static __inline __m128d __attribute__((__always_inline__))
> _mm_sub_pd (__m128d __A, __m128d __B)
> {
>  return (__m128d)((__v2df)__A - (__v2df)__B);
> }
>
> In the original gcc emmintrin.h that called a builtin _explicitly_.  I
> also want to avoid having the compiler use the same builtin
> _implicitly_.  If it uses SSE, 3DNOW or MMX implicitly, in this example,
> that would be fine, it just cannot use any SSE2 hardware.
>
> Actually, one thing I was never very clear on, do -msse2 -m3dnow
> etc. only provide access to the corresponding machine operations through
> the _mm* (or whatever) definitions in the header file, or does the
> compiler also figure out vector operations by itself during the
> optimization phase of compilation?
>
> Thank you,
>
> David Mathog
> mat...@caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
>


Re: Method to disable code SSE2 generation but still use -msse2

2010-11-22 Thread David Mathog
Ian Lance Taylor wrote:

> No.  If I understand what you are doing, I don't think you want to use
> -msse2 at all.  In fact I think you want -mno-sse2.

Following your suggestion mo-sse2 was tried, which generated an error
message well beyond my comprehension:

gcc -std=gnu99 -g -pg -pthread -O4 -DSOFT_SSE2 -msse -mno-sse2 
-DHAVE_CONFIG_H  -I../../easel -I../../easel -I. -I.. -I. -I../../src -o
msvfilter.o -c msvfilter.c
msvfilter.c: In function 'p7_MSVFilter':
msvfilter.c:208: error: unable to find a register to spill in class
'GENERAL_REGS'
msvfilter.c:208: error: this is the insn:
(insn:HI 3569 3568 3570 302 ../../easel/emmintrin.h:2334 (set
(strict_low_part (subreg:HI (reg:TI 1514) 0))
(mem:HI (plus:SI (reg/f:SI 20 frame)
(const_int -30 [0xffe2])) [14 S2 A16])) 40
{*movstricthi_1} (insn_list:REG_DEP_TRUE 3568 (nil))
(nil))
msvfilter.c:208: confused by earlier errors, bailing out
make: *** [msvfilter.o] Error 1

line 208 in msvfilter.c is the closing "}" on the p7_MSVFilter function.

line 2334 in emmintrin.h is the return statement in the snippet below

static __inline __m128i __attribute__((__always_inline__))
 _mm_shufflelo_epi16(__m128i __A, int __B){
 __v8hi __tmp  = { EMM_UINT2(__A)[__B& 3],
   EMM_UINT2(__A)[__B>>2 & 3],
   EMM_UINT2(__A)[__B>>4 & 3],
   EMM_UINT2(__A)[__B>>6 & 3],
   EMM_UINT2(__A)[4],
   EMM_UINT2(__A)[5],
   EMM_UINT2(__A)[6],
   EMM_UINT2(__A)[7]};
  return (__m128i) __tmp;
}

where HMM_UINT2 is this:

#define EMM_UINT2(a)   ((unsigned short *)&(a))

If -mno-sse2 is changed to -msse2 that compile completes without errors
or warnings.

gcc --version is: gcc (GCC) 4.2.3 (4.2.3-6mnb1)

What does that compiler error mean?

Thanks,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech


Re: Method to disable code SSE2 generation but still use -msse2

2010-11-22 Thread Ian Lance Taylor
"David Mathog"  writes:

> Following your suggestion mo-sse2 was tried, which generated an error
> message well beyond my comprehension:
>
> gcc -std=gnu99 -g -pg -pthread -O4 -DSOFT_SSE2 -msse -mno-sse2 
> -DHAVE_CONFIG_H  -I../../easel -I../../easel -I. -I.. -I. -I../../src -o
> msvfilter.o -c msvfilter.c
> msvfilter.c: In function 'p7_MSVFilter':
> msvfilter.c:208: error: unable to find a register to spill in class
> 'GENERAL_REGS'
> msvfilter.c:208: error: this is the insn:
> (insn:HI 3569 3568 3570 302 ../../easel/emmintrin.h:2334 (set
> (strict_low_part (subreg:HI (reg:TI 1514) 0))
> (mem:HI (plus:SI (reg/f:SI 20 frame)
> (const_int -30 [0xffe2])) [14 S2 A16])) 40
> {*movstricthi_1} (insn_list:REG_DEP_TRUE 3568 (nil))
> (nil))
> msvfilter.c:208: confused by earlier errors, bailing out
> make: *** [msvfilter.o] Error 1

This means that gcc was unable to store a __m128i value in the general
purpose registers.  It did not want to use the SSE2 registers because
you ruled out -msse2, which I assume is correct behaviour for what you
are trying to do.  It does seem likely that SSE2 code will stress out
the register allocator if it can't use the SSE2 registers.  That said, I
don't know offhand whether this is a bug or whether the scenario is
simply impossible to implement.

Ian