Re: Adding Leon processor to the SPARC list of processors
Geert Bosch wrote: > On Nov 19, 2010, at 11:53, Eric Botcazou wrote: > >>> Yes, if all the people who want only one set of libraries agree on what >>> that set shall be (or this can be selected with existing configure flags), >>> this is the simplest way. >>> >> Yes, this can be selected at configure time with --with-cpu and --with-float. >> >> The default configuration is also straightforward: LEON is an implementation >> of the SPARC-V8 architecture so --with-cpu=v8 and --with-float=hard. >> > > There is LEON2, which is V7, and LEON3/LEON4, which are V8. > While LEON3 can support all of V8 in hardware, LEON3 is a > configurable system-on-a-chip, targetting both FPGAs and ASICs, > where users can configure and synthesize different aspects of > the CPU: > > * CONFIG_PROC_NUM: The number of processor cores. > > * CONFIG_IU_V8MULDIV: Implements V8 multiply and divide instructions > UMUL, UMULCC, SMUL, SMULCC, UDIV, UDIVCC, SDIV, SDIVCC. > Costs about 8k gates. > > * CONFIG_IU_MUL_MAC: Implements the SPARC V8e UMAC/SMAC > (multiply-accumulate) instructions with a 40-bits accumulator > > * CONFIG_FPU_ENABLE: Enable or disable floating point unit > > Apart from these settings that determine wether instructions are > present at all, other settings allow selection of FPU implementation > (trading off between cycle count, area and timing), such as: > > * CONFIG_IU_MUL_LATENCY_2: Implementation options for the integer multiplier. > TypeImplementation issue-rate/latency > 2-clocks32x32 pipelined multiplier 1/2 > 4-clocks16x16 standard multiplier 4/4 > 5-clocks16x16 pipelined multiplier 4/5 > > * CONFIG_IU_LDELAY: One cycle load delay for best performance, or 2-cycles > to improve timing at the cost of about 5% reduced performance. > > CONFIG_FPU_ENABLE Y/N would correspond to --with-float=hard/soft, and > I believe setting CONFIG_IU_V8MULDIV to Y/N requires --with-cpu=V8/V7, > is that correct? I think it would make sense to build these as multilibs, > so the user can experiment to find out performance impacts of > the various hardware configurations on generated code. > > I wonder if it also would be worthwhile to have compiler options > for fpu=fast/slow and multiply=fast/slow, so we can schedule > appropriately. For the FPU, issue-rate/latency are as follows: > GR FPU: 1/4, with FDIV? 16 and FSQRT? 24 cycles, > non-pipelined on separate unit > GR FPU Lite: 8/8, with FDIVS/FDIVD/FSQRTS/FSQRTD 31/57/46/57 cycles, > non-pipelined on same unit > > While the FPU Lite is not pipelined, integer instructions can be > executed in parallel with a FPU instruction as long as no new FPU > instructions are pending. > > -Geert > > Just a humble opinion: Geert points out a very important fact, LEON's RTL is very configurable and if the compiler takes away such flexibility could be a bit of a pitty. Maybe the user should always have the choice to implement in software or hardware any given configuration. Jorge
Re: Adding Leon processor to the SPARC list of processors
Eric Botcazou wrote: >> How do you see this impacting the sparc-rtems target? >> >> We have v7/v8 with HW and SW FP multilibs now and >> leon is important to us. :-D > > Note that LEON will also be available as mere default cpu, i.e. you'll be > able > to configure sparc-rtems --with-tune=leon. The new multilib stuff is for the > default target sparc-leon-elf (and maybe sparc-leon-linux if we want one). > I agree. The patch I submitted only adds some extras. It shouldnt have a impact on the sparc-rtems target (or others).
Re: Adding Leon processor to the SPARC list of processors
Eric Botcazou wrote: >> Yes, if all the people who want only one set of libraries agree on what >> that set shall be (or this can be selected with existing configure flags), >> this is the simplest way. > > Yes, this can be selected at configure time with --with-cpu and --with-float. > > The default configuration is also straightforward: LEON is an implementation > of the SPARC-V8 architecture so --with-cpu=v8 and --with-float=hard. > >> Also, it might happen that someone doesn't want one multilib dimension, but >> they want to keep another one. > > Indeed, being able to partially disable multilibs would be nice. > I would suggest a simple solution: I can have 5 --with-cpu configure possibilies: 1. single-lib explicit selection: - --with-cpu=sfsparcleon: v7/soft | - --with-cpu=sfsparcleonv8 : v8/soft | - --with-cpu=hfsparcleon: v7/hard | - --with-cpu=hfsparcleonv8 : v8/hard | 2. generic multilib: - --with-cpu=leon : defaults to v7/hard use [-mcpu=v8 / -msoft-float ] at compile-time to select the hardware setting. Is this a practical approach? It would only require one extra file, say "gcc/sparc/config/t-leon-multilib" that enables multilib and is included with configure when --with-cpu=leon is given. I'll prepare a patch that provides such a setup. -- Greetings Konrad
Re: Adding Leon processor to the SPARC list of processors
> CONFIG_FPU_ENABLE Y/N would correspond to --with-float=hard/soft, and > I believe setting CONFIG_IU_V8MULDIV to Y/N requires --with-cpu=V8/V7, > is that correct? I think it would make sense to build these as multilibs, > so the user can experiment to find out performance impacts of > the various hardware configurations on generated code. > > I wonder if it also would be worthwhile to have compiler options > for fpu=fast/slow and multiply=fast/slow, so we can schedule > appropriately. For the FPU, issue-rate/latency are as follows: > GR FPU: 1/4, with FDIV? 16 and FSQRT? 24 cycles, > non-pipelined on separate unit > GR FPU Lite: 8/8, with FDIVS/FDIVD/FSQRTS/FSQRTD 31/57/46/57 cycles, > non-pipelined on same unit Let's not make this too complex for a first try, the settings used at AdaCore seem a good starting point to me. -- Eric Botcazou
Re: Adding Leon processor to the SPARC list of processors
>> * CONFIG_IU_MUL_LATENCY_2: Implementation options for the integer multiplier. >> TypeImplementation issue-rate/latency >> 2-clocks32x32 pipelined multiplier 1/2 >> 4-clocks16x16 standard multiplier 4/4 >> 5-clocks16x16 pipelined multiplier 4/5 I'm not shure how I should model this in gcc. I'm not that familiar with the gcc internals. Maybe someone could assist me? >> GR FPU: 1/4, with FDIV? 16 and FSQRT? 24 cycles, >> non-pipelined on separate unit >> GR FPU Lite: 8/8, with FDIVS/FDIVD/FSQRTS/FSQRTD 31/57/46/57 cycles, >> non-pipelined on same unit I could add a tune option that would switch the processor cost struct for FPU/FPU-lite. -- Greetings Konrad
Re: Adding Leon processor to the SPARC list of processors
> I would suggest a simple solution: > I can have 5 --with-cpu configure possibilies: > > 1. single-lib explicit selection: > - --with-cpu=sfsparcleon: v7/soft | > - --with-cpu=sfsparcleonv8 : v8/soft | > - --with-cpu=hfsparcleon: v7/hard | > - --with-cpu=hfsparcleonv8 : v8/hard | --with-cpu isn't really appropriate for this, we already have --with-cpu=v7/v8 and --with-float=soft/hard and --disable-multilib. > 2. generic multilib: > - --with-cpu=leon : defaults to v7/hard >use [-mcpu=v8 / -msoft-float ] >at compile-time to select the hardware > setting. --with-cpu shouldn't change multilibs. Multilibs are a property of a target, e.g. sparc-leon-elf or sparc-rtems, not that of a cpu setting. -- Eric Botcazou
Re: Adding Leon processor to the SPARC list of processors
Eric Botcazou wrote: >> I would suggest a simple solution: >> I can have 5 --with-cpu configure possibilies: >> >> 1. single-lib explicit selection: >> - --with-cpu=sfsparcleon: v7/soft | >> - --with-cpu=sfsparcleonv8 : v8/soft | >> - --with-cpu=hfsparcleon: v7/hard | >> - --with-cpu=hfsparcleonv8 : v8/hard | > > --with-cpu isn't really appropriate for this, we already have > --with-cpu=v7/v8 > and --with-float=soft/hard and --disable-multilib. Still I need to select sparc_cpu and leon.md too. I could then add -mtune=leon at compiletime to switch sparc_cpu, but the I have to give -mtune=leon every time. I would like to be able to make it the default. With just [ --with-cpu=v7/v8 | --with-float=soft/hard | --disable-multilib ] to configure you cant. So then my suggestion would be to use tripple [ --with-cpu=sparcleonv7/sparcleonv8 | --with-float=soft/hard | --disable-multilib ] to configure. And add the 2 cpu types sparcleonv7,sparcleonv8 that would replace v7/v8. Does this sound good? -- Greetings Konrad
Re: Adding Leon processor to the SPARC list of processors
Eric Botcazou wrote: >> CONFIG_FPU_ENABLE Y/N would correspond to --with-float=hard/soft, and >> I believe setting CONFIG_IU_V8MULDIV to Y/N requires --with-cpu=V8/V7, >> is that correct? I think it would make sense to build these as multilibs, >> so the user can experiment to find out performance impacts of >> the various hardware configurations on generated code. >> >> I wonder if it also would be worthwhile to have compiler options >> for fpu=fast/slow and multiply=fast/slow, so we can schedule >> appropriately. For the FPU, issue-rate/latency are as follows: >> GR FPU: 1/4, with FDIV? 16 and FSQRT? 24 cycles, >> non-pipelined on separate unit >> GR FPU Lite: 8/8, with FDIVS/FDIVD/FSQRTS/FSQRTD 31/57/46/57 cycles, >> non-pipelined on same unit > > Let's not make this too complex for a first try, the settings used at AdaCore > seem a good starting point to me. > I Agree
Re: Adding Leon processor to the SPARC list of processors
Hi, Appended is a new patch, this time against svn://gcc.gnu.org/svn/gcc/trunk. Following the recent comments by Eric, the patch now sketches the following setup: If multi-lib is wanted: configure --with-cpu=leon ... : creates multilib-dir soft|v8 combinations using [-msoft-float|-mcpu=sparcleonv8] (MULTILIB_OPTIONS = msoft-float mcpu=sparcleonv8) If Single-lib is wanted: configure --with-cpu=sparcleonv7 --with-float=soft --disable-multilib ... : (v7 | soft | no-multilib) configure --with-cpu=sparcleonv8 --with-float=soft --disable-multilib ... : (v8 | soft | no-multilib) configure --with-cpu=sparcleonv7 --with-float=hard --disable-multilib ... : (v7 | hard | no-multilib) configure --with-cpu=sparcleonv8 --with-float=hard --disable-multilib ... : (v8 | hard | no-multilib) Using --with-cpu=leon|sparcleonv7|sparcleonv8 the the sparc_cpu is switched to PROCESSOR_LEON. If this sheme is ok, i'll test it more thoroughly to check that the various version create the right output... Please comment. -- Greetings Konrad Konrad Eisele wrote: > Hello, > Jiri Gaisler has now signed the FSF copyleft (it took quite long to get > through the procedure) and I was said that I could post the patches > now. > > The patches are straightforward I think. > 1. Adds machine description gcc-4.4.2/gcc/config/sparc/leon.md > 2. gcc-4.4.2.ori/gcc/config/sparc/sparc.c: >+ adds leon_costs struct. >+ 4 target CPUs are added: > sparchfleon : hard float v7 > sparchfleonv8: hard float v8 > sparcsfleon : soft float v7 > sparcsfleonv8: soft float v8 >+ 1 cpu type: PROCESSOR_LEON > that is called "leon" in sparc.md > 3. gcc-4.4.2.ori/gcc/config/sparc/sparc.h: >add the 4 target cpu defines > 4. gcc-4.4.2.ori/gcc/config/sparc/sparc.md: >define cpu "leon" and include "leon.md" > 5. gcc-4.4.2/gcc/config/sparc/t-leon: >makefile template for leon > 6. gcc-4.4.2/gcc/config.gcc: >include t-leon for sparc[sf|hf]leon[v8]. > > They dont interfere with current code. If I should change something, > please let me know or maybe here is something I didnt think of... > >> Leon is a conforming implementation of the SPARC V7/V8 architecture so it >> should be possible to support it alongside the other SPARC implementations in >> the SPARC back-end of the mainline compiler. I'd be happy to review patches >> to this effect (and I presume the other SPARC maintainers are OK with this). >> >> So I'd suggest that Luís Vitório and/or Konrad do the required paperwork, and >> then start to post their patches on the gcc-patches@ list. I'll sponsor them >> for write access at that point. >> >> -- Eric Botcazou > > I come back to the offer of Eric: if the patches are approved I'd be > greatfull if you could check them in. > > -- Thanks Konrad > > > > To verify (if someone is interested): > I have created a crosstool-ng based install script that will build the 4 > sparc-leon cross-compilers: > > $wget ftp://gaisler.com/gaisler.com/linux/linuxbuild/linuxbuild-0.0.3.tar.bz2 > $tar xvf linuxbuild-0.0.3.tar.bz2 > $cd linuxbuild-0.0.3 > $make help > $make cts > > This will create /opt/sparc-linux-toolchains/{hfleon,hfleonv8,sfleon,sfleonv8} > (Write premissions needed for /opt/sparc-linux-toolchains/). > > The crosstool-ng script uses --with-cpu=sparc[sf|hf]leon[v8] to select > the desired proc type. > > > > > Index: gcc/gcc/config.gcc === --- gcc/gcc/config.gcc (revision 167027) +++ gcc/gcc/config.gcc (working copy) @@ -3437,6 +3437,9 @@ | v9 | ultrasparc | ultrasparc3 | niagara | niagara2) # OK ;; + sparcleonv7 | sparcleonv8 | leon) +tmake_file="${tmake_file} sparc/t-leon" +;; *) echo "Unknown cpu used in --with-$which=$val" 1>&2 exit 1 Index: gcc/gcc/config/sparc/sparc.md === --- gcc/gcc/config/sparc/sparc.md (revision 167027) +++ gcc/gcc/config/sparc/sparc.md (working copy) @@ -103,6 +103,7 @@ "v7, cypress, v8, + leon, supersparc, sparclite,f930,f934, hypersparc,sparclite86x, @@ -344,6 +345,7 @@ (include "ultra3.md") (include "niagara.md") (include "niagara2.md") +(include "leon.md") ;; Operand and operator predicates and constraints Index: gcc/gcc/config/sparc/sparc.c === --- gcc/gcc/config/sparc/sparc.c (revision 167027) +++ gcc/gcc/config/sparc/sparc.c (working copy) @@ -249,6 +249,30 @@ 0, /* shift penalty */ }; +static const +struct processor_costs leon_costs = { + COSTS_N_INSNS (1), /* int load */ + COSTS_N_INSNS (1), /* int signed load */ + COSTS_N_INSNS (1), /* int zeroed load */ + COSTS_N_INSNS (1), /* float load */ + COSTS_N_INSNS (1), /* fmov, fneg, fabs */ + COSTS_N_INSNS (1), /* fadd, fsub */ + COSTS_N_INSNS (1), /* fcmp */ + COSTS_N_INSNS (1),
more robust debug_bb?
Hello All, While debugging a MELT pass, I am sigsegv in debug_bb. The culprit is check_bb_profile which starts with if (profile_status == PROFILE_ABSENT) return; and we have in basic-block.h #define profile_status (cfun->cfg->x_profile_status) and unfortunately, my pass is a SIMPLE IPA pass so don't have any cfun. Would a patch defining #define profile_status (cfun?(cfun->cfg->x_profile_status):PROFILE_ABSENT) be acceptable? Cheers. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} ***
Re: more robust debug_bb?
On Mon, Nov 22, 2010 at 4:43 PM, Basile Starynkevitch wrote: > Hello All, > > While debugging a MELT pass, I am sigsegv in debug_bb. > > The culprit is check_bb_profile which starts with > if (profile_status == PROFILE_ABSENT) > return; > and we have in basic-block.h > #define profile_status (cfun->cfg->x_profile_status) > > and unfortunately, my pass is a SIMPLE IPA pass so don't have any cfun. > > Would a patch defining > > #define profile_status (cfun?(cfun->cfg->x_profile_status):PROFILE_ABSENT) > > be acceptable? Huh, no. Just watch were you are calling debug_bb from (or even better, rewrite it with python). Richard. > Cheers. > -- > Basile STARYNKEVITCH http://starynkevitch.net/Basile/ > email: basilestarynkevitchnet mobile: +33 6 8501 2359 > 8, rue de la Faiencerie, 92340 Bourg La Reine, France > *** opinions {are only mines, sont seulement les miennes} *** >
Re: more robust debug_bb?
On Mon, 22 Nov 2010 17:28:21 +0100 Richard Guenther wrote: > On Mon, Nov 22, 2010 at 4:43 PM, Basile Starynkevitch > wrote: > > Hello All, > > > > While debugging a MELT pass, I am sigsegv in debug_bb. > > > > The culprit is check_bb_profile which starts with > > if (profile_status == PROFILE_ABSENT) > > return; > > and we have in basic-block.h > > #define profile_status (cfun->cfg->x_profile_status) > > > > and unfortunately, my pass is a SIMPLE IPA pass so don't have any cfun. > > > > Would a patch defining > > > > #define profile_status (cfun?(cfun->cfg->x_profile_status):PROFILE_ABSENT) > > > > be acceptable? > > Huh, no. Just watch were you are calling debug_bb from (or > even better, rewrite it with python). I'm calling debug_bb from MELT code, so python is not really possible (and I guess you mean python inside gdb). [the details are probably in comments inside the gcc/testsuite/melt/topengpu-1.c file of the MELT branch and the debug_bb is called from opengpudetect_exec function inside gcc/melt/xtramelt-opengpu.melt rev 167035 of the melt branch] My feeling is that debug printing routines are not mainly for gurus like Richard Guenther or Diego Novillo or any other global reviewer or top GCC expert (I would imagine neither Richard nor Diego needs them or uses them), but for newbies. And newbies make bugs. We can't change that sad fact (unless you consider unethical solutions like killing newbies, but I am against such solutions :-) because I might be the victim!). And debug printing is a common way to help find bugs. Also, by definition, debug printing (& also dump printing) routines are never called when GCC is used normally. They are only useful for people hunting bugs (e.g. inside their plugins) or for newbies trying to understand the internals of GCC. A huge "production" compilation (e.g. by people compiling the kernel, LibreOffice, Mozilla or Google proprietary code) never call any debug_* debug-printing or dump_* dump-printing routine. But people (e.g. newbies) making bugs are, perhaps naively, expecting the debug printing routines to be robust, and not crash when given data comming from inside GCC. Or can we consider adding something like if (!cfun) return; at start of check_bb_profile () in cfg.c, or at least replacing the last check_bb_profile (bb, buffer->buffer->stream); in function dump_bb_header of gimple-pretty-print.c with if (cfun) check_bb_profile (bb, buffer->buffer->stream); If we require a valid cfun from debug_bb, we should at least add a gcc_assert in it. BTW, I am just trying to understand how to code a SIMPLE IPA pass (and where to place it). Please GCC gurus, accept the sad fact that some people (me included) understand GCC less than you, and precisely these people need debug printing. You probably don't need these routines, they are to help newbies! So they need to be reasonably robust. Don't forget that newbies make bugs, more than you do. Otherwise, please add at least a comment in header files explaining when a debug_* or dump_* routine can be used (and even more when it cannot be used)! I am not able to explain that! Cheers. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***
Re: more robust debug_bb?
Basile Starynkevitch writes: > or at least replacing the last > check_bb_profile (bb, buffer->buffer->stream); > in function dump_bb_header of gimple-pretty-print.c with > if (cfun) > check_bb_profile (bb, buffer->buffer->stream); I think something like this is the way to go. The debugging code should be as robust as possible. Ian
Re: more robust debug_bb?
On Mon, Nov 22, 2010 at 12:31, Ian Lance Taylor wrote: > Basile Starynkevitch writes: > >> or at least replacing the last >> check_bb_profile (bb, buffer->buffer->stream); >> in function dump_bb_header of gimple-pretty-print.c with >> if (cfun) >> check_bb_profile (bb, buffer->buffer->stream); > > I think something like this is the way to go. The debugging code should > be as robust as possible. Agreed. Diego.
Method to disable code SSE2 generation but still use -msse2
My software implementation of SSE2 now passes all the testsuite programs. In case anybody else ever needs this, it is here: http://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/soft_emmintrin.h I compiled that with a target program and gprof showed all the time in resulting binary in the inlined functions. It ran about 4X slower than the SSE2 hardware version, which is about what I expected. So, so far so good. What I am worried about now is that since it was invoked with "-msse2" the compiler may still be generating SSE2 calls within the inlined functions. Is there a way to definitively disable this but still retain -msse2 on the command line? For instance, here is one of the software version inline functions: /* vector subtract the two doubles in an __m128d */ static __inline __m128d __attribute__((__always_inline__)) _mm_sub_pd (__m128d __A, __m128d __B) { return (__m128d)((__v2df)__A - (__v2df)__B); } In the original gcc emmintrin.h that called a builtin _explicitly_. I also want to avoid having the compiler use the same builtin _implicitly_. If it uses SSE, 3DNOW or MMX implicitly, in this example, that would be fine, it just cannot use any SSE2 hardware. Actually, one thing I was never very clear on, do -msse2 -m3dnow etc. only provide access to the corresponding machine operations through the _mm* (or whatever) definitions in the header file, or does the compiler also figure out vector operations by itself during the optimization phase of compilation? Thank you, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
Re: Method to disable code SSE2 generation but still use -msse2
"David Mathog" writes: > I compiled that with a target program and gprof showed > all the time in resulting binary in the inlined functions. It ran about > 4X slower than the SSE2 hardware version, which is about what I > expected. So, so far so good. What I am worried about now is that > since it was invoked with "-msse2" the compiler may still be generating > SSE2 calls within the inlined functions. Is there a way to definitively > disable this but still retain -msse2 on the command line? No. If I understand what you are doing, I don't think you want to use -msse2 at all. In fact I think you want -mno-sse2. > Actually, one thing I was never very clear on, do -msse2 -m3dnow > etc. only provide access to the corresponding machine operations through > the _mm* (or whatever) definitions in the header file, or does the > compiler also figure out vector operations by itself during the > optimization phase of compilation? The latter: the compiler also figures out vector operations by itself, particularly if you use the -ftree-vectorize option. Ian
Re: Method to disable code SSE2 generation but still use -msse2
As Ian said, you want to make your emulation inline functions available when __SSE2__ macro is not defined so that you get the definitions when -msse2 is not specified, but not getting them when -msse2 is specified. In the future, gcc may be enhanced to exposed those mm intrinsics unconditionally (regardless of weather -mssex is defined or not), you may have a problem here due to name conflicts... David On Mon, Nov 22, 2010 at 2:33 PM, David Mathog wrote: > My software implementation of SSE2 now passes all the testsuite > programs. In case anybody else ever needs this, it is here: > > http://saf.bio.caltech.edu/pub/software/linux_or_unix_tools/soft_emmintrin.h > > I compiled that with a target program and gprof showed > all the time in resulting binary in the inlined functions. It ran about > 4X slower than the SSE2 hardware version, which is about what I > expected. So, so far so good. What I am worried about now is that > since it was invoked with "-msse2" the compiler may still be generating > SSE2 calls within the inlined functions. Is there a way to definitively > disable this but still retain -msse2 on the command line? > > For instance, here is one of the software version inline functions: > > /* vector subtract the two doubles in an __m128d */ > static __inline __m128d __attribute__((__always_inline__)) > _mm_sub_pd (__m128d __A, __m128d __B) > { > return (__m128d)((__v2df)__A - (__v2df)__B); > } > > In the original gcc emmintrin.h that called a builtin _explicitly_. I > also want to avoid having the compiler use the same builtin > _implicitly_. If it uses SSE, 3DNOW or MMX implicitly, in this example, > that would be fine, it just cannot use any SSE2 hardware. > > Actually, one thing I was never very clear on, do -msse2 -m3dnow > etc. only provide access to the corresponding machine operations through > the _mm* (or whatever) definitions in the header file, or does the > compiler also figure out vector operations by itself during the > optimization phase of compilation? > > Thank you, > > David Mathog > mat...@caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech >
Re: Method to disable code SSE2 generation but still use -msse2
Ian Lance Taylor wrote: > No. If I understand what you are doing, I don't think you want to use > -msse2 at all. In fact I think you want -mno-sse2. Following your suggestion mo-sse2 was tried, which generated an error message well beyond my comprehension: gcc -std=gnu99 -g -pg -pthread -O4 -DSOFT_SSE2 -msse -mno-sse2 -DHAVE_CONFIG_H -I../../easel -I../../easel -I. -I.. -I. -I../../src -o msvfilter.o -c msvfilter.c msvfilter.c: In function 'p7_MSVFilter': msvfilter.c:208: error: unable to find a register to spill in class 'GENERAL_REGS' msvfilter.c:208: error: this is the insn: (insn:HI 3569 3568 3570 302 ../../easel/emmintrin.h:2334 (set (strict_low_part (subreg:HI (reg:TI 1514) 0)) (mem:HI (plus:SI (reg/f:SI 20 frame) (const_int -30 [0xffe2])) [14 S2 A16])) 40 {*movstricthi_1} (insn_list:REG_DEP_TRUE 3568 (nil)) (nil)) msvfilter.c:208: confused by earlier errors, bailing out make: *** [msvfilter.o] Error 1 line 208 in msvfilter.c is the closing "}" on the p7_MSVFilter function. line 2334 in emmintrin.h is the return statement in the snippet below static __inline __m128i __attribute__((__always_inline__)) _mm_shufflelo_epi16(__m128i __A, int __B){ __v8hi __tmp = { EMM_UINT2(__A)[__B& 3], EMM_UINT2(__A)[__B>>2 & 3], EMM_UINT2(__A)[__B>>4 & 3], EMM_UINT2(__A)[__B>>6 & 3], EMM_UINT2(__A)[4], EMM_UINT2(__A)[5], EMM_UINT2(__A)[6], EMM_UINT2(__A)[7]}; return (__m128i) __tmp; } where HMM_UINT2 is this: #define EMM_UINT2(a) ((unsigned short *)&(a)) If -mno-sse2 is changed to -msse2 that compile completes without errors or warnings. gcc --version is: gcc (GCC) 4.2.3 (4.2.3-6mnb1) What does that compiler error mean? Thanks, David Mathog mat...@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
Re: Method to disable code SSE2 generation but still use -msse2
"David Mathog" writes: > Following your suggestion mo-sse2 was tried, which generated an error > message well beyond my comprehension: > > gcc -std=gnu99 -g -pg -pthread -O4 -DSOFT_SSE2 -msse -mno-sse2 > -DHAVE_CONFIG_H -I../../easel -I../../easel -I. -I.. -I. -I../../src -o > msvfilter.o -c msvfilter.c > msvfilter.c: In function 'p7_MSVFilter': > msvfilter.c:208: error: unable to find a register to spill in class > 'GENERAL_REGS' > msvfilter.c:208: error: this is the insn: > (insn:HI 3569 3568 3570 302 ../../easel/emmintrin.h:2334 (set > (strict_low_part (subreg:HI (reg:TI 1514) 0)) > (mem:HI (plus:SI (reg/f:SI 20 frame) > (const_int -30 [0xffe2])) [14 S2 A16])) 40 > {*movstricthi_1} (insn_list:REG_DEP_TRUE 3568 (nil)) > (nil)) > msvfilter.c:208: confused by earlier errors, bailing out > make: *** [msvfilter.o] Error 1 This means that gcc was unable to store a __m128i value in the general purpose registers. It did not want to use the SSE2 registers because you ruled out -msse2, which I assume is correct behaviour for what you are trying to do. It does seem likely that SSE2 code will stress out the register allocator if it can't use the SSE2 registers. That said, I don't know offhand whether this is a bug or whether the scenario is simply impossible to implement. Ian