vcond implementation in altivec

2007-02-27 Thread Ira Rosen

Hi,

We were looking at the implementation of vcond for altivec and we have a
couple of questions.

vcond has 6 operands, rs6000_emit_vector_cond_expr is called from
define_expand for "vcond". It gets those operands in their original
order, as in vcond, and emits  op0 = (op4 cond op5 ? op1 : op2), where cond
is op3.

Here is vcond for vector short (vconduv8hi, vcondv16qi, and vconduv16qi are
similar):
(define_expand "vcondv8hi"
 [(set (match_operand:V4SF 0 "register_operand" "=v")
   (unspec:V8HI [(match_operand:V4SI 1 "register_operand" "v")
(match_operand:V8HI 2 "register_operand" "v")
(match_operand:V8HI 3 "comparison_operator" "")
(match_operand:V8HI 4 "register_operand" "v")
(match_operand:V8HI 5 "register_operand" "v")
] UNSPEC_VCOND_V8HI))]
 "TARGET_ALTIVEC"
 "
 {
 if (rs6000_emit_vector_cond_expr (operands[0], operands[1],
operands[2],
   operands[3], operands[4],
operands[5]))
 DONE;
 else
 FAIL;
 }
 ")
Is there a reason why op0 is V4SF and op1 is V4SI (and not V8HI)?


In V4SF, op1 is V4SI:
(define_expand "vcondv4sf"
[(set (match_operand:V4SF 0 "register_operand" "=v")
  (unspec:V4SF [(match_operand:V4SI 1 "register_operand" "v")
   (match_operand:V4SF 2 "register_operand" "v")
   (match_operand:V4SF 3 "comparison_operator" "")
   (match_operand:V4SF 4 "register_operand" "v")
   (match_operand:V4SF 5 "register_operand" "v")
   ] UNSPEC_VCOND_V4SF))]
"TARGET_ALTIVEC"
"
{
if (rs6000_emit_vector_cond_expr (operands[0], operands[1],
operands[2],
  operands[3], operands[4],
operands[5]))
DONE;
else
FAIL;
}
")
Same question: is there a reason for op1 to be V4SI?

And also, why not use if_then_else instead of unspec (in all vcond's)?

Thanks,
Sa and Ira



Re: I need some advice for x86_64-pc-mingw32 va_list calling convention (in i386.c)

2007-02-27 Thread Kai Tietz
Thank you,

I allready adjusted my code. But by the reason of the need to define 
OUTGOING_REG_PARM_STACK_SPACE for this target, I had to change the general 
definition of it to be target specific. In front is was defined or -not-. 
Now it is getting defined to the default value of 0 for targets not 
defining it, one for targets allready using this define, and for 
i386-target specific to the MS x86_64-mingw32 target. Otherwise, it 
wouldn't be possible to have the x86_64 and the i386 compiler in one 
executable anymore.

Regards,
 i.A. Kai Tietz


  Kai Tietz - Software engineering
  OneVision Software Entwicklungs GmbH & Co KG
  Dr.-Leo-Ritter-Str. 9, 93049 Regensburg, Germany
  Phone: +49-941-78004-0
  FAX:   +49-941-78004-489
  WWW:   http://www.OneVision.com



Richard Henderson <[EMAIL PROTECTED]> 
26.02.2007 19:12

To
Kai Tietz <[EMAIL PROTECTED]>
cc
"Menezes, Evandro" <[EMAIL PROTECTED]>, gcc 
Subject
Re: I need some advice for x86_64-pc-mingw32 va_list calling convention 
(in  i386.c)






On Mon, Feb 26, 2007 at 09:40:59AM +0100, Kai Tietz wrote:
> So is there allready a mechanism in gcc, by whom I can reserve for all 
> methods simple space on stack for the 4 used register parameters, even 
if 
> they are not used for argument passing ?

See sparc.h.


r~





Re: I need some advice for x86_64-pc-mingw32 va_list calling convention (in i386.c)

2007-02-27 Thread Andrew Pinski

On 2/27/07, Kai Tietz <[EMAIL PROTECTED]> wrote:

Thank you,

I allready adjusted my code. But by the reason of the need to define
OUTGOING_REG_PARM_STACK_SPACE for this target, I had to change the general
definition of it to be target specific.


This is why the uses of OUTGOING_REG_PARM_STACK_SPACE should really be
changed to be if(OUTGOING_REG_PARM_STACK_SPACE) instead of #ifdef
OUTGOING_REG_PARM_STACK_SPACE.

There was an email a while back from Zack Weinberg and Hans-Peter
Nilsson which talks explictly about how chaning this has saved
maintaining and finding bugs in the compiler.


And I thought this was really part of our coding style too but I
cannot find it on http://gcc.gnu.org/codingconventions.html .

Thanks,
Andrew Pinski


vsftpd 2.0.5 vs. gcc 4.1.2

2007-02-27 Thread BuraphaLinux Server

When attempting to build vsftpd-2.0.5 from http://vsftpd.beasts.org/
on my linux system I get this:

...
gcc -c sysutil.c -O2 -Wall -W -Wshadow -march=i586 -mtune=i686 -idirafter
dummyinc
sysutil.c: In function 'vsf_sysutil_wait_exited_normally':
sysutil.c:604: error: assignment of read-only member '__in'
sysutil.c: In function 'vsf_sysutil_wait_get_exitcode':
sysutil.c:614: error: assignment of read-only member '__in'
make: *** [sysutil.o] Error 1

This is with gcc 4.1.2 compiled from source on a linux system.

Does anybody have a patch or know the trick to fix this?

BLS


Re: vsftpd 2.0.5 vs. gcc 4.1.2

2007-02-27 Thread Florian Weimer
* BuraphaLinux Server:

> Does anybody have a patch or know the trick to fix this?

Debian has got a patch.  I think the error message is wrong, it's a
const mismatch in pointer conversion, not an actual assignment.


Re: vcond implementation in altivec

2007-02-27 Thread Devang Patel

Is there a reason why op0 is V4SF


It is destination so, yes this is wrong.


and op1 is V4SI (and not V8HI)?


condition should be v4si, but it is not op1. So this is also not correct.


And also, why not use if_then_else instead of unspec (in all vcond's)?


I did not try that path. May be I did not know about it at that time.

-
Devang


RE: spec2k comparison of gcc 4.1 and 4.2 on AMD K8

2007-02-27 Thread Menezes, Evandro
Honza, 

> Well, rather than unstable, they seems to be more memory layout
> sensitive I would say. (the differences are more or less reproducible,
> not completely random, but independent on the binary itself. I can't
> think of much else than memory layout to cause it).  I always wondered
> if things like page coloring have chance to reduce this noise, but I
> never actually got around trying it.

You didn't mention the processors in your systems, but I wonder if they are 
dual-core.  If so, perhaps it's got to do with the fact that each K8 core has 
its own L2, whereas C2 chips have a shared L2.  Then, try preceding "runspec" 
with "taskset 0x02" to avoid the process from hopping between cores and finding 
cold caches (though the kernel strives to stick a process to a single core, 
it's not perfect).

HTH

-- 
___
Evandro Menezes   AMDAustin, TX





Re: vsftpd 2.0.5 vs. gcc 4.1.2

2007-02-27 Thread Andrew Pinski
> 
> * BuraphaLinux Server:
> 
> > Does anybody have a patch or know the trick to fix this?
> 
> Debian has got a patch.  I think the error message is wrong, it's a
> const mismatch in pointer conversion, not an actual assignment.

Actually it is a bug in glibc's header with WIFEXITED, WEXITSTATUS, etc.

See http://sourceware.org/bugzilla/show_bug.cgi?id=1392 .

Thanks,
Andrew Pinski



Re: vsftpd 2.0.5 vs. gcc 4.1.2

2007-02-27 Thread Florian Weimer
* Andrew Pinski:

>> 
>> * BuraphaLinux Server:
>> 
>> > Does anybody have a patch or know the trick to fix this?
>> 
>> Debian has got a patch.  I think the error message is wrong, it's a
>> const mismatch in pointer conversion, not an actual assignment.
>
> Actually it is a bug in glibc's header with WIFEXITED, WEXITSTATUS, etc.
>
> See http://sourceware.org/bugzilla/show_bug.cgi?id=1392 .

Ah, I was looking at the half-fixed headers which still suffer from
the -Wcast-qual issue, but have the assigned fixed.  Thanks for
setting me straight.


Re: spec2k comparison of gcc 4.1 and 4.2 on AMD K8

2007-02-27 Thread nick

NUMA support did strike me as a possible cause.

I thought that L2 caches on the Opteron communicated by I assume by your 
response the  Opteron memory controller doesn't allow cache propagation, 
instead invalidates the cache entries read (assuming again the write 
entries are handled differently).


Menezes, Evandro wrote:
Honza, 

  

Well, rather than unstable, they seems to be more memory layout
sensitive I would say. (the differences are more or less reproducible,
not completely random, but independent on the binary itself. I can't
think of much else than memory layout to cause it).  I always wondered
if things like page coloring have chance to reduce this noise, but I
never actually got around trying it.



You didn't mention the processors in your systems, but I wonder if they are dual-core.  If so, 
perhaps it's got to do with the fact that each K8 core has its own L2, whereas C2 chips have a 
shared L2.  Then, try preceding "runspec" with "taskset 0x02" to avoid the 
process from hopping between cores and finding cold caches (though the kernel strives to stick a 
process to a single core, it's not perfect).

HTH

  




RE: spec2k comparison of gcc 4.1 and 4.2 on AMD K8

2007-02-27 Thread Menezes, Evandro
Nick, 

> I thought that L2 caches on the Opteron communicated by I 
> assume by your 
> response the  Opteron memory controller doesn't allow cache 
> propagation, 
> instead invalidates the cache entries read (assuming again the write 
> entries are handled differently).

You're half right.  The caches on the same processor do have a fast path 
between them, but the fact still remains that an L2 cache miss plus the cache 
coherency protocol overhead is far slower than an L2 cache hit.  Bottom line: 
process migration is bad.

HTH

-- 
___
Evandro Menezes   AMDAustin, TX





Re: spec2k comparison of gcc 4.1 and 4.2 on AMD K8

2007-02-27 Thread Richard Guenther

On 2/27/07, Menezes, Evandro <[EMAIL PROTECTED]> wrote:

Honza,

> Well, rather than unstable, they seems to be more memory layout
> sensitive I would say. (the differences are more or less reproducible,
> not completely random, but independent on the binary itself. I can't
> think of much else than memory layout to cause it).  I always wondered
> if things like page coloring have chance to reduce this noise, but I
> never actually got around trying it.

You didn't mention the processors in your systems, but I wonder if they are dual-core.  If so, 
perhaps it's got to do with the fact that each K8 core has its own L2, whereas C2 chips have a 
shared L2.  Then, try preceding "runspec" with "taskset 0x02" to avoid the 
process from hopping between cores and finding cold caches (though the kernel strives to stick a 
process to a single core, it's not perfect).


Well, both britten and haydn are single core, two processor systems.  For
SPEC2k6 runs the problem is that the 2gb ram of the machine are
distributed over both numa nodes, so with the memory requirements of
SPEC2k6 we always get inter-node memory traffic.  Vangelis is a single
processor, single core system (and the most stable one).  Any idea on
how to force to use local memory only for a process?

Richard.


RE: spec2k comparison of gcc 4.1 and 4.2 on AMD K8

2007-02-27 Thread Menezes, Evandro
Richard, 

> Well, both britten and haydn are single core, two processor 
> systems.  For
> SPEC2k6 runs the problem is that the 2gb ram of the machine are
> distributed over both numa nodes, so with the memory requirements of
> SPEC2k6 we always get inter-node memory traffic.  Vangelis is a single
> processor, single core system (and the most stable one).  Any idea on
> how to force to use local memory only for a process?

numactl is your friend.  In order to run on the second single-core processor 
(the kernel seems to like the first processor better), preceed "runspec" with:

numactl --physcpubind=1 --membind=1

If SPEC2006 then fails to get all the memory it wants, numactl cannot help, 
only more RAM can I'm afraid.

HTH

-- 
___
Evandro Menezes   AMDAustin, TX





Why does linker fail to resolve dependencies within the same .a file?

2007-02-27 Thread Christian Convey

I'm using CMake to build a library (as a .a file) and a demo program
for that library.


My problem is that when I go to link that demo program, I get a linker
error that says one file function in the .a file can't find another
function in the .a file.  Here's what the linker command line and
error output look like:


Linking CXX executable simpleIO
cd /home/cjc/csc583-svn/uriVisionLib/trunk/SDK/Demos/C++/Basic/Simple_IO
&& /home/cjc/packages/CMake/bin/cmake -P
CMakeFiles/simpleIO.dir/cmake_clean_target.cmake
cd /home/cjc/csc583-svn/uriVisionLib/trunk/SDK/Demos/C++/Basic/Simple_IO
&& /usr/bin/c++  -fPIC "CMakeFiles/simpleIO.dir/main_IO.o"   -o
simpleIO -rdynamic
-L/home/cjc/csc583-svn/uriVisionLib/trunk/Development/Source/C++ -lGL
-lglut -Wl,-Bstatic -luriVision -luriVision -Wl,-Bdynamic
-Wl,-rpath,/home/cjc/csc583-svn/uriVisionLib/trunk/Development/Source/C++
/home/cjc/csc583-svn/uriVisionLib/trunk/Development/Source/C++/liburiVision.a(ImageReader.o):
In function `uriVideoSources::ImageReader::getFrame(bool,
uriBase::RasterImage*)':
ImageReader.cpp:(.text+0x90): undefined reference to
`uriVideoSources::ImageReader_gen::getFrame_(bool,
uriBase::RasterImage*)'


(there are more errors as well, but I figured this was enough to make my point).

I thought that perhaps the supposedly missing function wasn't in the
.a file, so I check with nm as follows:


[EMAIL PROTECTED]:~$ nm --demangle
/home/cjc/csc583-svn/uriVisionLib/trunk/Development/Source/C++/liburiVision.a
| grep outputFrame
002c T uriMovieEditing::ImageWriter::outputFrame(uriBase::RasterImage*)
 T uriMovieEditing::ImageWriter::outputFrame(uriBase::RasterImage*,
bool)
   U uriMovieEditing::ImageWriter_gen::outputFrame_(uriBase::RasterImage*,
bool)



So it appears to be in there.  I also thought that maybe this was one
of the cases where I needed to list the .a file twice on the linker
command line, but the linker invocation / output shown above already
reflects my doing that (I think).


Does anyone know where I might be going wrong here?

Thanks,
Christian


Re: Why does linker fail to resolve dependencies within the same .a file?

2007-02-27 Thread Jonathan Adamczewski

Christian Convey wrote:


In function `uriVideoSources::ImageReader::getFrame(bool, 
uriBase::RasterImage*)':

ImageReader.cpp:(.text+0x90): undefined reference to


If the missing reference is to

`uriVideoSources::ImageReader_gen::getFrame_(bool, 
uriBase::RasterImage*)'


Why do you grep for outputFrame?


j.