Re: SSE in libthr

2015-04-14 Thread Eric van Gyzen
Below is an updated patch to incorporate everyone's feedback so far. I recognize all of the counter-arguments, and I agree with them in general. Indeed, as applications use more SIMD, this kind of patch goes in the wrong direction. However, there are applications that do not use enough SSE to off

Re: SSE in libthr

2015-04-06 Thread John Baldwin
On Saturday, March 28, 2015 10:41:48 AM Adrian Chadd wrote: > Ok, so how do we reduce the amount of FPU save and restores, or make > them cheaper? Or make them more useful. If you are using SSE/AVX more often between context switches in ways that are beneficial then that might offset the cost of

Re: SSE in libthr

2015-03-28 Thread Adrian Chadd
Ok, so how do we reduce the amount of FPU save and restores, or make them cheaper? -a ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@f

Re: SSE in libthr

2015-03-28 Thread John-Mark Gurney
Eric van Gyzen wrote this message on Fri, Mar 27, 2015 at 17:43 -0400: > On 03/27/2015 16:49, Rui Paulo wrote: > > > > Regarding your patch, I think we should disable even more, if possible. > > How about: > > > > CFLAGS+=-mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 > > Yes, I was co

Re: SSE in libthr

2015-03-28 Thread David Chisnall
On 28 Mar 2015, at 13:54, Julian Elischer wrote: > > the point is that clang will do this anywhere it can, because it isn't taking > into account the > side effects, just the speed of the commands themselves. This is also something that is not going to decrease. Clang now enables the SLP vect

Re: SSE in libthr

2015-03-28 Thread Julian Elischer
On 3/28/15 5:44 AM, Konstantin Belousov wrote: On Fri, Mar 27, 2015 at 01:49:03PM -0700, Rui Paulo wrote: On Mar 27, 2015, at 12:26, Eric van Gyzen wrote: In a nutshell: Clang emits SSE instructions on amd64 in the common path of pthread_mutex_unlock. This reduces performance by a non-trivia

Re: SSE in libthr

2015-03-28 Thread Konstantin Belousov
On Fri, Mar 27, 2015 at 10:40:57PM +0100, Jilles Tjoelker wrote: > On Fri, Mar 27, 2015 at 03:26:17PM -0400, Eric van Gyzen wrote: > > In a nutshell: > > > Clang emits SSE instructions on amd64 in the common path of > > pthread_mutex_unlock. This reduces performance by a non-trivial > > amount.

Re: SSE in libthr

2015-03-27 Thread Tomoaki AOKI
If SIMD instructions are used for string proceccing, and FPU(AVX) contexts are NOT saved/restored properly on process (thread) switching, possibly processed string is destroyed by other process (thread). Can't it be a security risk? (Broken string parameter for syscalls, etc) If so, FPU (AVX) cont

Re: SSE in libthr

2015-03-27 Thread Tomoaki AOKI
Possibly related information. Recently, I tried to build world/kernel (head, r280410, amd64) with CPUTYPE setting in make.conf. Real CPU is sandybridge (corei7-avx). Running in VirtualBox VM, installworld fails with CPUTYPE?=corei7-avx, while with CPUTYPE?=corei7 everything goes OK. *Rebooting

Re: SSE in libthr

2015-03-27 Thread Adrian Chadd
On 27 March 2015 at 16:03, Alan Somers wrote: > On Fri, Mar 27, 2015 at 4:36 PM, Adrian Chadd wrote: >> hi, >> >> please don't try to microoptimise crap like strlen(). >> >> The TL;DR for performant high-throughput code is: if strlen() or >> memcpy() is the thing that's costing you the most, you'

Re: SSE in libthr

2015-03-27 Thread Alan Somers
On Fri, Mar 27, 2015 at 4:36 PM, Adrian Chadd wrote: > hi, > > please don't try to microoptimise crap like strlen(). > > The TL;DR for performant high-throughput code is: if strlen() or > memcpy() is the thing that's costing you the most, you're doing it > wrong. > > > > -adrian I respectfully di

Re: SSE in libthr

2015-03-27 Thread Adrian Chadd
hi, please don't try to microoptimise crap like strlen(). The TL;DR for performant high-throughput code is: if strlen() or memcpy() is the thing that's costing you the most, you're doing it wrong. -adrian ___ freebsd-current@freebsd.org mailing list

Re: SSE in libthr

2015-03-27 Thread Eric van Gyzen
On 03/27/2015 16:49, Rui Paulo wrote: > > Regarding your patch, I think we should disable even more, if possible. How > about: > > CFLAGS+=-mno-mmx -mno-3dnow -mno-sse -mno-sse2 -mno-sse3 Yes, I was considering copying all of the similar flags that we use in the kernel. That seems wise.

Re: SSE in libthr

2015-03-27 Thread Konstantin Belousov
On Fri, Mar 27, 2015 at 01:49:03PM -0700, Rui Paulo wrote: > On Mar 27, 2015, at 12:26, Eric van Gyzen wrote: > > > > In a nutshell: > > > > Clang emits SSE instructions on amd64 in the common path of > > pthread_mutex_unlock. This reduces performance by a non-trivial amount. > > I'd > > like

Re: SSE in libthr

2015-03-27 Thread Jilles Tjoelker
On Fri, Mar 27, 2015 at 03:26:17PM -0400, Eric van Gyzen wrote: > In a nutshell: > Clang emits SSE instructions on amd64 in the common path of > pthread_mutex_unlock. This reduces performance by a non-trivial > amount. I'd like to disable SSE in libthr. How about saving and restoring the FPU/SS

Re: SSE in libthr

2015-03-27 Thread Rui Paulo
On Mar 27, 2015, at 12:26, Eric van Gyzen wrote: > > In a nutshell: > > Clang emits SSE instructions on amd64 in the common path of > pthread_mutex_unlock. This reduces performance by a non-trivial amount. I'd > like to disable SSE in libthr. > > In more detail: > > In libthr/thread/thr_mute

Re: SSE in libthr

2015-03-27 Thread Daniel Eischen
On Fri, 27 Mar 2015, Eric van Gyzen wrote: In a nutshell: Clang emits SSE instructions on amd64 in the common path of pthread_mutex_unlock. This reduces performance by a non-trivial amount. I'd like to disable SSE in libthr. This makes sense to me. -- DE ___

Re: SSE in libthr

2015-03-27 Thread Adrian Chadd
Wow. I remember seeing this in the work application - all packet pushing in userland, but there are locks being acquired. I was wondering what exactly was triggering the FPU save/restore code. Now I know. Yes, if there are no other objections, I'd love to see this in -HEAD and stable/10. -adrian