Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Keith Whitwell
On Thu, 2011-06-30 at 17:53 +0200, Roland Scheidegger wrote: > Am 30.06.2011 16:14, schrieb Adam Jackson: > > On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: > >> Ok in fact there's a gcc bug about memcmp: > >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 > >> In short gcc's memcm

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Roland Scheidegger
Am 30.06.2011 16:14, schrieb Adam Jackson: > On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: >> Ok in fact there's a gcc bug about memcmp: >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 >> In short gcc's memcmp builtin is totally lame and loses to glibc's >> memcmp (including cal

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Roland Scheidegger
Am 30.06.2011 12:14, schrieb Jose Fonseca: > > > - Original Message - >> Hmm. >> Forgive my ignorance, but isn't memcmp() on structs pretty prone to >> give >> incorrect != results, given that there may be padding between members >> in >> structs and that IIRC gcc struct assignment is mem

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Adam Jackson
On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: > Ok in fact there's a gcc bug about memcmp: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 > In short gcc's memcmp builtin is totally lame and loses to glibc's > memcmp (including call overhead, no knowledge about alignment etc.) ev

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Keith Whitwell
On Thu, 2011-06-30 at 03:27 -0700, Jose Fonseca wrote: > > - Original Message - > > On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: > > > Ok in fact there's a gcc bug about memcmp: > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 > > > In short gcc's memcmp builtin is t

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Jose Fonseca
- Original Message - > On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: > > Ok in fact there's a gcc bug about memcmp: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 > > In short gcc's memcmp builtin is totally lame and loses to glibc's > > memcmp (including call overhe

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Jose Fonseca
- Original Message - > Hmm. > Forgive my ignorance, but isn't memcmp() on structs pretty prone to > give > incorrect != results, given that there may be padding between members > in > structs and that IIRC gcc struct assignment is member-wise. There's no alternative to bitwise comparison

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Jose Fonseca
Great work Roland! And thanks Ajax to finding this hot spot. We use memcmp a lot -- all CSO caching, so we should use this everywhere. We should also code a sse2 version with intrinsics for x86-64, which is guaranteed to always have SSE2. Jose - Original Message - > Actually I ran some

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Keith Whitwell
On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote: > Ok in fact there's a gcc bug about memcmp: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 > In short gcc's memcmp builtin is totally lame and loses to glibc's > memcmp (including call overhead, no knowledge about alignment etc.) ev

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Keith Whitwell
On Wed, 2011-06-29 at 16:16 -0700, Corbin Simpson wrote: > Okay, so maybe I'm failing to recognize the exact situation here, but > wouldn't it be possible to mark the FS state with a serial number and > just compare those? Or are these FS states not CSO-cached? No, the struct being compared is poo

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-30 Thread Thomas Hellstrom
Hmm. Forgive my ignorance, but isn't memcmp() on structs pretty prone to give incorrect != results, given that there may be padding between members in structs and that IIRC gcc struct assignment is member-wise. What happens if there's padding between the jit_context and variant members of str

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-29 Thread Roland Scheidegger
Ok in fact there's a gcc bug about memcmp: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052 In short gcc's memcmp builtin is totally lame and loses to glibc's memcmp (including call overhead, no knowledge about alignment etc.) even when comparing only very few bytes (and loses BIG time for lots of

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-29 Thread Roland Scheidegger
I didn't even look at that was just curious why the memcmp (which is used a lot in other places) is slow. However, none of the other memcmp seem to show up prominently (cso functions are quite low in profiles, _mesa_search_program_cache uses memcmp too but it's not that high neither). So I guess th

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-29 Thread Corbin Simpson
Okay, so maybe I'm failing to recognize the exact situation here, but wouldn't it be possible to mark the FS state with a serial number and just compare those? Or are these FS states not CSO-cached? ~ C. On Wed, Jun 29, 2011 at 3:44 PM, Roland Scheidegger wrote: > Actually I ran some numbers her

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-29 Thread Roland Scheidegger
Actually I ran some numbers here and tried out a optimized struct compare: original ipers: 12.1 fps ajax patch: 15.5 fps optimized struct compare: 16.8 fps This is the function I used for that (just enabled in that lp_setup function): static INLINE int util_cmp_struct(const void *src1, const voi

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-29 Thread Roland Scheidegger
Ohh that's interesting, you'd think the comparison shouldn't be that expensive (though I guess in ipers case the comparison is never true). memcmp is quite extensively used everywhere. Maybe we could replace that with something faster (since we only ever care if the blocks are the same but not care

Re: [Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-29 Thread Keith Whitwell
On Wed, 2011-06-29 at 13:19 -0400, Adam Jackson wrote: > Perversely, do this by eliminating the comparison between stored and > current fs state. On ipers, a perf trace showed try_update_scene_state > using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp. > Taking that out takes

[Mesa-dev] [PATCH] llvmpipe: Optimize new fs state setup

2011-06-29 Thread Adam Jackson
Perversely, do this by eliminating the comparison between stored and current fs state. On ipers, a perf trace showed try_update_scene_state using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp. Taking that out takes try_update_scene_state down to 6.5% of the profile; more import