On Thu, 2011-06-30 at 17:53 +0200, Roland Scheidegger wrote:
> Am 30.06.2011 16:14, schrieb Adam Jackson:
> > On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
> >> Ok in fact there's a gcc bug about memcmp:
> >> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> >> In short gcc's memcm
Am 30.06.2011 16:14, schrieb Adam Jackson:
> On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
>> Ok in fact there's a gcc bug about memcmp:
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
>> In short gcc's memcmp builtin is totally lame and loses to glibc's
>> memcmp (including cal
Am 30.06.2011 12:14, schrieb Jose Fonseca:
>
>
> - Original Message -
>> Hmm.
>> Forgive my ignorance, but isn't memcmp() on structs pretty prone to
>> give
>> incorrect != results, given that there may be padding between members
>> in
>> structs and that IIRC gcc struct assignment is mem
On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
> Ok in fact there's a gcc bug about memcmp:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> In short gcc's memcmp builtin is totally lame and loses to glibc's
> memcmp (including call overhead, no knowledge about alignment etc.) ev
On Thu, 2011-06-30 at 03:27 -0700, Jose Fonseca wrote:
>
> - Original Message -
> > On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
> > > Ok in fact there's a gcc bug about memcmp:
> > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> > > In short gcc's memcmp builtin is t
- Original Message -
> On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
> > Ok in fact there's a gcc bug about memcmp:
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> > In short gcc's memcmp builtin is totally lame and loses to glibc's
> > memcmp (including call overhe
- Original Message -
> Hmm.
> Forgive my ignorance, but isn't memcmp() on structs pretty prone to
> give
> incorrect != results, given that there may be padding between members
> in
> structs and that IIRC gcc struct assignment is member-wise.
There's no alternative to bitwise comparison
Great work Roland! And thanks Ajax to finding this hot spot.
We use memcmp a lot -- all CSO caching, so we should use this everywhere.
We should also code a sse2 version with intrinsics for x86-64, which is
guaranteed to always have SSE2.
Jose
- Original Message -
> Actually I ran some
On Thu, 2011-06-30 at 03:36 +0200, Roland Scheidegger wrote:
> Ok in fact there's a gcc bug about memcmp:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
> In short gcc's memcmp builtin is totally lame and loses to glibc's
> memcmp (including call overhead, no knowledge about alignment etc.) ev
On Wed, 2011-06-29 at 16:16 -0700, Corbin Simpson wrote:
> Okay, so maybe I'm failing to recognize the exact situation here, but
> wouldn't it be possible to mark the FS state with a serial number and
> just compare those? Or are these FS states not CSO-cached?
No, the struct being compared is poo
Hmm.
Forgive my ignorance, but isn't memcmp() on structs pretty prone to give
incorrect != results, given that there may be padding between members in
structs and that IIRC gcc struct assignment is member-wise.
What happens if there's padding between the jit_context and variant
members of str
Ok in fact there's a gcc bug about memcmp:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43052
In short gcc's memcmp builtin is totally lame and loses to glibc's
memcmp (including call overhead, no knowledge about alignment etc.) even
when comparing only very few bytes (and loses BIG time for lots of
I didn't even look at that was just curious why the memcmp (which is
used a lot in other places) is slow. However, none of the other memcmp
seem to show up prominently (cso functions are quite low in profiles,
_mesa_search_program_cache uses memcmp too but it's not that high
neither). So I guess th
Okay, so maybe I'm failing to recognize the exact situation here, but
wouldn't it be possible to mark the FS state with a serial number and
just compare those? Or are these FS states not CSO-cached?
~ C.
On Wed, Jun 29, 2011 at 3:44 PM, Roland Scheidegger wrote:
> Actually I ran some numbers her
Actually I ran some numbers here and tried out a optimized struct compare:
original ipers: 12.1 fps
ajax patch: 15.5 fps
optimized struct compare: 16.8 fps
This is the function I used for that (just enabled in that lp_setup
function):
static INLINE int util_cmp_struct(const void *src1, const voi
Ohh that's interesting, you'd think the comparison shouldn't be that
expensive (though I guess in ipers case the comparison is never true).
memcmp is quite extensively used everywhere. Maybe we could replace that
with something faster (since we only ever care if the blocks are the
same but not care
On Wed, 2011-06-29 at 13:19 -0400, Adam Jackson wrote:
> Perversely, do this by eliminating the comparison between stored and
> current fs state. On ipers, a perf trace showed try_update_scene_state
> using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp.
> Taking that out takes
Perversely, do this by eliminating the comparison between stored and
current fs state. On ipers, a perf trace showed try_update_scene_state
using 31% of a CPU, and 98% of that was in 'repz cmpsb', ie, the memcmp.
Taking that out takes try_update_scene_state down to 6.5% of the
profile; more import
18 matches
Mail list logo