I've changed the scons to always build with -fno-builtin-memcmp. Jose
----- Original Message ----- > On Tue, 2011-09-20 at 16:35 +0200, Roland Scheidegger wrote: > > Am 20.09.2011 16:15, schrieb Keith Whitwell: > > > On Tue, 2011-09-20 at 16:02 +0200, Roland Scheidegger wrote: > > >> Am 20.09.2011 12:35, schrieb Keith Whitwell: > > >>> On Tue, 2011-09-20 at 10:59 +0200, Fabio wrote: > > >>>> There was a discussion some months ago about using > > >>>> -fno-builtin-memcmp for > > >>>> improving memcmp performance: > > >>>> http://lists.freedesktop.org/archives/mesa-dev/2011-June/009078.html > > >>>> > > >>>> Since then, was it properly addressed in mesa or the flag is > > >>>> still > > >>>> recommended? If so, what about adding it in configure.ac? > > >>> > > >>> I've been meaning to follow up on this too. I don't know the > > >>> answer, > > >>> but pinging Roland in case he does. > > >> > > >> I guess it is still recommended. > > >> Ideally this is really something which should be fixed in gcc - > > >> the > > >> compiler has all the knowledge about fixed alignment and size > > >> (if any) > > >> (and more importantly knows if only a binary answer is needed > > >> which > > >> makes this much easier) and doesn't need to do any function > > >> call. > > >> If you enable that flag and some platform just has the same > > >> primitive > > >> repz cmpsb sequence in the system library it will just get even > > >> slower, > > >> though I guess chances of that happening are slim (with the > > >> possible > > >> exception of windows). > > >> I think in most cases it won't make much difference, so nobody > > >> cared to > > >> implement that change. It is most likely still a good idea > > >> unless gcc > > >> addressed that in the meantime... > > > > > > Hmm, it seemed like it made a big difference in the earlier > > > discussion... > > Yes for llvmpipe and one app at least. > > But that struct being compared there is most likely the biggest (by > > far) > > anywhere (at least which is compared in a regular fashion). > > > > > I should take a look at reducing the size of the struct (as > > > mentioned > > > before), but surely there's some way to pull in a better memcmp?? > > > > Well, apart from using -fno-builtin-memcmp we could build our own > > memcmpxx, though the version I did there (returning binary only > > result > > and assuming 32bit alignment/size allowing gcc to optimize it) was > > still > > slower for large sizes than -fno-builtin-memcmp. Of course we could > > optimize it more (e.g. for 64bit aligned/sized things, or using > > hand-coded sse2 versions using 128bit at-a-time comparisons) but > > then it > > gets more complicated, so I wasn't sure it was worth it. > > > > For reference here are the earlier numbers (ipers with llvmpipe): > > original ipers: 12.1 fps > > optimized struct compare: 16.8 fps > > -fno-builtin-memcmp: 18.1 fps > > > > And this was the function I used for getting the numbers: > > > > static INLINE int util_cmp_struct(const void *src1, const void > > *src2, > > unsigned count) > > { > > /* hmm pointer casting is evil */ > > const uint32_t *src1_ptr = (uint32_t *)src1; > > const uint32_t *src2_ptr = (uint32_t *)src2; > > unsigned i; > > assert(count % 4 == 0); > > for (i = 0; i < count/4; i++) { > > if (*src1_ptr != *src2_ptr) { > > return 1; > > } > > src1_ptr++; > > src2_ptr++; > > } > > return 0; > > } > > OK, maybe the first thing to do is fix the compared struct, then > let's > see if there's anything significant left for a better memcmp to > extract. > > I can find some time to do that in the next few days. > > Keith > > _______________________________________________ > mesa-dev mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ mesa-dev mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/mesa-dev
