FYI
-------- Original Message --------
Subject: {DirectFB} Core: Experimental gfxcard.c replacement code with huge
optimization
Date: 13 Nov 2011 14:07:05 +0100
From: [email protected]
To: [email protected]
New branch 'accel1' available with the following commits:
http://git.directfb.org/?p=core/DirectFB.git;a=commit;h=92c15f0612c55df9b95a83fb0d19804e4718fa48
commit 92c15f0612c55df9b95a83fb0d19804e4718fa48
Author: Denis Oliver Kropp <[email protected]>
Date: Mon Oct 31 23:44:07 2011 +0100
Core: Experimental gfxcard.c replacement code with huge optimization
This implementation does not lock/unlock buffers for each operation,
but does a lazy state and lock management. If one calls SetColor and
FillRectangles in a row it will only call SetState() of the driver,
no unlock/lock of buffers etc.
To achieve this there's a new dispatch cleanup handler added that
is called before the next read() from Fusion device, to unlock the
currently locked buffers and exit the currently active state.
There's still a lot of code to move from gfxcard.c, the actual rendering,
but maybe it's worth to think about a rework to support all kinds of
cases with/without hardware matrix/clipping, different primitives etc.
The performance boost is awesome, up to 20x for some tests I ran.
Here are a few results:
Benchmarking 100x100 on 852x464 RGB32 (32bit)...
Anti-aliased Text 3.001 secs ( 3637.187
KChars/sec) [ 19.6%]
Anti-aliased Text (blend) 3.015 secs ( 489.552
KChars/sec) [ 4.9%]
Fill Rectangle 3.003 secs ( 3303.030
MPixel/sec) [ 24.0%]
Fill Rectangle (blend) 3.066 secs ( 181.343
MPixel/sec) [ 1.6%]
Fill Rectangles [10] 3.018 secs ( 3479.125
MPixel/sec) [ 4.6%]
Fill Rectangles [10] (blend) 3.351 secs ( 182.035
MPixel/sec) [ 0.2%]
Blit 3.005 secs ( 3346.422
MPixel/sec) [ 16.3%]
Blit 180 3.014 secs ( 1379.230
MPixel/sec) [ 7.9%]
Blit colorkeyed 3.015 secs ( 1271.973
MPixel/sec) [ 7.3%]
Blit destination colorkeyed 3.012 secs ( 1403.054
MPixel/sec) [ 7.9%]
Blit with format conversion 3.059 secs ( 322.000
MPixel/sec) [ 1.9%]
Blit with colorizing 3.061 secs ( 189.807
MPixel/sec) [ 1.9%]
Blit from 32bit (blend) 3.059 secs ( 323.635
MPixel/sec) [ 1.9%]
Blit from 32bit (blend) with colorizing 3.126 secs ( 86.372
MPixel/sec) [ 0.9%]
Blit SrcOver (premultiplied source) 3.037 secs ( 526.506
MPixel/sec) [ 3.3%]
Blit SrcOver (premultiply source) 3.035 secs ( 548.599
MPixel/sec) [ 3.3%]
Compared to the old code:
Benchmarking 100x100 on 852x464 RGB32 (32bit)...
Anti-aliased Text 3.009 secs ( 926.021
KChars/sec) [ 18.0%]
Anti-aliased Text (blend) 3.015 secs ( 427.462
KChars/sec) [ 4.9%]
Fill Rectangle 3.010 secs ( 655.813
MPixel/sec) [ 40.5%]
Fill Rectangle (blend) 3.069 secs ( 171.391
MPixel/sec) [ 2.2%]
Fill Rectangles [10] 3.019 secs ( 3093.739
MPixel/sec) [ 5.6%]
Fill Rectangles [10] (blend) 3.326 secs ( 180.396
MPixel/sec) [ 0.3%]
Blit 3.037 secs ( 466.249
MPixel/sec) [ 6.6%]
Blit 180 3.051 secs ( 406.751
MPixel/sec) [ 5.5%]
Blit colorkeyed 3.046 secs ( 397.570
MPixel/sec) [ 5.2%]
Blit destination colorkeyed 3.030 secs ( 571.287
MPixel/sec) [ 8.2%]
Blit with format conversion 3.079 secs ( 220.850
MPixel/sec) [ 2.2%]
Blit with colorizing 3.072 secs ( 131.510
MPixel/sec) [ 2.2%]
Blit from 32bit (blend) 3.097 secs ( 188.246
MPixel/sec) [ 2.2%]
Blit from 32bit (blend) with colorizing 3.136 secs ( 77.487
MPixel/sec) [ 0.9%]
Blit SrcOver (premultiplied source) 3.078 secs ( 253.411
MPixel/sec) [ 2.9%]
Blit SrcOver (premultiply source) 3.068 secs ( 265.319
MPixel/sec) [ 2.9%]
Compared to new code, but running as master (new mechanism leverages async
FusionCalls):
Benchmarking 100x100 on 852x464 RGB32 (32bit)...
Anti-aliased Text 3.000 secs ( 1582.800
KChars/sec) [ 99.3%]
Anti-aliased Text (blend) 3.003 secs ( 402.797
KChars/sec) [ 99.6%]
Fill Rectangle 3.000 secs ( 1978.000
MPixel/sec) [ 99.6%]
Fill Rectangle (blend) 3.001 secs ( 172.609
MPixel/sec) [ 99.6%]
Fill Rectangles [10] 3.002 secs ( 3214.523
MPixel/sec) [ 99.6%]
Fill Rectangles [10] (blend) 3.049 secs ( 180.387
MPixel/sec) [ 99.6%]
Blit 3.001 secs ( 522.159
MPixel/sec) [ 99.3%]
Blit 180 3.000 secs ( 424.333
MPixel/sec) [ 99.6%]
Blit colorkeyed 3.002 secs ( 413.724
MPixel/sec) [ 99.3%]
Blit destination colorkeyed 3.001 secs ( 615.794
MPixel/sec) [ 99.3%]
Blit with format conversion 3.000 secs ( 225.333
MPixel/sec) [ 99.6%]
Blit with colorizing 3.003 secs ( 143.856
MPixel/sec) [ 99.6%]
Blit from 32bit (blend) 3.002 secs ( 207.861
MPixel/sec) [ 99.3%]
Blit from 32bit (blend) with colorizing 3.006 secs ( 74.184
MPixel/sec) [ 99.6%]
Blit SrcOver (premultiplied source) 3.001 secs ( 274.908
MPixel/sec) [ 99.0%]
Blit SrcOver (premultiply source) 3.000 secs ( 286.333
MPixel/sec) [ 99.6%]
YES, it is slower than as a slave, as master does not go via FusionCall!
lib/fusion/fusion.c | 68 +++++
lib/fusion/fusion.h | 19 ++
lib/fusion/fusion_internal.h | 2 +
src/core/CoreGraphicsState_real.cpp | 465 +++++++++++++++++++++++++++++++++-
src/core/gfxcard.c | 2 +-
src/core/graphics_state.h | 18 +-
src/core/state.h | 6 +-
src/gfx/clip.h | 2 +-
src/gfx/generic/generic.c | 197 +++++++++++-----
src/gfx/generic/generic.h | 3 +-
10 files changed, 704 insertions(+), 78 deletions(-)
http://git.directfb.org/?p=core/DirectFB.git;a=commit;h=364bbdb150032eed9d69e5a0acfdd6f976a38770
commit 364bbdb150032eed9d69e5a0acfdd6f976a38770
Author: Denis Oliver Kropp <[email protected]>
Date: Sun Nov 13 13:49:45 2011 +0100
Core: Use new "queue" property for rendering and state setting methods.
Except surface setters because of out of order execution with references
being dropped right after blitting from but before flushing.
src/core/CoreGraphicsState.flux | 24 ++++++++++++++++++++++++
1 files changed, 24 insertions(+), 0 deletions(-)
_______________________________________________
directfb-cvs mailing list
[email protected]
http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-cvs
_______________________________________________
directfb-dev mailing list
[email protected]
http://mail.directfb.org/cgi-bin/mailman/listinfo/directfb-dev