On 20/10/15 18:11, Rowley, Timothy O wrote:
Hi.  I'd like to introduce the Mesa3D community to a software project
that we hope to upstream.  We're a small team at Intel working on
software defined visualization (http://sdvis.org/), and have
opensource projects in both the raytracing (Embree, OSPRay) and
rasterization (OpenSWR) realms.

We're a different Intel team from that of i965 fame, with a different
type of customer and workloads.  Our customers have large clusters of
compute nodes that for various reasons do not have GPUs, and are
working with extremely large geometry models.

We've been working on a high performance, highly scalable rasterizer
and driver to interface with Mesa3D.  Our rasterizer functions as a
"software gpu", relying on the mature well-supported Mesa3D to provide
API and state tracking layers.

We would like to contribute this code to Mesa3D and continue doing
active development in your source repository.  We welcome discussion
about how this will happen and questions about the project itself.
Below are some answers to what we think might be frequently asked
questions.

Bruce and I will be the public contacts for this project, but this
project isn't solely our work - there's a dedicated group of people
working on the core SWR code.

   Tim Rowley
   Bruce Cherniak

   Intel Corporation

Why another software rasterizer?
--------------------------------

Good question, given there are already three (swrast, softpipe,
llvmpipe) in the Mesa3D tree. Two important reasons for this:

  * Architecture - given our focus on scientific visualization, our
    workloads are much different than the typical game; we have heavy
    vertex load and relatively simple shaders.  In addition, the core
    counts of machines we run on are much higher.  These parameters led
    to design decisions much different than llvmpipe.

  * Historical - Intel had developed a high performance software
    graphics stack for internal purposes.  Later we adapted this
    graphics stack for use in visualization and decided to move forward
    with Mesa3D to provide a high quality API layer while at the same
    time benefiting from the excellent performance the software
    rasterizerizer gives us.

It wouldn't be too dificult to make llvmpipe's vertex-shading distributed across threads.

What's the architecture?
------------------------

SWR is a tile based immediate mode renderer with a sort-free threading
model which is arranged as a ring of queues.  Each entry in the ring
represents a draw context that contains all of the draw state and work
queues.  An API thread sets up each draw context and worker threads
will execute both the frontend (vertex/geometry processing) and
backend (fragment) work as required.  The ring allows for backend
threads to pull work in order.  Large draws are split into chunks to
allow vertex processing to happen in parallel, with the backend work
pickup preserving draw ordering.

Our pipeline uses just-in-time compiled code for the fetch shader that
does vertex attribute gathering and AOS to SOA conversions, the vertex
shader and fragment shaders, streamout, and fragment blending. SWR
core also supports geometry and compute shaders but we haven't exposed
them through our driver yet. The fetch shader, streamout, and blend is
built internally to swr core using LLVM directly, while for the vertex
and pixel shaders we reuse bits of llvmpipe from
gallium/auxiliary/gallivm to build the kernels, which we wrap
differently than llvmpipe's auxiliary/draw code.

What's the performance?
-----------------------

For the types of high-geometry workloads we're interested in, we are
significantly faster than llvmpipe.  This is to be expected, as
llvmpipe only threads the fragment processing and not the geometry
frontend.

The linked slide below shows some performance numbers from a benchmark
dataset and application.  On a 36 total core dual E5-2699v3 we see
performance 29x to 51x that of llvmpipe.

        http://openswr.org/slides/SWR_Sept15.pdf

While our current performance is quite good, we know there is more
potential in this architecture.  When we switched from a prototype
OpenGL driver to Mesa we regressed performance severely, some due to
interface issues that need tuning, some differences in shader code
generation, and some due to conformance and feature additions to the
core swr.  We are looking to recovering most of this performance back.

I tried it on my i7-5500U, but I run into two issues:

- OpenSWR seems to only use 2 threads (even though my system support 4 threads)

- and even when I compensate llvmpipe to only use 2 rasterizer threads, I still only get half the framerate of llvmpipe with the "gloss" Mesa demo (a very simple texturing demo):

$ ./gloss
SWR create screen!
This processor supports AVX2.
720 frames in 5.004 seconds = 143.885 FPS
737 frames in 5.005 seconds = 147.253 FPS
729 frames in 5.004 seconds = 145.683 FPS
732 frames in 5.002 seconds = 146.341 FPS
735 frames in 5.001 seconds = 146.971 FPS
[...]
$ GALLIUM_DRIVER=llvmpipe LP_NUM_THREADS=2 ./gloss
1539 frames in 5.002 seconds = 307.677 FPS
1719 frames in 5 seconds = 343.8 FPS
1780 frames in 5.002 seconds = 355.858 FPS
1497 frames in 5.002 seconds = 299.28 FPS
1548 frames in 5.001 seconds = 309.538 FPS
[..]

I see similar ratio with more complex  workload with the trace from:

  http://people.freedesktop.org/~jrfonseca/traces/furmark-1.8.2-svga.trace

(you'll need to download https://github.com/apitrace/apitrace and build)

My questions are:

- Is this the expected performance when texturing is used? Or is there something wrong with my setup?

I understand that OpenSWR actually leverages llvmpipe (well gallivm's) code for texture sampling, so I was expecting a smaller gap.

- What exactly was the benchmark used for SWR_Sept15.pdf's figures ? Was there any texture sampling used on it, or was it just simple lighting?

Jose
_______________________________________________
mesa-dev mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to