Am 26.03.2013 20:28, schrieb Marek Olšák:
On Tue, Mar 26, 2013 at 6:44 PM, Christian König
<deathsim...@vodafone.de> wrote:
Am 26.03.2013 18:02, schrieb Jerome Glisse:
On Tue, Mar 26, 2013 at 12:40 PM, Marek Olšák <mar...@gmail.com> wrote:
On Tue, Mar 26, 2013 at 3:59 PM, Christian König
<deathsim...@vodafone.de> wrote:
Am 26.03.2013 15:34, schrieb Marek Olšák:
Speaking of si_pm4_state, I think it's a horrible mechanism for
anything other than constant state objects (create/bind/delete
functions). For everything else (set/draw functions), you want to emit
directly into the command stream. It's not so different from the bad
state management which r600g used to have (which is now gone). If you
have to call malloc or calloc in a set_* or draw_* function, you're
doing it wrong. Are there plans to change it to something more
efficient (e.g. how r300g and r600g emit non-CSO states right now), or
will it be like this forever?
Actually I hoped that r600g sooner or later moves into the same
direction
some more. The fact that we currently need to malloc every buffer indeed
sucks badly, but that is still better than mixing packet generation with
driver logic.
I don't understand the last sentence. What mixing? The set_* and
draw_* commands are supposed to be executed immediately, therefore
it's reasonable and preferable to write to the CS directly. Having any
intermediate storage for commands is a waste of time and space.
I agree here, i don't think uncached bo for command stream on new hw
would bring huge perf increase, probably will just be noise.
Also I don't think that emitting directly into the command stream is
such a
good idea, we sooner or later want that buffer to be a buffer allocated
in
GART memory. And under this condition it is better to build up the
commands
in a (heavily cached) system memory and then memcpy then to the
destination
buffer.
AFAIK, GART memory is cached on non-AGP systems, but even uncached
access shouldn't be a big issue, because the access pattern is
sequential and write-only. BTW, I have talked about emitting commands
into a buffer object with Dave and he thinks it's a bad idea due to
the map and unmap overhead. Also, we have to disallow writing to
certain unsafe registers anyway.
Marek
I think Christian is thinking about new hw > cayman where we can skip
register checking because of vm and hardware register checking (the hw
CP checks that register in the user IB is not one of the privilege
register and block write and throw irq if so). On this kind of hw you
can have cmd stream in bo and don't do the map/unmap.
Yes indeed, and my plan is to avoid the copying by referencing the state
directly with indirect buffer commands. That should also make thinks like
queries and predicated rendering a bit more simpler (think of PM4 subroutine
calls).
The problem on SI is that for embedded data and const IBs you need to patch
up the buffer quite a bit after it is written (at least if I understand them
If you need to patch up the buffer after it is written, i.e. it can't
be immutable, you probably shouldn't use any buffer at all and just
write the commands into the CS directly. How are you gonna update the
buffer if it's busy?
Unless you mean that some state depends on some other state and must
be recomputed, which is another aspect of radeonsi which is pretty
badly designed. Don't recompute states, just create multiple command
buffers covering all possible combinations of states you can ever get
with a single gallium CSO and switch between them at draw time. E.g.
when some external state changes from I to J, switch from
blend_state->variant[I] to blend_state->variant[J] (by just emitting
variant[J]), where variant is an array of immutable command buffers.
Same for pipe_surface (except that pipe_surface should initialize the
variants on demand).
Yeah, I know. I only have two hands, where should do you suggest I
should start?
correctly). But Marek is quite right that this only counts for state objects
and makes no sense for set_* and draw_* calls (and I'm currently thinking
how to avoid that and can't come up with a proper solution). Anyway it's
definitely not an urgent problem for radeonsi.
It will be a problem once we actually start caring about performance
and, most importantly, the CPU overhead of the driver.
I still think that writing into the command buffers directly (e.g. without
wrapper functions) is a bad idea, cause that lead to mixing driver logic and
I'm convinced the exact opposite is a bad idea, because it adds
another layer all commands must go through. A layer which brings no
advantage. Think about apps which issue 1k-10k draw calls per frame.
It's obvious that every byte moved around counts and the key to high
framerate is to do (almost) nothing in the driver. It looks like the
idea here is to make the driver as slow as possible.
packet building in r600g. For example just try to figure out how the
relocation in NOPs work by reading the source (please keep in mind that one
of the primary goals why AMD is supporting this driver is to give a good
example code for customers who want to implement that stuff on their own
systems).
I'm shocked. Sacrificing performance in the name of making the code
nicer for some customers? Seriously? I thought the plan was to make
the best graphics driver ever.
Well, maybe I'm repeating myself: Performance is not a priority, it's
only nice to have!
Sorry to say so, but if we sacrifice a bit of performance for more code
readability than that is perfectly ok with me (Don't understand me wrong
I would really prefer to replace the closed source driver today than
tomorrow, it's unfortunately just not what I'm paid for).
On the other hand, we are talking about perfectly optimizeable inline
functions and/or macros. All I'm saying is that we should structurize
the code a bit more.
Christian.
Marek
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev