On Fri, Apr 04, 2003 at 08:48:35AM -0700, Brian Paul wrote:
In general, this sounds reasonable but you also have to consider performance.
The glVertex, Color, TexCoord, etc commands have to be simple and fast. As it is now, glColor4f (for example) (when implemented in X86 assembly) is just a jump into _tnl_Color4f() which stuffs the color into the immediate struct and returns. Something similar is done in the R200 driver.
If the implementation of _tnl_Color4f() involves a call to producer->Color4f() we'd lose some performance.
I know, but my objective is to design a good object interface on which all drivers may fit and reuse code. When a driver gets to the point where the producer->Color4F() calls are the main performance bottleneck (!?) the developer is free to write a tailored version of TnLProducer that elimates that extra call:
Right now people use things like Viewperf to make systems purchase decisions. Unless the graphics hardware and the rest of the system are very mismatched, the immediate API already has an impact on performance in those benchmarks.
The performance of the immediate API *is* important to real applications. Why do you think Sun came up with the SUN_vertex extension? To reduce the overhead of the immediate API, of course. :)
[sample code cut]
But this is all of _very_ _little_ importance when compared by the ability of _writing_ a full driver fast, which is given by a well designed OOP interface. As I said here several times, this kind of low-level optimizations consume too much development time causing that higher-level optimizations (usually with much more impact on performance) are never attempted.
In principle, I think the producer/consumer idea is good. Why not implement known optimizations in it from the start? We already having *working code* to build formated vertex data (see the radeon & r200 drivers), why not build the object model from there? Each concrete producer class would have an associated vertex format. On creation, it would fill in a table of functions to put data in its vertex buffer. This could mean pointers to generic C functions, or it could mean dynamically generating code from assembly stubs.
The idea is that the functions from this table could be put directly in the dispatch table. This is, IMHO, critically important.
The various vertex functions then just need to call the object's produce method. This all boils down to putting a C++ face on a technique that has been demonstrated to work.
I do have one question. Do we really want to invoke the producer on every vertex immediatly? In the radeon / r200 drivers this is just to copy the whole vertex to a DMA buffer. Why not generate the data directly where it needs to go? I know that if the vertex format changes before the vertex is complete we need to copy out of the temporary buffer into the GL state vector, but that doesn't seem like the common case. At the very least, some guys at Intel think generating data directly in DMA buffers is the way to go:
http://www.intel.com/technology/itj/Q21999/ARTICLES/art_4.htm
I guess my point is that we *can* have our cake and eat it too. We can have a nice object model and have "classic" low-level optimizations. The benefit of doing that optimizations at the level of the object model is that they only need to be done once for a given vertex format. Reusing optimizations sounds like a big win to me! :)
------------------------------------------------------- This SF.net email is sponsored by: ValueWeb: Dedicated Hosting for just $79/mo with 500 GB of bandwidth! No other company gives more support or power for your dedicated server http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/ _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
