On Mon, Nov 08, 2004 at 11:32:24AM -0800, Ian Romanick wrote:
| This is something I've been thinking about ever since I saw the
| profiling tools in Nvidia's drivers at SIGGRAPH. There's a LOT of
| information that would be useful to get out of the driver about performance
Have you taken a look at the SGIX_instruments extension? It provides a
framework that's intended for gathering profiling information
asynchronously. The idea was that you'd add separate extensions that
defined the actual instrumentation (SGIX_ir_instrument1 was an early
example).
I searched my archives for things I'd written on this subject in the
past. The following is probably the most comprehensive summary. Some
of it's out-of-date now, or has implications for hardware design that's
out of our control, but some of it still looks useful.
Allen
Purposes of Instrumentation
Tuning
Analyzing the app or database to improve overall
performance and/or rendering quality. Typically done
during the development phase. Examples: determining
what percentage of triangles are clipped, or how well
texture memory is utilized.
Load Monitoring
Gathering information to modify the behavior of the app
or the structure of the database dynamically, to
maintain a constant frame-rate. Typically done in
real-time by production apps. Examples: determining
how much time is spent in geometric processing and how
much time in pixel-fill, in order to choose object
level-of-detail.
Debugging/Testing
Graphics systems are extremely complex, and their behavior
isn't always predictable. We can anticipate a need for
machine-specific instrumentation in order to understand
surprisingly high or low performance of an application,
or for use during driver development.
Infrastructure
The SGIX_instruments extension provides scaffolding for
pipeline instrumentation. The framework allows the app to:
Specify a buffer into which measurements will be
delivered (asynchronously) by the pipe.
Enable/disable an arbitrary collection of instruments.
Start/stop/snapshot measurements by the
currently-enabled set of instruments.
Label a measurement with a user-selectable marker.
Poll or wait for completion of a particular measurement.
We must write one or more new extensions to define instruments
that fit into the SGIX_instruments framework. This outline
sketches some of the instruments that might be appropriate.
Since some measurements are performed by real-time apps, it's
important to keep the overhead low. The asynchronous delivery
scheme helps with this, but it's also desirable to keep other
issues in mind (for example, avoid flushing the pipe if at all
possible).
Suggested Instruments
Rendering Statistics
Number of bytes of data sent to pipe
Number of bytes of data sent from pipe
These are used to identify data transfer
bottlenecks arising from geometry-path
commands, pixel-path commands, and texture
management.
Number of geometric primitives sent to pipe
Number of geometric primitives trivially accepted or rejected
Number of geometric primitives subjected to 3D clipping
Number of geometric primitives resulting from 3D clipping
Number of geometric primitives face-culled
Number of matrix ops sent to pipe
These measure culling effectiveness and
determine the cause of geometry-processing
bottlenecks (e.g., too many vertices, too much
clipping, or too many attribute changes).
Number of DrawPixels commands sent to pipe
Number of Bitmap commands sent to pipe
Number of ReadPixels commands sent to pipe
Number of CopyPixels commands sent to the pipe
Together with the data transfer statistics,
these help determine whether pixel-oriented
apps are running into data transfer or pixel
operation setup bottlenecks.
Number of MakeCurrent/MakeCurrentRead commands executed
This should help determine when apps are using
more than the optimal number of contexts, and
thus causing an inordinate number of context
switches.
Number of fragments generated, for each rasterizer
Number of fragments passing depth test, for each rasterizer
Together with other statistics, these help
estimate average triangle size, depth
complexity, and effectiveness of depth
sorting.
Open Issues:
Is there a way to track the number of bytes
processed by CopyPixels-style operations?
These aren't accounted-for by the transfers
to and from the pipe.
Texture Statistics
Number of texture binds performed
Pinpoints an important attribute-change
bottleneck.
Number of TexImage/TexSubImage commands
Number of CopyTexImage/CopyTexSubImage commands
Number of texture downloads initiated by texture manager
Number of GetTexImage commands
Number of texture uploads initiated by texture manager
Together with other stats, determines cost of
texture management operations.
Texture memory utilization
Initial/Max/Min/Final fraction of texture memory
in use over the measurement interval.
Open issues:
Number of texture fetches, per rasterizer?
Timing Measurements
Return these times for all commands appearing between
two ``bracketing'' commands issued by the app:
Host CPU time (usecs)
Geometry (total for vector and scalar units)
processing time (usecs)
Rasterization (for each rasterizer) processing
time (usecs)
Wall clock time (usecs)
Note that the above measurements should reflect the
``useful work'' performed by the associated pipe
stages; they should be repeatable no matter what is in
the pipe before the first bracketing command is issued
and no matter what is placed in the pipe after the
second bracketing command is issued. (Thus, counting
FIFO full/empty states isn't sufficient.)
Instruments NOT Recommended
Number of FIFO high-water interrupts
Not sure this is needed. Provided we do a good job of
accounting for time spent in each stage of the pipe,
that accounting should be of more use than the raw
number of interrupts, and interpreting it should
involve less system-dependent code.
Number of graphics context switches
Superseded by recording the number of MakeCurrent
commands (which should be more useful on a per-context
basis than the global number of context switches per
pipe).
Number of geometric primitives scissored
See note under Issues/Resolutions below.
Number of bytes transferred due to DrawPixel/Bitmap commands
Number of bytes transferred due to ReadPixel commands
Number of bytes transferred due to CopyPixels commands
Number of bytes of texture data transferred as a result
of TexImage, CopyTexImage, GetTexImage, etc.
These seem reasonable, but I suspect we'll get adequate
bang-for-the-buck just by counting the number of bytes
transferred to/from the pipe. (Tracking bytes
transferred for Copy* operations is an open issue.)
Coarse Z-culling stats of some kind?
My current guess is that if we can provide statistics
on number of fragments generated and the number of
fragments passing the depth test, it's unlikely we'll
need more stats on coarse Z-culling.
Issues/Resolutions
In principle, the application can handle some of the
measurements described above (counting the number of times a
given command is executed, for example). Should we bother
implementing instruments to capture such measurements?
I believe we should. Although it makes good design
sense to avoid duplicating what's easily accomplished
in the apps, there are two problems with requiring
users to make measurements on their own:
(1) Doing so could require wholesale changes to source
code. (Consider what would be needed to handle display
lists correctly.) It's unlikely many users would do
this.
(2) Users typically don't have access to the source code
for high-level libraries that issue OpenGL commands, so
requiring source code changes makes it impractical for
them to measure the commands executed by those libraries.
Why not use a library like GLS or a utility like ogldebug to
trace OpenGL commands and make such measurements?
Good arguments have been made for this, but I'm not
completely convinced.
In some cases, using GLS or ogldebug mitigates the
problems mentioned above. For example, it would be
easier to maintain counts of the number of times a
command is executed, since no access to source code is
needed. (Handling display lists correctly seems
possible, though it would require a good bit of work,
especially for shared dlists.)
There are problems merging the results of counts from
the tracing utilities with timing measurements made by
other instruments. The tracing utilities would need to
interpret the instrumentation commands to know when to
start and stop counting. The counts wouldn't be
available to the application under test, so it couldn't
make on-the-fly decisions based on them.
Also, in many cases I suspect it's more work to put
this functionality into the tracing utilities than it
is to fold the functionality into the instruments.
Counting pixel and texture commands might be
accomplished with just a few lines of microcode, for
example.
It's difficult to measure the number of scissored geometric
primitives, because a primitive may be scissored in one
rasterizer but not in others. Determining which primitives
have been scissored essentially requires tagging each primitive
so that the status from all rasterizers can be combined
meaningfully.
Good point. That statistic has been dropped from the
current proposal.
It would be worthwhile to consider instruments that would help
debug performance problems, but would not necessarily be
exposed for general use. (A count of the number of cycles for
which each type of memory request [texture, video, command
fifo, etc.] stalls, for example.)
Yes. The proposal now mentions a ``Debug/Test''
category of instruments.
Beware of adding readable hardware counters, particularly when
they affect multiple blocks of logic and software (consider
testability, new special command packets that would be
required, context switching, etc.).
True. Not all of these instruments will be practical.
For multiple geometry engines, some measurements will need to be
maintained on a per-GE basis. The extension spec must reflect
this (as it must reflect the existence of multiple rasterizers).
-------------------------------------------------------
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE
LinuxWorld Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
--
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel