[Dri-devel] dri-devel FAQ

Jos� Fonseca Sun, 20 Jan 2002 10:21:06 -0800

Hi all,

I've finished compiling the the information gathered from the dri-devel 
archives into the FAQ. Since my university network was again down I was 
not able to put in my workstation's web server so I took the liberty of 
attach it in this mail. I'll publish in the same site 
(http://mefriss1.swan.ac.uk/~jfonseca/dri/ ) in the meanwhile anyway.



I hope that you don't get disappointed - it's not yet complete but has 
several pieces of wisdom. I'm sure that some of the original authors will 
get nostalgic feelings when reading it.. :-)

I would like to get feedback on it. Either personally or to the dri-devel 
mailing list (to receive peer review).


I especially want that you make corrections on:

- Incorrect information: e.g., I've taken some assumptions is the 
questions as right since they were'nt refuted in the answers but that is 
not necessarily true.

- Out of date information: e.g., There are reference to branches which I 
don't know if they were merged in the trunk.

Please don't bother yourself to make comments/suggestions on:

- FAQ Structure: There are obviously stuff misplaced but the structure 
will evolve as I include stuff from the DRI original documents.

- Style or typos: Only once the whole information is gathered I will start 
looking on this.


If you have memory of interesting emails in the dri-devel that weren't 
included please tell me so. You may feel free to give more FAQs (with 
answers of course!) to include.

In summary, now it only matters that the information is _here_ and is 
_correct_, as much as possible.

Regards,

Jose Fonseca

Title: DRI Developer Frequently Asked Questions

DRI Developer Frequently Asked Questions

Jos� Fonseca

[EMAIL PROTECTED]

Revision History
Revision 0.10	2002-01-20	Revised by: jf

This is the list of Frequently Asked Questions for DRI developers, a framework for allowing direct access to graphics hardware in a safe and efficient manner. This FAQ is meant to be read in conjunction with the DRI Documentation. You can also get Postscript, PDF, HTML, and SGML versions of this document. The DRI Developer Frequently Asked Questions is distributed under the terms of the GNU Free Documentation License.

Table of Contents

1. Getting Started

1.1. How do I get started with development?
1.2. How is constituted a OpenGL driver?
1.3. Do I need to understand the X11 in order to help?
1.4. I want to start the development of a new driver. Which one should I took as template?

2. Debugging and benchmarking

2.1. How do you put a breakpoint in the dynamically loaded modules?
2.2. How do I do benchmarking with Unreal Tournament?
2.3. Is there any way for us to detect which features are implemented with hardware support for a given driver?
2.4. Which OpenGL benchmarking program can I use to test and compare various facets of the performance of graphics cards?
2.5. How should I report bugs?

3. DRI Components

3.1. Direct Rendering Module (DRM)

3.1.1. What is DRM?
3.1.2. The "The Direct Rendering Manager : Kernel Support for the Direct Rendering Infrastructure" document states that the drm driver has support for loading sub-drivers by calling drmCreateSub. Is that really implemented?
3.1.3. Is it possible to make an OpenGL driver without a DRM driver in a piece of hardware whereby we do all accelerations in PIO mode?
3.1.4. What is templated DRM code?
3.1.5. How do X modules and X applications communicate?

4. Hardware Specific

4.1. AGP

4.1.1. What is AGP?
4.1.2. Why not use the existing XFree86 AGP manipulation calls?
4.1.3. How do I use AGP?
4.1.4. How to allocate AGP memory?
4.1.5. If one has to insert pages he needs to check for -EBUSY errors and loop through the entire GTT. Wouldn't it be better if the driver fills up pg_start of agp_bind structure instead of user filling up?

4.2. ATI Cards

4.2.1. How to get ATI cards specification?

4.2.2. Mach64 based cards

4.2.2.1. I would like to help developing the Mach64 driver...

4.3. S3

4.3.1. Are there plans to enable the s3tc extension on any of the cards that currently support it?

5. Miscellaneous Questions

5.1. Remote rendering

5.1.1. If I run a GLX enabled OpenGL program on a remote system with the display set back to my machine, will the X server itself render the GLX requests through DRI?
5.1.2. What's the re a difference between local clients and remote clients?

5.2. How does the DMA transfer mechanism works?

5.3. How about the main X drawing surface? Are 2 extra "window sized" buffers allocated for primary and secondary buffers in a page-flipping configuration?

5.4. Is anyone working on adding SSE support to the transform/lighting code in Mesa?

5.5. How often are checks done to see if things need clipped/redrawn/redisplayed?

5.6. What's the deal with fullscreen and DGA?

5.7. DRI without X

5.7.1. Can DRI run without X?
5.7.2. What would be the advantages of a non-X version of DRI?
5.7.3. I would like to make a X11 free acces to 3d...

5.8. Is there a difference between using indirect DRI rendering (e.g., with LIBGL_ALWAYS_INDIRECT) and just linking against the Mesa library?

5.9. What's the relationship between Glide and DRI?

5.10. Of what use is the Mesa code in the xc tree?

5.11. Is there any documentation about the XMesa* calls?

6. Authorship and Acknowledgments.

1. Getting Started

1.1. How do I get started with development?

To get started with development you have to first understand the DRI and XFree86 architecture. You can start by reading the developer documents in the documentation section of this website.

Then you can continue by checking out the DRI CVS tree. Poke around the driver source code to find out some more about the inner workings of the DRI. There also are some text documents within the XFree86 source tree that contain useful information.

Once you feel that you have sufficient understanding of the DRI to begin coding you can start by submitting a patch. You have to submit at least one patch to get write access to our CVS. Have a look at the Sourceforge bug tracker for open issues. That is a good place to find an issue that you can fix by submitting a patch.

After you have submitted your patch you can start working on a more concrete project. Have a look at the status page or read the newsgroups for projects that need to be worked on.

Of course you don't have to submit a patch before you can work on a project. But since you wont be able to check in your work until you submit a patch it is very desirable to submit a patch first. You do want people to test your work, right?

Also, don't be shy about asking questions on the dri-devel newsgroup. The main purpose of the newsgroup is the discussion of development issues. So, feel free to ask questions.

1.2. How is constituted a OpenGL driver?

All OpenGL drivers are made up of 3 parts:

A DRI aware 2D driver which lives in xc/programs/Xserver/hw/xfree86/drivers usually in a file *_dri.[c.h].
A DRI aware 3D driver based on Mesa which lives in xc/lib/GL/mesa/src/drv
A kernel module which lives in xc/programs/Xserver/hw/xfree86/os-support/drm/linux

1.3. Do I need to understand the X11 in order to help?

	I do not understand the X11 codebase. Not one bit. Never even looked at it, probably never intend to. Do I need to understand this in order to work on the DRI/Mesa drivers? Nope. At a bare minimum, I'd need to know how the Mesa Device Driver (DD) interface works, and have a basic understanding of OpenGL. This would allow me to work on the client-side driver code (`xc/lib/GL/mesa/src/drv/*`). Having more knowledge about the internals of Mesa certainly helps, and if you're keen you can check out the chipset specific stuff in the DRM (which is fairly closely tied to the client-side driver code mentioned above, and mainly deals with DMA transfers etc). No one ever said it would be easy to pick up, but it's not like you have to have a working knowledge of the entire X server to work on the 3D drivers...
--Gareth Hughes

1.4. I want to start the development of a new driver. Which one should I took as template?

The tdfx driver is rather old, and was the place a lot of experimentation was done. It isn`t an example of a good driver.

To start out I`d concentrate on the i810. The design of that driver is very clean, and makes a good base to build upon. (Given more time and resources I`d rewrite the tdfx driver to act more like the i810)

Bear in mind that the i810 driver is (still) on a branch. There is some code on the trunk, and it may give a broad overview of how to proceed with the 3d part, but it won`t be much use for the interaction with the DRI.

Unfortunately, it looks like I am going to have to rework the i810 driver to put a lot more stuff into the kernel for security reasons.

There are a few possibilities you have to consider:

If your card has a dma model which is secure (there is no way that the client can emit instructions which cause writes to system memory, for instance), the current i810 driver is what you should examine (check out mga-0-0-2-branch, now).

If your card has a dma model which is insecure (dma buffers can cause writes to pci space), look at the current mga driver on the same branch.

If your card has a fifo/mmio model which is secure, and there is no vertex buffer or dma buffer mechanism, the i810 driver is probably the closest thing to look at for state management, but you will need to take a different approach to emitting cliprects - the tdfx driver has some examples of this.

If your card has a fifo/mmio model which is *insecure* and there is no vertex-buffer or dma-buffer mechanism, you are really in a world of hurt... There are ways around most problems, but hopefully we don`t need to get into details.

The security issue that I see most often is being able to write to PCI memory. Cards might do this via a 2d blit mode which allows blitting to/from main memory, or perhaps by a special dma instruction which writes a counter dword into PCI space. These are very useful operations, but can be exploited to write to, eg, the bit of memory which holds a process` UID number.

Most hardware seems to have been designed for consumer versions of Windows, which don`t really have a security model (to my knowledge).

So, you need to verify there is no mechanism to write to PCI space (in any way or form).

If there is no DMA interface: As long as the card is secure, that is probably going to simplify the task of writing the driver, as there will be only a tiny kernel component. The tdfx kernel driver (programs/Xserver/hw/xfree86/os-support/linux/drm/kernel/tdfx*) should be the basis for your kernel part - there should be very few changes. You can use the basic structure of the i810 3d driver for state management, etc, but you will want to emit state directly as OUTREG`s instead of via dma. You will need to look at how the tdfx driver emits cliprects in the triangle routines - you`ll need to do something similar.

2. Debugging and benchmarking

2.1. How do you put a breakpoint in the dynamically loaded modules?

You need xfree86-gdb, which is a version of gdb modified to understand the module binary format that XFree86 uses in addition to the standard elf/coff binary formats.

Example 1. Using xfree86-gdb

Use

modules mach64_drv.o mach64_dri.o

after starting the debugger to load symbols, etc.

xfree86-gdb is freely available from here.

2.2. How do I do benchmarking with Unreal Tournament?

Start a practive level. Type timedemo 1 in the console or in alternative select the "TimeDemo Statistic" menu entry under the "Tools" section.

You should see two numbers in white text on the right side of the screen showing the average framerate (Avg) and the number of frame rates in the last second (Last Sec). If this doesn't work check whether the stats get reported in your ~/.loki/ut/System/UnrealTournament.log file.

2.3. Is there any way for us to detect which features are implemented with hardware support for a given driver?

OpenGL doesn't have such a query. This is a potential problem with any OpenGL implementation. The real question one wants answered is "is this feature or GL state combination fast enough for my needs?". Whether a feature is implemented in hardware or software isn't always consistant with that question.

You might consider implementing a benchmark function to test the speed during start-up and making a decision depending on the result. The info could be cached in a file keyed by GL_RENDERER.

Check isfast

2.4. Which OpenGL benchmarking program can I use to test and compare various facets of the performance of graphics cards?

Games: You can use OpenGL games such as Quake 3, Unreal Tournament, etc.
SPECviewperf: SPECviewperf is a portable OpenGL performance benchmark program written in C providing a vast amount of flexibility in benchmarking OpenGL performance.
SPECglperf: SPECglperf is an executable toolkit that measures the performance of OpenGL 2D and 3D graphics operations. These operations are low-level primitives (points, lines, triangles, pixels, etc.) rather than entire models.
gleam: glean is a suite of tools for evaluating the quality of an OpenGL implementation and diagnosing any problems that are discovered. It also has the ability to compare two OpenGL implementations and highlight the differences between them.
machtest: machtest is a thorough benchmark for graphics cards. It has literary thousands of command line options, is easily extensible and it can produce machine readable output.
Mesa demos

2.5. How should I report bugs?

Please submit bugs through the bug tracking system in SourceForge. It`s the only way we can keep track of all of them. Write up one problem in each bug report. It`s best if you can create a small example that shows what you think is the problem.

For those who really want to be Open Source heros -- you can create a test for the bug under glean. The intention would be to run glean quite often, so any functionality you can verify there, is much less likely to reappear in a broken form at some random time in the future.

3. DRI Components

3.1. Direct Rendering Module (DRM)

3.1.1. What is DRM?

This is a kernel module that gives direct hardware access to DRI clients.

This module deals with DMA, AGP memory management, resource locking, and secure hardware access. In order to support multiple, simultaneous 3D applications the 3D graphics hardware must be treated as a shared resource. Locking is required to provide mutual exclusion. DMA transfers and the AGP interface are used to send buffers of graphics commands to the hardware. Finally, there must be security to prevent out-of-control clients from crashing the hardware.

For each 3D hardware driver there is a kernel module.

Since internal Linux kernel interfaces and data structures may be changed at any time, DRI kernel modules must be specially compiled for a particular kernel version. The DRI kernel modules reside in the /lib/modules/kernel-version/misc/ directory. DRI kernel modules are named device.o where device is a name such as tdfx, mga, r128, etc.

Normally, the X server automatically loads whatever DRI kernel modules are needed.

3.1.2. The "The Direct Rendering Manager : Kernel Support for the Direct Rendering Infrastructure" document states that the drm driver has support for loading sub-drivers by calling drmCreateSub. Is that really implemented?

Linus didn't like that approach. He wanted all drivers to be independent, so it went away.

3.1.3. Is it possible to make an OpenGL driver without a DRM driver in a piece of hardware whereby we do all accelerations in PIO mode?

The kernel provides three main things:

the ability to wait on a contended lock (the waiting process is put to sleep), and to free the lock of a dead process
the ability to mmap areas of memory that non-root processes can`t usually map
the ability to handle hardwre interruptions and a DMA queue

All of these are hard to do outside the kernel, but they aren`t required components of a DRM driver. For example, the tdfx driver doesn`t use hardware interrupts at all -- it is one of the simplest DRM drivers, and would be a good model for the hardware you are thinking about (in it`s current form, it is quite generic).

DRI was designed with a very wide range of hardware in mind, ranging from very low-end PC graphics cards through very high-end SGI-like hardware (which may not even need the lock). The DRI is an infrastructure or framework that is very flexible -- most of the example drivers we have use hardware interrupts, but that isn`t a requirement.

3.1.4. What is templated DRM code?

The purpose of this email is to discuss what I've done to bring up the mach64 kernel module. I have spoken about this with some VA people, but now that it's working I would like to broaden the audience a little.

Not wanting to simply copy-and-paste *another* version of _drv.[ch], _context.c, _bufs.s and so on, I did some refactoring along the lines of what Rik Faith and I discussed a long time ago. This is very much along the lines of a lot of Mesa code, where there exists a template header file that can be customized with a few #defines. So far, I've done _drv.c and _context.c, creating driver_tmp.h and context_tmp.h that can be used to build up the core module.

An inspection of mach64_drv.c on the mach64-0-0-1-branch reveals the following code:

#define DRIVER_AUTHOR           "Gareth Hughes"

#define DRIVER_NAME             "mach64"
#define DRIVER_DESC             "DRM module for the ATI Rage Pro"
#define DRIVER_DATE             "20001203"

#define DRIVER_MAJOR            1
#define DRIVER_MINOR            0
#define DRIVER_PATCHLEVEL       0


static drm_ioctl_desc_t         mach64_ioctls[] = {
	[DRM_IOCTL_NR(DRM_IOCTL_VERSION)]       = { mach64_version,    0, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_GET_UNIQUE)]    = { drm_getunique,     0, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_GET_MAGIC)]     = { drm_getmagic,      0, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_IRQ_BUSID)]     = { drm_irq_busid,     0, 1 },

	[DRM_IOCTL_NR(DRM_IOCTL_SET_UNIQUE)]    = { drm_setunique,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_BLOCK)]         = { drm_block,         1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_UNBLOCK)]       = { drm_unblock,       1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AUTH_MAGIC)]    = { drm_authmagic,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_ADD_MAP)]       = { drm_addmap,        1, 1 },

	[DRM_IOCTL_NR(DRM_IOCTL_ADD_BUFS)]      = { drm_addbufs,       1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_MARK_BUFS)]     = { drm_markbufs,      1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_INFO_BUFS)]     = { drm_infobufs,      1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_MAP_BUFS)]      = { drm_mapbufs,       1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_FREE_BUFS)]     = { drm_freebufs,      1, 0 },

	[DRM_IOCTL_NR(DRM_IOCTL_ADD_CTX)]       = { mach64_addctx,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_RM_CTX)]        = { mach64_rmctx,      1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_MOD_CTX)]       = { mach64_modctx,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_GET_CTX)]       = { mach64_getctx,     1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_SWITCH_CTX)]    = { mach64_switchctx,  1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_NEW_CTX)]       = { mach64_newctx,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_RES_CTX)]       = { mach64_resctx,     1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_ADD_DRAW)]      = { drm_adddraw,       1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_RM_DRAW)]       = { drm_rmdraw,        1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_LOCK)]          = { mach64_lock,       1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_UNLOCK)]        = { mach64_unlock,     1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_FINISH)]        = { drm_finish,        1, 0 },

#if defined(CONFIG_AGP) || defined(CONFIG_AGP_MODULE)
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_ACQUIRE)]   = { drm_agp_acquire,   1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_RELEASE)]   = { drm_agp_release,   1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_ENABLE)]    = { drm_agp_enable,    1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_INFO)]      = { drm_agp_info,      1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_ALLOC)]     = { drm_agp_alloc,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_FREE)]      = { drm_agp_free,      1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_BIND)]      = { drm_agp_bind,      1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_UNBIND)]    = { drm_agp_unbind,    1, 1 },
#endif

	[DRM_IOCTL_NR(DRM_IOCTL_MACH64_INIT)]   = { mach64_dma_init,   1, 1
},
	[DRM_IOCTL_NR(DRM_IOCTL_MACH64_CLEAR)]  = { mach64_dma_clear,  1, 0
},
	[DRM_IOCTL_NR(DRM_IOCTL_MACH64_SWAP)]   = { mach64_dma_swap,   1, 0
},
	[DRM_IOCTL_NR(DRM_IOCTL_MACH64_IDLE)]   = { mach64_dma_idle,   1, 0
},
};

#define DRIVER_IOCTL_COUNT      DRM_ARRAY_SIZE( mach64_ioctls )

#define HAVE_CTX_BITMAP         1

#define TAG(x) mach64_##x
#include "driver_tmp.h"

And that's all you need. A trivial amount of code is needed for the context handling:

#define __NO_VERSION__
#include "drmP.h"
#include "mach64_drv.h"

#define TAG(x) mach64_##x
#include "context_tmp.h"

And as far as I can tell, the only thing that's keeping this out of mach64_drv.c is the __NO_VERSION__, which is a 2.2 thing and is not used in 2.4 (right?).

To enable all the context bitmap code, we see the "#define HAVE_CTX_BITMAP 1". To enable things like AGP, MTRRs and DMA management, the author simply needs to #define the correct symbols. With less than five minutes of mach64-specific coding, I had a full kernel module that would do everything a basic driver requires -- enough to bring up a software-fallback driver. The above code is all that is needed for the tdfx driver, with appropriate name changes. Indeed, any card that doesn't do kernel-based DMA can have a fully functional DRM module with the above code. DMA-based drivers will need more, of course.

My plan is to extend this to basic DMA setup and buffer management, so that the creation of PCI or AGP DMA buffers, installation of IRQs and so on is as trivial as this. What will then be left is the hardware-specific parts of the DRM module that deal with actually programming the card to do things, such as setting state for rendering or kicking off DMA buffers. That is, the interesting stuff.

A couple of points:

Why am I doing it like this, and not with C++ features like virtual functions (ie. why don't I do it in C++)? Because it's the Linux kernel, dammit! No offense, Mark, or any other C++ fan who may be reading this :-) Besides, a lot of the initialization is order-dependent, so inserting or removing blocks of code with #defines is a nice way to achieve the desired result, at least in this situation.
Much of the core DRM code (like bufs.c, context.c and dma.c) will essentially move into these template headers. I feel that this is a better way to handle the common code. Take context.c as a trivial example -- the i810, mga, tdfx, r128 and mach64 drivers have *exactly* the same code, with name changes. Take bufs.c as a slightly more interesting example -- some drivers map only AGP buffers, some do both AGP and PCI, some map differently depending on their DMA queue management and so on. Again, rather than cutting and pasting the code from drm_addbufs into my driver, removing the sections I don't need and leaving it at that, I think keeping the core functionality in bufs_tmp.h and allowing this to be customized at compile time is a cleaner and more maintainable solution.

I plan on merging my dynamic heap management for DMA space into this branch (when the ati-4-1-1 work calms down), which will make it trivial to use across all drivers.

This looks way sweet. Have you thought about what it would take to generalize this to other OSs? I think that it has the possibility to make keeping the FreeBSD code up to date a lot easier.

Check out for examples the r128 driver from the trunk, for a good example. Notice there are files in there such as r128_tritmp.h. This is a template that gets included in r128_tris.c. What it does basically is consolidate code that is largly reproduced over several functions, so that you set a few macros. For example:

#define IND (R128_TWOSIDE_BIT)
#define TAG(x) x##_twoside

followed by

#include "r128_tritmp.h"

Notice the inline function's name defined in r128_tritmp.h is the result of the TAG macro, as well the function's content is dependant on what IND value is defined. So essentially the inline function is a template for various functions that have a bit in common. That way you consolidate common code and keep things consistent.

The current mach64 branch is only using one template in the driver. But yes, it does have some of the driver template code there.

Look at e.g. programs/Xserver/hw/xfree86/os-support/linux/drm/kernel/r128.h though. That's the template architecture at its beauty. Most of the code is shared between the drivers, customized with a few defines. Compare that to the duplication and inconsistency before.

3.1.5. How do X modules and X applications communicate?

X modules are loaded like kernel modules, with symbol resolution at load time, and can thus call eachother functions. For kernel modules, the communication between applications and modules is done via the /dev/* files.

X applications call X libraries function which creates a packet and sends it to the server via sockets which processes it. That's all well documented in the standard X documentation.

There's 3 ways 3D clients can communicate with the server or each other:

Via the X protocol requests. There are DRI extensions.
Via the SAREA (the shared memory segment)
Via the kernel driver.

4. Hardware Specific

4.1. AGP

4.1.1. What is AGP?

AGP (Accelerated Graphics Port) is a dedicated high-speed bus that allows the graphics controller to move large amoumts of data directly from system memory. Uses a Graphics Address Re-Mapping Table (GART) to provide a physically-contiguous view of scattered pages in system memory for Direct Memory Access (DMA) transfers.

Also check the Intel 440BX AGPset System Address Map

4.1.2. Why not use the existing XFree86 AGP manipulation calls?

You have to understand that the DRI functions have a different purpose then the ones in XFree. The DRM has to know about AGP, so it talks to the AGP kernel module itself. It has to be able to protect certain regions of AGP memory from the client side 3D drivers, yet it has to export some regions of it as well. While most of this functionality (most, not all) can be accomplished with the /dev/agpgart interface, it makes sense to use the DRM's current authentication mechanism. This means that there is less complexity on the client side. If we used /dev/agpgart then the client would have to open two devices, authenticate to both of them, and make half a dozen calls to agpgart, then only care about the DRM device.

As a side note, the XFree86 calls were written after the DRM functions.

Also to answer a previous question about not using XFree86 calls for memory mapping, you have to understand that under most OS`es (probably solaris as well), XFree86`s functions will only work for root privileged processes. The whole point of the DRI is to allow processes that can connect to the X server to do some form of direct to hardware rendering. If we limited ourselves to using XFree86's functionality, we would not be able to do this. We don`t want everyone to be root.

4.1.3. How do I use AGP?

You can also use this test program as a bit more documentation as to how agpgart is used.

4.1.4. How to allocate AGP memory?

Generally programs do the following:

open /dev/agpgart
ioctl(ACQUIRE)
ioctl(INFO) to determine amountof memory for AGP
mmap the device
ioctl(SETUP) to set the AGP mode
ioctl(ALLOCATE) a chunk o memory, specifying offset in aperture
ioctl(BIND) that same chunk o memory

Every time you update the GATT, you have to flush the cache and/or TLBs. This is expensive. Therefore, you allocate and bind the pages you'll use, and mmap() just returns the right pages when needed.

Then you need to have a remap of the agp aperture in the kernel which you can access. Use ioremap to do that.

After that you have access to the agp memory. You probably want to make sure that there is a write combining mtrr over the aperture. There is code in mga_drv.c in our kernel directory that shows you how to do that.

4.1.5. If one has to insert pages he needs to check for -EBUSY errors and loop through the entire GTT. Wouldn't it be better if the driver fills up pg_start of agp_bind structure instead of user filling up?

All this allocation should be done by only one process. If you need memory in the GTT you should be asking the Xserver for it (or whatever your controlling process is). Things are implemented this way so that the controlling process can know intimate details of how memory is laid out. This is very important for the I810, since you want to set tiled memory on certain regions of the aperture. If you made the kernel do the layout, then you would have to create device specific code in the kernel to make sure that the backbuffer/dcache are aligned for tiled memory. This adds complexity to the kernel that doesn`t need to be there, and imposes restrictions on what you can do with agp memory. Also, the current Xserver implementation (4.0) actually locks out other applications from adding to the GTT. While the Xserver is active, the Xserver is the only one who can add memory. Only the controlling process may add things to the GTT, and while a controlling process is active, no other application can be the controlling process.

Microsoft`s VGART does things like you are describing I believe. I think its bad design. It enforces a policy on whoever uses it, and is not flexible. When you are designing low level system routines I think it is very important to make sure your design has the minimum of policy. Otherwise when you want to do something different you have to change the interface, or create custom drivers for each application that needs to do things differently.

4.2. ATI Cards

4.2.1. How to get ATI cards specification?

http://www.ati.com/na/pages/resource_centre/dev_rel/devrel.html

4.2.2. Mach64 based cards

4.2.2.1. I would like to help developing the Mach64 driver...

The first step would be to check out the current mach64 branch from dri CVS, the tag is 'mach64-0-0-2-branch.' Follow the instructions on dri.sf.net to compile and install the tree. A couple of things you need to know are:

1. Make sure to check out the branch, not the head (use '... co -r mach64-0-0-2-branch xc')

2. You need libraries and headers from a full X install. I used lndir to add symlinks from /usr/X11R6/include and /usr/X11R6/lib into /usr/X11R6-DRI.

You'll need to have AGP support for your chipset configured in your kernel and have the module loaded before starting X (assuming you build it as a module). At this point, you need agpgart for the driver to load, but AGP isn't currently used by the driver yet.

Take a look at the code, the list archives and the DRI documentation on dri.sf.net (it's a little stale, but a good starting point). We are also using the driver from the Utah-GLX project as a guide, so you might want to check that out (utah-glx.sf.net). Many of us have documentation from ATI as well, you can apply to their developer program for docs at http://apps.ati.com/developers/devform1.asp

Our first priority right now is to get the 3D portion of the driver using DMA transfers (GUI mastering) rather than direct register programming. Frank Earl is currently working on this. Then we need to get the 2D driver to honor the drm locking scheme so we can enable 2D acceleration, which is currently disabled. Right now switching back to X from a text console or switching modes can cause a lockup because 2D and 3D operations are not synchronized. Also on the todo list is using AGP for texture uploads and finishing up the Mesa stuff (e.g. getting points and lines working, alpha blending...).

4.3. S3

4.3.1. Are there plans to enable the s3tc extension on any of the cards that currently support it?

There's not a lot we can do with s3tc because of S3's patent/license restrictions.

Normally, OpenGL implementations would do software compression of textures and then send them to the board. The patent seems to prevent that, so we're staying away from it.

If an application has compressed texture (they compressed them themselves or compressed them offline) we can download the compressed texture to the board. Unfortunetly, that's of little use since most applications don't work that way.

5. Miscellaneous Questions

5.1. Remote rendering

5.1.1. If I run a GLX enabled OpenGL program on a remote system with the display set back to my machine, will the X server itself render the GLX requests through DRI?

The X server will render the requests but not through the DRI. The rendering will be software only. Having the X server spawn a DRI client to process the requests is on the TODO list.

5.1.2. What's the re a difference between local clients and remote clients?

There is no difference as far as the client is concerned. The only difference is speed.

The difference between direct and indirect rendering is that the former can`t take place over the network. The DRI currently concentrates on the direct rendering case.

The application still gets a fully functional OpenGL implementation which is all that`s required by the spec. The fact is that the implementation is entirely in software, but that`s completely legal. In fact, all implementations fall back to software when they can`t handle the request in hardware. It`s just that in this case, the implementation can`t handle anything in hardware.

Most people don`t run GLX applications remotely, and/because most applications run very poorly when run remotely. It`s not really the applications fault, OpenGL pushes around a lot of data.

Therefore there hasn`t been a lot of interest in hardware accelerated remote rendering and there`s plenty to do local rendering. It is on the TODO list but at a low priority.

The solution is actually fairly straight forward. When the X server gets a remote client, it forks off a small application that just reads GLX packets and then remakes the same OpenGL calls. This new application is then just a standard direct rendering client and the problem reduces to one already solved.

5.2. How does the DMA transfer mechanism works?

Here's a proposal for an zero-ioctl (best case) dma transfer mechanism. Let's call it 'kernel ringbuffers'. The premise is to replace the calls to the 'fire-vertex-buffer' ioctl with code to write to a client-private mapping shared by the kernel (like the current sarea, but for each client).

Starting from the beginning:

Each client has a private piece of AGP memory, into which it will put secure commands (typically vertices and texture data). The client may expand or shrink this region according to load.
Each client has a shared user/kernel region of cached memory. (Per-context sarea). This is managed like a ring, with head and tail pointers.
The client emits vertices to AGP memory (as it currently does with DMA buffers).
When a statechange, clear, swap, flush, or other event occurs, the client:
- Grabs the hardware lock.
- Re-emits any invalidated state to the head of the ring.
- Emits a command to fire the portion of AGP space as vertices.
- Updates the head pointer in the ring.
- Releases the lock.
The kernel is responsible for processing all of the rings. Several events might cause the kernel to examine active rings for commands to be dispatched:
- A flush ioctl. (Called by impatient clients)
- A periodic timer. (If this is low overhead?)
- An interrupt previously emitted by the kernel. (If timers don't work)

Additionally, for those who've been paying attention, you'll notice that some of the assumptions that we use currently to manage hardware state between multiple active contexts are broken if client commands to hardware aren't executed serially in an order which is knowable to the clients. Otherwise, a client that grabs the heavy lock doesn't know what state has been invalidated or textures swapped out by other clients.

This could be solved by keeping per-context state in the kernel and implementing a proper texture manager. That's something we need to do anyway, but it's not a requirement for this mechanism to work.

Instead, force the kernel to fire all outstanding commands on client ringbuffers whenever the heavyweight lock changes hands. This provides the same serialized semantics as the current mechanism, and also simplifies the kernel's task as it knows that only a single context has an active ring buffer (the one last to hold the lock).

An additional mechanism is required to allow clients to know which pieces of their AGP buffer is pending execution by the hardware, and which pieces of the buffer are available to be reused. This is also exactly what NV_vertex_array_range requires.

5.3. How about the main X drawing surface? Are 2 extra "window sized" buffers allocated for primary and secondary buffers in a page-flipping configuration?

Right now, we don`t do page flipping at all. Everything is a blit from back to front. The biggest problem with page flipping is detecting when you`re in full screen mode, since OpenGL doesn`t really have a concept of full screen mode. We want a solution that works for existing games. So we`ve been designing a solution for it. It should get implemented fairly soon since we need it for antialiasing on the V5.

In the current implementation the X front buffer is the 3D front buffer. When we do page flipping we`ll continue to do the same thing. Since you have an X window that covers the screen it is safe for us to use the X surface`s memory. Then we`ll do page flipping. The only issue will be falling back to blitting if the window is ever moved from covering the whole screen.

5.4. Is anyone working on adding SSE support to the transform/lighting code in Mesa?

SSE stuff was somewhat broken in the kernels until recently. In fact, we (Gareth Hughes to be precise) just submitted a big kernel patch that should fully support SSE. I don`t know if anyone is working on them for Mesa, I haven`t seen much in that area lately.

I`d start with profiling your app against the current Mesa base, to decide where the optimization effort should go. I`m not convinced SSE is the next right step. There may be more fundamental optimizations to do first. We haven`t spent a much time on optimizing it.

5.5. How often are checks done to see if things need clipped/redrawn/redisplayed?

The locking system is designed to be highly efficient. It is based on a two tiered lock. Basically it works like this:

The client wants the lock. The use the CAS (I was corrected that the instruction is compare and swap, I knew that was the functionality, but I got the name wrong) If the client was the last application to hold the lock, you`re done you move on.

If it wasn`t the last one, then we use an IOCTL to the kernel to arbitrate the lock.

In this case some or all of the state on the card may have changed. The shared memory carries a stamp number for the X server. When the X server does a window operation it increments the stamp. If the client sees that the stamp has changed, it uses a DRI X protocol request to get new window location and clip rects. This only happens on a window move. Assuming your clip rects/window position hasn`t changed, the redisplay happens entirely in the client.

The client may have other state to restore as well. In the case of the tdfx driver we have three more flags for command fifo invalid, 3D state invalid, textures invalid. If those are set the corresponding state is restored.

So, if the X server wakes up to process input, it current grabs the lock but doesn`t invalidate any state. I`m actually fixing this now so that it doesn`t grab the lock for input processing.

If the X server draws, it grabs the lock and invalidates the command fifo.

If the X server moves a window, it grabs the lock, updates the stamp, and invalidates the command fifo.

If another 3D app runs, it grabs the lock, invalidates the command fifo, invalidates the 3D state and possibly invalidates the texture state.

5.6. What's the deal with fullscreen and DGA?

The difference between in a window and fullscreen is actually quite minor. When you`re in fullscreen mode what you`ve done is zoomed in your desktop and moved the zoomed portion to cover just the window. (Just the same hitting ctrl-alt-plus and ctrl-alt-minus) The game still runs in a window in either case. So, the behavior shouldn`t be any different.

DGA is turned off in the current configuration. I`ve just started adding those features back in. The latest code in the trunk has some support for DGA but is broken, the code in my tdfx-1-1 branch should be working.

5.7. DRI without X

5.7.1. Can DRI run without X?

The first question you have to ask is whether you really should throw out X11. X11 does event handling, multiple windows, etc. It can also be made quite light weight. It's running in 600k on an IPAQ.

If you decide you do need to throw out X, then you have to ask yourself if the DRI is the right solution for your problem. The DRI handles multiple 3D clients accessing hardware at the same time, sharing the same screen, in a robust and secure manner. If you don't need those properties the DRI isn't necessarily the right solution for you.

If you get to this point, then it would be theoretically possible to remove the DRI from X11 and have it run without X11. There's no code to support that at this point, so it would require some significant work to do that.

5.7.2. What would be the advantages of a non-X version of DRI?

The main reasons one would be interested in a non-X version of DRI

Pros: Eliminate any performance bottlenecks the XServer may be causing. Since we are 3D only, any extraneous locking/unlocking, periodic refreshes of the (hidden) 2D portion of the display, etc., will cause unexpected slowdowns.

Cons: If the X server never does any drawing then the overhead is minimal. Locking is done around Glide operations. A lock check is a single Check And Set (CAS) instruction. Assuming your 3D window covers the X root, then there are no 2D portions to redisplay.

Pros: Eliminate wasted system memory requirements.

Cons: Yes, there will be some resources from the X server, but I think not much.

Pros: Eliminate on-card font/pixmap/surface/etc caches that just waste memory.

Cons: If you don`t use them they aren`t taking any resources. Right now, there is a small pixmap cache that`s staticall added to 2D. Removing that is a trivial code change. Making it dynamic (3D steals it away from 2D) is not too tough and a better solution than any static allocation.

Pros: Eliminate the need for extra peripherals, such as mice.

Cons: Allowing operations without a mouse should be trivial if it isn`t a configuration option already.

Pros: Reduction in the amount of software necessary to install/maintain on a customer`s system. Certainly none of my customers would have been able to install XFree 4.0 on their own.

Cons: XFree 4.0 installs with appropriate packaging are trivial. What you`re saying is that no one has done the packaging work for you, and that`s correct. If you create your own stripped DRI version you`ll be in for a lot more packaging work on your own.

The impact of the Xserver is on the 3D graphics pipeline is very little. Essentially none in the 3D pipeline. Some resources, but I think not much. Realize the CAS is in the driver, so you`re looking at creating a custom version of that as well. I think effort spent avoiding the CAS, creating your own window system binding for GL, and moving the DRI functionality out of the X server would be much better spent optimizing Mesa and the driver instead. You have to focus resources where they provide the biggest payoff.

5.7.3. I would like to make a X11 free acces to 3d...

Take a look at the fbdri project. They're trying to get the DRI running directly on a console with the Radeon. If all you want is one window and only running OpenGL then this makes sense.

I'll throw out another option. Make the DRI work with TinyX. TinyX runs in 600k and gives you all of X. It should be a fairly straight forward project. As soon as you want more than one window, it makes a lot of sense to use the X framework that already exists.

5.8. Is there a difference between using indirect DRI rendering (e.g., with LIBGL_ALWAYS_INDIRECT) and just linking against the Mesa library?

Yes. DRI libGL used in in indirect mode sends GLX protocol messages to the X server which are executed by the GLcore renderer. Stand-alone Mesa's non-DRI libGL doesn't know anything about GLX. It effectively translates OpenGL calls into Xlib calls.

The GLcore renderer is based on Mesa. At this time the GLcore renderer can not take advantage of hardware acceleration.

5.9. What's the relationship between Glide and DRI?

Right now the picture looks like this:

Client -> OpenGL/GLX -> Glide as HAL (DRI) -> hw

In this layout the Glide(DRI) is really a hardware abstraction layer. The only API exposed it OpenGL and Glide(DRI) only works with OpenGL. It isn`t useful by itself.

There are a few Glide only games. 3dfx would like to see those work. So the current solution, shown above, doesn`t work since the Glide API isn`t available. Instead we need:

Client -> Glide as API (DRI) -> hw

Right now Mesa does a bunch of the DRI work, and then hands that data down to Glide. Also Mesa does all the locking of the hardware. If we`re going to remove Mesa, then Glide now has to do the DRI work, and we have to do something about the locking.

The solution is actually a bit more complicated. Glide wants to use all the memory as well. We don`t want the X server to draw at all. Glide will turn off drawing in the X server and grab the lock and never let it go. That way no other 3D client can start up and the X server can still process keyboard events and such for you. When the Glide app goes away we just force a big refresh event for the whole screen.

I hope that explains it. We`re really not trying to encourage people to use the Glide API, it is just to allow those existing games to run. We really want people to use OpenGL directly.

Another interesting project that a few people have discussed is removing Glide from the picture at all. Just let Mesa send the actual commands to the hardware. That`s the way most of our drivers were written. It would simplify the install process (you don`t need Glide separately) and it might improve performance a bit, and since we`re only doing this for one type of hardware (Voodoo3+) Glide isn`t doing that much as a hardware abstraction layer. It`s some work. There`s about 50 calls from Glide we use and those aren`t simple, but it might be a good project for a few people to tackle.

5.10. Of what use is the Mesa code in the xc tree?

Mesa is used to build some server side modules/libraries specifically for the benefit of the DRI. The libGL.so is the client side aspect of Mesa which works closely with the server side components of Mesa.

The libGLU and libglut libraries are entirely client side things, and so they are distributed seperately.

5.11. Is there any documentation about the XMesa* calls?

I don`t know of any documentation for those functions. The XMesa stuff on the client side was written before I [Brian] joined the project, and honestly, I haven`t looked at it too closely myself. Kevin Martin may be able to answer questions about it better than I.

However, I can point out a few things.

First, despite the prolific use of the word "Mesa" in the client (and server) side DRI code, the DRI is not dependant on Mesa. It`s a common misconception that the DRI was designed just for Mesa. It`s just that the drivers that we at PI have done so far have Mesa at their core. Other groups are working on non-Mesa-based DRI drivers.

In the client-side code, you could mentally replace the string "XMesa" with "Driver" or some other generic term. All the code below /xc/lib/GL/mesa could be replaced by alternate code. libGL would still work. libGL.so has no knowledge whatsoever of Mesa. It`s the drivers which it loads that have the Mesa code.

On the server side, which I believe is what you`re talking about, there`s more of the same. The XMesa code used for indirect/software rendering was originally borrowed from stand-alone Mesa and its pseudo GLX implementation. There are some crufty side-effects from that.

6. Authorship and Acknowledgments.

This FAQ is compiled and maintained by Jos� Fonseca, [EMAIL PROTECTED], with assistance and comments from the DRI developers mailing list subscribers.

In the impossibility of getting every person permission to quote them, if you are the author of any material here and don't want its reproduction please contact the author and it will be promptly removed.

Reply via email to