Jose, I've been experimenting with this too, and was able to get things going with state being emitted either from the client or the drm, though I'm still having lockups and things are generally a bit buggy and unstable still. To try client side context emits, I basically went back to having each primitive emit state into the vertex buffer before adding the vertex data, like the original hack with MMIO. This works, but may be emmiting state when it's not necessary. Now I'm trying state emits in the drm, and to do that I'm just grabbing a buffer from the freelist and adding it to the queue before the vertex buffer, so things are in the correct order in the queue. The downside of this is that buffer space is wasted, since the state emit uses a small portion of a buffer, but putting state in a separate buffer from vertex data allows the proper ordering in the queue. Perhaps we could use a private set of smaller buffers for this. At any rate, I've done the same for clears and swaps, so I have asynchronous DMA (minus blits) working with gears at least. I'm still getting lockups with anything more complicated and there are still some state problems. The good news is that I'm finally seeing an increase in frame rate, so there's light at the end of the tunnel.
Right now I'm using 1MB (half the buffers) as the high water mark, so there should always be plenty of available buffers for the drm. To get this working, I've used buffer aging rather than interrupts. What I realized with interrupts is that there doesn't appear to be an interrupt that can poll fast enough to keep up, since a VBLANK is tied to the vertical refresh -- which is relatively infrequent. I'm thinking that it might be best to start out without interrupts and to use GUI masters for blits and then investigate using interrupts, at least for blits. Anyway, I have an implementation of the freelist and other queues that's functional, though it might require some locks here and there. I'll try to stabilize things more and send a patch for you to look at. I've also played around some more with AGP textures. I have hacked up the performance boxes client-side with clear ioctls, and this helps to see what's going on. I'll try to clean that up so I can commit it. I've found some problems with the global LRU and texture aging that I'm trying to fix as well. I'll post a more detailed summary of that soon. BTW, as to your question about multiple clients and state: I think this is handled when acquiring the lock. If the context stamp on the SAREA doesn't match the current context after getting the lock, everything is marked as dirty to force the current context to emit all it's state. Emitting state to the SAREA is always done while holding the lock. Regards, Leif On Sun, 12 May 2002, Jos� Fonseca wrote: > As it becomes more clear that in the mach64 the best solution is to fill > DMA buffers with the context state and the vertex buffers I've been trying > to understand how can this be done and how the Gamma driver (which has > this same model) does. > > The context state is available right in the beginning of running a > pipeline and usually DDUpdateHWState is called in the beginning of > RunPipeline. The problem is that although all state information is > available, we don't know which part should be uploaded since other clients > could dirty the hardware registers in the meanwhile. > > I'm don't fully understand how the Gamma driver overcomes this. Its > behavior regarding this is controled by a macro definition, named > DO_VALIDATE, that enables a series of VALIDATE_* macros which in turn I > couldn't understand what they do. Another thing that caught my atention > was the "HACK" comment on gammaDDUpdateHWState before gammaEmitHwState - > it reminds a similar comment on mach64, which makes one think that the > author had in mind a better way to do that. Alan, could you shed some > light on these two issues please? > > Before I started this little research I already had given some thought on > I would do it. One idea that crossed my mind was to reserve some space on > the DMA buffer to put the context state before submiting the buffer. Of > course that there would be some DMA buffer waste but it wouldn't that much > since there are a fairly low number of context registers. One think that > holds me back is that I still don't understand how multiple clients avoid > each other: what is done in parallel, and what is done in serie... > > I would also appreciate any ideas regarding this. This is surely an issue > I would like to discuss further on the next meeting. > > Regards, > > Jos� Fonseca > > _______________________________________________________________ > > Have big pipes? SourceForge.net is looking for download mirrors. We supply > the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] > _______________________________________________ > Dri-devel mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/dri-devel > -- Leif Delgass http://www.retinalburn.net _______________________________________________________________ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
