On Tue, 11 Jun 2002, Jos� Fonseca wrote:

> On 2002.06.11 18:24 Leif Delgass wrote:
> > On Tue, 11 Jun 2002, Jos� Fonseca wrote:
> > 
> > > Leif,
> > >
> > > I've got it. After adding checks and checks to everything I finally
> > > discovered it - in some circunstances (such as a regular wait_for_idle
> > > with no previous buffer submiting as an example) we were firing GUI
> > master
> > > without enabling the BM. It's a miracle the card even worked at all!
> > 
> > Now I understand why I didn't experience the lockup.  I forgot to mention
> > that in my tree, I renamed your do_dma_flush to do_wait_for_dma (which
> > was
> > still called from do_dma_idle), and I made do_dma_flush call dma_start.
> > do_dma_flush is called by the flush ioctl, which is called before waiting
> > for idle by the DDX and Mesa driver, so this ensured that busmastering
> > was
> > on before waiting for idle.  We still need to sort this out, because the
> > DDFlush in the Mesa driver should flush the ring, ensuring all buffers
> > will be completed by the hardware in finite time, but shouldn't wait for
> > idle (only DDFinish needs to wait for idle).
> 
> In that case we just need to ensure that the head >= the offset of the 
> last ring submited.

But that means essentially waiting for idle.  The ring tail is moved every 
time we add a buffer to the ring, so the "last ring submitted" is always 
close to the new tail and could be separated from the head by many 
descriptors.  We just need to make sure that the card isn't going to stop, 
so the flush needs to restart DMA if the card is idle and ensure that it 
_won't_ stop if the card is active. 
 
> > A couple of other comments on your change:  I think we could just
> 
> Although there are some things you can try simplify this is _very_ trick 
> as you gonna see below...
> 
> > unconditionally turn on busmastering in SRC_CNTL if the engine is idle in
> 
> No. The engine could be active but not doing BM, e.g., if X submited 
> something to FIFO.

Notice that I said if the engine is _idle_.  The engine can't be active if
it's idle, right?  And it doesn't make sense to turn on busmastering while
the card is active.  This is assuming if GUI_ACTIVE is 0, then the FIFO is
also empty.  The docs say that a state of idleness (GUI_ACTIVE=0) implies
an empty fifo, so the FIFO check in wait_for_idle is really redundant. I'm
saying we move the call to dma_start into the block in UPDATE_RING_HEAD
where we already know the engine is idle from GUI_STAT, and move the two
writes to kick off the transfer into dma_start as well (just to group all
the writes that start a transfer in one place).
 
> > UPDATE_RING_HEAD.  One write is likely to use fewer cycles than one read,
> > and certainly less than one read followed by one write.  Plus, if the

Actually, I realized there are two writes (one to SRC_CNTL and one to 
BUS_CNTL), so reading SRC_CNTL could be better.  As you say, we could keep 
track of this with a variable, as long as we account for the possibility 
of the card disabling bus mastering in the case of a lockup.
 
> As I said in the code, what we can do is have a variable indicating 
> whether the busmaster is ON or OFF:
> 
> > engine is active, there shouldn't be a need to check if bus mastering is
> > on and enable it.  That also makes the do_wait_for_idle in dma_start
> 
> The Mach64 SDK sample code calls wait_for_idle before enabling the BM, so 
> we need to do that too.

See above.  We'd already know the engine is idle when enabling
busmastering.

> > unnecessary.  The flush ioctl could just call UPDATE_RING_HEAD, and
> > possibly wait to see the head advance a couple of descriptors to make
> > sure the engine won't stop.
> > 
> 
> What's the point in waiting for anything? I though we both had agree that 
> we shouldn't wait for anything when submiting a buffer... (after all we 
> both drawn the same conclusions in that matter in two distinct times).

As I mentioned above, we just need to ensure that the card won't stop
because it has already read an old end of list flag.  We could have added
several descriptors to the ring while the card is still active on an
earlier descriptor which the card sees as the end of the ring.  So
restarting the card if it's idle doesn't ensure that it won't stop if it's
active.  I was just thinking that waiting to see the card advance a couple
of descriptors (but not waiting for idle) might be enough to ensure that 
it isn't going to stop.

> Note that I'm quite reluctant in making optimizations where you have to 
> make more assumptions to gain a few cycles. We already assume alot, and 
> it's very dificult to determine when our assumptions are wrong. If another 
> bug like this one (which took me a whole week to solve) happens on the 
> hands of a user we'll be forever trying to debug it.

I think the assumption that the card is not active when it's idle is a 
safe one. ;)
 
> I also plan to leave all assumptions that we already do to be expressed in 
> the code (with MACH64_EXTRA_CHECKING or a better name perhaps) so that we 
> can debug these kind of problems more easily when they happen.
> 
> > > I'm really relieved for the problem was in the code and in no way
> > > associated with the card misbehavior on slow computers!
> > >
> > > I think that now we can make some code cleaning. If you have no
> > > objections, I would to remove some of the code associated with the
> > BUFFER
> > > AGING=0, and/or when MACH64_NO_BATCH_DISPATCH=0.
> > 
> 
> > I think that's probably fine, but I think we should see if we can salvage
> > the pseudo-dma path.  We should also consider what to do about frame
> > aging.  Also, the interrupt path is probably pretty broken now, we should
> > see if there's any way to make that a workable alternative or remove it.
> > 
> 
> I'll leave both those paths as they are for now - they are fairly 
> contained anyway. Later on we'll decide what to do with that code.
> 
> Regarding the frame aging we can do that by pointing down the ring offset 
> of the last buffer of each frame.
> 
> > > The UPDATE_RING_HEAD macro is getting monolithical but after giving
> > some
> > > thoughs about it I've come to the conclusion that that code must really
> > be
> > > there. The only liberty we have to either leave in a macro, inline it,
> > or
> > > put in a seperate directories. What I'll do it leave the most used code
> > > path in the macro, and define subroutines for doing the less used
> > paths.
> > 
> > Maybe, if you make the changes I suggested above, all the work done in
> > the
> > block where the engine is idle in UPDATE_RING_HEAD could be moved into
> > dma_start which could be made an inline function.  At the moment, that's
> > the only place dma_start is called, right?
> > 
> 
> Yep. But after thinking better and it's probably better do all in 
> UPDATE_RING... there isn't much point in putting 2 output instructions in 
> a seperate function where the call overhead takes more instructions than 
> that, especially if they are called only from a single function.

But doesn't inlining the function remove the overhead?  I was suggesting 
this mainly for readability, it's not really necessary if dma_start isn't 
going to be used anywhere else.
 
I'm attaching a diff so you can see what I'm talking about.  Hopefully 
this will remove any confusion. :)

-- 
Leif Delgass 
http://www.retinalburn.net

Attachment: mach64-drm-20020611.diff.gz
Description: GNU Zip compressed data

Reply via email to