[Dri-devel] Re: Ring lockups on mach64

Leif Delgass Sun, 09 Jun 2002 18:20:55 -0700

On Mon, 10 Jun 2002, Jos� Fonseca wrote:

> On 2002.06.10 00:53 Leif Delgass wrote:
> > On Sun, 9 Jun 2002, Jos� Fonseca wrote:
> > 
> > > Leif,
> > >
> > > I have had some problems with the DMA code (not just now but since the
> > > beginning of the week). Sometimes get lockups and some crashes quite
> > > easily, others I get bored of jumping around in UT waiting for
> > something
> > > bad to happen.
> > 
> > Anything reproducible, or is it at different places every time?
> 
> There isn't a true pattern. Sometimes it happens when I click on the mouse 
> or keyboard, others it happens by itself, and others it simply doesn't 
> ever happen. I still have no clue.
> 
> > 
> > > But finally I got something - please look at the attached dump - it
> > shows
> > > the ring contents. There are two things strange in that picture:
> > >
> > >   1- In the BM_COMMAND value specified in the table head is 0x400001a0
> > > while the register says 0x40000038
> > 
> > I think this is because the card is decrementing the byte count as it
> > processes the buffer.  You'll notice that BM_SYSTEM_MEM_ADDR also
> > looks like it's being incremented as the buffer is processed.  I don't
> > think this indicates a problem.  The register state looks like the engine
> > locked while processing the buffer at 0x00534000.
> 
> But if so should they be incremented/decremented by the same amount?


Maybe, but without knowing about the hardware implementation it's hard to
know.  Maybe the system address is only incremented in 256 byte chunks?  
I remember seeing BM_COMMAND seemingly being decremented by the hardware
in one of your earlier tests (where there was no lockup), so that looked
like normal operation to me.  A quick test could be made to repeatedly
sample these registers while a full DMA buffer is being processed.  Of 
course, there will always be delay between reads, so it's hard to get a 
true snapshot of the state of multiple registers at a given moment in 
time.
 
> > ...
> > >
> > > The feeling I get from this is that we are writing to pending buffers
> > > causing the engine to lock. But I don't see how... This only happened
> > in
> > > my testbox (which is rather slow). Have you experienced anything like
> > this?
> > 
> > I haven't had any lockups since your changes, apart from once when I was
> > working on the BM_HOSTDATA blit changes, and I haven't had any since I
> > committed my changes.  I've done quite a bit of "testing" with various
> > games lately, too. ;)  But, as you say, there could be bugs that only
> 
> This is good to know.

btw, when did you last update from cvs?  I was wondering if you had tried 
the blit changes I made yesterday (June 8).

> > crop
> > up on a slower system (I test on a Dell Inspiron 7k laptop w/ PII 400 and
> > a Rage LT Pro 8M AGP 2x).  I should mention that I have _EXTRA_CHECKING
> > disabled.  I suppose it's possible that the checking code causes a
> > problem, but I'm assuming that you had lockups before adding it.
> > 
> 
> Yes, the _EXTRA_CHECKING was already part of my attempt to hunt this 
> down...

That's what I figured.

> > One thing to be aware of is that do_release_used_buffers unconditionally
> > frees all pending buffers, so it should only be called when the card is
> > known to be idle and finished with all pending buffers on the ring.  But
> > I
> > think the current code is safe in that respect.  Maybe it would help in
> > debugging if you also dump the contents of the buffer pointed to by the
> > head (and maybe the tail if you think there might be a problem there).
> 
> Ok. Tomorrow I'll add a extra check there. I'll also add an wait_for_idle 
> in ADVANCE_RING to force a sync with the engine to see if this still 
> happens.
> 
> I just wanted to be sure that there isn't a race condition which we aren't 
> aware of... because if there is one, it could give more headaches 
> afterwards than now.

Sure.  We should definitely try to find the source of your lockups before 
going much further.

> Anyway I'm getting a little bored with this, so I'm already moving forward 
> too: I'm gonna have the DMA* macros to hold a a DMA buffer across 
> successive calls instead of wasting a buffer for each call.
> 
> > 
> > BTW, did anything come before the ring dump in the log?  Do you know
> > where it was triggered from?
> 
> No, this time there wasn't nothing more. The trigger is hard to determine: 
> they problem is noticed only when waiting for buffers, so sometimes there 
> are some messages related to timeouts.

Yeah, a timeout there is usually just a symptom of a lockup, lots of
things could potentially be the root cause.

-- 
Leif Delgass 
http://www.retinalburn.net


_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas - 
http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink

_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

[Dri-devel] Re: Ring lockups on mach64

Reply via email to