Having recently installed and got running an original Radeon QD
card, I've discovered a simple and reliable way to lock the card up:
I run the xscreensaver-demo version of gears with -delay 0 -fps
-wireframe. I've seen the same lockup with flightgear once, with one
of the fastest aircraft, and nowhere else - it doesn't happen with 
gears without the -wireframe option. With -wireframe, it takes at 
most a minute or so to lock up.

This happens with one opengl program running - I haven't tried with
more than one.

I can run Q3A timedemos without locking up - setting com_maxfps to 
130 makes no difference to this (aside from upping the average 
framerate).

The lockup starts out being incomplete: the mouse pointer will still
respond, but the windowmanager won't: windows won't redraw, none of
the keyboard shortcuts respond, but the mouse pointer behaves
normally. After a bit of time, or if any new program is started, or 
if Alt-SysRq-b is hit, it locks up hard, and needs a hardware reset
to reboot. I tried sshing in, and succeeded in logging on and
getting a shell, but it locked up solid immediately after that.

Hitting Alt-SysRq-t shows gears in R state, in sys_ioctl(), where
the call into filp->f_op->ioctl is made. 

Running strace on gears shows it making lots of ioctls, and then
hitting a point where the ioctl() calls fail with -EBUSY. The
transition point of the strace output:

ioctl(4, 0x6444, 0)                     = 0
ioctl(4, 0x6444, 0)                     = 0
ioctl(4, 0x6444, 0)                     = 0
ioctl(4, 0x6444, 0)                     = 0
ioctl(4, 0x6444, 0)                     = 0
ioctl(4, 0x6447, 0)                     = 0
write(3, "+\1\1\0", 4)                  = 4
read(3, "\1\2A\6\0\0\0\0\1\0@\5\0\0\0\0\1\0\0\0\0\0\0\0\1\0\0\0\330\10\363\10", 32) = 
32
ioctl(3, 0x541b, [0])                   = 0
ioctl(4, 0x4008642a, 0xbffff304)        = 0
ioctl(4, 0x40186448, 0xbffff324)        = 0
ioctl(4, 0x4010644f, 0xbfffebc4)        = 0
ioctl(4, 0xc0286429, 0xbfffeba4)        = 0
ioctl(4, 0x4008642a, 0xbfffec34)        = 0
ioctl(4, 0x4010644f, 0xbfffec04)        = 0
ioctl(4, 0x4008642b, 0xbfffec74)        = 0
ioctl(4, 0x4008642a, 0xbfffec04)        = 0
ioctl(4, 0x4010644f, 0xbfffebd4)        = 0
ioctl(4, 0x6444, 0)                     = -1 EBUSY (Device or resource busy)

There was a long list of these - the lockup didn't happen at this
point, but after another 900 or so failed ioctls.

Loading the radeon.o module with the debug option enabled dumps a
load of gunk all over my /var/log/kern.log file . . . It overwrites
the file way before the point where it would have started writing,
so something strange is happening there . . .

I've tried looking through the code, but I can't make head nor tail
of it, so I don't have the foggiest notion how to work out what's
going wrong here. I'm willing to try any
suggestions/patches/whatever in order to track it down.

The details of my system: I'm running kernel 2.4.18-pre9. I have
Xfree86 4.2 installed from CVS, and the latest DRI CVS code. I have an
ASUS K7M motherboard - this has the AMD 751 northbridge. The card is
a Radeon QD: lspci reports

01:05.0 VGA compatible controller: ATI Technologies Inc Radeon QD (prog-if 00 [VGA])
        Subsystem: ATI Technologies Inc: Unknown device 008a
        Flags: bus master, stepping, 66Mhz, medium devsel, latency 64, IRQ 11
        Memory at d8000000 (32-bit, prefetchable) [size=128M]
        I/O ports at 9800 [size=256]
        Memory at efe80000 (32-bit, non-prefetchable) [size=512K]
        Expansion ROM at efe60000 [disabled] [size=128K]
        Capabilities: [58] AGP version 2.0
        Capabilities: [50] Power Management version 2

for the card, and

00:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-751 [Irongate] AGP Bridge (rev 
01) (prog-if 00 [Normal decode])
        Flags: bus master, 66Mhz, medium devsel, latency 64
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
        I/O behind bridge: 00008000-00009fff
        Memory behind bridge: efd00000-efefffff
        Prefetchable memory behind bridge: d7b00000-e7bfffff

for the AGP chipset. The motherboard's bios is out of date - an
update is out there, which I was unable to apply because of a dead
floppy drive. The update was a features update, rather a bugfix one
(unless the docs for it were incomplete). 

I have a K7 550 in there, running at 550MHz. I have the
mem=nopentium thing on the kernel command line. 

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 1
model name      : AMD-K7(tm) Processor
stepping        : 2
cpu MHz         : 553.899
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat mmx 
syscall mmxext 3dnowext 3dnow
bogomips        : 1104.28

Finally, I've had a good year/18 months of successful DRI use with
the G400 I had before - no lockups, basically no problems at all.

As I said, I'd like to track this down, but I don't know enough to
do so myself - I'll do whatever people ask . . .

Simon Fowler

-- 
PGP public key Id 0x144A991C, or ftp://bg77.anu.edu.au/pub/himi/himi.asc
(crappy) Homepage: http://bg77.anu.edu.au
doe #237 (see http://www.lemuria.org/DeCSS) 
My DeCSS mirror: ftp://bg77.anu.edu.au/pub/mirrors/css/ 

Attachment: msg02919/pgp00000.pgp
Description: PGP signature

Reply via email to