Ralph Siemsen writes:
> Or the winbond chip is flakey, which is a fact that we already knew. 
> I've got several boards that have the tulip problem and they all work
> under 2.2's tulip driver (old_tulip).  Repeatably, going back and forth
> between the kernels.  Seeing as how the new driver doesn't have the
> necessary cache flushes in it, i'm really not too surprized that it
> doesn't work.

The new 2.4.0 driver does have the cache flush stuff in.  Its just not
called "dma_cache_xxx" any more.  Instead we use pci_alloc_consistent
and friends to get an area of uncached memory to store all the tulip
rings.  The documentation for this interface can be found in
linux/Documentation/DMA-mapping.txt.

Note also that the 2.4.0 tulip driver also works flawlessly on 21041
cards in the Integrator platform as well.

> I can't belive the PCI bus is to blame seeing as how the system manages
> to run this far - i'm not sure what to look for on the PCI bus, we have
> an analyser, but it's like a needle in the haystack if you don't have
> something to go on.

"needle in a haystack" is exactly where we are at the present time.  We
have virtually zero information on whats going on, and we need to increase
our information if we're going to figure it out.

> I'm open to suggestions as to what signals to look at though.

SERR and PERR would probably be a good couple to examine first off.  Maybe
also try to grab the bus transactions around the point when the tulip raises
its interrupt signal.

If the messages are going off as frequently as people say they are, it
shouldn't be too difficult to grab something.

If you don't have the time to do an in-depth examination of the grabbed
signals, then I'm willing to analyse them myself if you can provide images
of the traces.

Alternatively, if someone could send me one of these problematical Netwinders
(all mine work fine) then I can attempt to scope the relevent PCI lines
myself to try to find out what is going on.

Ok, heres the information from the data sheet:

 bit13 FPE - Fatal Bus Error
  Indicates that a bus error occurred (see table 3-67).  When this bit is
  set, the 21143 disables all of its bus access operations.

   CSR5<25:23> = 000 - Parity error, 001 - Master abort, 010 - Target abort

Another good thing to do would be to print the CSR5 value each time you get
a System Error.  Can someone do that, and then we'll know what type of
fatal bus error was caused.
   _____
  |_____| ------------------------------------------------- ---+---+-
  |   |        Russell King       [EMAIL PROTECTED]      --- ---
  | | | |            http://www.arm.linux.org.uk/            /  /  |
  | +-+-+                                                     --- -+-
  /   |               THE developer of ARM Linux              |+| /|\
 /  | | |                                                     ---  |
    +-+-+ -------------------------------------------------  /\\\  |

_______________________________________________
http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm
Please visit the above address for information on this list.

Reply via email to