Hi Eric,
> It's a HP system with two dual core CPUs at 3GHz, the
Then you might try to bind network IRQ to one CPU (echo 1 >/proc/irq/XX/smp_affinity)
XX being your NIC interrupt (cat /proc/interrupts to catch it)
and bind your user program to another cpu(s)
the NIC was already fixed at CPU0 and the irq_balancer switched the timer interrupt between all CPUs and the storage HBA between CPU1 and CPU4. Stopping the balancer and leaving NIC alone on CPU0 and the other interrupts and my program on CPU2-4 did not improve the situation. At least I could not see an improvement over just adding thash_entries=2048.
You might hit a cond_resched_softirq() bug that Ingo and others are sorting out right now. Using separate CPU for softirq handling and your programs should help a lot here.
Shouldn't I get some syslog messages if this bug is triggered? Nevertheless I also opened a call on Novell about this issue, as the current cond_resched_softirq() does look completely different than in 2.6.18
> This did help a lot, I tried thash_entries=10 and now only a > while loop around the "cat ...tcp" triggers packet loss. Tests
I dont understand here : using a small thash_entries makes the bug always appear ?
No. thash_entries=10 improves the situation. Without the param nearly every look at /proc/net/tcp leads to packet loss, with thash_entries=10 (or 2048, does not matter) I have to start a "while true; do cat /prc/net/tcp ; done" to get packet loss every minute. But even with thash_entries=10 and if I leave my program alone on he system I get packet loss every few hours. Regards, John - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html