Hi Eric,

> It's a HP system with two dual core CPUs at 3GHz, the

Then you might try to bind network IRQ to one CPU
(echo 1 >/proc/irq/XX/smp_affinity)

XX being your NIC interrupt (cat /proc/interrupts to catch it)

and bind your user program to another cpu(s)

the NIC was already fixed at CPU0 and the irq_balancer switched
the timer interrupt between all CPUs and the storage HBA between
CPU1 and CPU4. Stopping the balancer and leaving NIC alone on CPU0
and the other interrupts and my program on CPU2-4 did not improve
the situation.
At least I could not see an improvement over just adding
thash_entries=2048.

You might hit a cond_resched_softirq() bug that Ingo and others
are sorting out right now. Using separate CPU for softirq
handling and your programs should help a lot here.

Shouldn't I get some syslog messages if this bug is triggered?

Nevertheless I also opened a call on Novell about this issue,
as the current cond_resched_softirq() does look completely
different than in 2.6.18

> This did help a lot, I tried thash_entries=10 and now only a
> while loop around the "cat ...tcp" triggers packet loss. Tests

I dont understand here : using a small thash_entries makes
the bug always appear ?

No. thash_entries=10 improves the situation. Without the param
nearly every look at /proc/net/tcp leads to packet loss, with
thash_entries=10 (or 2048, does not matter) I have to start a
"while true; do cat /prc/net/tcp ; done" to get packet loss
every minute.

But even with thash_entries=10 and if I leave my program alone
 on he system I get packet loss every few hours.

Regards,
John




-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to