RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

jamal Wed, 02 May 2007 05:44:08 -0700

On Tue, 2007-01-05 at 16:04 -0700, Waskiewicz Jr, Peter P wrote:

I am just gonna delete stuff you had above here because i think you 
repeat those thoughts below. Just add back anything missed.
I will try to make this email shorter, but i am not sure i will
succeed;->


> > 1) You want to change the core code; i dont see a need for that.
> > The packet is received by the driver and netif stop works as 
> > before, with zero changes; the driver shuts down on the first 
> > ring full.
> > The only work is mostly driver specific.
> 
> To me, this doesn't buy you anything to do multiqueue only in the
> driver. 

Lets come up with some terminology; lets call multiqueue what the 
qdiscs do; lets call what the NICs do multi-ring.
Note, i have thus far said you need to have both and they must be
in sync.

> I agree the driver needs work to manage the queues in the
> hardware, but if a single feeder from the kernel is handing it
> packets, you gain nothing in my opinion without granularity of
> stopping/starting each queue in the kernel.
> 
This maybe _the_ main difference we have in opinion.
Like i said earlier, I used to hold the same thoughts you do.
And i think you should challenge my assertion that it doesnt
matter if you have a single entry point; 
[my assumptions are back in what i called #b and #c].

> The changes to PRIO are an initial example of getting my multiqueue
> approach working.  This is the only qdisc I see being a logical change
> for multiqueue; other qdiscs can certainly be added in the future,
> which
> I plan on once multiqueue device support is in the kernel in some
> form.

Fair enough, I looked at:
http://internap.dl.sourceforge.net/sourceforge/e1000/OpenSDM_8257x-10.pdf
and it seems to be implementing WRR (the M and N parameters in the count
field).
WRR doesnt exist in Linux - for no good reason really; theres a gent who
promised to submit some clean code for it but hasnt been heard of since;
you can find some really old code i wrote here:
http://www.cyberus.ca/~hadi/patches/prio-drr.kernel.patch.gz

If you clean that up as a linux qdisc, then other NICs can use it.
In your approach, that would only be usable by NICs with multi-rings
that implement WRR.

> > 3) For me: It is a driver change mostly. A new qdisc may be 
> > needed - but thats about it. 
> 
> I guess this is a fundamental difference in our thinking.  I think of
> multiqueue as the multiple paths out of the kernel, being managed by
> per-queue states.  

Yes, you are right; i think this is where we differ the most.
You feel the need to keep all the rings busy even when one is shutdown;
I claim by having a synced up qdisc of the same scheduler type you dont
need to worry about that. 
Both approaches are correct; what iam proposing is many factors simpler.

> If that is the case, the core code has to be changed at some level,
> specifically in dev_queue_xmit(), so it can check the state of the
> subqueue your skb has been associated with
> (skb->queue_mapping in my patchset).  The qdisc needs to comprehend
> how to classify the skb (using TOS or TC) and assign the queue on the
> NIC to transmit on.
> 
Indeed, something like that would be needed; but it could also be a
simple scheme like netdev->pick_ring(skb->prio) or something along those
lines once the drivers hardware transmit is invoked - and therefore you
move it away from main core. This way you only have the multi-ring
drivers doing the checks.
[Note: there is hardware i have seen which use IEEE semantics of what
priority means (essentialy 802.1p); which happens to be the reverse of
what IETF thinks of it (DSCP/TOS view); so proper mapping is nessary]

> My question to you is this: can you explain the benefit of 
> not allowing the kernel to know of and be able to manage the 
> queues on the NIC?  This seems to be the heart of our disagreement; 

For correctness it is not necessary to use all rings at the same time;
this is not to say if you do that (as you did) is wrong, both schemes
are correct.
To answer your question: For me it is for the sake of a simplicity i.e
being less intrusive and making it transparent to both writers of qdiscs
as well as users of those qdiscs. Simplicity is always better if there
was no trumping difference. Simplicity can never trump correctness. i.e
It would be totaly wrong if i am going to put a packet in coach
class when it paid for business class and vice-versa for the sake of a
simple scheme. But that is not gonna happen with either of those two
approaches.
The added side-benefit is if you followed what i described, you can then
get that WRR working for e1000 and other NICs as well. This in itself is
a big advantage.

> I  view the ability to manage
> these queues from the kernel as being true multiqueue, and view doing
> the queue management solely in the driver as something that doesn't
> give any benefit.

Let me a bit long winded ...
"benefit" should map to correctness; both yours and mine are correct. 
To my approach, managing the hardware rings is cosmetics (doesnt add any
value).
If by "management" you mean a config/control point of view, then you can
tie the qdisc to the multiq driver at config time. i.e when configuring
the qdisc onto a specific driver. I am not against that, i just refered
to it as a peripheral issue.
e.g the qdisc asks the driver "can i setup a WRR of two queues with
weights 3 for queue0 and weight 4 for queue1?" driver looks at its
capability and if there are no capability matches, it rejects.

My problem with your approach is from a datapath view.

It should be easy to demonstrate experimentally that the s/ware qdisc
framework in Linux is so good such that there is no difference in the
achieved qos when using the e1000 between:
 
1) 2 queues at the qdisc level with scheduler WRR + a single hware ring
(which is standard setup in Linux) 
2) no queueing at qdisc level + 2 hware rings implementing scheduler WRR
3) 2 queues at the qdisc level with scheduler WRR + X hware rings
implementing scheduler WRR

We havent talked about wireless - but lets get over the wired world
first.
> 
>  Hopefully we can
> come to an agreement of some sort soon, since this work has been going
> on for some time to be halted after quite a bit of engineering and
> community feedback.
> 
Again, I am sorry for coming in late - I have good reasons.

Note: I am not pulling these ideas out of a hat. I have implemented what
i described and didnt need to make any changes to the core (i never did
the capability check, but talking to you that seems to be a good idea).

cheers,
jamal

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] IPROUTE: Modify tc for new PRIO multiqueue behavior

Reply via email to