The discussion about tc action reminded my of something else I wanted to take care of in the next time. Some of the non-work-conserving qdiscs (HFSC, TBF, netem) need to peek at the next packet when throttling to calculate the timeout when to wake up. This is currently done be dequeueing a packet, looking at it (usually the size), and requeueing it. This only works properly when the inner qdisc doesn't reorder packets, otherwise it might hand out a different packet when dequeued again after wakeing up, which results in either wakeing up too early (when the packet is larger) or underutilization (when the packet is smaller). To correctly deal with this, we need a peek operation that guarantees that the next packet dequeued will be the one peeked at, even if a higher priority packer arrives. This will increase the worst-case latency by the transmission time of one full sized packet for reordering qdiscs, but the same can happen today, this way at least there is no underutilization.
There are basically two possibilities how to implement this. The less intrusive, but IMO more hackish one is to just handle this inside the qdiscs that require this operation by not requeueing the packet to the qdisc, but keeping a private reference somewhere. The disadvantage is that this distorts statistics and estimators, the classful qdisc would for example have more packets queued than the sum of all its inner qdiscs. The other possibility is to introduce a ->peek operation or a flag to ->dequeue and handle it within the reordering qdiscs. I think we only need to implement it for non-classful (or single-class) qdiscs, I can't imagine why anyone would add a scheduler as inner qdisc to a different scheduler, at least with the current ones. Any preferences or suggestions? Otherwise I'll go with the second possibility. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html