On 03/21/2018 12:37 PM, Randy Dunlap wrote:
> On 03/21/2018 11:33 AM, Tal Gilboa wrote:
>> Net DIM is a generic algorithm, purposed for dynamically
>> optimizing network devices interrupt moderation. This
>> document describes how it works and how to use it.
>>
>> Signed-off-by: Tal Gilboa <ta...@mellanox.com>
>> ---
>>  Documentation/networking/net_dim.txt | 174 
>> +++++++++++++++++++++++++++++++++++
>>  1 file changed, 174 insertions(+)
>>  create mode 100644 Documentation/networking/net_dim.txt
>>
>> diff --git a/Documentation/networking/net_dim.txt 
>> b/Documentation/networking/net_dim.txt
>> new file mode 100644
>> index 0000000..9cb31c5
>> --- /dev/null
>> +++ b/Documentation/networking/net_dim.txt
>> @@ -0,0 +1,174 @@
>> +Net DIM - Generic Network Dynamic Interrupt Moderation
>> +======================================================
>> +
>> +Author:
>> +    Tal Gilboa <ta...@mellanox.com>
>> +
>> +
>> +Contents
>> +=========
>> +
>> +- Assumptions
>> +- Introduction
>> +- The Net DIM Algorithm
>> +- Registering a Network Device to DIM
>> +- Example
>> +
>> +Part 0: Assumptions
>> +======================
>> +
>> +This document assumes the reader has basic knowledge in network drivers
>> +and in general interrupt moderation.
>> +
>> +
>> +Part I: Introduction
>> +======================
>> +
>> +Dynamic Interrupt Moderation (DIM) (in networking) refers to changing the
>> +interrupt moderation configuration of a channel in order to optimize packet
>> +processing. The mechanism includes an algorithm which decides if and how to
>> +change moderation parameters for a channel, usually by performing an 
>> analysis on
>> +runtime data sampled from the system. Net DIM is such a mechanism. In each
>> +iteration of the algorithm, it analyses a given sample of the data, 
>> compares it
>> +to the previous sample and if required, it can decide to change some of the
>> +interrupt moderation configuration fields. The data sample is composed of 
>> data
>> +bandwidth, the number of packets and the number of events. The time between
>> +samples is also measured. Net DIM compares the current and the previous 
>> data and
>> +returns an adjusted interrupt moderation configuration object. In some 
>> cases,
>> +the algorithm might decide not to change anything. The configuration fields 
>> are
>> +the minimum duration (microseconds) allowed between events and the maximum
>> +number of wanted packets per event. The Net DIM algorithm ascribes 
>> importance to
>> +increase bandwidth over reducing interrupt rate.
>> +
>> +
>> +Part II: The Net DIM Algorithm
>> +===============================
>> +
>> +Each iteration of the Net DIM algorithm follows these steps:
>> +1. Calculates new data sample.
>> +2. Compares it to previous sample.
>> +3. Makes a decision - suggests interrupt moderation configuration fields.
>> +4. Applies a schedule work function, which applies suggested configuration.
>> +
>> +The first two steps are straightforward, both the new and the previous data 
>> are
>> +supplied by the driver registered to Net DIM. The previous data is the new 
>> data
>> +supplied to the previous iteration. The comparison step checks the 
>> difference
>> +between the new and previous data and decides on the result of the last 
>> step.
>> +A step would result as "better" if bandwidth increases and as "worse" if
>> +bandwidth reduces. If there is no change in bandwidth, the packet rate is
>> +compared in a similar fashion - increase == "better" and decrease == 
>> "worse".
>> +In case there is no change in the packet rate as well, the interrupt rate is
>> +compared. Here the algorithm tries to optimize for lower interrupt rate so 
>> an
>> +increase in the interrupt rate is considered "worse" and a decrease is
>> +considered "better". Step #2 has an optimization for avoiding false 
>> results: it
>> +only considers a difference between samples as valid if it is greater than a
>> +certain percentage. Also, since Net DIM does not measure anything by 
>> itself, it
>> +assumes the data provided by the driver is valid.
>> +
>> +Step #3 decides on the suggested configuration based on the result from 
>> step #2
>> +and the internal state of the algorithm. The states reflect the "direction" 
>> of
>> +the algorithm: is it going left (reducing moderation), right (increasing
>> +moderation) or standing still. Another optimization is that if a decision
>> +to stay still is made multiple times, the interval between iterations of the
>> +algorithm would increase in order to reduce calculation overhead. Also, 
>> after
>> +"parking" on one of the most left or most right decisions, the algorithm may
>> +decide to verify this decision by taking a step in the other direction. 
>> This is
>> +done in order to avoid getting stuck in a "deep sleep" scenario. Once a
>> +decision is made, an interrupt moderation configuration is selected from
>> +the predefined profiles.
> 
> I think a short description of the predefined profiles could help.

Agreed it would help if the different modes
(NET_DIM_CQ_PERIOD_MODE_START_FROM_EQE,
NET_DIM_CQ_PERIOD_MODE_START_FROM_CQE) were expanded a bit further. The
whole term QE sounds very much Ethernet converged adapter to me...

> 
>> +
>> +The last step is to notify the registered driver that it should apply the
>> +suggested configuration. This is done by scheduling a work function, 
>> defined by
>> +the Net DIM API and provided by the registered driver.
>> +
>> +As you can see, Net DIM itself does not actively interact with the system. 
>> It
>> +would have trouble making the correct decisions if the wrong data is 
>> supplied to
>> +it and it would be useless if the work function would not apply the 
>> suggested
>> +configuration. This does, however, allow the registered driver some room for
>> +manoeuvre as it may provide partial data or ignore the algorithm suggestion
>> +under some conditions.
>> +
>> +
>> +Part III: Registering a Network Device to DIM
>> +==============================================
>> +
>> +Net DIM API exposes the main function net_dim(struct net_dim *dim,
>> +struct net_dim_sample end_sample). This function is the entry point to the 
>> Net
>> +DIM algorithm and has to be called every time the driver would like to 
>> check if
>> +it should change interrupt moderation parameters. The driver should provide 
>> two
> 
> Is it completely up to the driver to decide when to call net_dim()?
> So it could be based on TX traffic, RX traffic, time, queue depths, etc.?
> 
>> +data structures: struct net_dim and struct net_dim_sample. Struct net_dim
>> +describes the state of DIM for a specific object (RX queue, TX queue,
>> +other queues, etc.). This includes the current selected profile, previous 
>> data
>> +samples, the callback function provided by the driver and more.
>> +Struct net_dim_sample describes a data sample, which will be compared to the
>> +data sample stored in struct net_dim in order to decide on the algorithm's 
>> next
>> +step. The sample should include bytes, packets and interrupts, measured by
>> +the driver.
>> +
>> +In order to use Net DIM from a networking driver, the driver needs to call 
>> the
>> +main net_dim() function. The recommended method is to call net_dim() on each
>> +interrupt. Since Net DIM has a built-in moderation and it might decide to 
>> skip
> 
> (continuing my question from above:)
> or on each interrupt.  But the hardware could also be doing interrupt 
> mitigation,
> so each interrupt doesn't always correlate to anything specific.
> 
>> +iterations under certain conditions, there is no need to moderate the 
>> net_dim()
>> +calls as well. As mentioned above, the driver needs to provide an object of 
>> type
>> +struct net_dim to the net_dim() function call. It is advised for each entity
>> +using Net DIM to hold a struct net_dim as part of its data structure and 
>> use it
>> +as the main Net DIM API object. The struct net_dim_sample should hold the 
>> latest
>> +bytes, packets and interrupts count. No need to perform any calculations, 
>> just
>> +include the raw data.
>> +
>> +The net_dim() call itself does not return anything. Instead Net DIM relies 
>> on
>> +the driver to provide a callback function, which is called when the 
>> algorithm
>> +decides to make a change in the interrupt moderation parameters. This 
>> callback
>> +will be scheduled and run in a separate thread in order not to add overhead 
>> to
>> +the data flow. After the work is done, Net DIM algorithm needs to be set to
>> +the proper state in order to move to the next iteration.
>> +
>> +
>> +Part IV: Example
>> +=================
>> +
>> +The following code demonstrates how to register a driver to Net DIM. The 
>> actual
>> +usage is not complete but it should make the outline of the usage clear.
>> +
>> +my_driver.c:
>> +
>> +#include <linux/net_dim.h>
>> +
>> +/* Callback for net DIM to schedule on a decision to change moderation */
>> +void my_driver_do_dim_work(struct work_struct *work)
>> +{
>> +    /* Get struct net_dim from struct work_struct */
>> +    struct net_dim *dim = container_of(work, struct net_dim,
>> +                                       work);
>> +    /* Do interrupt moderation related stuff */
>> +    ...
>> +
>> +    /* Signal net DIM work is done and it should move to next iteration */
>> +    dim->state = NET_DIM_START_MEASURE;
>> +}
>> +
>> +/* My driver's interrupt handler */
>> +int my_driver_handle_interrupt(struct my_driver_entity *my_entity, ...)
>> +{
>> +    ...
>> +    /* A struct to hold current measured data */
>> +    struct net_dim_sample dim_sample;
>> +    ...
>> +    /* Initiate data sample struct with current data */
>> +    net_dim_sample(my_entity->events,
>> +                   my_entity->packets,
>> +                   my_entity->bytes,
>> +                   &dim_sample);
>> +    /* Call net DIM */
>> +    net_dim(&my_entity->dim, dim_sample);
>> +    ...
>> +}
>> +
>> +/* My entity's initialization function (my_entity was already allocated) */
>> +int my_driver_init_my_entity(struct my_driver_entity *my_entity, ...)
>> +{
>> +    ...
>> +    /* Initiate struct work_struct with my driver's callback function */
>> +    INIT_WORK(&my_entity->dim.work, my_driver_do_dim_work);
>> +    ...
>> +}
>>
> 
> Reviewed-by: Randy Dunlap <rdun...@infradead.org>
> 
> thanks,
> 


-- 
Florian

Reply via email to