On Mon, 13 Nov 2017, Sagi Grimberg wrote:
> > 3) Affinity override in managed mode
> >
> > Doable, but there are a couple of things to think about:
>
> I think that it will be good to shoot for (3). Given that there are
> driver requirements I'd say that driver will expose up front if it can
> handle it, and if not we fallback to (1).
>
> > * How is this enabled?
> >
> > - Opt-in by driver
> >
> > - Extra sysfs/procfs knob
> >
> > We definitely should not enable it per default because that would
> > surprise users/drivers which work with the current managed devices and
> > rely on the affinity files to be non writeable in managed mode.
>
> Do you know if any exist? Would it make sense to have a survey to
> understand if anyone relies on it?
>
> From what I've seen so far, drivers that were converted simply worked
> with the non-managed facility and didn't have any special code for it.
> Perhaps Christoph can comment as he convert most of them.
>
> But if there aren't any drivers that absolutely rely on it, maybe its
> not a bad idea to allow it by default?
Sure, I was just cautious and I have to admit that I have no insight into
the driver side details.
> > * When and how is the driver informed about the change?
> >
> > When:
> >
> > #1 Before the core tries to move the interrupt so it can veto the
> > move if it cannot allocate new resources or whatever is required
> > to operate after the move.
>
> What would the core do if a driver veto a move?
Return the error code from write_affinity() as it does with any other error
which fails to set the affinity.
> I'm wandering in what conditions a driver will be unable to allocate
> resources for move to cpu X but able to allocate for move to cpu Y.
Node affine memory allocation is the only thing which comes to my mind, or
some decision not to have a gazillion of queues on a single CPU.
> This looks like it can work to me, but I'm probably not familiar enough
> to see the full picture here.
On the interrupt core side this is workable, I just need the input from the
driver^Wsubsystem side if this can be implemented sanely.
Thanks,
tglx