On Mon, 13 Nov 2017, Sagi Grimberg wrote:
> > > Can you explain what do you mean by "subsystem"? I thought that the
> > > subsystem would be the irq subsystem (which means you are the one to
> > > provide
> > > the needed input :) ) and the driver would pass in something
> > > like msi_irq_ops to p
Can you explain what do you mean by "subsystem"? I thought that the
subsystem would be the irq subsystem (which means you are the one to provide
the needed input :) ) and the driver would pass in something
like msi_irq_ops to pci_alloc_irq_vectors() if it supports the driver
requirements that yo
On Mon, 13 Nov 2017, Sagi Grimberg wrote:
> > > > #1 Before the core tries to move the interrupt so it can veto
> > > > the
> > > > move if it cannot allocate new resources or whatever is
> > > > required
> > > > to operate after the move.
> > >
> > > What would the c
Do you know if any exist? Would it make sense to have a survey to
understand if anyone relies on it?
From what I've seen so far, drivers that were converted simply worked
with the non-managed facility and didn't have any special code for it.
Perhaps Christoph can comment as he convert most of
On Mon, 13 Nov 2017, Sagi Grimberg wrote:
> > 3) Affinity override in managed mode
> >
> > Doable, but there are a couple of things to think about:
>
> I think that it will be good to shoot for (3). Given that there are
> driver requirements I'd say that driver will expose up front if it can
>
Hi Thomas,
What can be done with some time to work on?
The managed mechanism consists of 3 pieces:
1) Vector spreading
2) Managed vector allocation, which becomes a guaranteed reservation in
4.15 due of the big rework of the vector management code.
Non managed interrupts get a
On Fri, 10 Nov 2017, Saeed Mahameed wrote:
> Well, I can speak for mlx5 case or most of the network drivers, where
> all of the queues associated with an interrupt, move with it, so i
> don't think our current driver have this issue. I don't believe there
> are network driver with fixed Per cpu re
On Thu, 2017-11-09 at 22:42 +0100, Thomas Gleixner wrote:
> Find below a summary of the technical details, implications and
> options
>
> What can be done for 4.14?
>
> We basically have two options: Revert at the driver level or ship
> as
> is.
>
I think we all came to the consensus that t
On 11/08/2017 12:33 PM, Thomas Gleixner wrote:
> On Wed, 8 Nov 2017, Jes Sorensen wrote:
>> On 11/07/2017 10:07 AM, Thomas Gleixner wrote:
>>> Depending on the machine and the number of queues this might even result in
>>> completely losing the ability to suspend/hibernate because the number of
>>>
Find below a summary of the technical details, implications and options
What can be done for 4.14?
We basically have two options: Revert at the driver level or ship as
is.
Even if we come up with a quick and dirty hack then it will be too late
for proper testing before sunday.
What can
On 11/09/2017 02:23 PM, Thomas Gleixner wrote:
> On Thu, 9 Nov 2017, Jens Axboe wrote:
>> On 11/09/2017 10:03 AM, Thomas Gleixner wrote:
>>> On Thu, 9 Nov 2017, Jens Axboe wrote:
On 11/09/2017 07:19 AM, Thomas Gleixner wrote:
If that's the attitude at your end, then I do suggest we just r
On Thu, 9 Nov 2017, Jens Axboe wrote:
> On 11/09/2017 10:07 AM, Thomas Gleixner wrote:
> > I say it one last time: It can be done and I'm willing to help.
>
> It didn't sound like it earlier, but that's good news.
Well, I'm equally frustrated by this whole thing, but I certainly never
said that I
On Thu, 9 Nov 2017, Jens Axboe wrote:
> On 11/09/2017 10:03 AM, Thomas Gleixner wrote:
> > On Thu, 9 Nov 2017, Jens Axboe wrote:
> >> On 11/09/2017 07:19 AM, Thomas Gleixner wrote:
> >> If that's the attitude at your end, then I do suggest we just revert the
> >> driver changes. Clearly this isn't
On 11/09/2017 10:07 AM, Thomas Gleixner wrote:
> On Thu, 9 Nov 2017, Jens Axboe wrote:
>
>> On 11/09/2017 09:01 AM, Sagi Grimberg wrote:
Now you try to blame the people who implemented the managed affinity stuff
for the wreckage, which was created by people who changed drivers to use
>>>
On 11/09/2017 10:03 AM, Thomas Gleixner wrote:
> On Thu, 9 Nov 2017, Jens Axboe wrote:
>> On 11/09/2017 07:19 AM, Thomas Gleixner wrote:
>> If that's the attitude at your end, then I do suggest we just revert the
>> driver changes. Clearly this isn't going to be productive going forward.
>>
>> The
On Thu, 9 Nov 2017, Jens Axboe wrote:
> On 11/09/2017 09:01 AM, Sagi Grimberg wrote:
> >> Now you try to blame the people who implemented the managed affinity stuff
> >> for the wreckage, which was created by people who changed drivers to use
> >> it. Nice try.
> >
> > I'm not trying to blame any
On Thu, 9 Nov 2017, Jens Axboe wrote:
> On 11/09/2017 07:19 AM, Thomas Gleixner wrote:
> If that's the attitude at your end, then I do suggest we just revert the
> driver changes. Clearly this isn't going to be productive going forward.
>
> The better solution was to make the managed setup more fl
On 11/09/2017 09:01 AM, Sagi Grimberg wrote:
>> Now you try to blame the people who implemented the managed affinity stuff
>> for the wreckage, which was created by people who changed drivers to use
>> it. Nice try.
>
> I'm not trying to blame anyone, really. I was just trying to understand
> how
The early discussion of the managed facility came to the conclusion that it
will manage this stuff completely to allow fixed association of 'queue /
interrupt / corresponding memory' to a single CPU or a set of CPUs. That
removes a lot of 'affinity' handling magic from the driver and utilizes th
Again, I think Jes or others can provide more information.
Sagi, I believe Jes is not trying to argue about what initial affinity
values you give to the driver, We have a very critical regression that
is afflicting Live systems today and common tools that already exists
in various distros, suc
On 11/09/2017 07:19 AM, Thomas Gleixner wrote:
> On Thu, 9 Nov 2017, Sagi Grimberg wrote:
>> Thomas,
>>
Because the user sometimes knows better based on statically assigned
loads, or the user wants consistency across kernels. It's great that the
system is better at allocating this no
On 11/09/2017 03:50 AM, Sagi Grimberg wrote:
> Thomas,
>
>>> Because the user sometimes knows better based on statically assigned
>>> loads, or the user wants consistency across kernels. It's great that the
>>> system is better at allocating this now, but we also need to allow for a
>>> user to ch
On 11/09/2017 03:09 AM, Christoph Hellwig wrote:
> On Wed, Nov 08, 2017 at 09:13:59AM -0700, Jens Axboe wrote:
>> There are numerous valid reasons to be able to set the affinity, for
>> both nics and block drivers. It's great that the kernel has a predefined
>> layout that works well, but users do
On Wed, 2017-11-08 at 09:27 +0200, Sagi Grimberg wrote:
> > Depending on the machine and the number of queues this might even
> > result in
> > completely losing the ability to suspend/hibernate because the
> > number of
> > available vectors on CPU0 is not sufficient to accomodate all queue
> > in
On Thu, 9 Nov 2017, Sagi Grimberg wrote:
> Thomas,
>
> > > Because the user sometimes knows better based on statically assigned
> > > loads, or the user wants consistency across kernels. It's great that the
> > > system is better at allocating this now, but we also need to allow for a
> > > user t
Thomas,
Because the user sometimes knows better based on statically assigned
loads, or the user wants consistency across kernels. It's great that the
system is better at allocating this now, but we also need to allow for a
user to change it. Like anything on Linux, a user wanting to blow off
his
On Wed, Nov 08, 2017 at 09:13:59AM -0700, Jens Axboe wrote:
> There are numerous valid reasons to be able to set the affinity, for
> both nics and block drivers. It's great that the kernel has a predefined
> layout that works well, but users do need the flexibility to be able to
> reconfigure affin
On Wed, 8 Nov 2017, Jes Sorensen wrote:
> On 11/07/2017 10:07 AM, Thomas Gleixner wrote:
> > On Sun, 5 Nov 2017, Sagi Grimberg wrote:
> >> I do agree that the user would lose better cpu online/offline behavior,
> >> but it seems that users want to still have some control over the IRQ
> >> affinity
On 11/07/2017 10:07 AM, Thomas Gleixner wrote:
> On Sun, 5 Nov 2017, Sagi Grimberg wrote:
>> I do agree that the user would lose better cpu online/offline behavior,
>> but it seems that users want to still have some control over the IRQ
>> affinity assignments even if they lose this functionality.
On 11/08/2017 05:21 AM, David Laight wrote:
> From: Sagi Grimberg
>> Sent: 08 November 2017 07:28
> ...
>>> Why would you give the user a knob to destroy what you carefully optimized?
>>
>> Well, looks like someone relies on this knob, the question is if he is
>> doing something better for his work
From: Sagi Grimberg
> Sent: 08 November 2017 07:28
...
> > Why would you give the user a knob to destroy what you carefully optimized?
>
> Well, looks like someone relies on this knob, the question is if he is
> doing something better for his workload. I don't know, its really up to
> the user to
Depending on the machine and the number of queues this might even result in
completely losing the ability to suspend/hibernate because the number of
available vectors on CPU0 is not sufficient to accomodate all queue
interrupts.
Would it be possible to keep the managed facility until a user ov
On Sun, 5 Nov 2017, Sagi Grimberg wrote:
> > > > This wasn't to start a debate about which allocation method is the
> > > > perfect solution. I am perfectly happy with the new default, the part
> > > > that is broken is to take away the user's option to reassign the
> > > > affinity. That is a bug
This wasn't to start a debate about which allocation method is the
perfect solution. I am perfectly happy with the new default, the part
that is broken is to take away the user's option to reassign the
affinity. That is a bug and it needs to be fixed!
Well,
I would really want to wait for Tho
On Thu, 2 Nov 2017, Sagi Grimberg wrote:
>
> > This wasn't to start a debate about which allocation method is the
> > perfect solution. I am perfectly happy with the new default, the part
> > that is broken is to take away the user's option to reassign the
> > affinity. That is a bug and it needs
On 11/02/2017 12:14 PM, Sagi Grimberg wrote:
>
>> This wasn't to start a debate about which allocation method is the
>> perfect solution. I am perfectly happy with the new default, the part
>> that is broken is to take away the user's option to reassign the
>> affinity. That is a bug and it needs
This wasn't to start a debate about which allocation method is the
perfect solution. I am perfectly happy with the new default, the part
that is broken is to take away the user's option to reassign the
affinity. That is a bug and it needs to be fixed!
Well,
I would really want to wait for Tho
On 11/02/2017 06:08 AM, Sagi Grimberg wrote:
>
I vaguely remember Nacking Sagi's patch as we knew it would break
mlx5e netdev affinity assumptions.
>> I remember that argument. Still the series found its way in.
>
> Of course it maid its way in, it was acked by three different
> maintai
> >This means that if your NIC is on NUMA #1, and you reduce the number of
> >channels, you might end up working only with the cores on the far NUMA.
> >Not good!
> We deliberated on this before, and concluded that application affinity
> and device affinity are equally important. If you have a rea
I vaguely remember Nacking Sagi's patch as we knew it would break
mlx5e netdev affinity assumptions.
I remember that argument. Still the series found its way in.
Of course it maid its way in, it was acked by three different
maintainers, and I addressed all of Saeed's comments.
That series m
On 02/11/2017 1:02 AM, Jes Sorensen wrote:
On 11/01/2017 06:41 PM, Saeed Mahameed wrote:
On Wed, Nov 1, 2017 at 11:20 AM, Jes Sorensen wrote:
On 11/01/2017 01:21 PM, Sagi Grimberg wrote:
I am all in favor of making the automatic setup better, but assuming an
automatic setup is always right s
Jes,
I am all in favor of making the automatic setup better, but assuming an
automatic setup is always right seems problematic. There could be
workloads where you may want to assign affinity explicitly.
Adding Thomas to the thread.
My understanding that the thought is to prevent user-space fr
On 11/01/2017 06:41 PM, Saeed Mahameed wrote:
> On Wed, Nov 1, 2017 at 11:20 AM, Jes Sorensen wrote:
>> On 11/01/2017 01:21 PM, Sagi Grimberg wrote:
>> I am all in favor of making the automatic setup better, but assuming an
>> automatic setup is always right seems problematic. There could be
>> wo
On Wed, Nov 1, 2017 at 11:20 AM, Jes Sorensen wrote:
> On 11/01/2017 01:21 PM, Sagi Grimberg wrote:
>>> Hi,
>>
>> Hi Jes,
>>
>>> The below patch seems to have broken PCI IRQ affinity assignments for
>>> mlx5.
>>
>> I wouldn't call it breaking IRQ affinity assignments. It just makes
>> them automat
On 11/01/2017 01:21 PM, Sagi Grimberg wrote:
>> Hi,
>
> Hi Jes,
>
>> The below patch seems to have broken PCI IRQ affinity assignments for
>> mlx5.
>
> I wouldn't call it breaking IRQ affinity assignments. It just makes
> them automatic.
Well it explicitly breaks the option for an admin to assi
Hi,
Hi Jes,
The below patch seems to have broken PCI IRQ affinity assignments for mlx5.
I wouldn't call it breaking IRQ affinity assignments. It just makes
them automatic.
Prior to this patch I could echo a value to /proc/irq//smp_affinity
and it would get assigned. With this patch applied
Hi,
The below patch seems to have broken PCI IRQ affinity assignments for mlx5.
Prior to this patch I could echo a value to /proc/irq//smp_affinity
and it would get assigned. With this patch applied I get -EIO
The actual affinity assignments seem to have changed too, but I assume
this is a resul
47 matches
Mail list logo