subject:"mlx5 broken affinity"

Re: [RFD] Managed interrupt affinities [ Was: mlx5 broken affinity ]

2017-11-14 Thread Thomas Gleixner

On Mon, 13 Nov 2017, Sagi Grimberg wrote: > > > Can you explain what do you mean by "subsystem"? I thought that the > > > subsystem would be the irq subsystem (which means you are the one to > > > provide > > > the needed input :) ) and the driver would pass in something > > > like msi_irq_ops to p

Re: [RFD] Managed interrupt affinities [ Was: mlx5 broken affinity ]

2017-11-13 Thread Sagi Grimberg

Can you explain what do you mean by "subsystem"? I thought that the subsystem would be the irq subsystem (which means you are the one to provide the needed input :) ) and the driver would pass in something like msi_irq_ops to pci_alloc_irq_vectors() if it supports the driver requirements that yo

Re: [RFD] Managed interrupt affinities [ Was: mlx5 broken affinity ]

2017-11-13 Thread Thomas Gleixner

On Mon, 13 Nov 2017, Sagi Grimberg wrote: > > > > #1 Before the core tries to move the interrupt so it can veto > > > > the > > > > move if it cannot allocate new resources or whatever is > > > > required > > > > to operate after the move. > > > > > > What would the c

Re: [RFD] Managed interrupt affinities [ Was: mlx5 broken affinity ]

2017-11-13 Thread Sagi Grimberg

Do you know if any exist? Would it make sense to have a survey to understand if anyone relies on it? From what I've seen so far, drivers that were converted simply worked with the non-managed facility and didn't have any special code for it. Perhaps Christoph can comment as he convert most of

Re: [RFD] Managed interrupt affinities [ Was: mlx5 broken affinity ]

2017-11-13 Thread Thomas Gleixner

On Mon, 13 Nov 2017, Sagi Grimberg wrote: > > 3) Affinity override in managed mode > > > > Doable, but there are a couple of things to think about: > > I think that it will be good to shoot for (3). Given that there are > driver requirements I'd say that driver will expose up front if it can >

Re: [RFD] Managed interrupt affinities [ Was: mlx5 broken affinity ]

2017-11-13 Thread Sagi Grimberg

Hi Thomas, What can be done with some time to work on? The managed mechanism consists of 3 pieces: 1) Vector spreading 2) Managed vector allocation, which becomes a guaranteed reservation in 4.15 due of the big rework of the vector management code. Non managed interrupts get a

Re: [RFD] Managed interrupt affinities [ Was: mlx5 broken affinity ]

2017-11-10 Thread Thomas Gleixner

On Fri, 10 Nov 2017, Saeed Mahameed wrote: > Well, I can speak for mlx5 case or most of the network drivers, where > all of the queues associated with an interrupt, move with it, so i > don't think our current driver have this issue. I don't believe there > are network driver with fixed Per cpu re

Re: [RFD] Managed interrupt affinities [ Was: mlx5 broken affinity ]

2017-11-09 Thread Saeed Mahameed

On Thu, 2017-11-09 at 22:42 +0100, Thomas Gleixner wrote: > Find below a summary of the technical details, implications and > options > > What can be done for 4.14? > > We basically have two options: Revert at the driver level or ship > as > is. > I think we all came to the consensus that t

Re: mlx5 broken affinity

2017-11-09 Thread Jes Sorensen

On 11/08/2017 12:33 PM, Thomas Gleixner wrote: > On Wed, 8 Nov 2017, Jes Sorensen wrote: >> On 11/07/2017 10:07 AM, Thomas Gleixner wrote: >>> Depending on the machine and the number of queues this might even result in >>> completely losing the ability to suspend/hibernate because the number of >>>

[RFD] Managed interrupt affinities [ Was: mlx5 broken affinity ]

2017-11-09 Thread Thomas Gleixner

Find below a summary of the technical details, implications and options What can be done for 4.14? We basically have two options: Revert at the driver level or ship as is. Even if we come up with a quick and dirty hack then it will be too late for proper testing before sunday. What can

Re: mlx5 broken affinity

2017-11-09 Thread Jens Axboe

On 11/09/2017 02:23 PM, Thomas Gleixner wrote: > On Thu, 9 Nov 2017, Jens Axboe wrote: >> On 11/09/2017 10:03 AM, Thomas Gleixner wrote: >>> On Thu, 9 Nov 2017, Jens Axboe wrote: On 11/09/2017 07:19 AM, Thomas Gleixner wrote: If that's the attitude at your end, then I do suggest we just r

Re: mlx5 broken affinity

2017-11-09 Thread Thomas Gleixner

On Thu, 9 Nov 2017, Jens Axboe wrote: > On 11/09/2017 10:07 AM, Thomas Gleixner wrote: > > I say it one last time: It can be done and I'm willing to help. > > It didn't sound like it earlier, but that's good news. Well, I'm equally frustrated by this whole thing, but I certainly never said that I

Re: mlx5 broken affinity

2017-11-09 Thread Thomas Gleixner

On Thu, 9 Nov 2017, Jens Axboe wrote: > On 11/09/2017 10:03 AM, Thomas Gleixner wrote: > > On Thu, 9 Nov 2017, Jens Axboe wrote: > >> On 11/09/2017 07:19 AM, Thomas Gleixner wrote: > >> If that's the attitude at your end, then I do suggest we just revert the > >> driver changes. Clearly this isn't

Re: mlx5 broken affinity

2017-11-09 Thread Jens Axboe

On 11/09/2017 10:07 AM, Thomas Gleixner wrote: > On Thu, 9 Nov 2017, Jens Axboe wrote: > >> On 11/09/2017 09:01 AM, Sagi Grimberg wrote: Now you try to blame the people who implemented the managed affinity stuff for the wreckage, which was created by people who changed drivers to use >>>

Re: mlx5 broken affinity

2017-11-09 Thread Jens Axboe

On 11/09/2017 10:03 AM, Thomas Gleixner wrote: > On Thu, 9 Nov 2017, Jens Axboe wrote: >> On 11/09/2017 07:19 AM, Thomas Gleixner wrote: >> If that's the attitude at your end, then I do suggest we just revert the >> driver changes. Clearly this isn't going to be productive going forward. >> >> The

Re: mlx5 broken affinity

2017-11-09 Thread Thomas Gleixner

On Thu, 9 Nov 2017, Jens Axboe wrote: > On 11/09/2017 09:01 AM, Sagi Grimberg wrote: > >> Now you try to blame the people who implemented the managed affinity stuff > >> for the wreckage, which was created by people who changed drivers to use > >> it. Nice try. > > > > I'm not trying to blame any

Re: mlx5 broken affinity

2017-11-09 Thread Thomas Gleixner

On Thu, 9 Nov 2017, Jens Axboe wrote: > On 11/09/2017 07:19 AM, Thomas Gleixner wrote: > If that's the attitude at your end, then I do suggest we just revert the > driver changes. Clearly this isn't going to be productive going forward. > > The better solution was to make the managed setup more fl

Re: mlx5 broken affinity

2017-11-09 Thread Jens Axboe

On 11/09/2017 09:01 AM, Sagi Grimberg wrote: >> Now you try to blame the people who implemented the managed affinity stuff >> for the wreckage, which was created by people who changed drivers to use >> it. Nice try. > > I'm not trying to blame anyone, really. I was just trying to understand > how

Re: mlx5 broken affinity

2017-11-09 Thread Sagi Grimberg

The early discussion of the managed facility came to the conclusion that it will manage this stuff completely to allow fixed association of 'queue / interrupt / corresponding memory' to a single CPU or a set of CPUs. That removes a lot of 'affinity' handling magic from the driver and utilizes th

Re: mlx5 broken affinity

2017-11-09 Thread Sagi Grimberg

Again, I think Jes or others can provide more information. Sagi, I believe Jes is not trying to argue about what initial affinity values you give to the driver, We have a very critical regression that is afflicting Live systems today and common tools that already exists in various distros, suc

Re: mlx5 broken affinity

2017-11-09 Thread Jens Axboe

On 11/09/2017 07:19 AM, Thomas Gleixner wrote: > On Thu, 9 Nov 2017, Sagi Grimberg wrote: >> Thomas, >> Because the user sometimes knows better based on statically assigned loads, or the user wants consistency across kernels. It's great that the system is better at allocating this no

Re: mlx5 broken affinity

2017-11-09 Thread Jens Axboe

On 11/09/2017 03:50 AM, Sagi Grimberg wrote: > Thomas, > >>> Because the user sometimes knows better based on statically assigned >>> loads, or the user wants consistency across kernels. It's great that the >>> system is better at allocating this now, but we also need to allow for a >>> user to ch

Re: mlx5 broken affinity

2017-11-09 Thread Jens Axboe

On 11/09/2017 03:09 AM, Christoph Hellwig wrote: > On Wed, Nov 08, 2017 at 09:13:59AM -0700, Jens Axboe wrote: >> There are numerous valid reasons to be able to set the affinity, for >> both nics and block drivers. It's great that the kernel has a predefined >> layout that works well, but users do

Re: mlx5 broken affinity

2017-11-09 Thread Saeed Mahameed

On Wed, 2017-11-08 at 09:27 +0200, Sagi Grimberg wrote: > > Depending on the machine and the number of queues this might even > > result in > > completely losing the ability to suspend/hibernate because the > > number of > > available vectors on CPU0 is not sufficient to accomodate all queue > > in

Re: mlx5 broken affinity

2017-11-09 Thread Thomas Gleixner

On Thu, 9 Nov 2017, Sagi Grimberg wrote: > Thomas, > > > > Because the user sometimes knows better based on statically assigned > > > loads, or the user wants consistency across kernels. It's great that the > > > system is better at allocating this now, but we also need to allow for a > > > user t

Re: mlx5 broken affinity

2017-11-09 Thread Sagi Grimberg

Thomas, Because the user sometimes knows better based on statically assigned loads, or the user wants consistency across kernels. It's great that the system is better at allocating this now, but we also need to allow for a user to change it. Like anything on Linux, a user wanting to blow off his

Re: mlx5 broken affinity

2017-11-09 Thread Christoph Hellwig

On Wed, Nov 08, 2017 at 09:13:59AM -0700, Jens Axboe wrote: > There are numerous valid reasons to be able to set the affinity, for > both nics and block drivers. It's great that the kernel has a predefined > layout that works well, but users do need the flexibility to be able to > reconfigure affin

Re: mlx5 broken affinity

2017-11-08 Thread Thomas Gleixner

On Wed, 8 Nov 2017, Jes Sorensen wrote: > On 11/07/2017 10:07 AM, Thomas Gleixner wrote: > > On Sun, 5 Nov 2017, Sagi Grimberg wrote: > >> I do agree that the user would lose better cpu online/offline behavior, > >> but it seems that users want to still have some control over the IRQ > >> affinity

Re: mlx5 broken affinity

2017-11-08 Thread Jes Sorensen

On 11/07/2017 10:07 AM, Thomas Gleixner wrote: > On Sun, 5 Nov 2017, Sagi Grimberg wrote: >> I do agree that the user would lose better cpu online/offline behavior, >> but it seems that users want to still have some control over the IRQ >> affinity assignments even if they lose this functionality.

Re: mlx5 broken affinity

2017-11-08 Thread Jens Axboe

On 11/08/2017 05:21 AM, David Laight wrote: > From: Sagi Grimberg >> Sent: 08 November 2017 07:28 > ... >>> Why would you give the user a knob to destroy what you carefully optimized? >> >> Well, looks like someone relies on this knob, the question is if he is >> doing something better for his work

RE: mlx5 broken affinity

2017-11-08 Thread David Laight

From: Sagi Grimberg > Sent: 08 November 2017 07:28 ... > > Why would you give the user a knob to destroy what you carefully optimized? > > Well, looks like someone relies on this knob, the question is if he is > doing something better for his workload. I don't know, its really up to > the user to

Re: mlx5 broken affinity

2017-11-07 Thread Sagi Grimberg

Depending on the machine and the number of queues this might even result in completely losing the ability to suspend/hibernate because the number of available vectors on CPU0 is not sufficient to accomodate all queue interrupts. Would it be possible to keep the managed facility until a user ov

Re: mlx5 broken affinity

2017-11-07 Thread Thomas Gleixner

On Sun, 5 Nov 2017, Sagi Grimberg wrote: > > > > This wasn't to start a debate about which allocation method is the > > > > perfect solution. I am perfectly happy with the new default, the part > > > > that is broken is to take away the user's option to reassign the > > > > affinity. That is a bug

Re: mlx5 broken affinity

2017-11-05 Thread Sagi Grimberg

This wasn't to start a debate about which allocation method is the perfect solution. I am perfectly happy with the new default, the part that is broken is to take away the user's option to reassign the affinity. That is a bug and it needs to be fixed! Well, I would really want to wait for Tho

Re: mlx5 broken affinity

2017-11-02 Thread Thomas Gleixner

On Thu, 2 Nov 2017, Sagi Grimberg wrote: > > > This wasn't to start a debate about which allocation method is the > > perfect solution. I am perfectly happy with the new default, the part > > that is broken is to take away the user's option to reassign the > > affinity. That is a bug and it needs

Re: mlx5 broken affinity

2017-11-02 Thread Jes Sorensen

On 11/02/2017 12:14 PM, Sagi Grimberg wrote: > >> This wasn't to start a debate about which allocation method is the >> perfect solution. I am perfectly happy with the new default, the part >> that is broken is to take away the user's option to reassign the >> affinity. That is a bug and it needs

Re: mlx5 broken affinity

2017-11-02 Thread Sagi Grimberg

This wasn't to start a debate about which allocation method is the perfect solution. I am perfectly happy with the new default, the part that is broken is to take away the user's option to reassign the affinity. That is a bug and it needs to be fixed! Well, I would really want to wait for Tho

Re: mlx5 broken affinity

2017-11-02 Thread Jes Sorensen

On 11/02/2017 06:08 AM, Sagi Grimberg wrote: > I vaguely remember Nacking Sagi's patch as we knew it would break mlx5e netdev affinity assumptions. >> I remember that argument. Still the series found its way in. > > Of course it maid its way in, it was acked by three different > maintai

Re: mlx5 broken affinity

2017-11-02 Thread Andrew Lunn

> >This means that if your NIC is on NUMA #1, and you reduce the number of > >channels, you might end up working only with the cores on the far NUMA. > >Not good! > We deliberated on this before, and concluded that application affinity > and device affinity are equally important. If you have a rea

Re: mlx5 broken affinity

2017-11-02 Thread Sagi Grimberg

I vaguely remember Nacking Sagi's patch as we knew it would break mlx5e netdev affinity assumptions. I remember that argument. Still the series found its way in. Of course it maid its way in, it was acked by three different maintainers, and I addressed all of Saeed's comments. That series m

Re: mlx5 broken affinity

2017-11-02 Thread Tariq Toukan

On 02/11/2017 1:02 AM, Jes Sorensen wrote: On 11/01/2017 06:41 PM, Saeed Mahameed wrote: On Wed, Nov 1, 2017 at 11:20 AM, Jes Sorensen wrote: On 11/01/2017 01:21 PM, Sagi Grimberg wrote: I am all in favor of making the automatic setup better, but assuming an automatic setup is always right s

Re: mlx5 broken affinity

2017-11-02 Thread Sagi Grimberg

Jes, I am all in favor of making the automatic setup better, but assuming an automatic setup is always right seems problematic. There could be workloads where you may want to assign affinity explicitly. Adding Thomas to the thread. My understanding that the thought is to prevent user-space fr

Re: mlx5 broken affinity

2017-11-01 Thread Jes Sorensen

On 11/01/2017 06:41 PM, Saeed Mahameed wrote: > On Wed, Nov 1, 2017 at 11:20 AM, Jes Sorensen wrote: >> On 11/01/2017 01:21 PM, Sagi Grimberg wrote: >> I am all in favor of making the automatic setup better, but assuming an >> automatic setup is always right seems problematic. There could be >> wo

Re: mlx5 broken affinity

2017-11-01 Thread Saeed Mahameed

On Wed, Nov 1, 2017 at 11:20 AM, Jes Sorensen wrote: > On 11/01/2017 01:21 PM, Sagi Grimberg wrote: >>> Hi, >> >> Hi Jes, >> >>> The below patch seems to have broken PCI IRQ affinity assignments for >>> mlx5. >> >> I wouldn't call it breaking IRQ affinity assignments. It just makes >> them automat

Re: mlx5 broken affinity

2017-11-01 Thread Jes Sorensen

On 11/01/2017 01:21 PM, Sagi Grimberg wrote: >> Hi, > > Hi Jes, > >> The below patch seems to have broken PCI IRQ affinity assignments for >> mlx5. > > I wouldn't call it breaking IRQ affinity assignments. It just makes > them automatic. Well it explicitly breaks the option for an admin to assi

Re: mlx5 broken affinity

2017-11-01 Thread Sagi Grimberg

Hi, Hi Jes, The below patch seems to have broken PCI IRQ affinity assignments for mlx5. I wouldn't call it breaking IRQ affinity assignments. It just makes them automatic. Prior to this patch I could echo a value to /proc/irq//smp_affinity and it would get assigned. With this patch applied

mlx5 broken affinity

2017-11-01 Thread Jes Sorensen

Hi, The below patch seems to have broken PCI IRQ affinity assignments for mlx5. Prior to this patch I could echo a value to /proc/irq//smp_affinity and it would get assigned. With this patch applied I get -EIO The actual affinity assignments seem to have changed too, but I assume this is a resul

47 matches

Mail list logo