On Thu, 25 Feb 2016 15:14:32 -0800 Brian Norris <[email protected]> wrote:
> On Thu, Feb 25, 2016 at 11:54:25PM +0100, Richard Weinberger wrote: > > On Tue, Oct 6, 2015 at 3:52 PM, Harvey Hunt <[email protected]> wrote: > > > From: Alex Smith <[email protected]> > > [...] > > > > --- a/drivers/mtd/nand/nand_base.c > > > +++ b/drivers/mtd/nand/nand_base.c > > > @@ -543,23 +543,32 @@ static void panic_nand_wait_ready(struct mtd_info > > > *mtd, unsigned long timeo) > > > } > > > } > > > > > > -/* Wait for the ready pin, after a command. The timeout is caught later. > > > */ > > > +/** > > > + * nand_wait_ready - [GENERIC] Wait for the ready pin after commands. > > > + * @mtd: MTD device structure > > > + * > > > + * Wait for the ready pin after a command, and warn if a timeout occurs. > > > + */ > > > void nand_wait_ready(struct mtd_info *mtd) > > > { > > > struct nand_chip *chip = mtd->priv; > > > - unsigned long timeo = jiffies + msecs_to_jiffies(20); > > > + unsigned long timeo = 400; > > > > > > - /* 400ms timeout */ > > > if (in_interrupt() || oops_in_progress) > > > - return panic_nand_wait_ready(mtd, 400); > > > + return panic_nand_wait_ready(mtd, timeo); > > > > > > led_trigger_event(nand_led_trigger, LED_FULL); > > > /* Wait until command is processed or timeout occurs */ > > > + timeo = jiffies + msecs_to_jiffies(timeo); > > > do { > > > if (chip->dev_ready(mtd)) > > > - break; > > > - touch_softlockup_watchdog(); > > > + goto out; > > > + cond_resched(); > > > } while (time_before(jiffies, timeo)); > > > + > > > + pr_warn_ratelimited( > > > + "timeout while waiting for chip to become ready\n"); > > > +out: > > > > Sorry for exhuming an already merged patch but Boris and I ran into > > spurious chip timeouts > > and hunted the issue down to this change. > > If the system is under heavy load the cond_resched() will swap in > > other threads and the > > time_before() calculation will trigger and a wrong chip timeout is reported. > > > > It is also not clear to us why the cond_resched() is needed at all. > > Can you please elaborate? > > I can't speak for the "why" precisely. It seemed reasonable to avoid a > (potentially) 400 ms busy loop though, in the presence of other > potential work. > > Regardless, this timeout loop is wrong. Shouldn't it have something like > the following? > > diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c > index f2c8ff398d6c..596a9b0503da 100644 > --- a/drivers/mtd/nand/nand_base.c > +++ b/drivers/mtd/nand/nand_base.c > @@ -566,8 +566,8 @@ void nand_wait_ready(struct mtd_info *mtd) > cond_resched(); > } while (time_before(jiffies, timeo)); > > - pr_warn_ratelimited( > - "timeout while waiting for chip to become ready\n"); > + if (!chip->dev_ready(mtd)) > + pr_warn_ratelimited("timeout while waiting for chip to become > ready\n"); > out: > led_trigger_event(nand_led_trigger, LED_OFF); > } Looks good to me. If you post the patch, you can add Reviewed-by: Boris Brezillon <[email protected]> -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com

