On Fri 07-07-17 11:39:18, Sergey Senozhatsky wrote:
[...]
> > void drm_modeset_lock_all(struct drm_device *dev)
> > {
> >         struct drm_mode_config *config = &dev->mode_config;
> >         struct drm_modeset_acquire_ctx *ctx;
> >         int ret;
> > 
> >         ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> >         if (WARN_ON(!ctx))
> >                 return;
> 
> hm, this allocation, per se, looks ok to me. can't really blame it.
> what you had is a combination of factors
> 
>       CPU0                    CPU1                            CPU2
>                                                               
> console_callback()
>                                                                console_lock()
>                                                                ^^^^^^^^^^^^^
>       vprintk_emit()          mutex_lock(&par->bo_mutex)
>                                kzalloc(GFP_KERNEL)
>        console_trylock()        kmem_cache_alloc()              
> mutex_lock(&par->bo_mutex)
>        ^^^^^^^^^^^^^^^^          io_schedule_timeout
> 
> // but I haven't seen the logs that you have provided, yet.
> 
> [..]
> > As a result, console was not able to print SysRq-t output.
> > 
> > So, how should we avoid this problem?
> 
> from the top of my head -- console_sem must be replaced with something
> better.

Yeah, absolutely. The current mess just allows basically arbitrary lock
depencies which are not deadlocks because the printk part is careful but
essentially we are deadlocked wrt. functionality.

> but that's a task for years.
> 
> hm...
> 
> > But should fbcon, drm, tty and so on stop using __GFP_DIRECT_RECLAIM
> > memory allocations because consoles should be as responsive as printk() ?
> 
> may be, may be not. like I said, the allocation in question does not
> participate in console output. it's rather hard to imagine how we would
> enforce a !__GFP_DIRECT_RECLAIM requirement here. it's console semaphore
> to blame, I think.

Agreed! Looking at the problem just from the page allocator perspective
is simply wrong. That is where you see your immediate problem because
that is what you are testing I would bet my hat you can find other
interesting scenarios if you try too hard...

-- 
Michal Hocko
SUSE Labs
_______________________________________________
dri-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to