On Wed, 12 Nov 2014 13:08:55 +0900 Tetsuo Handa <penguin-kernel at
i-love.sakura.ne.jp> wrote:
> Andrew Morton wrote:
> > Poor ttm guys - this is a bit of a trap we set for them.
>
> Commit a91576d7916f6cce (\"drm/ttm: Pass GFP flags in order to avoid
> deadlock.\")
> changed to use sc->gfp_mask rather than GFP_KERNEL.
>
> - pages_to_free = kmalloc(npages_to_free * sizeof(struct page *),
> - GFP_KERNEL);
> + pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
>
> But this bug is caused by sc->gfp_mask containing some flags which are not
> in GFP_KERNEL, right? Then, I think
>
> - pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp);
> + pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp &
> GFP_KERNEL);
>
> would hide this bug.
>
> But I think we should use GFP_ATOMIC (or drop __GFP_WAIT flag)
Well no - ttm_page_pool_free() should stop calling kmalloc altogether.
Just do
struct page *pages_to_free[16];
and rework the code to free 16 pages at a time. Easy.
Apart from all the other things we're discussing here, it should do
this because kmalloc() isn't very reliable within a shrinker.
> for
> two reasons when __alloc_pages_nodemask() is called from shrinker functions.
>
> (1) Stack usage by __alloc_pages_nodemask() is large. If we unlimitedly allow
> recursive __alloc_pages_nodemask() calls, kernel stack could overflow
> under extreme memory pressure.
>
> (2) Some shrinker functions are using sleepable locks which could make kswapd
> sleep for unpredictable duration. If kswapd is unexpectedly blocked inside
> shrinker functions and somebody is expecting that kswapd is running for
> reclaiming memory, it is a memory allocation deadlock.
>
> Speak of ttm module, commit 22e71691fd54c637 (\"drm/ttm: Use mutex_trylock()
> to
> avoid deadlock inside shrinker functions.\") prevents unlimited recursive
> __alloc_pages_nodemask() calls.
Yes, there are such problems.
Shrinkers do all sorts of surprising things - some of the filesystem
ones do disk writes! And these involve all sorts of locking and memory
allocations. But they won't be directly using scan_control.gfp_mask.
They may be using open-coded __GFP_NOFS for the allocations. The
complicated ones pass the IO over to kernel threads and wait for them
to complete, which addresses the stack consumption concerns (at least).