Hello.
On úterý 19. září 2023 17:43:40 CEST Matthew Wilcox wrote:
> On Tue, Sep 19, 2023 at 10:26:42AM +0200, Oleksandr Natalenko wrote:
> > Andrzej asked me to try to revert commits 0b62af28f249, e0b72c14d8dc and
> > 1e0877d58b1e, and reverting those fixed the i915 crash for me. The
> > e0b72c14d8dc and 1e0877d58b1e commits look like just prerequisites, so I
> > assume 0b62af28f249 ("i915: convert shmem_sg_free_table() to use a
> > folio_batch") is the culprit here.
> >
> > Could you please check this?
> >
> > Our conversation with Andrzej is available at drm-intel GitLab [1].
> >
> > Thanks.
> >
> > [1] https://gitlab.freedesktop.org/drm/intel/-/issues/9256
>
> Wow, that is some great debugging. Thanks for all the time & effort
> you and others have invested. Sorry for breaking your system.
>
> You're almost right about the "prerequisites", but it's in the other
> direction; 0b62af28f249 is a prerequisite for the later two cleanups,
> so reverting all three is necessary to test 0b62af28f249.
>
> It seems to me that you've isolated the problem to constructing overly
> long sg lists. I didn't realise that was going to be a problem, so
> that's my fault.
>
> Could I ask you to try this patch? I'll follow up with another patch
> later because I think I made another assumption that may not be valid.
I can confirm this one fixes the issue for me on T460s laptop. Thank you!
Should you submit it, please add:
Fixes: 0b62af28f2 ("i915: convert shmem_sg_free_table() to use a folio_batch")
Cc: [email protected] # 6.5.x
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/9256
Link: https://lore.kernel.org/lkml/[email protected]/
Reported-by: Oleksandr Natalenko <[email protected]>
Tested-by: Oleksandr Natalenko <[email protected]>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> index 8f1633c3fb93..73a4a4eb29e0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> @@ -100,6 +100,7 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915,
> struct sg_table *st,
> st->nents = 0;
> for (i = 0; i < page_count; i++) {
> struct folio *folio;
> + unsigned long nr_pages;
> const unsigned int shrink[] = {
> I915_SHRINK_BOUND | I915_SHRINK_UNBOUND,
> 0,
> @@ -150,6 +151,8 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915,
> struct sg_table *st,
> }
> } while (1);
>
> + nr_pages = min_t(unsigned long,
> + folio_nr_pages(folio), page_count - i);
> if (!i ||
> sg->length >= max_segment ||
> folio_pfn(folio) != next_pfn) {
> @@ -157,13 +160,13 @@ int shmem_sg_alloc_table(struct drm_i915_private *i915,
> struct sg_table *st,
> sg = sg_next(sg);
>
> st->nents++;
> - sg_set_folio(sg, folio, folio_size(folio), 0);
> + sg_set_folio(sg, folio, nr_pages * PAGE_SIZE, 0);
> } else {
> /* XXX: could overflow? */
> - sg->length += folio_size(folio);
> + sg->length += nr_pages * PAGE_SIZE;
> }
> - next_pfn = folio_pfn(folio) + folio_nr_pages(folio);
> - i += folio_nr_pages(folio) - 1;
> + next_pfn = folio_pfn(folio) + nr_pages;
> + i += nr_pages - 1;
>
> /* Check that the i965g/gm workaround works. */
> GEM_BUG_ON(gfp & __GFP_DMA32 && next_pfn >= 0x00100000UL);
--
Oleksandr Natalenko (post-factum)
signature.asc
Description: This is a digitally signed message part.
