Hi Kevin,

the basic problem you are facing is that ttm_tt_create/destroy is mandatory (It always was). You need an implementation or otherwise you won't be able to use the system domain (additional to the optional GTT domain).

My best guess is that the difference is that we now force to initiate the system domain for all drivers.

If that is correct you just that you never ran into because you never correctly initialized TTM to support buffer moves.

I'm not sure what exactly the OpenChrome DRM driver is doing, but I strongly suggest to just drop TTM support completely and use the GEM VRAM helper layer instead.

Regards,
Christian.

Am 19.10.20 um 09:23 schrieb Kevin Brace:
Hi Dave,

Yeah, with the workaround I mentioned in my previous e-mail, OpenChrome DRM does not 
crash for "ttm_tt_create" member being null.
It is still not able to boot X Server due to some other TTM related memory 
allocation issue it is suffering from.
I think making huge changes to TTM during this development cycle broke 
OpenChrome DRM.
     Following up on the question I raised during the previous e-mail.
Shouldn't "use_tt" parameter being "false" for ttm_range_man_init() disable TTM 
TT functionality?
I feel like that should be the expected behavior.
Again, there is only 5 to 6 more days left until Linux 5.10-rc2, so I decided 
to contact you on Sunday (I consider this bug to be urgent.).
Assuming what I am asserting is correct, I think the reason why this was not 
discovered earlier was due to the following reasons.

1) nouveau, radeon, and amdgpu already use TTM TT functionality.
2) ast uses GEM VRAM helper that internally uses TTM. It populates "ttm_tt_create" and 
"ttm_tt_destroy" members, hence, the developers did not notice the breakage.
3) OpenChrome DRM is still not in the mainline tree, so no one other than 
myself noticed the problem until now.


Regarding the TTM TT functionality, OpenChrome DRM currently does not support acceleration, hence, 
I did not believe it was necessary to populate "ttm_tt_create" and 
"ttm_tt_destroy" members.
That implementation worked fine until the previous development cycle code.
Of course, I will eventually add support for acceleration, hence, TTM TT 
functionality will be utilized at some point.

Regards,

Kevin Brace
Brace Computer Laboratory blog
https://bracecomputerlab.com


Sent: Sunday, October 18, 2020 at 12:50 PM
From: "Dave Airlie" <[email protected]>
To: "Kevin Brace" <[email protected]>, "Christian König" 
<[email protected]>
Cc: "dri-devel" <[email protected]>, "Dave Airlie" 
<[email protected]>
Subject: Re: It appears drm-next TTM cleanup broke something . . .

On Mon, 19 Oct 2020 at 05:15, Kevin Brace <[email protected]> wrote:
Hi Dave,

It is a little urgent, so I am writing this right now.
As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM 
repository a few days ago.
While going through the changes I need to make to OpenChrome DRM to compile 
with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, 
and it was replaced with ttm_range_man_init().
ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do 
not think it is functioning correctly.
If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying 
it, TTM still tries to call it, and crashes due to a null pointer access.
The workaround I found so far is to specify the "ttm_tt_create" member by 
copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c.
This is what the call trace looks like without specifying the "ttm_tt_create" 
member (i.e., this member is null).
cc'ing Christian,

I can't remember if we did this deliberately or if just worked by
accident previously.

Either way, you should probably need a ttm_tt_create going forward.

Dave.

_______________________________________________
. . .
kernel: [   34.310674] [drm:openchrome_bo_create [openchrome]] Entered 
openchrome_bo_create.
kernel: [   34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] 
Entered openchrome_ttm_domain_to_placement.
kernel: [   34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] 
Exiting openchrome_ttm_domain_to_placement.
kernel: [   34.310737] BUG: kernel NULL pointer dereference, address: 
0000000000000000
kernel: [   34.310742] #PF: supervisor instruction fetch in kernel mode
kernel: [   34.310745] #PF: error_code(0x0010) - not-present page
. . .
kernel: [   34.310807] Call Trace:
kernel: [   34.310827]  ttm_tt_create+0x5f/0xa0 [ttm]
kernel: [   34.310839]  ttm_bo_validate+0xb8/0x140 [ttm]
kernel: [   34.310886]  ? drm_vma_offset_add+0x56/0x70 [drm]
kernel: [   34.310897]  ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome]
. . .
_______________________________________________

The erroneous call to  "ttm_tt_create" member happens right after TTM placement 
is performed (openchrome_ttm_domain_to_placement()).
Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" 
member, and this arrangement worked fine until Linux 5.9's drm-next code.
It appears that Linux 5.10's drm-next code broke the code.

Regards,

Kevin Brace
Brace Computer Laboratory blog
https://bracecomputerlab.com

_______________________________________________
dri-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
dri-devel mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to