On Thu, Jan 10, 2019, 6:51 AM Christian König < [email protected] wrote:
> Am 10.01.19 um 12:41 schrieb Marek Olšák: > > > > On Thu, Jan 10, 2019, 4:15 AM Koenig, Christian <[email protected] > wrote: > >> Am 10.01.19 um 00:39 schrieb Marek Olšák: >> >> On Wed, Jan 9, 2019 at 1:41 PM Christian König < >> [email protected]> wrote: >> >>> Am 09.01.19 um 17:14 schrieb Marek Olšák: >>> >>> On Wed, Jan 9, 2019 at 8:09 AM Christian König < >>> [email protected]> wrote: >>> >>>> Am 09.01.19 um 13:36 schrieb Marek Olšák: >>>> >>>> >>>> >>>> On Wed, Jan 9, 2019, 5:28 AM Christian König < >>>> [email protected] wrote: >>>> >>>>> Looks good, but I'm wondering what's the actual improvement? >>>>> >>>> >>>> No malloc calls and 1 less for loop copying the bo list. >>>> >>>> >>>> Yeah, but didn't we want to get completely rid of the bo list? >>>> >>> >>> If we have multiple IBs (e.g. gfx + compute) that share a BO list, I >>> think it's faster to send the BO list to the kernel only once. >>> >>> >>> That's not really faster. >>> >>> The only thing we safe us is a single loop over all BOs to lockup the >>> handle into a pointer and that is only a tiny fraction of the overhead. >>> >>> The majority of the overhead is locking the BOs and reserving space for >>> the submission. >>> >>> What could really help here is to submit gfx+comput together in just one >>> CS IOCTL. This way we would need the locking and space reservation only >>> once. >>> >>> It's a bit of work in the kernel side, but certainly doable. >>> >> >> OK. Any objections to this patch? >> >> >> In general I'm wondering if we couldn't avoid adding so much new >> interface. >> > > There are Vulkan drivers that still use the bo_list interface. > > >> For example we can avoid the malloc() when we just cache the last freed >> bo_list structure in the device. We would just need an atomic pointer >> exchange operation for that. >> > >> This way we even don't need to change mesa at all. >> > > There is still the for loop that we need to get rid of. > > > Yeah, but that I'm fine to handle with a amdgpu_bo_list_create_raw which > only takes the handles and still returns the amdgpu_bo_list structure we > are used to. > > See what I'm mostly concerned about is having another CS function to > maintain. > There is no maintenance cost. It's just a wrapper. Eventually all drivers will switch to it. Marek > > >> Regarding optimization, this chunk can be replaced by a cast on 64bit: >> >> + chunk_array = alloca(sizeof(uint64_t) * num_chunks); >> + for (i = 0; i < num_chunks; i++) >> + chunk_array[i] = (uint64_t)(uintptr_t)&chunks[i]; >> >> It can't. The input is an array of structures. The ioctl takes an array > of pointers. > > > Ah! Haven't seen this, sorry for the noise. > > Christian. > > > Marek > > >> Regards, >> Christian. >> >> >> Thanks, >> Marek >> >> >> > _______________________________________________ > amd-gfx mailing > [email protected]https://lists.freedesktop.org/mailman/listinfo/amd-gfx > > >
_______________________________________________ amd-gfx mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/amd-gfx
