I don't have access to any gfx10 hardware.

Can you give me a dmesg and/or backtrace, etc..?

I can't push this unless it's working properly.

Christian.

Am 16.05.22 um 14:56 schrieb Marek Olšák:
Reproduction steps:
- use mesa/main on gfx10.3 (not sure what other GPUs do)
- run: radeonsi_mall_noalloc=true glxgears

Marek

On Mon, May 16, 2022 at 7:53 AM Christian König <[email protected]> wrote:

    Crap, do you have a link to the failure?

    Am 16.05.22 um 13:10 schrieb Marek Olšák:
    I forgot to say: The NOALLOC flag causes an allocation failure,
    so there is a kernel bug somewhere.

    Marek

    On Mon, May 16, 2022 at 7:06 AM Marek Olšák <[email protected]> wrote:

        FYI, I think it's time to merge this because the Mesa commits
        are going to be merged in ~30 minutes if Gitlab CI is green,
        and that includes updated amdgpu_drm.h.

        Marek

        On Wed, May 11, 2022 at 2:55 PM Marek Olšák
        <[email protected]> wrote:

            Ok sounds good.

            Marek

            On Wed., May 11, 2022, 03:43 Christian König,
            <[email protected]> wrote:

                It really *is* a NOALLOC feature. In other words
                there is no latency improvement on reads because the
                cache is always checked, even with the noalloc flag set.

                The only thing it affects is that misses not enter
                the cache and so don't cause any additional pressure
                on evicting cache lines.

                You might want to double check with the hardware
                guys, but I'm something like 95% sure that it works
                this way.

                Christian.

                Am 11.05.22 um 09:22 schrieb Marek Olšák:
                Bypass means that the contents of the cache are
                ignored, which decreases latency at the cost of no
                coherency between bypassed and normal memory
                requests. NOA (noalloc) means that the cache is
                checked and can give you cache hits, but misses are
                not cached and the overall latency is higher. I
                don't know what the hw does, but I hope it was
                misnamed and it really means bypass because there is
                no point in doing cache lookups on every memory
                request if the driver wants to disable caching to
                *decrease* latency in the situations when the cache
                isn't helping.

                Marek

                On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo
                <[email protected]> wrote:



                    On 5/11/2022 11:36 AM, Christian König wrote:
                    > Mhm, it doesn't really bypass MALL. It just
                    doesn't allocate any MALL
                    > entries on write.
                    >
                    > How about AMDGPU_VM_PAGE_NO_MALL ?

                    One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last
                    level cache, * = some sort
                    of attribute which decides LLC behaviour]

                    Thanks,
                    Lijo

                    >
                    > Christian.
                    >
                    > Am 10.05.22 um 23:21 schrieb Marek Olšák:
                    >> A better name would be:
                    >> AMDGPU_VM_PAGE_BYPASS_MALL
                    >>
                    >> Marek
                    >>
                    >> On Fri, May 6, 2022 at 7:23 AM Christian König
                    >> <[email protected]> wrote:
                    >>
                    >>     Add the AMDGPU_VM_NOALLOC flag to let
                    userspace control MALL
                    >>     allocation.
                    >>
                    >>     Only compile tested!
                    >>
                    >>     Signed-off-by: Christian König
                    <[email protected]>
                    >>     ---
                    >>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
                    >>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 3 +++
                    >>   drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 3 +++
                    >>   include/uapi/drm/amdgpu_drm.h        | 2 ++
                    >>      4 files changed, 10 insertions(+)
                    >>
                    >>     diff --git
                    a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
                    >>  b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
                    >>     index bf97d8f07f57..d8129626581f 100644
                    >>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
                    >>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
                    >>     @@ -650,6 +650,8 @@ uint64_t
                    amdgpu_gem_va_map_flags(struct
                    >>     amdgpu_device *adev, uint32_t flags)
                    >> pte_flag |= AMDGPU_PTE_WRITEABLE;
                    >>             if (flags & AMDGPU_VM_PAGE_PRT)
                    >> pte_flag |= AMDGPU_PTE_PRT;
                    >>     +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
                    >>     +  pte_flag |= AMDGPU_PTE_NOALLOC;
                    >>
                    >>             if (adev->gmc.gmc_funcs->map_mtype)
                    >> pte_flag |= amdgpu_gmc_map_mtype(adev,
                    >>     diff --git
                    a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
                    >>  b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
                    >>     index b8c79789e1e4..9077dfccaf3c 100644
                    >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
                    >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
                    >>     @@ -613,6 +613,9 @@ static void
                    gmc_v10_0_get_vm_pte(struct
                    >>     amdgpu_device *adev,
                    >>             *flags &=
                    ~AMDGPU_PTE_MTYPE_NV10_MASK;
                    >>             *flags |= (mapping->flags &
                    AMDGPU_PTE_MTYPE_NV10_MASK);
                    >>
                    >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
                    >>     +       *flags |= (mapping->flags &
                    AMDGPU_PTE_NOALLOC);
                    >>     +
                    >>             if (mapping->flags &
                    AMDGPU_PTE_PRT) {
                    >> *flags |= AMDGPU_PTE_PRT;
                    >> *flags |= AMDGPU_PTE_SNOOPED;
                    >>     diff --git
                    a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
                    >>  b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
                    >>     index 8d733eeac556..32ee56adb602 100644
                    >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
                    >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
                    >>     @@ -508,6 +508,9 @@ static void
                    gmc_v11_0_get_vm_pte(struct
                    >>     amdgpu_device *adev,
                    >>             *flags &=
                    ~AMDGPU_PTE_MTYPE_NV10_MASK;
                    >>             *flags |= (mapping->flags &
                    AMDGPU_PTE_MTYPE_NV10_MASK);
                    >>
                    >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
                    >>     +       *flags |= (mapping->flags &
                    AMDGPU_PTE_NOALLOC);
                    >>     +
                    >>             if (mapping->flags &
                    AMDGPU_PTE_PRT) {
                    >> *flags |= AMDGPU_PTE_PRT;
                    >> *flags |= AMDGPU_PTE_SNOOPED;
                    >>     diff --git a/include/uapi/drm/amdgpu_drm.h
                    >>  b/include/uapi/drm/amdgpu_drm.h
                    >>     index 57b9d8f0133a..9d71d6330687 100644
                    >>     --- a/include/uapi/drm/amdgpu_drm.h
                    >>     +++ b/include/uapi/drm/amdgpu_drm.h
                    >>     @@ -533,6 +533,8 @@ struct
                    drm_amdgpu_gem_op {
                    >>      #define AMDGPU_VM_MTYPE_UC           
                     (4 << 5)
                    >>      /* Use Read Write MTYPE instead of
                    default MTYPE */
                    >>      #define AMDGPU_VM_MTYPE_RW           
                     (5 << 5)
                    >>     +/* don't allocate MALL */
                    >>     +#define AMDGPU_VM_PAGE_NOALLOC       
                     (1 << 9)
                    >>
                    >>      struct drm_amdgpu_gem_va {
                    >>             /** GEM object handle */
                    >>     --
                    >>     2.25.1
                    >>
                    >



Reply via email to