On Wed, Jan 11, 2023 at 3:48 PM Alex Deucher <[email protected]> wrote:
>
> On Wed, Jan 4, 2023 at 3:17 PM Marek Olšák <[email protected]> wrote:
> >
> > Yes, it's meant to be like a spec sheet. We are not interested in the 
> > current bandwidth utilization.
>
> After chatting with Marek on IRC and thinking about this more, I think
> this patch is fine.  It's not really meant for bandwidth per se, but
> rather as a limit to determine what the driver should do in certain
> cases (i.e., when does it make sense to copy to vram vs not).  It's
> not straightforward for userspace to parse the full topology to
> determine what links may be slow.  I guess one potential pitfall would
> be that if you pass the device into a VM, the driver may report the
> wrong values.  Generally in a VM the VM doesn't get the full view up
> to the root port.  I don't know if the hypervisors report properly for
> pcie_bandwidth_available() in a VM or if it just shows the info about
> the endpoint in the VM.
>
> Reviewed-by: Alex Deucher <[email protected]>

Actually:

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index fe7f871e3080..f7fc7325f17f 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -1053,7 +1053,7 @@ struct drm_amdgpu_info_device {
     __u32 enabled_rb_pipes_mask;
     __u32 num_rb_pipes;
     __u32 num_hw_gfx_contexts;
-    __u32 _pad;
+    __u32 pcie_gen;
     __u64 ids_flags;
     /** Starting virtual address for UMDs. */
     __u64 virtual_address_offset;
@@ -1109,6 +1109,7 @@ struct drm_amdgpu_info_device {
     __u64 high_va_max;
     /* gfx10 pa_sc_tile_steering_override */
     __u32 pa_sc_tile_steering_override;
+    __u32 pcie_num_lanes;
     /* disabled TCCs */
     __u64 tcc_disabled_mask;
     __u64 min_engine_clock;

Doesn't that last one need to be added to the end of the structure?

Alex

>
> Alex
>
> >
> > Marek
> >
> > On Wed, Jan 4, 2023 at 10:33 AM Lazar, Lijo <[email protected]> wrote:
> >>
> >> [AMD Official Use Only - General]
> >>
> >>
> >> To clarify, with DPM in place, the current bandwidth will be changing 
> >> based on the load.
> >>
> >> If apps/umd already has a way to know the current bandwidth utilisation, 
> >> then possible maximum also could be part of the same API. Otherwise, this 
> >> only looks like duplicate information. We have the same information in 
> >> sysfs DPM nodes.
> >>
> >> BTW, I don't know to what extent app/umd really makes use of this. Take 
> >> that memory frequency as an example (I'm reading it as 16GHz). It only 
> >> looks like a spec sheet.
> >>
> >> Thanks,
> >> Lijo
> >> ________________________________
> >> From: Marek Olšák <[email protected]>
> >> Sent: Wednesday, January 4, 2023 8:40:00 PM
> >> To: Lazar, Lijo <[email protected]>
> >> Cc: [email protected] <[email protected]>
> >> Subject: Re: [PATCH 1/2] drm/amdgpu: return the PCIe gen and lanes from 
> >> the INFO
> >>
> >> On Wed, Jan 4, 2023 at 9:19 AM Lazar, Lijo <[email protected]> wrote:
> >>
> >>
> >>
> >> On 1/4/2023 7:43 PM, Marek Olšák wrote:
> >> > On Wed, Jan 4, 2023 at 6:50 AM Lazar, Lijo <[email protected]
> >> > <mailto:[email protected]>> wrote:
> >> >
> >> >
> >> >
> >> >     On 1/4/2023 4:11 AM, Marek Olšák wrote:
> >> >      > I see. Well, those sysfs files are not usable, and I don't think 
> >> > it
> >> >      > would be important even if they were usable, but for completeness:
> >> >      >
> >> >      > The ioctl returns:
> >> >      >      pcie_gen = 1
> >> >      >      pcie_num_lanes = 16
> >> >      >
> >> >      > Theoretical bandwidth from those values: 4.0 GB/s
> >> >      > My DMA test shows this write bandwidth: 3.5 GB/s
> >> >      > It matches the expectation.
> >> >      >
> >> >      > Let's see the devices (there is only 1 GPU Navi21 in the system):
> >> >      > $ lspci |egrep '(PCI|VGA).*Navi'
> >> >      > 0a:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi
> >> >     10 XL
> >> >      > Upstream Port of PCI Express Switch (rev c3)
> >> >      > 0b:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi
> >> >     10 XL
> >> >      > Downstream Port of PCI Express Switch
> >> >      > 0c:00.0 VGA compatible controller: Advanced Micro Devices, Inc.
> >> >      > [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] (rev c3)
> >> >      >
> >> >      > Let's read sysfs:
> >> >      >
> >> >      > $ cat /sys/bus/pci/devices/0000:0a:00.0/current_link_width
> >> >      > 16
> >> >      > $ cat /sys/bus/pci/devices/0000:0b:00.0/current_link_width
> >> >      > 16
> >> >      > $ cat /sys/bus/pci/devices/0000:0c:00.0/current_link_width
> >> >      > 16
> >> >      > $ cat /sys/bus/pci/devices/0000:0a:00.0/current_link_speed
> >> >      > 2.5 GT/s PCIe
> >> >      > $ cat /sys/bus/pci/devices/0000:0b:00.0/current_link_speed
> >> >      > 16.0 GT/s PCIe
> >> >      > $ cat /sys/bus/pci/devices/0000:0c:00.0/current_link_speed
> >> >      > 16.0 GT/s PCIe
> >> >      >
> >> >      > Problem 1: None of the speed numbers match 4 GB/s.
> >> >
> >> >     US bridge = 2.5GT/s means operating at PCIe Gen 1 speed. Total
> >> >     theoretical bandwidth is then derived based on encoding and total
> >> >     number
> >> >     of lanes.
> >> >
> >> >      > Problem 2: Userspace doesn't know the bus index of the bridges,
> >> >     and it's
> >> >      > not clear which bridge should be used.
> >> >
> >> >     In general, modern ones have this arch= US->DS->EP. US is the one
> >> >     connected to physical link.
> >> >
> >> >      > Problem 3: The PCIe gen number is missing.
> >> >
> >> >     Current link speed is based on whether it's Gen1/2/3/4/5.
> >> >
> >> >     BTW, your patch makes use of capabilities flags which gives the 
> >> > maximum
> >> >     supported speed/width by the device. It may not necessarily reflect 
> >> > the
> >> >     current speed/width negotiated. I guess in NV, this info is already
> >> >     obtained from PMFW and made available through metrics table.
> >> >
> >> >
> >> > It computes the minimum of the device PCIe gen and the motherboard/slot
> >> > PCIe gen to get the final value. These 2 lines do that. The low 16 bits
> >> > of the mask contain the device PCIe gen mask. The high 16 bits of the
> >> > mask contain the slot PCIe gen mask.
> >> > + pcie_gen_mask = adev->pm.pcie_gen_mask & (adev->pm.pcie_gen_mask >> 
> >> > 16);
> >> > + dev_info->pcie_gen = fls(pcie_gen_mask);
> >> >
> >>
> >> With DPM in place on some ASICs, how much does this static info help for
> >> upper level apps?
> >>
> >>
> >> It helps UMDs make better decisions if they know the maximum achievable 
> >> bandwidth. UMDs also compute the maximum memory bandwidth and compute 
> >> performance (FLOPS). Right now it's printed by Mesa to give users detailed 
> >> information about their GPU. For example:
> >>
> >> $ AMD_DEBUG=info glxgears
> >> Device info:
> >>     name = NAVI21
> >>     marketing_name = AMD Radeon RX 6800
> >>     num_se = 3
> >>     num_rb = 12
> >>     num_cu = 60
> >>     max_gpu_freq = 2475 MHz
> >>     max_gflops = 19008 GFLOPS
> >>     l0_cache_size = 16 KB
> >>     l1_cache_size = 128 KB
> >>     l2_cache_size = 4096 KB
> >>     l3_cache_size = 128 MB
> >>     memory_channels = 16 (TCC blocks)
> >>     memory_size = 16 GB (16384 MB)
> >>     memory_freq = 16 GHz
> >>     memory_bus_width = 256 bits
> >>     memory_bandwidth = 512 GB/s
> >>     pcie_gen = 1
> >>     pcie_num_lanes = 16
> >>     pcie_bandwidth = 4.0 GB/s
> >>
> >> Marek

Reply via email to