Am 21.11.24 um 16:22 schrieb Felix Kuehling:
On 2024-11-20 22:58, Chen, Xiaogang wrote:
   @@ -1822,15 +1804,20 @@ struct kfd_process *kfd_lookup_process_by_pasid(u32 
pasid)
   {
       struct kfd_process *p, *ret_p = NULL;
       unsigned int temp;
+    int i;
         int idx = srcu_read_lock(&kfd_processes_srcu);
         hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
-        if (p->pasid == pasid) {
-            kref_get(&p->ref);
-            ret_p = p;
-            break;
+        for (i = 0; i < p->n_pdds; i++) {
+            if (p->pdds[i]->pasid == pasid) {
+                kref_get(&p->ref);
+                ret_p = p;
+                break;
+            }
I think this won't work correctly. The same PASID can be used for different 
processes on different GPUs because each adev manages its own PASID->amdgpu_vm 
lookup table. So kfd_lookup_process_by_pasid needs a new parameter that identifies 
the GPU adev, and you should only compare pasids, if the adev matches.
I think it is the main concern here: the pasid used here is global in driver by 
amdgpu_pasid_alloc(16) at amdgpu_driver_open_kms.  Each time a render 
node(partition) got opened, a new pasid value is generated. Its lifetime is 
until render node got closed. A pdd just uses this pasid.  And each adev has 
its own xarray which saves pasids for this adev.
I think I had a misunderstanding here. I saw the xarray in adev and assumed 
that each adev allocated PASIDs independently. But there is also a global 
amdgpu_pasid_ida that I missed. If the PASID allocation is global in the 
amdgpu_pasid_ida, then each PASID uniquely identifies a VM your code should be 
fine.

@Christian, we discussed the number of PASIDs consumed on systems with many 
GPUs and partitions. If the PASIDs are managed globally, then running out of 
PASIDs is a concern. Do you think we should change this?

I think we could change it, but I'm not sure if we should.

IIRC there used to be some KFD requirement that PASIDs were global unique, but could be that this was dropped together with ATC support.

Regards,
Christian.


Regards,
   Felix

Regards

Xiaogang

Reply via email to