Am 21.11.24 um 16:22 schrieb Felix Kuehling:
On 2024-11-20 22:58, Chen, Xiaogang wrote:
@@ -1822,15 +1804,20 @@ struct kfd_process *kfd_lookup_process_by_pasid(u32
pasid)
{
struct kfd_process *p, *ret_p = NULL;
unsigned int temp;
+ int i;
int idx = srcu_read_lock(&kfd_processes_srcu);
hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
- if (p->pasid == pasid) {
- kref_get(&p->ref);
- ret_p = p;
- break;
+ for (i = 0; i < p->n_pdds; i++) {
+ if (p->pdds[i]->pasid == pasid) {
+ kref_get(&p->ref);
+ ret_p = p;
+ break;
+ }
I think this won't work correctly. The same PASID can be used for different
processes on different GPUs because each adev manages its own PASID->amdgpu_vm
lookup table. So kfd_lookup_process_by_pasid needs a new parameter that identifies
the GPU adev, and you should only compare pasids, if the adev matches.
I think it is the main concern here: the pasid used here is global in driver by
amdgpu_pasid_alloc(16) at amdgpu_driver_open_kms. Each time a render
node(partition) got opened, a new pasid value is generated. Its lifetime is
until render node got closed. A pdd just uses this pasid. And each adev has
its own xarray which saves pasids for this adev.
I think I had a misunderstanding here. I saw the xarray in adev and assumed
that each adev allocated PASIDs independently. But there is also a global
amdgpu_pasid_ida that I missed. If the PASID allocation is global in the
amdgpu_pasid_ida, then each PASID uniquely identifies a VM your code should be
fine.
@Christian, we discussed the number of PASIDs consumed on systems with many
GPUs and partitions. If the PASIDs are managed globally, then running out of
PASIDs is a concern. Do you think we should change this?
I think we could change it, but I'm not sure if we should.
IIRC there used to be some KFD requirement that PASIDs were global
unique, but could be that this was dropped together with ATC support.
Regards,
Christian.
Regards,
Felix
Regards
Xiaogang