Hello Stefano,

> On 29 Oct 2020, at 8:17 pm, Stefano Stabellini <[email protected]> wrote:
> 
> On Thu, 29 Oct 2020, Bertrand Marquis wrote:
>>> On 28 Oct 2020, at 19:12, Julien Grall <[email protected]> wrote:
>>> On 26/10/2020 11:03, Rahul Singh wrote:
>>>> Hello Julien,
>>>>> On 23 Oct 2020, at 4:19 pm, Julien Grall <[email protected]> wrote:
>>>>> On 23/10/2020 15:27, Rahul Singh wrote:
>>>>>> Hello Julien,
>>>>>>> On 23 Oct 2020, at 2:00 pm, Julien Grall <[email protected]> wrote:
>>>>>>> On 23/10/2020 12:35, Rahul Singh wrote:
>>>>>>>> Hello,
>>>>>>>>> On 23 Oct 2020, at 1:02 am, Stefano Stabellini 
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> On Thu, 22 Oct 2020, Julien Grall wrote:
>>>>>>>>>>>> On 20/10/2020 16:25, Rahul Singh wrote:
>>>>>>>>>>>>> Add support for ARM architected SMMUv3 implementations. It is 
>>>>>>>>>>>>> based on
>>>>>>>>>>>>> the Linux SMMUv3 driver.
>>>>>>>>>>>>> Major differences between the Linux driver are as follows:
>>>>>>>>>>>>> 1. Only Stage-2 translation is supported as compared to the Linux 
>>>>>>>>>>>>> driver
>>>>>>>>>>>>>   that supports both Stage-1 and Stage-2 translations.
>>>>>>>>>>>>> 2. Use P2M  page table instead of creating one as SMMUv3 has the
>>>>>>>>>>>>>   capability to share the page tables with the CPU.
>>>>>>>>>>>>> 3. Tasklets is used in place of threaded IRQ's in Linux for event 
>>>>>>>>>>>>> queue
>>>>>>>>>>>>>   and priority queue IRQ handling.
>>>>>>>>>>>> 
>>>>>>>>>>>> Tasklets are not a replacement for threaded IRQ. In particular, 
>>>>>>>>>>>> they will
>>>>>>>>>>>> have priority over anything else (IOW nothing will run on the pCPU 
>>>>>>>>>>>> until
>>>>>>>>>>>> they are done).
>>>>>>>>>>>> 
>>>>>>>>>>>> Do you know why Linux is using thread. Is it because of long 
>>>>>>>>>>>> running
>>>>>>>>>>>> operations?
>>>>>>>>>>> 
>>>>>>>>>>> Yes you are right because of long running operations Linux is using 
>>>>>>>>>>> the
>>>>>>>>>>> threaded IRQs.
>>>>>>>>>>> 
>>>>>>>>>>> SMMUv3 reports fault/events bases on memory-based circular buffer 
>>>>>>>>>>> queues not
>>>>>>>>>>> based on the register. As per my understanding, it is 
>>>>>>>>>>> time-consuming to
>>>>>>>>>>> process the memory based queues in interrupt context because of 
>>>>>>>>>>> that Linux
>>>>>>>>>>> is using threaded IRQ to process the faults/events from SMMU.
>>>>>>>>>>> 
>>>>>>>>>>> I didn’t find any other solution in XEN in place of tasklet to 
>>>>>>>>>>> defer the
>>>>>>>>>>> work, that’s why I used tasklet in XEN in replacement of threaded 
>>>>>>>>>>> IRQs. If
>>>>>>>>>>> we do all work in interrupt context we will make XEN less 
>>>>>>>>>>> responsive.
>>>>>>>>>> 
>>>>>>>>>> So we need to make sure that Xen continue to receives interrupts, 
>>>>>>>>>> but we also
>>>>>>>>>> need to make sure that a vCPU bound to the pCPU is also responsive.
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> If you know another solution in XEN that will be used to defer the 
>>>>>>>>>>> work in
>>>>>>>>>>> the interrupt please let me know I will try to use that.
>>>>>>>>>> 
>>>>>>>>>> One of my work colleague encountered a similar problem recently. He 
>>>>>>>>>> had a long
>>>>>>>>>> running tasklet and wanted to be broken down in smaller chunk.
>>>>>>>>>> 
>>>>>>>>>> We decided to use a timer to reschedule the taslket in the future. 
>>>>>>>>>> This allows
>>>>>>>>>> the scheduler to run other loads (e.g. vCPU) for some time.
>>>>>>>>>> 
>>>>>>>>>> This is pretty hackish but I couldn't find a better solution as 
>>>>>>>>>> tasklet have
>>>>>>>>>> high priority.
>>>>>>>>>> 
>>>>>>>>>> Maybe the other will have a better idea.
>>>>>>>>> 
>>>>>>>>> Julien's suggestion is a good one.
>>>>>>>>> 
>>>>>>>>> But I think tasklets can be configured to be called from the 
>>>>>>>>> idle_loop,
>>>>>>>>> in which case they are not run in interrupt context?
>>>>>>>>> 
>>>>>>>> Yes you are right tasklet will be scheduled from the idle_loop that is 
>>>>>>>> not interrupt conext.
>>>>>>> 
>>>>>>> This depends on your tasklet. Some will run from the softirq context 
>>>>>>> which is usually (for Arm) on the return of an exception.
>>>>>>> 
>>>>>> Thanks for the info. I will check and will get better understanding of 
>>>>>> the tasklet how it will run in XEN.
>>>>>>>>> 
>>>>>>>>>>>>> 4. Latest version of the Linux SMMUv3 code implements the 
>>>>>>>>>>>>> commands queue
>>>>>>>>>>>>>   access functions based on atomic operations implemented in 
>>>>>>>>>>>>> Linux.
>>>>>>>>>>>> 
>>>>>>>>>>>> Can you provide more details?
>>>>>>>>>>> 
>>>>>>>>>>> I tried to port the latest version of the SMMUv3 code than I 
>>>>>>>>>>> observed that
>>>>>>>>>>> in order to port that code I have to also port atomic operation 
>>>>>>>>>>> implemented
>>>>>>>>>>> in Linux to XEN. As latest Linux code uses atomic operation to 
>>>>>>>>>>> process the
>>>>>>>>>>> command queues 
>>>>>>>>>>> (atomic_cond_read_relaxed(),atomic_long_cond_read_relaxed() ,
>>>>>>>>>>> atomic_fetch_andnot_relaxed()) .
>>>>>>>>>> 
>>>>>>>>>> Thank you for the explanation. I think it would be best to import 
>>>>>>>>>> the atomic
>>>>>>>>>> helpers and use the latest code.
>>>>>>>>>> 
>>>>>>>>>> This will ensure that we don't re-introduce bugs and also buy us 
>>>>>>>>>> some time
>>>>>>>>>> before the Linux and Xen driver diverge again too much.
>>>>>>>>>> 
>>>>>>>>>> Stefano, what do you think?
>>>>>>>>> 
>>>>>>>>> I think you are right.
>>>>>>>> Yes, I agree with you to have XEN code in sync with Linux code that's 
>>>>>>>> why I started with to port the Linux atomic operations to XEN  then I 
>>>>>>>> realised that it is not straightforward to port atomic operations and 
>>>>>>>> it requires lots of effort and testing. Therefore I decided to port 
>>>>>>>> the code before the atomic operation is introduced in Linux.
>>>>>>> 
>>>>>>> Hmmm... I would not have expected a lot of effort required to add the 3 
>>>>>>> atomics operations above. Are you trying to also port the LSE support 
>>>>>>> at the same time?
>>>>>> There are other atomic operations used in the SMMUv3 code apart from the 
>>>>>> 3 atomic operation I mention. I just mention 3 operation as an example.
>>>>> 
>>>>> Ok. Do you have a list you could share?
>>>>> 
>>>> Yes. Please find the list that we have to port to the XEN in order to 
>>>> merge the latest SMMUv3 code.
>>> 
>>> Thanks!
>>> 
>>>> If we start to port the below list we might have to port another atomic 
>>>> operation based on which below atomic operations are implemented. I have 
>>>> not spent time on how these atomic operations are implemented in detail 
>>>> but as per my understanding, it required an effort to port them to XEN and 
>>>> required a lot of testing.
>>> 
>>> For the beginning, I think it is fine to implement them with a stronger 
>>> memory barrier than necessary or in a less efficient. This can be refined 
>>> afterwards.
>>> 
>>> Those helpers can directly be defined in the SMMUv3 drivers so we know they 
>>> are not for general purpose :).
>>> 
>>>> 1. atomic_set_release
>>> 
>>> This could be implemented as:
>>> 
>>> smp_mb();
>>> atomic_set();
>>> 
>>>> 2. atomic_fetch_andnot_relaxed
>>> 
>>> This would need to be imported.
>>> 
>>>> 3. atomic_cond_read_relaxed
>>> 
>>> This would need to be imported. The simplest version seems to be the 
>>> generic version provided by include/asm-generic/barrier.h (see 
>>> smp_cond_load_relaxed).
>>> 
>>>> 4. atomic_long_cond_read_relaxed
>>>> 5. atomic_long_xor
>>> 
>>> The two would require the implementation of atomic64. Volodymyr also 
>>> required a version. I offered my help, however I didn't find enough time to 
>>> do it yet :(.
>>> 
>>> For Arm64, it would be possible to do a copy/paste of the existing helpers 
>>> and replace anything related to a 32-bit register with a 64-bit one.
>>> 
>>> For Arm32, they are a bit more complex because you now need to work with 2 
>>> registers.
>>> 
>>> However, for your purpose, you would be using atomic_long_t. So the the 
>>> Arm64 implementation should be sufficient.
>>> 
>>>> 6. atomic_set_release
>>> 
>>> Same as 1.
>>> 
>>>> 7. atomic_cmpxchg_relaxed might be we can use atomic_cmpxchg that is 
>>>> implemented in XEN need to check.
>>> atomic_cmpxchg() is strongly ordered. So it would be fine to use it for 
>>> implementing the helper. Although, more inefficient :).
>>> 
>>>> 8. atomic_dec_return_release
>>> 
>>> Xen implements a stronger version atomic_dec_return(). You can re-use it 
>>> here.
>>> 
>>>> 9. atomic_fetch_inc_relaxed
>>> 
>>> This would need to be imported.
>> 
>> We do agree that this would be the work required but some of the things to 
>> be imported have dependencies and this is not
>> a simple change on the patch done by Rahul and it would require to almost 
>> restart the implementation and testing from the
>> very beginning.
>> 
>> In the meantime could we process with the review of this SMMUv3 driver and 
>> include it in Xen as is ?
>> 
>> We can set me and Rahul as maintainers and flag the driver as experimental 
>> in Support.md (it is already
>> protected by the EXPERT configuration in Kconfig).
> 
> I think that is OK as long as you make sure to go through the changelog
> of the Linux driver to make sure we are not missing any bug fixes and
> errata fixes.

Ok Yes when I ported the driver I port the command queue operation from the 
previous commit where atomic operations is not used and rest all the code is 
from the latest code. I will again make sure that any bug that is fixed in 
Linux should be fixed in XEN also.


Regards,
Rahul

Reply via email to