Hi Kyrill, On 2019/8/28 16:57, Kyrill Tkachov wrote: > Hi Shaokun, > > On 8/28/19 9:47 AM, Shaokun Zhang wrote: >> Hi Kyrill, >> >> On 2019/8/27 18:16, Kyrill Tkachov wrote: >>> Hi Shaokun, >>> >>> On 8/22/19 3:10 PM, Shaokun Zhang wrote: >>>> The DCache clean & ICache invalidation requirements for instructions >>>> to be data coherence are discoverable through new fields in CTR_EL0. >>>> Let's support the two bits if they are enabled, then we can get some >>>> performance benefit from this feature. >>>> >>>> 2019-08-22 Shaokun Zhang <zhangshao...@hisilicon.com> >>>> >>>> * config/aarch64/sync-cache.c: Support CTR_EL0.IDC and CTR_EL0.DIC >>>> >>> This needs to mention __aarch64_sync_cache_range as the function being >>> changed. >>> >> Ok, I will update in next version. >> >>>> --- >>>> libgcc/config/aarch64/sync-cache.c | 56 >>>> ++++++++++++++++++++++++-------------- >>>> 1 file changed, 35 insertions(+), 21 deletions(-) >>>> >>>> diff --git a/libgcc/config/aarch64/sync-cache.c >>>> b/libgcc/config/aarch64/sync-cache.c >>>> index 791f5e42ff44..0b057efbdcab 100644 >>>> --- a/libgcc/config/aarch64/sync-cache.c >>>> +++ b/libgcc/config/aarch64/sync-cache.c >>>> @@ -23,6 +23,9 @@ a copy of the GCC Runtime Library Exception along with >>>> this program; >>>> see the files COPYING3 and COPYING.RUNTIME respectively. If not, see >>>> <http://www.gnu.org/licenses/>. */ >>>> >>>> +#define CTR_IDC_SHIFT 28 >>>> +#define CTR_DIC_SHIFT 29 >>>> + >>>> void __aarch64_sync_cache_range (const void *, const void *); >>>> >>>> void >>>> @@ -41,32 +44,43 @@ __aarch64_sync_cache_range (const void *base, const >>>> void *end) >>>> icache_lsize = 4 << (cache_info & 0xF); >>>> dcache_lsize = 4 << ((cache_info >> 16) & 0xF); >>>> >>>> - /* Loop over the address range, clearing one cache line at once. >>>> - Data cache must be flushed to unification first to make sure the >>>> - instruction cache fetches the updated data. 'end' is exclusive, >>>> - as per the GNU definition of __clear_cache. */ >>>> + /* If CTR_EL0.IDC is enabled, Data cache clean to the Point of >>>> Unification is >>>> + not required for instruction to data coherence. */ >>>> + >>>> + if ((cache_info >> CTR_IDC_SHIFT) & 0x1 == 0x0) { >>>> + /* Loop over the address range, clearing one cache line at once. >>>> + Data cache must be flushed to unification first to make sure the >>>> + instruction cache fetches the updated data. 'end' is exclusive, >>>> + as per the GNU definition of __clear_cache. */ >>>> >>>> - /* Make the start address of the loop cache aligned. */ >>>> - address = (const char*) ((__UINTPTR_TYPE__) base >>>> - & ~ (__UINTPTR_TYPE__) (dcache_lsize - 1)); >>>> + /* Make the start address of the loop cache aligned. */ >>>> + address = (const char*) ((__UINTPTR_TYPE__) base >>>> + & ~ (__UINTPTR_TYPE__) (dcache_lsize - 1)); >>>> >>>> - for (; address < (const char *) end; address += dcache_lsize) >>>> - asm volatile ("dc\tcvau, %0" >>>> - : >>>> - : "r" (address) >>>> - : "memory"); >>>> + for (; address < (const char *) end; address += dcache_lsize) >>>> + asm volatile ("dc\tcvau, %0" >>>> + : >>>> + : "r" (address) >>>> + : "memory"); >>>> + } >>>> >>>> asm volatile ("dsb\tish" : : : "memory"); >>>> >>>> - /* Make the start address of the loop cache aligned. */ >>>> - address = (const char*) ((__UINTPTR_TYPE__) base >>>> - & ~ (__UINTPTR_TYPE__) (icache_lsize - 1)); >>>> + /* If CTR_EL0.DIC is enabled, Instruction cache cleaning to the Point of >>>> + Unification is not required for instruction to data coherence. */ >>>> + >>>> + if ((cache_info >> CTR_DIC_SHIFT) & 0x1 == 0x0) { >>>> + /* Make the start address of the loop cache aligned. */ >>>> + address = (const char*) ((__UINTPTR_TYPE__) base >>>> + & ~ (__UINTPTR_TYPE__) (icache_lsize - 1)); >>>> >>>> - for (; address < (const char *) end; address += icache_lsize) >>>> - asm volatile ("ic\tivau, %0" >>>> - : >>>> - : "r" (address) >>>> - : "memory"); >>>> + for (; address < (const char *) end; address += icache_lsize) >>>> + asm volatile ("ic\tivau, %0" >>>> + : >>>> + : "r" (address) >>>> + : "memory"); >>>> >>>> - asm volatile ("dsb\tish; isb" : : : "memory"); >>>> + asm volatile ("dsb\tish" : : : "memory"); >>>> + } >>>> + asm volatile("isb" : : : "memory") >>>> } >>> This looks ok to me (but you'll need approval from the maintainers). >>> >>> There is a question of whether we need the barriers if both DIC and IDC are >>> 1 (in which case no cache-maintentance instructions are emitted). >>> >>> I think we still want them to ensure the writes have been completed and the >>> fetches from the updated cache are up-to-date. >>> >> At the beginning, I'm not sure how to deal with the barrier instructions if >> DIC and IDC >> are 1, So I sent one mail to discuss it, it is a pity that no response. >> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg217561.html > > Sorry for that. August is a tricky month due to summer vacation in many > places and mail can slip through the cracks... >
Oh, thanks for more information about it, Got it. > I recommend you ping your question if you haven't had a reply within a week > or so. > Good idea, I will follow it in the future work. >> >> From the arm ARM document: >> STR W11, [X1] ; W11 contains a new instruction to be stored in >> program memory >> DC CVAU, X1 ; clean to PoU makes the new instruction visible >> to instruction cache >> DSB ISH >> IC IVAU, X1 ; ensures instruction cache/branch predictor >> discards stale data >> DSB ISH ; ensures completion of the invalidation >> ISB ; ensures instruction fetch path sees new >> instruction cache state >> >> AFAIK, the first "DSB ISH" is used ensure STR and DC CVAU instruction; Even >> if IDC is 1, >> "DSB ISH" shall be kept for STR instruction. The second "DSB ISH" and "ISB" >> are explained >> in comments, so I think "ISB" shall be kept also. > > > Thanks, that's my understanding as well. > > >>> For arch versions before CTR_EL0.{DIC, IDC} these bits are reserved as zero >>> so the code will do the right thing on those targets. >>> >>> How has this patch been tested? Do you also have any performance results >>> you can share? >>> >> I sent the patch as RFC because our cpu core which supports DIC and IDC is >> designing, >> so there is no performance result at the moment. > > > That's okay. In the meantime a bootstrap and test cycle on a current aarch64 > machine would be good to make sure existing functionality doesn't break. > I will update the ChangeLog and do the formal patch if the maintainers are happy on it. Cheers, Shaokun > Thanks, > > Kyrill > >> >> Thanks, >> Shaokun >> >>> Thanks, >>> >>> Kyrill >>> >>> >>>> -- >>>> 2.7.4 >>>> >>> . >>> > > . >