Hi Kyrill, On 2019/8/27 18:16, Kyrill Tkachov wrote: > Hi Shaokun, > > On 8/22/19 3:10 PM, Shaokun Zhang wrote: >> The DCache clean & ICache invalidation requirements for instructions >> to be data coherence are discoverable through new fields in CTR_EL0. >> Let's support the two bits if they are enabled, then we can get some >> performance benefit from this feature. >> >> 2019-08-22 Shaokun Zhang <zhangshao...@hisilicon.com> >> >> * config/aarch64/sync-cache.c: Support CTR_EL0.IDC and CTR_EL0.DIC >> > This needs to mention __aarch64_sync_cache_range as the function being > changed. >
Ok, I will update in next version. > >> --- >> libgcc/config/aarch64/sync-cache.c | 56 >> ++++++++++++++++++++++++-------------- >> 1 file changed, 35 insertions(+), 21 deletions(-) >> >> diff --git a/libgcc/config/aarch64/sync-cache.c >> b/libgcc/config/aarch64/sync-cache.c >> index 791f5e42ff44..0b057efbdcab 100644 >> --- a/libgcc/config/aarch64/sync-cache.c >> +++ b/libgcc/config/aarch64/sync-cache.c >> @@ -23,6 +23,9 @@ a copy of the GCC Runtime Library Exception along with >> this program; >> see the files COPYING3 and COPYING.RUNTIME respectively. If not, see >> <http://www.gnu.org/licenses/>. */ >> >> +#define CTR_IDC_SHIFT 28 >> +#define CTR_DIC_SHIFT 29 >> + >> void __aarch64_sync_cache_range (const void *, const void *); >> >> void >> @@ -41,32 +44,43 @@ __aarch64_sync_cache_range (const void *base, const void >> *end) >> icache_lsize = 4 << (cache_info & 0xF); >> dcache_lsize = 4 << ((cache_info >> 16) & 0xF); >> >> - /* Loop over the address range, clearing one cache line at once. >> - Data cache must be flushed to unification first to make sure the >> - instruction cache fetches the updated data. 'end' is exclusive, >> - as per the GNU definition of __clear_cache. */ >> + /* If CTR_EL0.IDC is enabled, Data cache clean to the Point of >> Unification is >> + not required for instruction to data coherence. */ >> + >> + if ((cache_info >> CTR_IDC_SHIFT) & 0x1 == 0x0) { >> + /* Loop over the address range, clearing one cache line at once. >> + Data cache must be flushed to unification first to make sure the >> + instruction cache fetches the updated data. 'end' is exclusive, >> + as per the GNU definition of __clear_cache. */ >> >> - /* Make the start address of the loop cache aligned. */ >> - address = (const char*) ((__UINTPTR_TYPE__) base >> - & ~ (__UINTPTR_TYPE__) (dcache_lsize - 1)); >> + /* Make the start address of the loop cache aligned. */ >> + address = (const char*) ((__UINTPTR_TYPE__) base >> + & ~ (__UINTPTR_TYPE__) (dcache_lsize - 1)); >> >> - for (; address < (const char *) end; address += dcache_lsize) >> - asm volatile ("dc\tcvau, %0" >> - : >> - : "r" (address) >> - : "memory"); >> + for (; address < (const char *) end; address += dcache_lsize) >> + asm volatile ("dc\tcvau, %0" >> + : >> + : "r" (address) >> + : "memory"); >> + } >> >> asm volatile ("dsb\tish" : : : "memory"); >> >> - /* Make the start address of the loop cache aligned. */ >> - address = (const char*) ((__UINTPTR_TYPE__) base >> - & ~ (__UINTPTR_TYPE__) (icache_lsize - 1)); >> + /* If CTR_EL0.DIC is enabled, Instruction cache cleaning to the Point of >> + Unification is not required for instruction to data coherence. */ >> + >> + if ((cache_info >> CTR_DIC_SHIFT) & 0x1 == 0x0) { >> + /* Make the start address of the loop cache aligned. */ >> + address = (const char*) ((__UINTPTR_TYPE__) base >> + & ~ (__UINTPTR_TYPE__) (icache_lsize - 1)); >> >> - for (; address < (const char *) end; address += icache_lsize) >> - asm volatile ("ic\tivau, %0" >> - : >> - : "r" (address) >> - : "memory"); >> + for (; address < (const char *) end; address += icache_lsize) >> + asm volatile ("ic\tivau, %0" >> + : >> + : "r" (address) >> + : "memory"); >> >> - asm volatile ("dsb\tish; isb" : : : "memory"); >> + asm volatile ("dsb\tish" : : : "memory"); >> + } >> + asm volatile("isb" : : : "memory") >> } > > This looks ok to me (but you'll need approval from the maintainers). > > There is a question of whether we need the barriers if both DIC and IDC are 1 > (in which case no cache-maintentance instructions are emitted). > > I think we still want them to ensure the writes have been completed and the > fetches from the updated cache are up-to-date. > At the beginning, I'm not sure how to deal with the barrier instructions if DIC and IDC are 1, So I sent one mail to discuss it, it is a pity that no response. https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg217561.html >From the arm ARM document: STR W11, [X1] ; W11 contains a new instruction to be stored in program memory DC CVAU, X1 ; clean to PoU makes the new instruction visible to instruction cache DSB ISH IC IVAU, X1 ; ensures instruction cache/branch predictor discards stale data DSB ISH ; ensures completion of the invalidation ISB ; ensures instruction fetch path sees new instruction cache state AFAIK, the first "DSB ISH" is used ensure STR and DC CVAU instruction; Even if IDC is 1, "DSB ISH" shall be kept for STR instruction. The second "DSB ISH" and "ISB" are explained in comments, so I think "ISB" shall be kept also. > For arch versions before CTR_EL0.{DIC, IDC} these bits are reserved as zero > so the code will do the right thing on those targets. > > How has this patch been tested? Do you also have any performance results you > can share? > I sent the patch as RFC because our cpu core which supports DIC and IDC is designing, so there is no performance result at the moment. Thanks, Shaokun > Thanks, > > Kyrill > > >> -- >> 2.7.4 >> > > . >