Hello Chris, On Thursday 21 of July 2016 00:22:30 Chris Johns wrote: > On 20/07/2016 18:55, Pavel Pisa wrote: > > From be993a950be6c382b70609bcbc38be9bd161e1d4 Mon Sep 17 00:00:00 2001 > > Message-Id: > > <be993a950be6c382b70609bcbc38be9bd161e1d4.1469004576.git.p...@cmp.felk.cv > >ut.cz> From: Pavel Pisa <p...@cmp.felk.cvut.cz> > > Date: Wed, 20 Jul 2016 10:49:19 +0200 > > Subject: [PATCH] libdl/rtl-obj.c: synchronize cache after code > > relocation. To: rtems-de...@rtems.org > > > > Memory content changes caused by relocation has to be > > propagated to memory/cache level which is used/snooped > > during instruction cache fill. > > This looks fine to me. > > Does this close https://devel.rtems.org/ticket/2438? > > If it does could "Closes #2438" please be added to the commit comment.
I have added Trac tag. Testing on Zynq should be done as well and if there is problem still then ticket should be reopened or new added. Change has been tested on RPi1 and RPi2. There are many variant of possible integration of ARM cores even for single CPU marking variant. It seems that RPi2 Cortex-A7 includes multiprocessing extension which should work such way that it is enough to clean only instruction cache and prefetch buffer by virtual address range. Invalidation should be propagated to other cores. Then new fill of instruction caches causes snoops over all CPUs so it is not required to flush data cache(s). The observation on RPi2 on UP build confirms this kind of integration because even data cache flushing before relocation worked on RPi2. Actual default implementation of rtems_cache_instruction_sync_after_code_change() flushes not only instruction cache but even all levels of data cache by virtual address ranges. The behavior can be optimized in cache_.h by specifying #define CPU_CACHE_SUPPORT_PROVIDES_INSTRUCTION_SYNC_FUNCTION 1 and providing optimized _CPU_cache_instruction_sync_after_code_change() which does not need to flush data cache at all for this Cortex-A scenario. If multiprocessing extension is not implemented then flush of instruction cache and the first level of data is required on Cortex-A even on uniprocessor system. Flush to the level of inner level or level of unification is required on such Cortex-A variants. The RPi1 ARMv6 seems to not snoop data cache so the flush of the first data cache level is required and if not done after relocation then instruction opcodes are read OK but relocated target addresses are not seen. So I and D L1 cache flush is required and the need has been observed. On theother hand, L1 flush is enough so some limited flush version can be more efficient. But actual default code is OK even that it does too deep flushes. The behavior it is my interpretation of ARM documents but I have not found some clean table which compares all ARM architectures levels and variants and specify which CP15 instructions are supported for each variant and which caches are snooped under which conditions. Architecture manuals have many IMPLEMENTATION DEFINED or IF MULTIPROCESSOR EXTENSION etc statements so it is far from beeing 100% clear to me. As for the code, it should be enough if there is no other executable section, code or trampoline generated in some of omitted sections or elsewhere. I consider, that next mask covers all potential code sections RTEMS_RTL_OBJ_SECT_TEXT | RTEMS_RTL_OBJ_SECT_CONST | RTEMS_RTL_OBJ_SECT_DATA | RTEMS_RTL_OBJ_SECT_BSS | RTEMS_RTL_OBJ_SECT_EXEC; Problem arises if relocation changes code out of list of sections of given object. So if you load object and that object added symbols results in relocation of code in other object which load and rtems_rtl_obj_synchronize_cache() has already been called then there is a problem. So if there can be relocations which go outside of actual obj section list then there should be flush of these code updates. So if I think more about the patch then it is possible to revert this patch, use original flush after section loads but add flush in each relocation operation modifying code. Even if such approach is decided as the next step then I would suggest to to leave rtems_rtl_obj_synchronize_cache() there as alternative option. So at the end, I am not sure now when I think more about that after code pushing. On the other hand even if flush is done after each record relocation then newly loaded object final flush by sections not text, data, bss can be better that long alignment gaps or nonstandard layout is flush right. But is is possible that some more flushes need to be added load of new object can lead to update in previous one or mutual objects dependency is solved such way, that one object is loaded, rtems_rtl_obj_synchronize_cache() called, unresolved symbols are left open, then next object is loaded, previously missing symbols are resolved, previous object code relocation is updated, the second object relocation is done and rtems_rtl_obj_synchronize_cache() is then done only for the second object. It would work OK on Cortex-A7 with SMP extension but it is not generally correct with actual implementation. If described scenario is possible by RTL. Best wishes, Pavel _______________________________________________ devel mailing list devel@rtems.org http://lists.rtems.org/mailman/listinfo/devel