On 30/05/2019 07:27, Alex Bennée wrote: > > Hi, > > Food for thought for today's sync up. I've been writting QEMU plugins to > exercise the plugin system and see what sort of useful information you > can extract when you can control the instruction stream. > > For example I now have a plugin that can break down instruction counts > for any given run, for example a kernel boot: > > Instruction Classes: > Class: UDEF not counted > Class: SVE (68 hits) > Class: Reserved (0 hits) > Class: PCrel addr (4589078 hits) > Class: Add/Sub (imm,tags) (0 hits) > Class: Add/Sub (imm) (26832113 hits) > Class: Logical (imm) (74304974 hits) > Class: Move Wide (imm) (10933759 hits) > Class: Bitfield (71470957 hits) > Class: Extract (85655 hits) > Class: Data Proc Imm (0 hits) > Class: Cond Branch (imm) (37227632 hits) > Class: Exception Gen (6 hits) > Class: NOP not counted > Class: Hints (244825554 hits) > Class: Barriers (1668558 hits) > Class: PSTATE (202144 hits) > Class: System Insn (7132992 hits) > Class: System Reg (2268308 hits) > Class: Branch (reg) (6280976 hits) > Class: Branch (imm) (18347905 hits) > Class: Cmp & Branch (180167025 hits) > Class: Tst & Branch (4092972 hits) > Class: Branches (0 hits) > Class: AdvSimd ldstmult (0 hits) > Class: AdvSimd ldstmult++ (0 hits) > Class: AdvSimd ldst (0 hits) > Class: AdvSimd ldst++ (0 hits) > Class: ldst excl (160861365 hits) > Class: Prefetch (0 hits) > Class: Load Reg (lit) (12828544 hits) > Class: ldst noalloc pair (0 hits) > Class: ldst pair (60381349 hits) > Class: ldst reg (0 hits) > Class: Atomic ldst (0 hits) > Class: ldst reg (reg off) (0 hits) > Class: ldst reg (pac) (0 hits) > Class: ldst reg (imm) (119597941 hits) > Class: Loads & Stores (0 hits) > Class: Data Proc Reg (113586343 hits) > Class: Scalar FP (0 hits) > Class: Unclassified (0 hits) > > You can break down each class to individual instructions. For example > the Hints are mostly: > > Individual Instructions: > Instr: wfe (132400072 hits) (op=0xd503205f/ > Hints) > Instr: sevl (66433640 hits) (op=0xd50320bf/ Hints) > Instr: yield (29619246 hits) (op=0xd503203f/ Hints) > Instr: wfi (2865 hits) (op=0xd503207f/ Hints) > > So I'm looking for a similar experiment that would be useful for the > memory sub-system. When I chatted to Maxim we thought maybe a simplified > cache line simulator might be useful. The aim wouldn't be to simulate > what a real cache might do but to be useful say for identifying regions > of code which might be susceptible to cache line bouncing. So as > compiler writers what sort of run time memory behaviour would you like > to track? What sort of information would be useful to extract with such > a tool? > > I'm open to ideas ;-)
Back at IBM one internal project we usually regularly was an instruction tracer based on a out-of-tree patch to valgrind. The idea was to get precise instruction sequence for a specific text segment boundary so we could it loaded it later on a powerpc simulator to post-analyse the code behaviour regarding instruction latency, op-ports utilization, cpu stalls etc. Not sure if would be that useful without a post-analysis tool, but I think it might be useful to some arch-specific optimization. What do you think? _______________________________________________ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-toolchain