On 1/26/2021 10:37 AM, Peter Zijlstra wrote:
On Tue, Jan 19, 2021 at 12:38:22PM -0800, [email protected] wrote:@@ -1577,9 +1668,20 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event, }if (format_size & PEBS_DATACFG_MEMINFO) {+ if (sample_type & PERF_SAMPLE_WEIGHT) { + u64 weight = meminfo->latency; + + if (x86_pmu.flags & PMU_FL_INSTR_LATENCY) + weight >>= PEBS_CACHE_LATENCY_OFFSET; + data->weight = weight & PEBS_LATENCY_MASK ?: intel_get_tsx_weight(meminfo->tsx_tuning); + } + + if (sample_type & PERF_SAMPLE_WEIGHT_EXT) { + data->weight_ext.val = 0; + if (x86_pmu.flags & PMU_FL_INSTR_LATENCY) + data->weight_ext.instr_latency = meminfo->latency & PEBS_LATENCY_MASK; + }if (sample_type & PERF_SAMPLE_DATA_SRC)data->data_src.val = get_data_src(event, meminfo->aux);Talk to me about that SAMPLE_WEIGHT stuff.... I'm not liking it. Sure you want multiple dimensions, but urgh. Also, afaict, as proposed you're wasting 80/128 bits. That is, all data you want to export fits in a single u64 and yet you're using two, which is mighty daft. Sure, pebs::lat / pebs_meminfo::latency is defined as a u64, but you can't tell me that that is ever actually more than 4G cycles. Even the TSX block latency is u32. So how about defining SAMPLE_WEIGHT_STRUCT which uses the exact same data as SAMPLE_WEIGHT but unions it with a struct. I'm not sure if we want: union sample_weight { u64 weight; struct { u32 low_dword; u32 high_dword; }; /* or */ struct { u32 low_dword; u16 high_word; u16 higher_word; }; }; Then have the core code enforce SAMPLE_WEIGHT ^ SAMPLE_WEIGHT_STRUCT and make the existing code never set the high dword.
So the kernel will only accept either SAMPLE_WEIGHT type or SAMPLE_WEIGHT_STRUCT type. It should error out if both types are set, right?
I will check if u32 is enough for meminfo::latency on the previous platforms.
Thanks, Kan

