https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96461
Bug ID: 96461 Summary: [SVE] Use the HISTCNT instruction for simple histogram loops Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Blocks: 53947 Target Milestone: --- Target: aarch64*-*-* SVE2 has a HISTCNT instruction that, for each element, counts the number of matching elements in that lane and previous lanes. It allows things like: void update (uint32_t *histogram, uint32_t *records, uint32_t num_records) { for (uint32_t i = 0; i < num_records; i++) histogram[records[i]] += 1; } to be vectorised using a gather load, HISTCNT and scatter store. The instruction is intended for loops that do more than the bare minimum above, but as far as autovec goes, we need to start somewhere. This PR is therefore about recognising the: histogram[…] += 1; gather/HISTCNT/scatter pattern and generating something like the asm below for the function above: mov x3, #0 whilelo p0.s, xzr, x2 .loop ld1w z1.s, p0/z, [x1, x3, lsl #2] ld1w z2.s, p0/z, [x0, z1.s, uxtw #2] histcnt z0.s, p0/z, z1.s, z1.s add z2.s, p0/m, z2.s, z0.s st1w z2.s, p0, [x0, z1.s, uxtw #2] incw x3 whilelo p0.s, x3, x2 b.first .loop ret Once that's done we could move on to fancier cases. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations