> > Btw, the bswap pass enhancements that are currently in review may > also be an opportunity to catch these. They can merge adjacent > loads that are used "composed" (but not yet composed by storing > into adjacent memory). The basic-block vectorizer should also > handle this (if the composition happens to be by storing into > adjacent memory) - of course it needs vector modes available and > it has to be enabled.
Should we really do it there ? If we start merging multiple loads and stores into the vector register set, on some architectures or microarchitectures the cost of moving to and from the vector register set to the general purpose register set might be too expensive for some operations and then we get into all sorts of issues. I think there is merit in an RTL pass. Ramana > > Richard. > >> Thanks, >> bin >>> >>> >>>> >>>> So, any comments about this? >>>> >>>> Thanks, >>>> bin >>>> >>>> >>>> 2014-05-15 Bin Cheng <bin.ch...@arm.com> >>>> * common.opt (flag_merge_paired_loadstore): New option. >>>> * merge-paired-loadstore.c: New file. >>>> * Makefile.in: Support new file. >>>> * config/arm/arm.c (TARGET_MERGE_PAIRED_LOADSTORE): New macro. >>>> (load_latency_expanded_p, arm_merge_paired_loadstore): New function. >>>> * params.def (PARAM_MAX_MERGE_PAIRED_LOADSTORE_DISTANCE): New param. >>>> * doc/invoke.texi (-fmerge-paired-loadstore): New. >>>> (max-merge-paired-loadstore-distance): New. >>>> * doc/tm.texi.in (TARGET_MERGE_PAIRED_LOADSTORE): New. >>>> * doc/tm.texi: Regenerated. >>>> * target.def (merge_paired_loadstore): New. >>>> * tree-pass.h (make_pass_merge_paired_loadstore): New decl. >>>> * passes.def (pass_merge_paired_loadstore): New pass. >>>> * timevar.def (TV_MERGE_PAIRED_LOADSTORE): New time var. >>>> >>>> gcc/testsuite/ChangeLog >>>> 2014-05-15 Bin Cheng <bin.ch...@arm.com> >>>> >>>> * gcc.target/arm/merge-paired-loadstore.c: New test. >>>> >>>> <merge-paired-loadstore-20140515.txt> >> >> >> >> -- >> Best Regards.