I have a fix for the testcases. Just need to increase the size from 32 to 1024. Basically with 32, vectors are used for the memcpy which allows to optimize it; just differently and not what is being tested here. I am hoping 1024 is big enough not to hit another target which does memcpy for 1024 bytes with one register load/store.
I will push a fix tomorrow. Thanks, Andrew > -----Original Message----- > From: Sam James <s...@gentoo.org> > Sent: Friday, April 18, 2025 5:28 PM > To: Andrew Pinski (QUIC) <quic_apin...@quicinc.com> > Cc: gcc-regression@gcc.gnu.org; haochen.ji...@intel.com > Subject: Re: Regressions on native/master at commit r16-29 vs > commit r16-21 on Linux/x86_64 > > WARNING: This email originated from outside of Qualcomm. > Please be wary of any links or attachments, and do not enable > macros. > > This builder uses --with-arch=native. The (a) difference starts > at x86-64-v3: > > $ diff -u <(gcc -O2 -fdump-tree-forwprop1-details=- -O2 > gcc.dg/pr78408-3.c -c -march=x86-64-v2) <(gcc -O2 -fdump- > tree-forwprop1-details=- -O2 gcc.dg/pr78408-3.c -c - > march=x86-64-v3) > --- /dev/fd/63 2025-04-19 01:27:31.676852279 +0100 > +++ /dev/fd/62 2025-04-19 01:27:31.651851999 +0100 > @@ -1,15 +1,17 @@ > > -;; Function bbb (bbb, funcdef_no=0, decl_uid=2939, > cgraph_uid=1, symbol_order=0) > +;; Function bbb (bbb, funcdef_no=0, decl_uid=3312, > cgraph_uid=1, > +symbol_order=0) > > void * bbb () > { > char buf[32]; > void * ret; > + vector(32) unsigned char _5; > > <bb 2> : > ret_3 = aaa (); > buf = ""; > - MEM <unsigned char[32]> [(char * {ref-all})ret_3] = MEM > <unsigned char[32]> [(char * {ref-all})&buf]; > + _5 = MEM <vector(32) unsigned char> [(char * {ref- > all})&buf]; MEM > + <vector(32) unsigned char> [(char * {ref-all})ret_3] = _5; > buf ={v} {CLOBBER(eos)}; > return ret_3;