On Wed, Jan 02, 2019 at 04:31:49PM +0100, Thomas Koenig wrote: > Hell world, > > somebody fixed PR 48543 for us, so I have committed the > attached test case as obvious and closed the PR. Thanks > to however did this!
Nobody actually fixed this and the testcase fails e.g. on i686-linux. The reason why you don't see the 'Supercalifragilis' string on x86_64 is that the expansion decides to emit the memcpy of the first string using move_by_pieces and it emits: movaps .LC3(%rip), %xmm0 movb $115, 29(%rsp) ... movups %xmm0, 13(%rsp) ... .LC3: .quad 7809632571816441171 .quad 7596562564204619369 so the Supercalifragilis is there, but not in a greppable form. The above two numbers are in hex 0x6c61637265707553 0x696c696761726669 where you can already see the ascii characters. The reason why the middle-end doesn't and can't do anything about these is that the Fortran FE is representing those to the middle-end as zero-terminated strings. For those it can only do tail merging, but even that is deferred to the linker, which does this for SHF_MERGE | SHF_STRINGS sections. So, e.g. C "foobar" can be merged with "blahblahfoobar", by replacing relocations against "foobar" with "blahblahfoobar" + 8. Fortran doesn't have zero terminated strings, so the FE could do arbitrary substring merging if it processes all functions e.g. through FE optimizations before actually handing them over to the middle-end (at which point it would turn them into zero terminated strings). The linker doesn't offer anything that could help here though, so it would still be limited to single TU only. Another question is how to implement it efficiently. For tail merging ld.bfd or ld.gold has an algorithm, but for the general substring merging, would you e.g. use a hash table hashed by each 4? consecutive characters to filter out strings that are certainly not containing that substring (plus length check), and then use say Boyer-Moore algorithm to search for the substrings in the strings that could contain it. Perhaps also hash table based on each present character for shorter strings. Another thing is that it should be independent of the order in which it is looked up, so if you first enter "foo" and then "foobar", it should still point "foo" to "foobar" + 3. And, you might consider even partial overlaps, like if the source refers to "foobar" and "barbaz", you could emit "foobarbaz" and point "barbaz" at "foobarbaz" + 3. Another thing to consider is string alignment, some targets make sure string constants have higher alignment so that e.g. memcpy etc. from those can be vectorized or implemented efficiently using aligned multi-byte reads/stores, by doing arbitrary string merging more strings might be not aligned. In any case, IMHO the test should be removed or XFAILed for now. > 2019-01-02 Thomas Koenig <tkoe...@gcc.gnu.org> > > PR fortran/48543 > * gfortran.dg/const_chararacter_merge.f90: New test. > ! { dg-do compile } > ! { dg-options "-Os" } > ! PR 48543 > program main > character(len=17) :: a > character(len=34) :: b > a = 'Supercalifragilis' > b = 'Supercalifragilisticexpialidocious' > print *,a," ",b > end program main > ! { dg-final { scan-assembler-times "Supercalifragilis" 1 } } Jakub