On Wed, Jan 02, 2019 at 04:31:49PM +0100, Thomas Koenig wrote:
> Hell world,
> 
> somebody fixed PR 48543 for us, so I have committed the
> attached test case as obvious and closed the PR.  Thanks
> to however did this!

Nobody actually fixed this and the testcase fails e.g. on i686-linux.
The reason why you don't see the 'Supercalifragilis' string on x86_64
is that the expansion decides to emit the memcpy of the first string
using move_by_pieces and it emits:
        movaps  .LC3(%rip), %xmm0
        movb    $115, 29(%rsp)
...
        movups  %xmm0, 13(%rsp)
...
.LC3:
        .quad   7809632571816441171
        .quad   7596562564204619369
so the Supercalifragilis is there, but not in a greppable form.
The above two numbers are in hex
0x6c61637265707553
0x696c696761726669
where you can already see the ascii characters.

The reason why the middle-end doesn't and can't do anything about these is
that the Fortran FE is representing those to the middle-end as
zero-terminated strings.  For those it can only do tail merging, but even
that is deferred to the linker, which does this for SHF_MERGE | SHF_STRINGS
sections.  So, e.g. C "foobar" can be merged with "blahblahfoobar", by
replacing relocations against "foobar" with "blahblahfoobar" + 8.

Fortran doesn't have zero terminated strings, so the FE could do arbitrary
substring merging if it processes all functions e.g. through FE
optimizations before actually handing them over to the middle-end (at which
point it would turn them into zero terminated strings).
The linker doesn't offer anything that could help here though, so it would
still be limited to single TU only.  Another question is how to implement it
efficiently.  For tail merging ld.bfd or ld.gold has an algorithm, but for
the general substring merging, would you e.g. use a hash table hashed by
each 4? consecutive characters to filter out strings that are certainly not
containing that substring (plus length check), and then use say Boyer-Moore
algorithm to search for the substrings in the strings that could contain it.
Perhaps also hash table based on each present character for shorter strings.
Another thing is that it should be independent of the order in which it is
looked up, so if you first enter "foo" and then "foobar", it should still
point "foo" to "foobar" + 3.  And, you might consider even partial overlaps,
like if the source refers to "foobar" and "barbaz", you could emit
"foobarbaz" and point "barbaz" at "foobarbaz" + 3.
Another thing to consider is string alignment, some targets make sure string
constants have higher alignment so that e.g. memcpy etc. from those can be
vectorized or implemented efficiently using aligned multi-byte reads/stores,
by doing arbitrary string merging more strings might be not aligned.

In any case, IMHO the test should be removed or XFAILed for now.

> 2019-01-02  Thomas Koenig  <tkoe...@gcc.gnu.org>
> 
>       PR fortran/48543
>       * gfortran.dg/const_chararacter_merge.f90: New test.

> ! { dg-do compile }
> ! { dg-options "-Os" }
> ! PR 48543
> program main
>   character(len=17) :: a
>   character(len=34) :: b
>   a = 'Supercalifragilis'
>   b = 'Supercalifragilisticexpialidocious'
>   print *,a," ",b
> end program main
> ! { dg-final { scan-assembler-times "Supercalifragilis" 1 } }


        Jakub

Reply via email to