On Mon, Oct 2, 2023 at 7:15 PM Hanke Zhang via Gcc <[email protected]> wrote:
>
> Martin Jambor <[email protected]> 于2023年10月3日周二 00:34写道:
> >
> > Hello,
> >
> > On Mon, Oct 02 2023, Hanke Zhang via Gcc wrote:
> > > Hi, I have some questions about the strategy and behavior of function
> > > splitting in gcc, like the following code:
> > >
> > > int glob;
> > > void f() {
> > > if (glob) {
> > > printf("short path\n");
> > > return;
> > > }
> > > // do lots of expensive things
> > > // ...
> > > }
> > >
> > > I hope it can be broken down like below, so that the whole function
> > > can perhaps be inlined, which is more efficient.
> > >
> > > int glob;
> > > void f() {
> > > if (glob) {
> > > printf("short path\n");
> > > return;
> > > }
> > > f_part();
> > > }
> > >
> > > void f_part() {
> > > // do lots of expensive things
> > > // ...
> > > }
> > >
> > >
> > > But on the contrary, gcc splits it like these, which not only does not
> > > bring any benefits, but may increase the time consumption, because the
> > > function call itself is a more resource-intensive thing.
> > >
> > > int glob;
> > > void f() {
> > > if (glob) {
> > > f_part();
> > > return;
> > > }
> > > // do lots of expensive things
> > > // ...
> > > }
> > >
> > > void f_part() {
> > > printf("short path\n"); // just do this????
> > > }
> > >
> > > Are there any options I can offer to gcc to change this behavior? Or
> > > do I need to make some changes in ipa-split.cc?
> >
> > I'd suggest you file a bug to Bugzilla with a specific example that is
> > mis-handled, then we can have a look and discuss what and why happens
> > and what can be done about it.
> >
> > Thanks,
> >
> > Martin
>
> Hi, thanks for your reply.
>
> I'm trying to create an account right now. And I put a copy of the
> example code here in case someone is interested.
>
> And I'm using gcc 12.3.0. When you complie the code below via 'gcc
> test.c -O3 -flto -fdump-tree-fnsplit', you will find a phenomenon that
> is consistent with what I described above in the gimple which is
> dumped from fnsplit.
I think fnsplit currently splits out _cold_ code, I suppose !opstatus
is predicted to be false most of the time.
It looks like your intent is to inline this very early check as
if (!opstatus) { test_split_write_1 (..); } else { test_split_write_2 (..); }
to possibly elide that test? I would guess that IPA-CP is supposed to
do this but eventually refuses to create a clone for this case since
it would be large.
Unfortunately function splitting doesn't run during IPA transforms,
but maybe IPA-CP can be teached how to avoid the expensive clone
by performing what IPA split does in the case a check in the entry
block which splits control flow can be optimized?
Richard.
> #include <stdio.h>
> #include <stdlib.h>
>
> int opstatus;
> unsigned char *objcode = 0;
> unsigned long position = 0;
> char *globalfile;
>
> int test_split_write(char *file) {
> FILE *fhd;
>
> if (!opstatus) {
> // short path here
> printf("Object code generation not active! Forgot to call "
> "quantum_objcode_start?\n");
> return 1;
> }
>
> if (!file)
> file = globalfile;
>
> fhd = fopen(file, "w");
>
> if (fhd == 0)
> return -1;
>
> fwrite(objcode, position, 1, fhd);
>
> fclose(fhd);
>
> int *arr = malloc(1000);
> for (int i = 0; i < 1000; i++) {
> arr[i] = rand();
> }
>
> return 0;
> }
>
> // to avoid `test_split_write` inlining into main
> void __attribute__((noinline)) call() { test_split_write("./txt"); }
>
> int main() {
> opstatus = rand();
> objcode = malloc(100);
> position = 0;
> call();
> return 0;
> }