>  While testing it I noticed that the final executable
> is larger with your patch then with mine.  Here are the sizes of the
> bare-metal executables I created using the same flags I sent you
> earlier, the first has no switch optimization, the second one uses my
> plugin optimization, and the third uses your latest patch.  I haven't
> looked into why the size difference for your patch and mine exists, do
> you see a size difference on your platforms? 

Yes I do, but after playing around with it, this seems very dependant
on pass ordering.

I've built various arm-none-eabi compilers to test with:

  clean: is a compiler without path threading.
  steve.pass: is your original pass patch.
  james: is my patch (which will be called within vrp and dom passes)

  steve.after-vrp1: moves your pass to immediately after the first call
    to vrp

  steve.before, steve.after, steve.after-vrp-before-dom,
  steve.before-vrp-after-dom: run your pass immediately before or after
    both vrp and both dom passes.

  james.ch is my patch, rerunning pass_ch after dom1.

Then, building with flags:

  -finline-limit=1000 -funroll-all-loops
  -finline-functions [[-ftree-switch-shortcut]] -O3 -mthumb

And passing the resulting binary through:

$ arm-none-eabi-strip blob.*

I see:

$ size blob.arm.* | sort -n

   text    data     bss     dec     hex filename
  53984    2548     296   56828    ddfc ../blobs/blob.arm.clean
  54464    2548     296   57308    dfdc ../blobs/blob.arm.steve.pass
  54496    2548     296   57340    dffc ../blobs/blob.arm.steve.after
  54496    2548     296   57340    dffc 
../blobs/blob.arm.steve.after-vrp-before-dom
  54504    2548     296   57348    e004 ../blobs/blob.arm.james.ch
  54504    2548     296   57348    e004 ../blobs/blob.arm.steve.only-after-vrp1
  54656    2548     296   57500    e09c ../blobs/blob.arm.james
  54704    2548     296   57548    e0cc 
../blobs/blob.arm.steve.before-vrp-after-dom
  54736    2548     296   57580    e0ec ../blobs/blob.arm.steve.before

So to my mind, this is all far too tied up in pass ordering details to
resolve. Given that all the threading opportunities for my patch are found
in dom1 and how fragile the positioning of dom1 is, there is not a great
deal I can do to modify the ordering.

The biggest improvement I could find comes from rerunning pass_ch
immediately after dom1, though I'm not sure what the cost of that
would be.

I wonder if you or others have any thoughts on what the right thing to
do would be?

> I am not sure if path threading in general is turned off for -Os but it
> probably should be.

I agree, jump threading is on at -Os, path threading should not be.

Thanks,
James

Reply via email to