> While testing it I noticed that the final executable > is larger with your patch then with mine. Here are the sizes of the > bare-metal executables I created using the same flags I sent you > earlier, the first has no switch optimization, the second one uses my > plugin optimization, and the third uses your latest patch. I haven't > looked into why the size difference for your patch and mine exists, do > you see a size difference on your platforms?
Yes I do, but after playing around with it, this seems very dependant on pass ordering. I've built various arm-none-eabi compilers to test with: clean: is a compiler without path threading. steve.pass: is your original pass patch. james: is my patch (which will be called within vrp and dom passes) steve.after-vrp1: moves your pass to immediately after the first call to vrp steve.before, steve.after, steve.after-vrp-before-dom, steve.before-vrp-after-dom: run your pass immediately before or after both vrp and both dom passes. james.ch is my patch, rerunning pass_ch after dom1. Then, building with flags: -finline-limit=1000 -funroll-all-loops -finline-functions [[-ftree-switch-shortcut]] -O3 -mthumb And passing the resulting binary through: $ arm-none-eabi-strip blob.* I see: $ size blob.arm.* | sort -n text data bss dec hex filename 53984 2548 296 56828 ddfc ../blobs/blob.arm.clean 54464 2548 296 57308 dfdc ../blobs/blob.arm.steve.pass 54496 2548 296 57340 dffc ../blobs/blob.arm.steve.after 54496 2548 296 57340 dffc ../blobs/blob.arm.steve.after-vrp-before-dom 54504 2548 296 57348 e004 ../blobs/blob.arm.james.ch 54504 2548 296 57348 e004 ../blobs/blob.arm.steve.only-after-vrp1 54656 2548 296 57500 e09c ../blobs/blob.arm.james 54704 2548 296 57548 e0cc ../blobs/blob.arm.steve.before-vrp-after-dom 54736 2548 296 57580 e0ec ../blobs/blob.arm.steve.before So to my mind, this is all far too tied up in pass ordering details to resolve. Given that all the threading opportunities for my patch are found in dom1 and how fragile the positioning of dom1 is, there is not a great deal I can do to modify the ordering. The biggest improvement I could find comes from rerunning pass_ch immediately after dom1, though I'm not sure what the cost of that would be. I wonder if you or others have any thoughts on what the right thing to do would be? > I am not sure if path threading in general is turned off for -Os but it > probably should be. I agree, jump threading is on at -Os, path threading should not be. Thanks, James