First: my apologies for the delay in this reply.
[Richard wrote:]
Well, but we do have a pretty strong if-converter on RTL
> which has access to target specific information.
Yes, I have had a quick look at it. It looks quite thorough.
I think I see that you [Richard] are implying that the if converter
at the GIMPLE level should not be trying to do all the if-conversion
work that could possibly be done. I agree with that. However,
AFAIK the RTL work is done strictly after the autovectorization,
so any if conversion that is strictly for the benefit of
autovectorization must be done before autovectorization
and therefor at the GIMPLE level. Corrections are welcome.
[Abe wrote:]
The preceding makes me wonder: has anybody considered adding an optimization
profile for GCC, […] that optimizes for the amount of energy consumed?
>> I don`t remember reading about anything like that […]
[Richard wrote:]
I think there were GCC summit papers/talks about this.
Thanks, but can you please be more specific?
After writing the message quotes above as "[Abe wrote:]"
I found 2 or 3 papers about compiling code with an eye
towards energy efficiency, but not a whole hell of a lot,
and I didn`t yet find anything GCC-specific on this topic.
[Abe wrote:]
The old one can, in some cases, produce code that
e.g. dereferences a null pointer when the same
program given the same inputs would have not
>> done so without the if-conversion "optimization".
[Richard wrote:]
Testcase? I don't think it can and if it can this bug needs to be fixed.
With the program below my sign-off, using stock GCC 4.9.2, even under "-O3"
the code compiles and runs OK, even when using "-ftree-loop-if-convert",
which is I guess what you [Richard] meant by a recent comment to me [Abe].
Sebastian confirmed in person that even the old if converter did things
differently
even to loads when GCC is invoked with the full "-ftree-loop-if-convert-stores"
flag but without "-ftree-loop-if-convert-stores". With the old converter,
compiling with "-ftree-loop-if-convert-stores" yields a program that segfaults
due
to dereferencing a null pointer [it would deref. a _lot_ of them if it _could_
;-)].
[tested using GCC 4.9.2 on both Cygwin (64-bit) for Windows 7, AMD64
compilation,
and Mac OS X 10.6.8 -- also using GCC 4.9.2 -- compiling for both ia32 and
AMD64]
I intend to adapt the test case to DejaGNU format and add
it to the codebase from which the patch is being generated.
[Abe wrote:]
The new converter reduces/eliminates this problem.
[Richard wrote:]
You mean the -ftree-loop-if-convert-stores path.
The old converter apparently only produced code with the aforementioned
crashing problem only when "-ftree-loop-if-convert-stores" is/was in use, yes.
The new one should not be producing code with that problem
regardless of "-ftree-loop-if-convert-stores" or lack thereof.
I think the reason for the confusing ambiguity is that since the old converter
did conversion of stores in a way that was thread-unsafe for half hammocks
[e.g. C source code like "if (condition) A[a] = something;" with no attached
"else", assuming "condition" and "something" are both conversion-friendly],
somebody used "-ftree-loop-if-convert-stores" to mean "if-convert as much
as possible even if doing so requires converting unsafely". In the new
converter we have no such unsafety TTBOMK, so I propose to remove that flag.
Regards,
Abe
Makefile
all: foo___if-converted foo___if-converted_with_stores_flag
foo___NOT_if-converted
foo___if-converted_with_stores_flag: foo.c
gcc -std=c99 -O3 -ftree-loop-if-convert-stores foo.c -o
foo___if-converted_with_stores_flag
foo___if-converted: foo.c
gcc -std=c99 -O3 -ftree-loop-if-convertfoo.c -o
foo___if-converted
foo___NOT_if-converted: foo.c
gcc -std=c99 -O3 foo.c -o
foo___NOT_if-converted
foo.c
=
/* intentionally not defining "SIZE" here:
pretending that "choose" is compiled separately from "main"
*/
/* a "controlled copy-paste" that takes 4 arrays, 2 of which are full of
pointers,
and for each index deref.s one of the 2 pointers and shoves the result in one
of the arrays, based on the value in the respective index of the remaining
array
*/
void __attribute__((noinline)) /* forcing no inlining b/c inlining
theoretically allow
sufficient analysis to allow the optimizer to
"see"
that "cond_array[...]" is full of nothing but
0s
*/
choose(char* cond_array,
short ** pointer_array_1,
short ** pointer_array_2,
short * output_array,
unsigned long long len) {
for (unsigned long long index = 0; index < len; ++index)
if (cond_array[index])
output_array[index] = *pointer_array_1[index];
else
output_array[index] = *pointer_array_2[index];
}