On Tue, Apr 30, 2019 at 9:53 PM Jeff Law <[email protected]> wrote:
>
> On 4/30/19 12:34 PM, cmdLP #CODE wrote:
> > Hello GCC-team,
> >
> > I use GCC for all my C and C++ programs. I know how to use GCC, but I am
> > not a contributor to GCC (yet). I often discover some problems C and C++
> > code have in general. There is often the choice between fast or readable
> > code. Some code I have seen performs well but looks ugly (goto, etc.);
> > other code is simple, but has some performance problems. What if we could
> > make the simple code perform well?
> >
> > There is a common problem with conditional branches inside loops. This can
> > decrease the performance of a program. To fix this issue, the conditional
> > branch should be moved outside of the loop. Sometimes this optimization is
> > done by the compiler, but guessing on optimizations done by the compiler is
> > really bad. Often it is not easy to transform the source code to have the
> > conditional branching outside the loop. Instead I propose a new attribute,
> > which forces the compiler to do a conditional branch (based on the
> > annotated parameter) at the beginning of a function. It branches to the
> > corresponding code of the function compiled with the value being constant.
> >
> > Here is a code example, which contains this issue.
> >
> > enum reduce_op
> > {
> > REDUCE_ADD,
> > REDUCE_MULT,
> > REDUCE_MIN,
> > REDUCE_MAX
> > };
> >
> > /* performance critical function */
> > void reduce_data(enum reduce_op reduce,
> > unsigned const* data,
> > unsigned data_size)
> > {
> > unsigned i, result, item;
> >
> > result = reduce == REDUCE_MULT ? 1u
> > : reduce == REDUCE_MIN ? ~0u // ~0u is UINT_MAX
> > : 0u;
> >
> > for(i = 0; i < data_size; ++i)
> > {
> > item = data[i];
> >
> > switch(reduce)
> > {
> > case REDUCE_ADD:
> > result += item;
> > break;
> >
> > case REDUCE_MULT:
> > result *= item;
> > break;
> >
> > case REDUCE_MIN:
> > if(item < result) result = item;
> > // RIP: result <?= item;
> > break;
> >
> > case REDUCE_MAX:
> > if(item > result) result = item;
> > // RIP: result >?= item;
> > break;
> > }
> > }
> >
> > return result;
> > }
> >
> > The value of reduce does not change inside the function. For this
> > example, the optimization is trivial. But consider more complex examples.
> > The function should be optimized to:
> [ .... ]
> This is loop unswitching. It's a standard GCC optimization. If it's
> not working as well as it should, we're far better off determining why
> and fixing the automatic transformation rather than relying on
> attributes to drive the transformation.
It's currently not implemented for switch () stmts, just for conditionals.
This also hurts SPEC cactusADM. There might be a missed-optimization
bug about this. A simple recursive implementation might be possible;
unswitch one case at a time - maybe order by profile probability. We
already recurse on the unswitched bodies (in case multiple conditions
can be unswitched)
>
> > What if the variable changes?
> >
> > When the variable changes there should be a jump to the corresponding
> > parallel code of the compiled code with new value.
> >
> > Unoptimized
> >
> > /* removes comments starting with # and ending in a newline */
> > void remove_comment(char* dst,
> > char const* src)
> > {
> > // initialization nessecary
> > int state __attribute__((early_branch(0, 1))) = 0;
> >
> > char c;
> >
> > while(*src)
> > {
> > c = *src++;
> >
> > switch(state)
> > {
> > case 0:
> > if(c == '#')
> > state = 1;
> > else
> > *dst++ = c;
> >
> > break;
> >
> > case 1:
> > if(c == '\n')
> > {
> > *dst++ = '\n';
> > state = 0;
> > }
> >
> > break;
> > }
> > }
> > *dst = '\0';
> > }
> >
> > changed to
> >
> > void remove_comment(char* dst,
> > char const* src)
> > {
> > char c;
> >
> > switch(0)
> > {
> > case 0:
> > while(*src)
> > {
> > c = *src++;
> > if(c == '#')
> > goto branch1;
> > else
> > *dst++ = c;
> > branch0:
> > }
> > break;
> >
> > case 1:
> > while(*src)
> > {
> > if(*src++ == '\n')
> > {
> > *dst++ = '\n';
> > goto branch0;
> > }
> > branch1:
> > }
> > break;
> > }
> > *dst = '\0';
> > }
> >
> > You see that the state is not tested in each iteration of the loop(s). For
> > complex cases (like parsers), this can improve performance. This attribute
> > is needed to avoid such a goto usage. I have seen such code, which is
> > unmaintainable but performs better than the first solution.
> This is backwards jump threading which was primarily designed to help
> state machines by avoiding evaluation of the switch at each iteration of
> the loop and instead jumping directly to the desired target.
>
> Again, if it's not working for a particular case we're far better off
> figuring out why than relying on attributes to drive the transformation.
>
> jeff
>