https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115863

--- Comment #9 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Li Pan from comment #8)

> Thanks Richard.
> Yes, the .SAT_TRUNC doesn't pay any attention the other possible use of
> MIN_EXPR.
> 
> As your suggestion, we may need one additional check here (like
> gimple_unsigned_sat_trunc() && no_other_MIN_EXPR_use_after_sat_trunc_p ())
> before we build the SAT_TRUNC call.
> Sorry I didn't get the point here why we need to do this, could you please
> help to explain a bit more about it? Like wrong code or something else in
> above
> sample code.
The wrong-code bug is now fixed (it was x86 target-specific oversight in the
expander), but while fixing the original bug, I noticed that the addition of
ustrunc{m}{n}2 optab regressed compress2 loop performance wise.

Without ustrunc{m}{n} the loop in compress2 looks like:

  <bb 5> [local count: 536870912]:
  _18 = MIN_EXPR <left_8, 4294967295>;
  iftmp.0_11 = (unsigned int) _18;
  stream.avail_out = iftmp.0_11;
  left_37 = left_8 - _18;

and when ustrunc{m}{n}2 is present in i386.md:

  <bb 5> [local count: 536870912]:
  _45 = MIN_EXPR <left_8, 4294967295>;
  iftmp.0_11 = .SAT_TRUNC (left_8);
  stream.avail_out = iftmp.0_11;
  left_37 = left_8 - _45;

In the first case, iftmp.0_11 is calculated with a simple truncation from
unsinged long to int (i.e. "mov %eax, %edx" on x86). In the second case, it
uses .SAT_TRUNC optab, which on x85 expands to a sequence of complex
instructions. Performanve wise, it is universally better to have a "normal"
truncation after MIN_EXPR than saturating .SAT_TRUNC truncation.

Reply via email to