ThunderxT2 chip has an odd property that nested scalar FP min and max are
slower than logically the same sequence of compares and branches.

Here is the patch where I'm trying to implement that transformation.
Please advise if the "combine" pass (actually after the pass itself) is the
appropriate place to do this.

I was considering the possibility to implement this in aarch64.md
(which would be much cleaner) but didn't manage to figure out how
to make fmin/fmax survive until later passes and replace them only
then.

-- 
  Thanks,
  Anton

Attachment: 0001-WIP-MIN-to-conditionals-1.patch
Description: Binary data

Reply via email to