aemerson added a comment.
Herald added a subscriber: jplehr.

@dmgreen I've been looking at this test again trying to see what's missing. The 
problem now is that only a VF of 4 is chosen. In the good case, 
instcombine/simplifyCFG runs so that it simplifies down to an `smin` intrinsic. 
After this change `__SSAT()` is inlined first. We then have:

  target datalayout = "e-m:e-i8:8:32-i16:16:32-i64:64-i128:128-n32:64-S128"
  target triple = "aarch64-linux-gnu"
  
  define void @arm_mult_q15(ptr %pSrcA, ptr %pSrcB, ptr noalias %pDst, i32 
%blockSize) {
  entry:
    br label %while.cond
  
  while.cond:                                       ; preds = %while.body, 
%entry
    %pSrcB.addr.0 = phi ptr [ %pSrcB, %entry ], [ %incdec.ptr1, %while.body ]
    %pDst.addr.0 = phi ptr [ %pDst, %entry ], [ %incdec.ptr4, %while.body ]
    %pSrcA.addr.0 = phi ptr [ %pSrcA, %entry ], [ %incdec.ptr, %while.body ]
    %blkCnt.0 = phi i32 [ %blockSize, %entry ], [ %dec, %while.body ]
    %cmp.not = icmp eq i32 %blkCnt.0, 0
    br i1 %cmp.not, label %while.end, label %while.body
  
  while.body:                                       ; preds = %while.cond
    %incdec.ptr = getelementptr inbounds i16, ptr %pSrcA.addr.0, i32 1
    %0 = load i16, ptr %pSrcA.addr.0, align 2
    %conv = sext i16 %0 to i32
    %incdec.ptr1 = getelementptr inbounds i16, ptr %pSrcB.addr.0, i32 1
    %1 = load i16, ptr %pSrcB.addr.0, align 2
    %conv2 = sext i16 %1 to i32
    %mul = mul nsw i32 %conv, %conv2
    %shr = ashr i32 %mul, 15
    %cmp4.i = icmp sgt i32 %shr, 32767
    %switch.i = icmp ult i1 %cmp4.i, true
    %spec.select.i = select i1 %switch.i, i32 %shr, i32 32767
    %conv3 = trunc i32 %spec.select.i to i16
    %incdec.ptr4 = getelementptr inbounds i16, ptr %pDst.addr.0, i32 1
    store i16 %conv3, ptr %pDst.addr.0, align 2
    %dec = add i32 %blkCnt.0, -1
    br label %while.cond
  
  while.end:                                        ; preds = %while.cond
    ret void
  }

These instructions are from the callee that should now be combined into `smin`:

  %cmp4.i = icmp sgt i32 %shr, 32767
  %switch.i = icmp ult i1 %cmp4.i, true
  %spec.select.i = select i1 %switch.i, i32 %shr, i32 32767

... except due to the surrounding instructions, the first icmp is optimized 
into 
`icmp sgt i32 %mul, 1073741823` by 
`InstCombinerImpl::foldICmpInstWithConstant()`

This breaks the smin recognition. I'm not sure what the best approach is to fix 
this. InstCombine already has this chunk of code to try to avoid messing with 
compares that might form min/max patterns but it expects further simplification 
to fire:

  // Test if the ICmpInst instruction is used exclusively by a select as
  // part of a minimum or maximum operation. If so, refrain from doing
  // any other folding. This helps out other analyses which understand
  // non-obfuscated minimum and maximum idioms, such as ScalarEvolution
  // and CodeGen. And in this case, at least one of the comparison
  // operands has at least one user besides the compare (the select),
  // which would often largely negate the benefit of folding anyway.
  //
  // Do the same for the other patterns recognized by matchSelectPattern.
  if (I.hasOneUse())
    if (SelectInst *SI = dyn_cast<SelectInst>(I.user_back())) {
      Value *A, *B;
      SelectPatternResult SPR = matchSelectPattern(SI, A, B);
      if (SPR.Flavor != SPF_UNKNOWN)
        return nullptr;
    }

Any ideas? I'd really like to get this inliner change in because it's 
fundamentally a good change to have.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143624/new/

https://reviews.llvm.org/D143624

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to