[Lldb-commits] [llvm] [lldb] [mlir] [openmp] [flang] [mlir][Vector] Add patterns for efficient i4 -> i8 conversion emulation (PR #79494)

Benjamin Maxwell via lldb-commits Fri, 26 Jan 2024 07:32:02 -0800

MacDue wrote:

>  It gets difficult to get this working for scalable at this level as we would 
> have to introduce SVE or LLVM intrinsics to model the interleave in an 
> scalable way.


There already are LLVM intrinsics for that, so I don't think it'd be hard to 
extend to support SVE:

I wrote this little test, which seemed to build fine, and generate reasonable 
looking code:
```mlir
func.func @test_sve_i4_extend(%inMem: memref<?xi4> ) -> vector<[8]xi32> {
  %c0 = arith.constant 0 :index
  %c4 = arith.constant 4 : i8
  %in = vector.load %inMem[%c0] :  memref<?xi4>, vector<[8]xi4>
  %shift = vector.splat %c4 : vector<[4]xi8>
  %0 = vector.bitcast %in : vector<[8]xi4> to vector<[4]xi8>
  %1 = arith.shli %0, %shift : vector<[4]xi8>
  %2 = arith.shrsi %1, %shift : vector<[4]xi8>
  %3 = arith.shrsi %0, %shift : vector<[4]xi8>
  %4 = "llvm.intr.experimental.vector.interleave2"(%2, %3) : (vector<[4]xi8>, 
vector<[4]xi8>) -> vector<[8]xi8>
  %5 = arith.extsi %4 : vector<[8]xi8> to vector<[8]xi32>
  return %5 : vector<[8]xi32>
}
```
->
```
test_sve_i4_extend: 
        ptrue   p0.s
        ld1sb   { z0.s }, p0/z, [x1]
        lsl     z1.s, z0.s, #28
        asr     z0.s, z0.s, #4
        asr     z1.s, z1.s, #28
        zip2    z2.s, z1.s, z0.s
        zip1    z0.s, z1.s, z0.s
        movprfx z1, z2
        sxtb    z1.s, p0/m, z2.s
        sxtb    z0.s, p0/m, z0.s
        ret
```

I think in the vector dialect: `"llvm.intr.experimental.vector.interleave2` 
could nicely become `vector.scalable.interleave` :slightly_smiling_face: 




https://github.com/llvm/llvm-project/pull/79494
_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

[Lldb-commits] [llvm] [lldb] [mlir] [openmp] [flang] [mlir][Vector] Add patterns for efficient i4 -> i8 conversion emulation (PR #79494)

Reply via email to