yanyanyanggg opened a new issue, #18569:
URL: https://github.com/apache/tvm/issues/18569

   ### Issue: [RISC-V RVV] softmax operator shows suboptimal vectorization
   
   #### Description
   The softmax operator performs worse with the RISC‑V Vector (RVV) extension, 
achieving only 0.745× the performance of the scalar implementation. This 
suggests inefficient vectorization for softmax operations.
   
   #### Steps to Reproduce
   1. Generate the softmax operator with the following configuration:
   ```python
   params = {
       "dtype": "float32",
       "batch": 14,
       "features": 185
   }
   ```
   
   2. Export the operator to two targets:
      - **RV target** (scalar, without vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c
        ```
      - **RVV target** (with vector extension):
        ```
        llvm -mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mabi=lp64d 
-mattr=+64bit,+m,+a,+f,+d,+c,+v
        ```
   
   3. Run performance measurement on both targets.
   
   Operator definition code:
   ```python
   def export_softmax(params, set_dir=None, platform="rv"):
       data = relay.var("data", shape=(params["batch"], params["features"]),
                        dtype=params["dtype"])
       softmax = relay.nn.softmax(data)
       export_op(softmax, params["op_name"], [data], params, set_dir=set_dir)
   ```
   
   #### Performance Data
   - **RV execution time**: 1.831500 ms
   - **RVV execution time**: 2.457040 ms
   - **Acceleration ratio (RV/RVV)**: 0.745 (RVV is ~1.34× slower)
   
   #### Environment Information
   - **TVM version**: 0.19.0
   - **LLVM version**: [Please provide: `llvm-config --version`]
   - **Hardware**: Spacemit K1‑X bit‑brick board
   - **CPU**: Spacemit X60 (8 cores, 1.6 GHz)
   - **ISA**: rv64imafdcv (with vector extensions)
   - **Memory**: 7.6 GB
   - **OS**: Bianbu 2.2, Linux kernel 6.6.63
   - **Operation**: Softmax on a 2D tensor of shape (14, 185)
   
   #### Expected Behavior
   RVV vectorization should provide a performance improvement over the scalar 
RV baseline for softmax operations, which involve reduction and elementwise 
operations that can be vectorized.
   
   #### Additional Context
   - The softmax operation is applied to a 2D tensor of shape (14, 185), which 
is a relatively small tensor compared to other operators tested.
   - The performance regression, though less severe than some other operators, 
still indicates that the vectorized implementation of softmax may have 
inefficiencies in the reduction and exponentiation steps.
   - This is part of a broader pattern where multiple operators show 
performance degradation with RVV, suggesting potential issues with 
vectorization strategies for reduction and elementwise operations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to