Not a direct answer, but one simple improvement you could do is to allocate 
all the rows together, then break them up by slicing. So instead of your:

    T := make([][]byte, n)
    for i := 0; i < n; i++ {
        T[i] = make([]byte, m)
    }

Do something like:

    T := make([][]byte, n)
    Q := make([]byte, m*n)
    for i := 0; i < n; i++ {
        T[i] = Q[i*m : (i+1)*m] 
    }

In my benchmarks, with your matrix size, this gives a 20-30 percent 
speedup. In part because it avoids many allocations, but also (I suspect) 
because the data is contiguous.  

On Wednesday, August 5, 2020 at 2:30:28 PM UTC-4 [email protected] wrote:

> Hi all,
>
> Matrix transpose in pure golang is slow in HPC cases, and using package 
> gonum needs structure transformation which costs extra time. So a 
> assembly version may be a better solution.
>
> Sizes of the matrix vary ([][]byte) or can be fixed for example (
> [64][512]byte), and the element type may be int32 or int64 for general 
> scenarios.
>
> Below is a golang version:
>
> m := 64
> n := 512
>
> // orignial matrix
> M := make([][]byte, m)
> for i := 0; i < m; i++ {
>     M[i] = make([]byte, n)
> }
>
> func transpose(M [][]byte) [][]byte {
>     m := len(M)
>     n := len(M[0])
>
>     // transposed matrix
>     T := make([][]byte, n)
>     for i := 0; i < n; i++ {
>         T[i] = make([]byte, m)
>     }
>
>     var row []byte // a row in T
>     for i := 0; i < n; i++ {
>         row = T[i]
>         for j = 0; j < m; j++ {
>             row[j] = M[j][i]
>         }
>     }
>
>     return T
> }
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/d7e20ba9-df02-42cc-859d-653450b48e1bn%40googlegroups.com.

Reply via email to