I am working on optimizing code to calculate discrete cosine transforms. I
am doing this by implementing static code for a given array size vs. using
code that can dynamically handle a slice of any size. Overall, I am seeing
amazing performance - kudos Go compiler. I have one last optimization
opportunity that I would like to achieve. Unfortunately, my Gofu is not
sufficient to figure it out on my own. Is there a way to convert a fixed
size array to a slice without having to incur 2 allocations.
Go Environment information:
go version go1.22.4 linux/amd64
Example:
func DCT2DFast8(in []float64) (out [64]) {...} - Optimized static code, 0
allocations when called directly. See following benchmark.
func BenchmarkDCT2DFast8(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = DCT2DFast8(ary2d_flat[8])
}
}
BenchmarkDCT2DFast8-12 1372852 757.9 ns/op 0 B/op 0
allocs/op
Same functionality but wrapped in the generalized function that can be
called for any size slice
func DCT_2D(input []float64, sz int) []float64 {
...
if sz == 8 {
r := DCT2DFast8(result)
return []float64(r[:])
}
...
}
BenchmarkDCT_2D_8-12 340670 3227 ns/op 1024 B/op 2
allocs/op
Is there a more performant way, meaning 1 or zero allocations, to convert a
fixed size array to a slice?
Ideally, I would like for the following to work:
return (DCT2DFast8(result)[:]
Unfortunately, this does not since the function's return value is transient
and the slice expression cannot operate on it.
The current static implementation is 2+ time faster than the more
generalized form.
BenchmarkDCT_2D_8-12 144020 7922 ns/op 2112 B/op 19
allocs/op
But it falls far short of the 11 times faster than the array result.
Thank you in advance!
lbe
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/golang-nuts/738b551e-d441-4e63-a1a8-75db51800e06n%40googlegroups.com.