Thanks! I'd seen the "dead code elimination" comment somewhere and questioned it, but not enough.
If I worry about what some future Go compiler might optimize, I end up
worrying quite a lot! For example, could this code:
func BenchmarkPopCountAlive(b *testing.B) {
sum = 0
for i := 0; i < b.N; i++ {
sum += PopCount(0x1234567890abcdef)
}
}
hypothetically be optimized to:
func BenchmarkPopCountAlive(b *testing.B) {
sum = PopCount(0x1234567890abcdef) * b.N
}
since PopCount() always returns the same value for the same argument? It
probably wouldn't, since that would break many existing benchmarks.
Maybe more reasonably worrisome: If sum is initialized and incremented but
never read, could all references to it be optimized away? That's easy
enough to avoid, as is the optimization I worried about above:
var sum = 0 // Avoid dead code elimination
const firstInput uint64 = 0x1234567890abcdef
func BenchmarkPopCountSimple(b *testing.B) {
for i, input := 0, firstInput; i < b.N; i++ {
sum += PopCount(input)
input++
}
if sum == 0 {
b.Error("BenchmarkPopCountSimple: sum == 0")
}
}
Here are my new benchmark results:
$ go version
go version go1.16.5 windows/amd64
$ go test -cpu=1 -bench=.
goos: windows
goarch: amd64
pkg: gopl.io/popcount
cpu: Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz
BenchmarkPopCountSimple 271004790 4.419 ns/op
BenchmarkPopCountSlowSimple 278 4235890 ns/op
BenchmarkPopCountAlmostSimple/BenchmarkPopCountAlmostSimple
272759134 4.434 ns/op
BenchmarkPopCountNearlySimple/PopCountNearlySimple
145613077 8.172 ns/op
PASS
ok gopl.io/popcount 7.061s
The anomalistically-low "simple" and "almost simple" results are now more
reasonable, but still nearly a factor of two lower than the "nearly simple"
results. They're still inlining the calls, as is the "slow simple"
benchmark, where the "nearly simple" one isn't:
$ go test -cpu=1 -bench=. -gcflags=-m 2>&1 | egrep 'inlining call to
PopCount'
.\popcount_test.go:10:18: inlining call to PopCount
.\popcount_test.go:22:19: inlining call to PopCount
.\popcount_test.go:34:19: inlining call to PopCount
Overkill, right? --PSRC
P.S.: For better or worse, I discovered I can -- but shouldn't, based on
the feedback I got! -- get "fancy" formatting by copying from vscode and
pasting into Gmail.
On Tuesday, June 1, 2021 at 12:01:51 PM UTC-4 peterGo wrote:
> Here is a more definitive reply than my original reply.
>
> I got as far as this
>
> func BenchmarkPopCountSimple(b *testing.B) {
> sum := 0 // Avoid dead code elimination.
> for i := 0; i < b.N; i++ {
> sum += PopCount(0x1234567890abcdef)
> }
> }
>
> As you can see from the objdump, BenchmarkPopCountSimple dead code is
> eliminated.
>
> func BenchmarkPopCountSimple(b *testing.B) {
> 0x4e38c0 31c9 XORL CX, CX
>
> for i := 0; i < b.N; i++ {
> 0x4e38c2 eb03 JMP 0x4e38c7
> 0x4e38c4 48ffc1 INCQ CX
> 0x4e38c7 48398890010000 CMPQ CX, 0x190(AX)
> 0x4e38ce 7ff4 JG 0x4e38c4
> }
> 0x4e38d0 c3 RET
>
> I added an additional BenchmarkPopCountAlive benchmark.
>
> var sum = 0 // Avoid dead code elimination.
>
> func BenchmarkPopCountAlive(b *testing.B) {
> sum = 0
>
> for i := 0; i < b.N; i++ {
> sum += PopCount(0x1234567890abcdef)
> }
> }
>
> As you can see from the objdump, BenchmarkPopCountAlive code is not
> eliminated.
>
> For details, see the popcount.txt attachment.
>
> Peter
>
> On Monday, May 31, 2021 at 6:29:27 PM UTC-4 Paul S. R. Chisholm wrote:
>
>> This is not a serious problem, but it surprised me. (By the way, how can
>> I post a message with code formatting?)
>>
>> I'd like to create table-driven benchmarks:
>> https://blog.golang.org/subtests.
>>
>> To that end, I started with code from *The Go Programming Language*:
>>
>> // PopCount is based on an example from chapter 2 of THE GO PROGRAMMING
>> LANGUAGE.
>> // Copyright © 2016 Alan A. A. Donovan & Brian W. Kernighan.
>> // License: https://creativecommons.org/licenses/by-nc-sa/4.0/
>>
>> package popcount
>>
>> // pc[i] is the population count of i.
>> var pc [256]byte
>>
>> func init() {
>> for i := range pc {
>> pc[i] = pc[i/2] + byte(i&1)
>> }
>> }
>>
>> // PopCount returns the population count (number of set bits) of x.
>> func PopCount(x uint64) int {
>> return int(pc[byte(x>>(0*8))] +
>> pc[byte(x>>(1*8))] +
>> pc[byte(x>>(2*8))] +
>> pc[byte(x>>(3*8))] +
>> pc[byte(x>>(4*8))] +
>> pc[byte(x>>(5*8))] +
>> pc[byte(x>>(6*8))] +
>> pc[byte(x>>(7*8))])
>> }
>>
>> I then wrote a simple test:
>>
>> // popcount_simple_test.go
>> package popcount
>>
>> import "testing"
>>
>> func BenchmarkPopCountSimple(b *testing.B) {
>> sum := 0 // Avoid dead code elimination.
>> for i := 0; i < b.N; i++ {
>> sum += PopCount(0x1234567890abcdef)
>> }
>> }
>>
>> Because I was paranoid about dead code elimination, I wrote an identical
>> test that calls the function many times (inside the loop to call it b.N
>> times, which should make no difference if the hypothetically-still-dead
>> code is being eliminated):
>>
>> // popcount_slow_simple_test.go
>> package popcount
>>
>> import "testing"
>>
>> func BenchmarkPopCountSlowSimple(b *testing.B) {
>> sum := 0 // Avoid dead code elimination.
>> for i := 0; i < b.N; i++ {
>> // Exactly as before, but call the function many times.
>> for j:= 0; j < 1_000_000; j++ {
>> sum += PopCount(0x1234567890abcdef)
>> }
>> }
>> }
>>
>> Then I wrote an "almost simple" test that uses testing.B.Run() but is
>> hardwired to call the function being benchmarked:
>>
>> // popcount_almost_simple_test.go
>> package popcount
>>
>> import "testing"
>>
>> func BenchmarkPopCountAlmostSimple(b *testing.B) {
>> b.Run("BenchmarkPopCountAlmostSimple", func(b *testing.B) {
>> sum := 0 // Avoid dead code elimination.
>> for i := 0; i < b.N; i++ {
>> sum += PopCount(0x1234567890abcdef)
>> }
>> })
>> }
>>
>> Finally, I wrote a "nearly-simple" test that passes arguments to Run()
>> that could have come from a table:
>>
>> // popcount_nearly_simple.go
>> package popcount
>>
>> import "testing"
>>
>> func BenchmarkPopCountNearlySimple(b *testing.B) {
>> f := PopCount
>> name := "PopCountNearlySimple"
>> b.Run(name, func(b *testing.B) {
>> sum := 0 // Avoid dead code elimination.
>> for i := 0; i < b.N; i++ {
>> sum += f(0x1234567890abcdef)
>> }
>> })
>> }
>>
>> The simple and almost-simple results are nearly identical (happily, the
>> slow results are not), but the nearly-simple results are an order of
>> magnitude slower:
>>
>> $ go version
>> go version go1.16.4 windows/amd64
>> $ go test -cpu=1 -bench=.
>> goos: windows
>> goarch: amd64
>> pkg: gopl.io/popcount
>> cpu: Intel(R) Core(TM) i5 CPU 760 @ 2.80GHz
>> BenchmarkPopCountAlmostSimple/BenchmarkPopCountAlmostSimple
>> 1000000000 0.6102 ns/op
>> BenchmarkPopCountNearlySimple/PopCountNearlySimple
>> 194925662 6.197 ns/op
>> BenchmarkPopCountSimple
>> 1000000000 0.5994 ns/op
>> BenchmarkPopCountSlowSimple
>> 1953 606194 ns/op
>> PASS
>> ok gopl.io/popcount 4.534s
>>
>> After reading this article:
>>
>>
>> https://medium.com/a-journey-with-go/go-inlining-strategy-limitation-6b6d7fc3b1be
>>
>> I ran:
>>
>> $ go test -cpu=1 -bench=. -gcflags=-m 2>&1 | egrep 'inlining call to
>> PopCount'
>> .\popcount_almost_simple_test.go:9:19: inlining call to PopCount
>> .\popcount_simple_test.go:8:18: inlining call to PopCount
>> .\popcount_slow_simple_test.go:10:19: inlining call to PopCount
>>
>> That makes sense. 0.6 ns is less than a function call. The simple and
>> almost-simple benchmarks can inline the call to the hardwired function, but
>> the nearly-simple benchmark can't.
>>
>> In practice, this isn't much of a problem. Any function small enough to
>> be inlined is unlikely to be a performance bottleneck. If it ever is, a
>> non-table-driven benchmark can still measure it.
>>
>> Hope this helps. --PSRC
>>
>> P.S.: Out of curiosity, how can I post a message with fancy code examples
>> like this one?
>> https://groups.google.com/g/golang-nuts/c/5DgtH2Alt_I/m/hlsqdRSGAgAJ
>>
>
--
You received this message because you are subscribed to the Google Groups
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/golang-nuts/CAG%3D3cjnebauo-VDiX%2BQjPVnnSji1DA0DtxsTKkOcTW9eVfTN8g%40mail.gmail.com.
popcount_test.go
Description: Binary data
