https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104950
Bug ID: 104950
Summary: GCC does not emit branchless code
Product: gcc
Version: 12.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: vincenzo.innocente at cern dot ch
Target Milestone: ---
In this example GCC fails to emit branchless code while CLANG does.
In the actual application, measurements shows slow down up to a factor 2.
I managed to force branchless (-DBL) but the code is pretty unfriendly
godbolt link (GCC, clang, GCC -DBL
https://godbolt.org/z/KWY1rjhhY
and here inlined
include <vector>
const float defaultBaseResponse = 0.5;
class DForest {
public:
//based on FastForest::evaluate() and BDTree::parseTree()
DForest() {
}
float evaluate(const float* features) const;
std::vector<int> rootIndices_;
//"node" layout: cut, index, left, right
struct Node{
float v; int i,l,r;
constexpr int eval(float const * f) const {
#ifdef BL
auto m = f[i] > v;
return *((&l) + int(m));
#else
return f[i] > v ? r : l;
#endif
}
};
std::vector<Node> nodes_;
std::vector<float> responses_;
std::vector<float> baseResponses_;
};
float DForest::evaluate(const float* features) const{
float sum{defaultBaseResponse + baseResponses_[0]};
for(int index : rootIndices_){
do {
index = nodes_[index].eval(features);
} while (index>0);
sum += responses_[-index];
}
return sum;
}