On 8/3/20 2:18 PM, Paul Smith wrote:
I'm testing upgrading from GCC 9.3 to 10.2 and I'm seeing this new
warning:
$ g++ --version
x86_64-unknown-linux-gnu-g++ (GCC) 10.2.0
...
$ g++ -Wall -Werror -O2 -c -o stringop.o stringop.cpp
In member function 'void LeafNode::markUpdate()',
inlined from 'void applyTo(LeafNode*)' at stringop.cpp:45:21:
stringop.cpp:37:33: error: writing 1 byte into a region of size 0
[-Werror=stringop-overflow=]
37 | void markUpdate() { flags() |= UPDATE; }
| ~~~~~~~~^~~~~~~~~
stringop.cpp: In function 'void applyTo(LeafNode*)':
stringop.cpp:34:7: note: at offset 0 to object 'LeafNode::<anonymous>' with
size 4 declared here
34 | class LeafNode : public NodeWithPayload<LeafNodePayload>
| ^~~~~~~~
cc1plus: all warnings being treated as errors
It could well be that this code is dodgy; I take no responsibility for
it :). It's part of a very tricky and performance- and memory-
sensitive area of the software.
I've stripped out tons of stuff and gotten it down to this, which is
still probably not the minimal test case but is pretty small. If, for
example, I take out the _sequence field in LeafNodePayload, it doesn't
give any warnings.
The repro case is provided below. If people think this is actually a
bug I can file an issue.
The warning is issued for the MEM_REF assignments in on the IL below.
It determines the offset (unsigned char &)_11 + 8 is in the range
[16, 65552] from the base subobject of the object node points to,
which is past the end. The warning is by design correct because
there's nothing to suggest that the memory beyond the end of
the subobject can be accessed.
applyTo (struct LeafNode * node)
{
struct NodeWithPayload * _4;
short unsigned int _5;
long unsigned int _6;
long unsigned int _7;
long unsigned int _8;
long unsigned int _9;
long unsigned int _10;
struct LeafNodePayload & _11;
unsigned char _12;
unsigned char _13;
<bb 2> [local count: 1073741824]:
_4 = &node_2(D)->D.2380;
_5 = MEM[(struct NodeWithPayload *)node_2(D)].D.2371._keyLength;
_6 = (long unsigned int) _5;
_7 = _6 + 1;
_8 = _7 >> 3;
_9 = _8 + 1;
# RANGE [8, 65544] NONZERO 131064
_10 = _9 * 8;
_11 = _4 + _10;
_12 = MEM[(unsigned char &)_11 + 8];
_13 = _12 | 4;
MEM[(unsigned char &)_11 + 8] = _13;
return;
}
If the code is designed to treat Node sort of like a struct with
a flexible array member I would suggest to make that explicit by
adding a zero-element array member to Node and using it to access
other memory. E.g., add:
unsigned char data[0];
as the last member of Node and change getPayload to:
PAYLOAD& getPayload() {
return *(reinterpret_cast<PAYLOAD*>(data) +
Node::actualSize(_keyLength, alignof(PAYLOAD)));
}
Martin
-----------------------------------------------------
nclude <cstddef>
#include <cstdint>
constexpr unsigned char UPDATE = 0x4;
class Node
{
protected:
static size_t actualSize(uint16_t keyLength, size_t alignment) {
return alignment * (((offsetof(Node, key) + keyLength - 1) /
alignment) + 1);
}
public:
const uint16_t _keyLength;
struct {} key;
};
template<typename PAYLOAD>
class NodeWithPayload : public Node
{
protected:
PAYLOAD& getPayload() {
return *(reinterpret_cast<PAYLOAD*>(reinterpret_cast<unsigned
char*>(this) + Node::actualSize(_keyLength, alignof(PAYLOAD))));
}
};
class LeafNodePayload
{
public:
uint64_t _sequence;
unsigned char _flags;
};
class LeafNode : public NodeWithPayload<LeafNodePayload>
{
public:
void markUpdate() { flags() |= UPDATE; }
private:
unsigned char& flags() { return getPayload()._flags; }
};
void applyTo(LeafNode* node)
{
node->markUpdate();
}