On Jul 2, 2009, at 06:02, Paul Chavent wrote:
Hi.
I already have posted about the endianess attribute (http://gcc.gnu.org/ml/gcc/2008-11/threads.html#00146
).
For some year, i really need this feature on c projects.
Today i would like to go inside the internals of gcc, and i would
like to implement this feature as an exercise.
You already prevent me that it would be a hard task (aliasing,
etc.), but i would like to begin with basic specs.
As another gcc user (and, once upon a time, developer) who's had to
deal with occasional byte ordering issues (mainly in network
protocols), I can imagine some uses for something like this. But...
The spec could be :
- add an attribute (this description could change to be compatible
with existing ones (diabdata for example))
__attribute__ ((endian("big")))
__attribute__ ((endian("lil")))
I would use "little" spelled out, rather than trying to use some cute
abbreviation. Whether it should be a string vs a C token like little
or __little__, I don't know, or particularly care.
- this attribute only apply to ints
It should at least be any integral type -- short to long long or
whatever TImode is. (Technically maybe char/QImode could be allowed
but it wouldn't have any effect on code generation.) I wouldn't jump
to the conclusion that it would be useless for pointers or floating
point values, but I don't know what the use cases for those would be
like. However, I think that's a case where you could limit the
implementation initially, then expand the support later if needed,
unlike the pointer issue below.
- this attribute only apply to variables declaration
- a pointer to this variable don't inherit the attribute (this
behavior could change later, i don't know...)
This seems like a poor idea -- for one thing, my use cases would
probably involve something like pointers to unaligned big-endian
integers in allocated buffers, or maybe integer fields in packed
structures, again via pointers. (It looks like you may be trying to
handle the latter but not the former in the code you've got so far.)
For another, one operation that may be used in code refactoring
involves taking a bunch of code accessing some variable x (and
presumably similar blocks of code elsewhere that may use different
variables), and pulling it out into a separate function that takes the
address of the thing to be modified, passed in at the call sites to
the new function; if direct access to x and access via &x behave
differently under this attribute, suddenly this formerly reasonable
transformation is unsafe -- and perhaps worst of all, the behavior
change would be silent, since the compiler would have nothing to
complain about.
Also, changing the behavior later means changing the interpretation of
some code after deploying a compiler using one interpretation.
Consider this on a 32-bit little-endian machine:
unsigned int x __attribute__((endian("big"));
*&x = 0x12345678;
In normal C code without this attribute, reading and writing "*&x" is
the same as reading and writing x. In your proposed version, "*&x"
would use the little-endian interpretation, and "x" would use the big-
endian interpretation, with nothing at the site of the executable code
to indicate that the two should be different. But an expression like
this can come up naturally when dealing with macro expansions. Or,
someone using this attribute may write code depending on that
different handling of "*&x" to deal with a selected byte order in some
cases and native byte order in other cases. Then if you update the
compiler so that the attribute is passed along to the pointer type, in
the next release, suddenly the two cases behave the same -- breaking
the user's code when it worked under the previous compiler release.
If you support taking the address of specified-endianness variables at
all, you need to get the pointer handling right the first time around.
I would suggest that if you implement something like this, the
attribute should be associated with the data type, not the variable
decl; so in the declaration above, x wouldn't be treated specially,
but its type would be "big-endian unsigned int", a distinct type from
"int" (even on a big-endian machine, probably).
The one advantage I see to associating the attribute with the decl
rather than the type is that I could write:
uint32_t thing __attribute__((endian("big")));
rather than needing to figure out what uint32_t is in fundamental C
types and create a new typedef incorporating the underlying type plus
the attribute, kind of like how you can't write a declaration using
"signed size_t". But that's a long-standing issue in C, and I don't
think making the language inconsistent so you can fix the problem in
some cases but not others is a very good idea.
- the test case is
uint32_t x __attribute__ ((endian("big")));
uint32_t * ptr_x = x;
Related to my suggestions above, I think this assignment should get a
warning about incompatible pointer types.
Though, it brings up an interesting additional question -- should
pointers to big-endian int and "normal" int be compatible on big-
endian machines? Under C, "char", "unsigned char" and "signed char"
are three distinct types, even though "char" must functionally be the
same as one of the others. I'd suggest that probably the normal type
should be incompatible with both of the explicit-endian types, to help
make the code type-safe and not dependent on the target machine's byte
order.
Ken