On Jul 2, 2009, at 06:02, Paul Chavent wrote:
Hi.

I already have posted about the endianess attribute (http://gcc.gnu.org/ml/gcc/2008-11/threads.html#00146 ).

For some year, i really need this feature on c projects.

Today i would like to go inside the internals of gcc, and i would like to implement this feature as an exercise.

You already prevent me that it would be a hard task (aliasing, etc.), but i would like to begin with basic specs.

As another gcc user (and, once upon a time, developer) who's had to deal with occasional byte ordering issues (mainly in network protocols), I can imagine some uses for something like this. But...

The spec could be :

- add an attribute (this description could change to be compatible with existing ones (diabdata for example))

 __attribute__ ((endian("big")))
 __attribute__ ((endian("lil")))

I would use "little" spelled out, rather than trying to use some cute abbreviation. Whether it should be a string vs a C token like little or __little__, I don't know, or particularly care.

- this attribute only apply to ints

It should at least be any integral type -- short to long long or whatever TImode is. (Technically maybe char/QImode could be allowed but it wouldn't have any effect on code generation.) I wouldn't jump to the conclusion that it would be useless for pointers or floating point values, but I don't know what the use cases for those would be like. However, I think that's a case where you could limit the implementation initially, then expand the support later if needed, unlike the pointer issue below.

- this attribute only apply to variables declaration

- a pointer to this variable don't inherit the attribute (this behavior could change later, i don't know...)

This seems like a poor idea -- for one thing, my use cases would probably involve something like pointers to unaligned big-endian integers in allocated buffers, or maybe integer fields in packed structures, again via pointers. (It looks like you may be trying to handle the latter but not the former in the code you've got so far.) For another, one operation that may be used in code refactoring involves taking a bunch of code accessing some variable x (and presumably similar blocks of code elsewhere that may use different variables), and pulling it out into a separate function that takes the address of the thing to be modified, passed in at the call sites to the new function; if direct access to x and access via &x behave differently under this attribute, suddenly this formerly reasonable transformation is unsafe -- and perhaps worst of all, the behavior change would be silent, since the compiler would have nothing to complain about.

Also, changing the behavior later means changing the interpretation of some code after deploying a compiler using one interpretation. Consider this on a 32-bit little-endian machine:

  unsigned int x __attribute__((endian("big"));
  *&x = 0x12345678;

In normal C code without this attribute, reading and writing "*&x" is the same as reading and writing x. In your proposed version, "*&x" would use the little-endian interpretation, and "x" would use the big- endian interpretation, with nothing at the site of the executable code to indicate that the two should be different. But an expression like this can come up naturally when dealing with macro expansions. Or, someone using this attribute may write code depending on that different handling of "*&x" to deal with a selected byte order in some cases and native byte order in other cases. Then if you update the compiler so that the attribute is passed along to the pointer type, in the next release, suddenly the two cases behave the same -- breaking the user's code when it worked under the previous compiler release. If you support taking the address of specified-endianness variables at all, you need to get the pointer handling right the first time around.

I would suggest that if you implement something like this, the attribute should be associated with the data type, not the variable decl; so in the declaration above, x wouldn't be treated specially, but its type would be "big-endian unsigned int", a distinct type from "int" (even on a big-endian machine, probably).

The one advantage I see to associating the attribute with the decl rather than the type is that I could write:

  uint32_t thing __attribute__((endian("big")));

rather than needing to figure out what uint32_t is in fundamental C types and create a new typedef incorporating the underlying type plus the attribute, kind of like how you can't write a declaration using "signed size_t". But that's a long-standing issue in C, and I don't think making the language inconsistent so you can fix the problem in some cases but not others is a very good idea.

- the test case is

 uint32_t x __attribute__ ((endian("big")));
 uint32_t * ptr_x = x;

Related to my suggestions above, I think this assignment should get a warning about incompatible pointer types.

Though, it brings up an interesting additional question -- should pointers to big-endian int and "normal" int be compatible on big- endian machines? Under C, "char", "unsigned char" and "signed char" are three distinct types, even though "char" must functionally be the same as one of the others. I'd suggest that probably the normal type should be incompatible with both of the explicit-endian types, to help make the code type-safe and not dependent on the target machine's byte order.

Ken

Reply via email to