[Bug ld/10774] New: Bogus documentation

konrad dot schwarz at siemens dot com Wed, 14 Oct 2009 03:04:48 -0700

Chapter 3.5.4, "Source Code Reference", of the ld Manual is so inaccurate and
inconsistent in its use of vocabulary with the rest of the manual that it should
be replaced.  A detailed critique is below; I suggest the following replacement:


3.5.4 Accessing Symbols defined in Linker Scripts in Source Code
----------------------------------------------------------------
The value of a symbol is its address.  Thus, to access a symbol's
value, declare it an external variable and use its address.

Note that in most cases, symbols defined by linker scripts do *not*
have any associated storage assigned to them, so it is typically an
error to read from or write to such an external variable!

For example, the Unix System V documentation traditionally
uses the following C declarations for the end of the text segment, the
end of the data segment, and the end of the BSS segment, which System V
marks with the symbols ``etext'', ``edata'', and ``end'':

extern etext;
extern edata;
extern end;

Note that these declarations implicitly use a type of ``int''.

One can choose the type most appropriate to the application, because type
checking is not done during link editing.  E.g., declaring such symbols
as incomplete arrays of const char enables the C compiler to diagnose writes,
reads (without array dereference) and use of the sizeof operator as errors:

extern char const end [];

Finally, note that some systems perform a
transformation between variable names as used in a high-level language and
symbol names as seen by the linker.  The transformation is part of the ABI.
E.g., a.out and COFF(?)-based systems prepend an underscore
to variable names to arrive at the symbol name---this is done to create
separate name spaces for high-level language modules and assembly language
modules.  Symbol names must take this transformation into account: e.g.,
the above symbols would be named ``_etext'', ``_edata'', and ``_end'' on
such systems.

In C++, the ``extern "C"'' modifier can be used to suppress the additional
"mangling" of variable names done by that language.

CRITIQUE OF CURRENT TEXT

File: ld.info,  Node: Source Code Reference,  Prev: PROVIDE_HIDDEN,  Up: Assign\
ments

3.5.4 Source Code Reference
---------------------------

Accessing a linker script defined variable  from source code is not
>>                                symbol
intuitive.  In particular a linker script symbol is not equivalent to a
variable declaration in a high level language, it is instead a symbol
that does not have a value.
>>                     ??? It has a value, it just might not have storage
>> associated with it.  This node's parent is titled "Assigning values to 
>> Symbols"!

   Before going further, it is important to note that compilers often
transform names in the source code into different names when they are
stored in the symbol table.  For example, Fortran compilers commonly
>>      That mangling is defined by the ABI should be mentioned
prepend or append an underscore, and C++ performs extensive `name
mangling'.  Therefore there might be a discrepancy between the name of
a variable as it is used in source code and the name of the same
variable as it is defined in a linker script.  For example in C a
linker script variable might be referred to as:

       extern int foo;

   But in the linker script it might be defined as:

       _foo = 1000;

   In the remaining examples however it is assumed that no name
transformation has taken place.

   When a symbol is declared in a high level language such as C, two
things happen.  The first is that the compiler reserves enough space in
the program's memory to hold the _value_ of the symbol.  The second is
>>                               data of the variable
that the compiler creates an entry in the program's symbol table which
>>       technically, for gcc, the assembler
>>                                        object file's
holds the symbol's _address_.  ie the symbol table contains the address
of the block of memory holding the symbol's value.  So for example the
following C declaration, at file scope:

       int foo = 1000;

   creates a entry called `foo' in the symbol table.  This entry holds
the address of an `int' sized block of memory where the number 1000 is
initially stored.

   When a program references a symbol the compiler generates code that
first accesses the symbol table to find the address of the symbol's
>>     Utter nonsense!
memory block and then code to read the value from that memory block.
So:

       foo = 1;

   looks up the symbol `foo' in the symbol table, gets the address
associated with this symbol and then writes the value 1 into that
address.  Whereas:

       int * a = & foo;

   looks up the symbol `foo' in the symbol table, gets it address and
then copies this address into the block of memory associated with the
variable `a'.

   Linker scripts symbol declarations, by contrast, create an entry in
the symbol table but do not assign any memory to them.  Thus they are
an address without a value.  So for example the linker script
>>  Again, this is completely at variance to how the rest of the manual
>> defines the "value" of a symbol, namely as its address for normal symbols
>> or [sic] its value for absolute symbols.
definition:

       foo = 1000;

   creates an entry in the symbol table called `foo' which holds the
address of memory location 1000, but nothing special is stored at
address 1000.  This means that you cannot access the _value_ of a
linker script defined symbol - it has no value - all you can do is
access the _address_ of a linker script defined symbol.
>> See above

   Hence when you are using a linker script defined symbol in source
code you should always take the address of the symbol, and never
attempt to use its value.  For example suppose you want to copy the
contents of a section of memory called .ROM into a section called
.FLASH and the linker script contains these declarations:

       start_of_ROM   = .ROM;
       end_of_ROM     = .ROM + sizeof (.ROM) - 1;
       start_of_FLASH = .FLASH;
       start_of_FLASH = .FLASH;

   Then the C source code to perform the copy would be:

       extern char start_of_ROM, end_of_ROM, start_of_FLASH;
>> A better practice is to define these variables as char start_of_ROM [], etc.
>> This causes the compiler to complain if these variables are read from or
>> written to, e.g., if the address-of operator & is forgotten, as the author
>> describes below.

>> Furthermore, non-writable sections should be const qualified.

       memcpy (& start_of_FLASH, & start_of_ROM, & end_of_ROM - & start_of_ROM)\
;

   Note the use of the `&' operators.  These are correct.

-- 
           Summary: Bogus documentation
           Product: binutils
           Version: 2.19
            Status: NEW
          Severity: normal
          Priority: P2
         Component: ld
        AssignedTo: unassigned at sources dot redhat dot com
        ReportedBy: konrad dot schwarz at siemens dot com
                CC: bug-binutils at gnu dot org


http://sourceware.org/bugzilla/show_bug.cgi?id=10774

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


_______________________________________________
bug-binutils mailing list
bug-binutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-binutils

[Bug ld/10774] New: Bogus documentation

Reply via email to