BUG elf32-i386 R_386_PC32 done wrong

2006-06-17 Thread doctor electron
Hi!

As author of the HotBasic compiler for Windows, in porting same
to Linux, I find that ld does not properly link relative
relocations (R_386_PC32) in correct elf32-i386 .o files.

In particular, after opcode E8h (call), ld inserts a relative
value which is 4 bytes too much, as if it did not take into
account the position of the program counter which points to the
*end* of the 4-byte value to be relocated.  This happens both
for procedures within the same module or in other modules in the
link.

I tentatively conclude that if ld works for all the elf files in
a linux installation, it could only do so, if they contain such
relocations, if all of those files had incorrect .text section
relocation info to match (by reversing the 4 byte error) bug or
fault in ld.  I haven't looked yet, but ld would properly do the
relative relocations if the symbol table address of a location
in .o files was consistently 4 bytes less the real location of
the symbol.

In short, in this issue, ld is not compatible with its own
stated standards for relocation.

As it is now, HotBasic programs, which have correct relocation
information in the .o files, cannot be linked with ld on Linux
-- a major portability problem.  Is there any plan to fix this
or any advice re where in ld source a 4 byte correction could be
added to compile a "good ld" capable of linking correct .o
files.

Ironically, a fixed ld would require fixing all the other
software which apparently is producing incorrect .o files to
match (reversing) the ld relocation error.

TIA,

Cordially, James Keene, PhD



___
bug-binutils mailing list
bug-binutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-binutils


Re: BUG elf32-i386 R_386_PC32 done wrong

2006-06-23 Thread doctor electron
Long, long ago, Ian Lance Taylor, a life form in far off space,
emitted:

>doctor electron <[EMAIL PROTECTED]> writes:
>
>> As author of the HotBasic compiler for Windows, in porting same
>> to Linux, I find that ld does not properly link relative
>> relocations (R_386_PC32) in correct elf32-i386 .o files.
>
>GNU ld is correct according to the ELF ABI Processor Supplement for
>i386 Processors.

Thank you for your reply, Ian.  The first smoking gun was
described in my first email:  ld overshoots the target for rel
relocs within module by 4 bytes.  This is undeniably a linker
failure.  The processor adds the value in the rel relocated
address to eip ... and, period; that's it.  ld does not know how
i386 and essentially all other microprocessors work.  There is
no other credible explanation.

>In typical use, the .o file will contain a 0xfffc in the four
>bytes affected by R_386_PC32.

Yes, this is what I predicted in my previous email and found in
files such as acquirew.o; and which you now admit -- that all .o
and .so files have to have a -4 fudge factor placed in such
locations by compilers since ld knows not how to do rel relocs.

If not fixed, the ld manual should, I think, "come clean" on
this an state plainly that ld fails on rel relocs since it
requires object files to contain a fudge factor to prevent this
failure.

The one and only formula for rel relocs is:

S - (A + 4)

where S is the symbol address and A is the location to be
relocated relatively (the 4 byte field after E8 for example).
Notice that the contents of that E8 field in the
.obj or .o is irrelevant, it should be overwritten.

[This is why it looks ridiculous (!) to see -4 in these
locations in existing linux .o and .so files -- really looks
like people have no idea what they are doing -- KeyStone Cops.
I would like to be an advocate/promoter of Linux, but this, my
friend, is totally second-class.]  

All the compiler has to do is allocate a 4-byte field with *any*
value in the .obj or .o file and make an entry for it in the rel
reloc table.  The linker should *never* read the value in this
rel reloc address; rather it should put the correct offset in
it.

This table contains entries of three values and the third is a
code that the entry is absolute or relative relocation.  So we
are down to two values, which are precisely the S and A values
above.  For each module in the link, both values are referenced
to the beginning of the .text (code) section -- 0.  Thus, if a
linker is concatenating .text sections from multiple modules
(aligned 16 as we have seen), the "0" address for both S and A
needs to be adjusted (but only when S and A are in different
modules originally).  Anyway, once you have the right S and A
values, the formula above is applied and the result is stored at
the A address.  And you know why the formula works -- it is the
way the processors work -- purely hardware related.

Knowledge of how microprocessors work (re adding the offset to
eip) goes back the beginning of the very first microprocessors
of any kind.  This is why it is amazing that *both* compiler
writers and linker writers in linux seem to be completely
uninformed about how the processors they use work, even in their
best known and simplest aspects.  Anyone who sees those -4's in
existing .o and .so files cannot conclude anything other than
"this Linux is bound together with rubber bands."

SUMMARY:  The two S and A values in the rel reloc table entries
are the only thing needed to write the relocs into the
executable.

So I humbly ask again:  Where is this code, so e might best find
this code in ld, rewrite it correctly, and then ld would link
all i386 formats, for both good and existing (contain the -4
gibberish) input files.

This code might be buried in some include or bfd (?) module
called by ld, but whatever it is, it seems to be feasible to do
a completely general and complete fix to the benefit of all
linux users.  Since a rel-reloc-fixed ld would not read the -4
values, it would work both for good and fudge-factored (existing
Linux) input files.

As it is, this ld failure prevents porting valid object files
into a Linux environment.  Surely, Linux people do not intend to
say, "Do not enter here", to the public looking at Linux.

>R_386_PC32 is defined to add the PC relative offset from the start of the 4 
>byte field
>to the existing contents of the 4 byte field.

Above is wrong.  What could possibly be the "trial and error"
origin of this definition?  [Scenario:  In early days of ld and
linux-based compilers, someone found that a program ran it -4
was stuck in the object file.  If this is true:  This is "design
by random trial and error"; why not use the real definition of
rel relocs?

Your statement is a crystal clear acknowledgment that ld is
wrong.  Thank you.

I rest my case.  The correct definition, I give above.  Let

Re: BUG elf32-i386 R_386_PC32 done wrong

2006-06-23 Thread doctor electron
Long, long ago, Ian Lance Taylor, a life form in far off space,
emitted:

>the four bytes affected by R_386_PC32

Dear Ian, I think a single statement edit would fix ld re rel
relocs:  The place where we read the "four bytes affected" now
is the equivalent of

x = [the four bytes]  ...or... mov eax,[esi]

We need to change that one statement to the equivalent of

x = -4  ...or... mov eax,-4

Would you be so kind as to inform us of the file/line number of
that statement?  If so, we'll recompile (I have a friend who can
do that) and test the fixed ld on both my and existing object
files.  TIA, Jim



___
bug-binutils mailing list
bug-binutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-binutils


Re: BUG elf32-i386 R_386_PC32 done wrong

2006-06-24 Thread doctor electron
Long, long ago, Eric Botcazou, a life form in far off space,
emitted:

>Interesting.  Then your next task is to convince the dumb guys at Sun too 
>because their toolchain behaves exactly like the Linux toolchain...

Thanks for this info, Eric.  

As you might see in Ian's thoughtful reply, I don't think he
gets the point (maybe my failure to communicate well):

ld can get the -4 on its own, rather than read it from "typical"
input files and thereby conform to the rel reloc formula *and*
remove the requirement that .o files contain -4 at all those
locations, which must be a continuing source of shame and
embarrassment to writers of existing Linux compilers (nasm, C,
etc).  Note that if ld coughs up its own -4 (per the formula I
posted), all the existing .o/.so files would still link -- the
-4 is a *constant* in the hardware-based formula and its
presence in those .o/.so files should properly be ignored [so
correct input files with any arbitrary value (e.g., 0) in those
locations would also link].

Think about it, it is not so much as being right or wrong, smart
or dumb (ld covers a huge range of architectures and formats;
clearly its writers are very "smart").  ld's problem is that it
must get the constant -4 from input files, when in fact, it does
not need to be in the input files (the compiler making those
only needs to allocate the "place-holder" 4 bytes addressed in
the rel reloc entry).

Clearly, ld is using the correct formula; the issue is *where*
it gets the constant -4.  If it did not try to read this from
the input file, then all object files would be correctly
relocated -- not just the ones that have -4 in them.  I argue
that this would be beneficial to Linux (Sun would be forced to
follow on the quality upgrade in OS flag-ship linker).

Does it not seem lame to require compiler writers to put a -4
gibberish in their object file outputs for the only reason that
ld can't cough up a -4 all by itself?

If, as Ian pointed out in his second reply (thank you), this is
a very pervasive problem -- requiring all inputs to have -4
constants inserted by third party (compiler) programs, and now
you add Sun, too, this is a big, news-worthy story -- involving
issues of the "image" or "appearance" of competence and
integrity in the Linux OS.

Notice Ian gave no rationale beyond what I might call "This is
the way we do it" and "Some ABI document covers us", as if any
document changes how existing processors work.  Nor did Ian even
challenge my formula (which is really the formula of
microprocessor makers).  Rather he seemed to me to rely on "We
all say this is it so it must be it", instead of addressing the
absurdity of reading the constant -4 from input files in the
first place, and thereby requiring third parties (compiler
writers) to, in essence, corrupt their output files by
mindlessly inserting this -4 constant at rel reloc addresses.

SUMMARY:  Your Sun data only confirms my analysis and augments
the dimension of this big story.  When and if this becomes
public, so to speak, what are proponents of requiring the -4
constants in object files going to say?

From what we see in this thread of posts, will they have nothing
credible that makes any sense -- other than some ABI doc says
it's OK?  The enquiring public will want more substance than
that.

Me, I think we on this end can fix ld.  So if Linux developers
don't want to fix this; that's fine, the loss is reduced Linux
credibility, I think.

Worst, your compiler developers are hung out to dry:  they are
"innocent", so to speak, but are forced by ld developers to take
the hit because they have no rational explanation of the -4's in
their output files other than ld can't do rel relocs all by
itself.  

The implications of an article -- "Scandal in Linux Cyberspace"
-- documenting this "big story" are many.  Why was this not
fixed years ago?  How could such an ill-conceived design (an
object file must have a constant for a linker) ever have become
a "canon of Linux"?  Did these developers not realize that the
linker outputs run only because the linker itself seems unaware
of the correct formula and just needs to make its own -4 and not
read it from object files?  It seems that on historic
development day 2, this constant would have be "moved" into
linker code -- but that did not happen -- why?

Why do Linux developers appear to want to keep correct .o files
(no -4 constants in there) out of the Linux environment?  Is the
intent to keep quality software out of the Linux environment?  
The larger computing world outside Linux/Sun, etc, also has
smart people producing useful object files -- in business, math,
science, you name it.  But ld does not know how to link them!

What message are you sending?  

Is the message that objcopy writers now must also go in and
insert the -4's in .text sections when converting from .obj to
.o?  All because ld can't find its own -4?  [Free gift, cut and
paste any -4 in this post and put it in ld source code!] 

In the absence o

Re: BUG elf32-i386 R_386_PC32 done wrong

2006-06-24 Thread doctor electron
Long, long ago, Eric Botcazou, a life form in far off space,
emitted:

>neither you nor us can change it now

Thanks for your further thoughts -- and even bigger story!

The answer to that is pending; I hope we can edit ld to not
require -4 in those rel reloc locations in input files.

If so, the market will decide; is a corrected ld favored or not.

>but you are 20 years late

That's the marvel.  Why was this not corrected 20 years ago?
IMHO, it is never too late to upgrade quality -- and my proposed
correction would have a practical effect: (1) you could announce
to compiler makers they don't need the -4 gibberish in the
object file outputs anymore (and they could optionally remove
that from their compilers) and (2) a huge resource of countless
object files in the outside world of computing would be linkable
on Linux.

And you wouldn't even have to edit your ABI document.  With a
fixed ld, compiler makers could put the -4's as before or 0
instead or even random numbers in those locations.

Are you saying, "Linux cannot be improved?"  Seems so.  IMHO,
not good public relations re promoting the OS.

Only problem may be: while you are free to nurture a "We are
defeated" attitude, there is, oops, nothing to prevent others
from doing the obvious improvements -- especially when it comes
to the OS's flag-ship linker.

Take care, Jim



___
bug-binutils mailing list
bug-binutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-binutils


Re: BUG elf32-i386 R_386_PC32 done wrong

2006-06-24 Thread doctor electron
Long, long ago, Eric Botcazou, a life form in far off space,
emitted:

>> If so, the market will decide; is a corrected ld favored or not.
>
>It's already decided: your proposed change would break the ABI, hence break 
>binary compatibility by definition.

The ABI states linkers cannot make -4 themselves, they have to
read it from a file?  Heck, let's break it!  What are we waiting
for?

I agree it's "already decided".  Last time I looked, Linux and
the others mentioned in this thread have well less than 10%
market share both for number of computer users and number of
computers.  That percent can improve if upgrading quality was
more important than some document written a generation ago.

Fact:  ld fails to rel reloc a .text section location if it does
not contain -4.  This fact == low quality.

>> That's the marvel.  Why was this not corrected 20 years ago?
>
>Because, if you really think about it, the current definition of R_386_PC32 is 
>the right one.

I gave the current definition in my previous emails (the only
valid one is based on how microprocessors work) and yet ld fails
to link as stated above.  [hint: microprocessors don't know what
we are saying or what ABI says; ld has no need whatsoever to get
-4 from input files; ld writers should know that.]

>Again, it's not Linux, the i386 ABI predates Linux, Linux only conformed to 
>the existing ABI.

ABI again?  Are you saying ABI doesn't know how to do rel
relocs?  Again, the location must contain the offset to the
symbol from the current contents of the CPU eip register.  Are
you saying ABI contradicts that?  What exactly does ABI have to
do with ld's failure to do rel reloc?  Or are you saying, ld
should fail?  I say, why?  Why not do it right?  Success is not
as bad as it might seem!  ;) j
  


___
bug-binutils mailing list
bug-binutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-binutils


Re: BUG elf32-i386 R_386_PC32 done wrong

2006-06-24 Thread doctor electron
Long, long ago, Ian Lance Taylor, a life form in far off space,
emitted:

>If you ignore the contents of the .o file, then how do you propose to
>handle the assembler code
>call foo + 16
>?

Very good question.  Thank you.  Apparently the ABI assemblers
would put foo's address in the rel reloc Symbol entry and (16-4)
in the location to be relocated.  I get it.  [But would vote for
such compilers to evaluate foo + 16 and use that value ;)]

I was wrong to assume that all assemblers would put the actual
destination (foo + 16) in the rel reloc symbol entry and that
therefore, -4 was a constant.

Thanks for clarifying that really the -4 value is not (always) a
constant, at least given the behavior of assemblers you
describe.

hmmm.  OK, for my object files, alternate entry points all have
labels, so for those the -4 is a constant and an ld assuming
that (which would not work for the call you cite above) would
still be of interest in my project.  Let's see.

Thanks again and best wishes, j



___
bug-binutils mailing list
bug-binutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-binutils


Re: BUG elf32-i386 R_386_PC32 done wrong

2006-06-25 Thread doctor electron
Long, long ago, Ian Lance Taylor, a life form in far off space,
emitted:

>If you ignore the contents of the .o file, then how do you propose to
>handle the assembler code
>call foo + 16
>?

ADDENDUM:
Thanks again for this implied explanation, where apparently rel
reloc info is split in two parts, one in the reloc table and the
other in the location to be relocated.  If this is correct, may
I add some friendly thoughts?

1. I've contemplated writing my own "LoadLibrary" and
"GetProcAddress" procedures, and your explanation (implied in
your question above) will be essential to do so correctly.

2. As one who finds much in Linux to be very praiseworthy, I
worry a bit about what seems to be such allegiance to a
20-year-old ABI doc which may be inconsistent with making
quality improvements in the future.  [No hardware maker would do
such a thing, I think, and survive; no one makes 16 bit bus
cards anymore.]  In short, the interoperability rationale may
loose its punch if the thing to be interoperable should best be
discarded.

3. Whether the "call foo + 16" case above is the right place to
break with this ABI, I don't know; but for fun and discussion,
let's consider it.

First, we might call this the "Laziness concession to compiler
writers" in that they don't have to evaluate "foo + 16" above
and just put it in the reloc table, with an autogenerated
textual symbol if necessary, as "good boys and girls" would.  If
the contents of the rel reloc location were simply ignored, the
message to compiler writers is that this splitting of reloc info
is no longer supported, in favor of the "regular" simpler,
cleaner rel reloc relocation table entry.

Second, the -4 constant is still embedded in the location
contents read by the linker -- another place where one could
break with ABI.  If linkers just used a -4 constant and the rel
reloc info in the relocation table, in effect, some advantages
of interoperability would be gained (all the .obj files would
link, whether in .obj or .o form from objcopy), and compiler
splitting of rel reloc info would be discarded.  Compiler makers
could choose to comply or not.  To comply, they would have many
options such as

lea eax,foo
add eax,16
call eax  ;no rel reloc table entry!

...or put the value of foo + 16 as the symbol address in the
reloc table as described above.

The above may not be the point to break with ABI, but my
friendly message is that major quality upgrades may be found by
dropping provisions which are not presently deemed to be
optimally efficient.  Of course, this is contrary to the "ABI is
written in stone" concept.

Again, I don't know, but it might seem almost inconceivable that
a technical specification written over 20 years ago, on almost
anything in computing software or hardware, could be completely
perfect or that the world will end if we drop support for items
such as above or even add support for new items.

If I understand, Microsoft has broken with ABI specs (e.g., they
put 0 in the location to undergo rel relocation) and lightning
did not strike them dead (actually, they have done just fine
after doing so).

SUMMARY:  I seriously wonder if continuing OS upgrades and ABI
are consistent, if ABI can prevent progress.

For my purposes, I now know the "mess" that ABI supports for rel
relocs in order to write code to do them at run-time.  Thanks
for that.

Greetings, Jim



___
bug-binutils mailing list
bug-binutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-binutils


Re: BUG elf32-i386 R_386_PC32 done wrong

2006-06-25 Thread doctor electron
Long, long ago, Ian Lance Taylor, a life form in far off space,
emitted:

>We would discard the ABI in a second if the benefit exceeds the cost.

We agree; I'm happy.

>What benefit would we gain by changing the definition of R_386_PC32?

As stated, I don't know; the case was discussed as an example of
a larger concept which (above) we agree on.

>You have not described any benefit beyond abstract appeals to what you
>think object files should look like.  That doesn't count.  Give us a
>measurable benefit and we'll consider it.

I did: the vast amount of .obj files containing useful
procedures would become "interoperable".  In terms of available
software, one might estimate 10x more .obj files than .o files
worldwide.

Concreteness?  When ld links ELF files, it produces a compact
executable (very nice); when ld links COFF files, (a) it
apparently cannot output a "very nice" compact ELF file, but
rather the longer zero-padded PE format (for PE section
alignment) which requires objcopy to convert to ELF with the
result still having all the zero-padding.

I'm good, at peace, Ian.  Nothing to worry about.  If my shop
makes an ld that can bring all this .obj software into a Linux
environment with the very nice compact ELF format (which, as you
agree, ld cannot do now), I'm happy.  Thank you again. 

Cheers, Jim



___
bug-binutils mailing list
bug-binutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-binutils