BUG elf32-i386 R_386_PC32 done wrong
Hi! As author of the HotBasic compiler for Windows, in porting same to Linux, I find that ld does not properly link relative relocations (R_386_PC32) in correct elf32-i386 .o files. In particular, after opcode E8h (call), ld inserts a relative value which is 4 bytes too much, as if it did not take into account the position of the program counter which points to the *end* of the 4-byte value to be relocated. This happens both for procedures within the same module or in other modules in the link. I tentatively conclude that if ld works for all the elf files in a linux installation, it could only do so, if they contain such relocations, if all of those files had incorrect .text section relocation info to match (by reversing the 4 byte error) bug or fault in ld. I haven't looked yet, but ld would properly do the relative relocations if the symbol table address of a location in .o files was consistently 4 bytes less the real location of the symbol. In short, in this issue, ld is not compatible with its own stated standards for relocation. As it is now, HotBasic programs, which have correct relocation information in the .o files, cannot be linked with ld on Linux -- a major portability problem. Is there any plan to fix this or any advice re where in ld source a 4 byte correction could be added to compile a "good ld" capable of linking correct .o files. Ironically, a fixed ld would require fixing all the other software which apparently is producing incorrect .o files to match (reversing) the ld relocation error. TIA, Cordially, James Keene, PhD ___ bug-binutils mailing list bug-binutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-binutils
Re: BUG elf32-i386 R_386_PC32 done wrong
Long, long ago, Ian Lance Taylor, a life form in far off space, emitted: >doctor electron <[EMAIL PROTECTED]> writes: > >> As author of the HotBasic compiler for Windows, in porting same >> to Linux, I find that ld does not properly link relative >> relocations (R_386_PC32) in correct elf32-i386 .o files. > >GNU ld is correct according to the ELF ABI Processor Supplement for >i386 Processors. Thank you for your reply, Ian. The first smoking gun was described in my first email: ld overshoots the target for rel relocs within module by 4 bytes. This is undeniably a linker failure. The processor adds the value in the rel relocated address to eip ... and, period; that's it. ld does not know how i386 and essentially all other microprocessors work. There is no other credible explanation. >In typical use, the .o file will contain a 0xfffc in the four >bytes affected by R_386_PC32. Yes, this is what I predicted in my previous email and found in files such as acquirew.o; and which you now admit -- that all .o and .so files have to have a -4 fudge factor placed in such locations by compilers since ld knows not how to do rel relocs. If not fixed, the ld manual should, I think, "come clean" on this an state plainly that ld fails on rel relocs since it requires object files to contain a fudge factor to prevent this failure. The one and only formula for rel relocs is: S - (A + 4) where S is the symbol address and A is the location to be relocated relatively (the 4 byte field after E8 for example). Notice that the contents of that E8 field in the .obj or .o is irrelevant, it should be overwritten. [This is why it looks ridiculous (!) to see -4 in these locations in existing linux .o and .so files -- really looks like people have no idea what they are doing -- KeyStone Cops. I would like to be an advocate/promoter of Linux, but this, my friend, is totally second-class.] All the compiler has to do is allocate a 4-byte field with *any* value in the .obj or .o file and make an entry for it in the rel reloc table. The linker should *never* read the value in this rel reloc address; rather it should put the correct offset in it. This table contains entries of three values and the third is a code that the entry is absolute or relative relocation. So we are down to two values, which are precisely the S and A values above. For each module in the link, both values are referenced to the beginning of the .text (code) section -- 0. Thus, if a linker is concatenating .text sections from multiple modules (aligned 16 as we have seen), the "0" address for both S and A needs to be adjusted (but only when S and A are in different modules originally). Anyway, once you have the right S and A values, the formula above is applied and the result is stored at the A address. And you know why the formula works -- it is the way the processors work -- purely hardware related. Knowledge of how microprocessors work (re adding the offset to eip) goes back the beginning of the very first microprocessors of any kind. This is why it is amazing that *both* compiler writers and linker writers in linux seem to be completely uninformed about how the processors they use work, even in their best known and simplest aspects. Anyone who sees those -4's in existing .o and .so files cannot conclude anything other than "this Linux is bound together with rubber bands." SUMMARY: The two S and A values in the rel reloc table entries are the only thing needed to write the relocs into the executable. So I humbly ask again: Where is this code, so e might best find this code in ld, rewrite it correctly, and then ld would link all i386 formats, for both good and existing (contain the -4 gibberish) input files. This code might be buried in some include or bfd (?) module called by ld, but whatever it is, it seems to be feasible to do a completely general and complete fix to the benefit of all linux users. Since a rel-reloc-fixed ld would not read the -4 values, it would work both for good and fudge-factored (existing Linux) input files. As it is, this ld failure prevents porting valid object files into a Linux environment. Surely, Linux people do not intend to say, "Do not enter here", to the public looking at Linux. >R_386_PC32 is defined to add the PC relative offset from the start of the 4 >byte field >to the existing contents of the 4 byte field. Above is wrong. What could possibly be the "trial and error" origin of this definition? [Scenario: In early days of ld and linux-based compilers, someone found that a program ran it -4 was stuck in the object file. If this is true: This is "design by random trial and error"; why not use the real definition of rel relocs? Your statement is a crystal clear acknowledgment that ld is wrong. Thank you. I rest my case. The correct definition, I give above. Let
Re: BUG elf32-i386 R_386_PC32 done wrong
Long, long ago, Ian Lance Taylor, a life form in far off space, emitted: >the four bytes affected by R_386_PC32 Dear Ian, I think a single statement edit would fix ld re rel relocs: The place where we read the "four bytes affected" now is the equivalent of x = [the four bytes] ...or... mov eax,[esi] We need to change that one statement to the equivalent of x = -4 ...or... mov eax,-4 Would you be so kind as to inform us of the file/line number of that statement? If so, we'll recompile (I have a friend who can do that) and test the fixed ld on both my and existing object files. TIA, Jim ___ bug-binutils mailing list bug-binutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-binutils
Re: BUG elf32-i386 R_386_PC32 done wrong
Long, long ago, Eric Botcazou, a life form in far off space, emitted: >Interesting. Then your next task is to convince the dumb guys at Sun too >because their toolchain behaves exactly like the Linux toolchain... Thanks for this info, Eric. As you might see in Ian's thoughtful reply, I don't think he gets the point (maybe my failure to communicate well): ld can get the -4 on its own, rather than read it from "typical" input files and thereby conform to the rel reloc formula *and* remove the requirement that .o files contain -4 at all those locations, which must be a continuing source of shame and embarrassment to writers of existing Linux compilers (nasm, C, etc). Note that if ld coughs up its own -4 (per the formula I posted), all the existing .o/.so files would still link -- the -4 is a *constant* in the hardware-based formula and its presence in those .o/.so files should properly be ignored [so correct input files with any arbitrary value (e.g., 0) in those locations would also link]. Think about it, it is not so much as being right or wrong, smart or dumb (ld covers a huge range of architectures and formats; clearly its writers are very "smart"). ld's problem is that it must get the constant -4 from input files, when in fact, it does not need to be in the input files (the compiler making those only needs to allocate the "place-holder" 4 bytes addressed in the rel reloc entry). Clearly, ld is using the correct formula; the issue is *where* it gets the constant -4. If it did not try to read this from the input file, then all object files would be correctly relocated -- not just the ones that have -4 in them. I argue that this would be beneficial to Linux (Sun would be forced to follow on the quality upgrade in OS flag-ship linker). Does it not seem lame to require compiler writers to put a -4 gibberish in their object file outputs for the only reason that ld can't cough up a -4 all by itself? If, as Ian pointed out in his second reply (thank you), this is a very pervasive problem -- requiring all inputs to have -4 constants inserted by third party (compiler) programs, and now you add Sun, too, this is a big, news-worthy story -- involving issues of the "image" or "appearance" of competence and integrity in the Linux OS. Notice Ian gave no rationale beyond what I might call "This is the way we do it" and "Some ABI document covers us", as if any document changes how existing processors work. Nor did Ian even challenge my formula (which is really the formula of microprocessor makers). Rather he seemed to me to rely on "We all say this is it so it must be it", instead of addressing the absurdity of reading the constant -4 from input files in the first place, and thereby requiring third parties (compiler writers) to, in essence, corrupt their output files by mindlessly inserting this -4 constant at rel reloc addresses. SUMMARY: Your Sun data only confirms my analysis and augments the dimension of this big story. When and if this becomes public, so to speak, what are proponents of requiring the -4 constants in object files going to say? From what we see in this thread of posts, will they have nothing credible that makes any sense -- other than some ABI doc says it's OK? The enquiring public will want more substance than that. Me, I think we on this end can fix ld. So if Linux developers don't want to fix this; that's fine, the loss is reduced Linux credibility, I think. Worst, your compiler developers are hung out to dry: they are "innocent", so to speak, but are forced by ld developers to take the hit because they have no rational explanation of the -4's in their output files other than ld can't do rel relocs all by itself. The implications of an article -- "Scandal in Linux Cyberspace" -- documenting this "big story" are many. Why was this not fixed years ago? How could such an ill-conceived design (an object file must have a constant for a linker) ever have become a "canon of Linux"? Did these developers not realize that the linker outputs run only because the linker itself seems unaware of the correct formula and just needs to make its own -4 and not read it from object files? It seems that on historic development day 2, this constant would have be "moved" into linker code -- but that did not happen -- why? Why do Linux developers appear to want to keep correct .o files (no -4 constants in there) out of the Linux environment? Is the intent to keep quality software out of the Linux environment? The larger computing world outside Linux/Sun, etc, also has smart people producing useful object files -- in business, math, science, you name it. But ld does not know how to link them! What message are you sending? Is the message that objcopy writers now must also go in and insert the -4's in .text sections when converting from .obj to .o? All because ld can't find its own -4? [Free gift, cut and paste any -4 in this post and put it in ld source code!] In the absence o
Re: BUG elf32-i386 R_386_PC32 done wrong
Long, long ago, Eric Botcazou, a life form in far off space, emitted: >neither you nor us can change it now Thanks for your further thoughts -- and even bigger story! The answer to that is pending; I hope we can edit ld to not require -4 in those rel reloc locations in input files. If so, the market will decide; is a corrected ld favored or not. >but you are 20 years late That's the marvel. Why was this not corrected 20 years ago? IMHO, it is never too late to upgrade quality -- and my proposed correction would have a practical effect: (1) you could announce to compiler makers they don't need the -4 gibberish in the object file outputs anymore (and they could optionally remove that from their compilers) and (2) a huge resource of countless object files in the outside world of computing would be linkable on Linux. And you wouldn't even have to edit your ABI document. With a fixed ld, compiler makers could put the -4's as before or 0 instead or even random numbers in those locations. Are you saying, "Linux cannot be improved?" Seems so. IMHO, not good public relations re promoting the OS. Only problem may be: while you are free to nurture a "We are defeated" attitude, there is, oops, nothing to prevent others from doing the obvious improvements -- especially when it comes to the OS's flag-ship linker. Take care, Jim ___ bug-binutils mailing list bug-binutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-binutils
Re: BUG elf32-i386 R_386_PC32 done wrong
Long, long ago, Eric Botcazou, a life form in far off space, emitted: >> If so, the market will decide; is a corrected ld favored or not. > >It's already decided: your proposed change would break the ABI, hence break >binary compatibility by definition. The ABI states linkers cannot make -4 themselves, they have to read it from a file? Heck, let's break it! What are we waiting for? I agree it's "already decided". Last time I looked, Linux and the others mentioned in this thread have well less than 10% market share both for number of computer users and number of computers. That percent can improve if upgrading quality was more important than some document written a generation ago. Fact: ld fails to rel reloc a .text section location if it does not contain -4. This fact == low quality. >> That's the marvel. Why was this not corrected 20 years ago? > >Because, if you really think about it, the current definition of R_386_PC32 is >the right one. I gave the current definition in my previous emails (the only valid one is based on how microprocessors work) and yet ld fails to link as stated above. [hint: microprocessors don't know what we are saying or what ABI says; ld has no need whatsoever to get -4 from input files; ld writers should know that.] >Again, it's not Linux, the i386 ABI predates Linux, Linux only conformed to >the existing ABI. ABI again? Are you saying ABI doesn't know how to do rel relocs? Again, the location must contain the offset to the symbol from the current contents of the CPU eip register. Are you saying ABI contradicts that? What exactly does ABI have to do with ld's failure to do rel reloc? Or are you saying, ld should fail? I say, why? Why not do it right? Success is not as bad as it might seem! ;) j ___ bug-binutils mailing list bug-binutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-binutils
Re: BUG elf32-i386 R_386_PC32 done wrong
Long, long ago, Ian Lance Taylor, a life form in far off space, emitted: >If you ignore the contents of the .o file, then how do you propose to >handle the assembler code >call foo + 16 >? Very good question. Thank you. Apparently the ABI assemblers would put foo's address in the rel reloc Symbol entry and (16-4) in the location to be relocated. I get it. [But would vote for such compilers to evaluate foo + 16 and use that value ;)] I was wrong to assume that all assemblers would put the actual destination (foo + 16) in the rel reloc symbol entry and that therefore, -4 was a constant. Thanks for clarifying that really the -4 value is not (always) a constant, at least given the behavior of assemblers you describe. hmmm. OK, for my object files, alternate entry points all have labels, so for those the -4 is a constant and an ld assuming that (which would not work for the call you cite above) would still be of interest in my project. Let's see. Thanks again and best wishes, j ___ bug-binutils mailing list bug-binutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-binutils
Re: BUG elf32-i386 R_386_PC32 done wrong
Long, long ago, Ian Lance Taylor, a life form in far off space, emitted: >If you ignore the contents of the .o file, then how do you propose to >handle the assembler code >call foo + 16 >? ADDENDUM: Thanks again for this implied explanation, where apparently rel reloc info is split in two parts, one in the reloc table and the other in the location to be relocated. If this is correct, may I add some friendly thoughts? 1. I've contemplated writing my own "LoadLibrary" and "GetProcAddress" procedures, and your explanation (implied in your question above) will be essential to do so correctly. 2. As one who finds much in Linux to be very praiseworthy, I worry a bit about what seems to be such allegiance to a 20-year-old ABI doc which may be inconsistent with making quality improvements in the future. [No hardware maker would do such a thing, I think, and survive; no one makes 16 bit bus cards anymore.] In short, the interoperability rationale may loose its punch if the thing to be interoperable should best be discarded. 3. Whether the "call foo + 16" case above is the right place to break with this ABI, I don't know; but for fun and discussion, let's consider it. First, we might call this the "Laziness concession to compiler writers" in that they don't have to evaluate "foo + 16" above and just put it in the reloc table, with an autogenerated textual symbol if necessary, as "good boys and girls" would. If the contents of the rel reloc location were simply ignored, the message to compiler writers is that this splitting of reloc info is no longer supported, in favor of the "regular" simpler, cleaner rel reloc relocation table entry. Second, the -4 constant is still embedded in the location contents read by the linker -- another place where one could break with ABI. If linkers just used a -4 constant and the rel reloc info in the relocation table, in effect, some advantages of interoperability would be gained (all the .obj files would link, whether in .obj or .o form from objcopy), and compiler splitting of rel reloc info would be discarded. Compiler makers could choose to comply or not. To comply, they would have many options such as lea eax,foo add eax,16 call eax ;no rel reloc table entry! ...or put the value of foo + 16 as the symbol address in the reloc table as described above. The above may not be the point to break with ABI, but my friendly message is that major quality upgrades may be found by dropping provisions which are not presently deemed to be optimally efficient. Of course, this is contrary to the "ABI is written in stone" concept. Again, I don't know, but it might seem almost inconceivable that a technical specification written over 20 years ago, on almost anything in computing software or hardware, could be completely perfect or that the world will end if we drop support for items such as above or even add support for new items. If I understand, Microsoft has broken with ABI specs (e.g., they put 0 in the location to undergo rel relocation) and lightning did not strike them dead (actually, they have done just fine after doing so). SUMMARY: I seriously wonder if continuing OS upgrades and ABI are consistent, if ABI can prevent progress. For my purposes, I now know the "mess" that ABI supports for rel relocs in order to write code to do them at run-time. Thanks for that. Greetings, Jim ___ bug-binutils mailing list bug-binutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-binutils
Re: BUG elf32-i386 R_386_PC32 done wrong
Long, long ago, Ian Lance Taylor, a life form in far off space, emitted: >We would discard the ABI in a second if the benefit exceeds the cost. We agree; I'm happy. >What benefit would we gain by changing the definition of R_386_PC32? As stated, I don't know; the case was discussed as an example of a larger concept which (above) we agree on. >You have not described any benefit beyond abstract appeals to what you >think object files should look like. That doesn't count. Give us a >measurable benefit and we'll consider it. I did: the vast amount of .obj files containing useful procedures would become "interoperable". In terms of available software, one might estimate 10x more .obj files than .o files worldwide. Concreteness? When ld links ELF files, it produces a compact executable (very nice); when ld links COFF files, (a) it apparently cannot output a "very nice" compact ELF file, but rather the longer zero-padded PE format (for PE section alignment) which requires objcopy to convert to ELF with the result still having all the zero-padding. I'm good, at peace, Ian. Nothing to worry about. If my shop makes an ld that can bring all this .obj software into a Linux environment with the very nice compact ELF format (which, as you agree, ld cannot do now), I'm happy. Thank you again. Cheers, Jim ___ bug-binutils mailing list bug-binutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-binutils