"GOT" under aarch64
Hi I am writing a code generator for ARM64. To access a global variable I was generating addrp x0,someglobal add x0,[x0,:lo12:someglobal] This worked without any problems with gcc version 4.9.2 (Debian/Linaro 4.9.2-10) and GNU ld (GNU Binutils for Debian) 2.25. I have updated my system and now with gcc version 6.3.0 20170516 (Debian 6.3.0-18) and GNU ld (GNU Binutils for Debian) 2.28. The linker complains about illegal relocations. Investigating this, I noticed that now gcc generates adrp x0, :got:stderr ldr x0, [x0, #:got_lo12:stderr] I changed now my code generator and it works again. The problem for me is: 1) How can I know what I should generate? Should I figure out the gcc version installed? 2) Is there any documentation for this change somewhere? What does it mean? 3) What should be a portable solution for this problem? Thanks in advance for your time. Jacob
-pie option in ARM64 environment
Hi I am getting this error: GNU ld (GNU Binutils for Debian) 2.28 /usr/bin/ld: error.o: relocation R_AARCH64_ADR_PREL_PG_HI21 against external symbol `stderr@@GLIBC_2.17' can not be used when making a shared object; recompile with -fPIC The problem is, I do NOT want to make a shared object! Just a plain executable. The verbose linker options are as follows: collect2 version 6.3.0 20170516 /usr/bin/ld -plugin /usr/lib/gcc/aarch64-linux-gnu/6/liblto_plugin.so -plugin-opt=/usr/lib/gcc/aarch64-linux-gnu/6/lto-wrapper -plugin-opt=-fresolution=/tmp/cc9I00ft.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --sysroot=/ --build-id --eh-frame-hdr --hash-style=gnu -dynamic-linker /lib/ld-linux-aarch64.so.1 -X -EL -maarch64linux --fix-cortex-a53-843419 -pie -o lcc /usr/lib/gcc/aarch64-linux-gnu/6/../../../aarch64-linux-gnu/Scrt1.o /usr/lib/gcc/aarch64-linux-gnu/6/../../../aarch64-linux-gnu/crti.o /usr/lib/gcc/aarch64-linux-gnu/6/crtbeginS.o -L/usr/lib/gcc/aarch64-linux-gnu/6 -L/usr/lib/gcc/aarch64-linux-gnu/6/../../../aarch64-linux-gnu -L/usr/lib/gcc/aarch64-linux-gnu/6/../../../../lib -L/lib/aarch64-linux-gnu -L/lib/../lib -L/usr/lib/aarch64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/aarch64-linux-gnu/6/../../.. alloc.o bind.o dag.o decl.o enode.o error.o backend-arm.o intrin.o event.o expr.o gen.o init.o input.o lex.o arm64.o list.o operators.o main.o ncpp.o output.o simp.o msg.o callwin64.o bitmasktable.o table.o stmt.o string.o stab.o sym.o Tree.o types.o analysis.o asm.o inline.o -lm ../lcclib.a ../bfd/libbfd.a ../asm/libopcodes.a -Map=lcc.map -v -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc/aarch64-linux-gnu/6/crtendS.o /usr/lib/gcc/aarch64-linux-gnu/6/../../../aarch64-linux-gnu/crtn.o I think the problems lies in this mysterious "pie" option: ... --fix-cortex-a53-843419 -pie -o lcc... "PIE" could stand for Position Independent Executable. How could I get rid of that? Which text file where is responsible for adding this "pie" option to the ld command line? I am not so well versed in gcc's internals to figure out without your help. Thanks in advance. Jacob
Re: -pie option in ARM64 environment
Le 29/09/2017 à 13:22, Marc Glisse a écrit : -no-pie probably. YES! It just did not occur to me, I should have figured out alone. Thanks to all that answered. jacob
Caching globals in registers
When doing some tests with my ARM64 code generator, I saw the performances of my software drop from 75% of gcc's speed to just 60%. What was happening? One of the reasons (besides the nullity of my code generator of course) was that gcc cached the values of the global "table" in registers, reading it just once. Since there are many accesses in the most busy function in the program, gcc speeds up considerably. Clever, but there is a problem with that: the generated program becomes completely thread unfriendly. It will read the value once, and then, even if another thread modifies the value, it will use the old value. I read always the value from memory, allowing fast access to globals without too many locks. Some optimizations contradict the "least surprise" principle, and I think they are not worth the effort. They could be optional, for single threaded programs but that decision is better to leave it at the user's discretion and not implemented by default with -O2. "-O2" is the standard gcc's optimization level seen since years everywhere. Maybe it would be worth considering moving that to O4 or even O9? Lock operations are expensive. Access to globals can be cached only when they are declared const, and that wasn't the case for the program being compiled. Suppose (one of the multiple scenarios) that you store in a set of memory locations data like wind speed, temperature, etc. Only the thread that updates that table acquires a lock. All others access the data without any locks in read mode. A program generated with this optimizations reads it once. That's a bug... Compilers are very complex, and the function that I used was a leaf function. Highly important functions where you tend to optimize aggresively. They aren't supposed to last for a long time anyway, so the caching can't hurt, and in this case was dead right since speed increases notably. What do you think? jacob
Re: Caching globals in registers
Le 05/11/2017 à 20:43, Jakub Jelinek a écrit : A bug in the program that does that. You can use volatile, or atomics (including e.g. relaxed __atomic_load, which isn't really expensive). Yeah true. Maybe I will cache them in registers too, it is not very difficult. Still, I think I will do it only in leaf functions. In non-leaf functions it (seems to me) a departure of the default that can be dangerous. Yes, the user can declare them as such, and then nothing happens. Just old software stops working and you do not know why. Until you get to what is not working it takes a LOT of effort, then seeing that the variable is not getting read again from the assembly code is easy, of course. Anybody can do it. GCC is used in many contexts, as you all know.
Debugging optimizer problems
Hi I am confronted with a classical problem: a program gives correct results when compiled with optimizations off, and gives the wrong ones with optimization (-O2) on. I have isolated the probem in a single file but now there is no way that I can further track down the problem to one of the many functions in that file. I have in my small C compiler introduced the following construct: #pragma optimize(on/off,push/pop) to deal with optimizer bugs. #pragma optimize(off) turns OFF all optimizations until a #pragma optimize(on) is seen or until the end of the compiulation unit. If #pragma optimize(off,push) is given, the optimization state can be retrieved with a #pragma optimize(pop), or #pragma optimize(on) This has three advantages: 1) Allows the user to reduce the code area where the problem is hiding. 2) Provides a work around to the user for any optimizer bug. 3) Allows gcc developers to find bugs in a more direct fashion. These pragmas can only be given at a global scope, not within a function. I do not know gcc internals, and this improvement could be difficult to implement, and I do not know either your priorities in gcc development but it surely would help users. Obviously I think that the problem is in the code I am compiling, not in gcc, but it *could* be in gcc. That construct would help enormously. Thanks in advance for your time. jacob
Re: Debugging optimizer problems
Le 02/02/2018 à 22:11, Florian Weimer a écrit : * jacob navia: I have in my small C compiler introduced the following construct: #pragma optimize(on/off,push/pop) Not sure what you are after. GCC has something quite similar: <https://gcc.gnu.org/onlinedocs/gcc/Function-Specific-Option-Pragmas.html> Great! I had never seen it, and the docs in my machine weren't very explicit about that. I apologize for the noise and thank you for pointing me to that doc. jacob
Any difference between gcc 4.3 and 4.1 in exception handling?
I would like to know if the exception handling software has changed between 4.1 and 4.3. I have developed a module to generate gcc compatible dwarf debug info and it works with gcc 4.1. It is in the context of a JIT. Now I have some mysterious crashes with gcc 4.3 under Suse 11. Ubuntu seems to work... Thanks in advance for any info. jacob
Exception handling information in the macintosh
Hi I have developed a JIT for linux 64 bits. It generates exception handling information according to DWARF under linux and it works with gcc 4.2.1. I have recompiled the same code under the Macintosh and something has changed, apparently, because now any throw that passes through my code crashes. Are there any differences bertween the exception info format between the macintosh and linux? The stack at the moment of the throw looks like this: CPP code compiled with gcc 4.2.1 calls JIT code generated on the fly by my JIT compiler that calls CPP code compiled with gcc 4.2.1 that throws. The catch is in the CPP code The throw must go through the JIT code, so it needs the DWARF frame descriptions that I generate. Apparently there is a difference. Thanks in advance for any information. jacob
Re: Exception handling information in the macintosh
Jack Howarth a écrit : On Thu, Feb 04, 2010 at 08:12:10PM +0100, jacob navia wrote: Hi I have developed a JIT for linux 64 bits. It generates exception handling information according to DWARF under linux and it works with gcc 4.2.1. I have recompiled the same code under the Macintosh and something has changed, apparently, because now any throw that passes through my code crashes. Are there any differences bertween the exception info format between the macintosh and linux? The st...@the moment of the throw looks like this: CPP code compiled with gcc 4.2.1 calls JIT code generated on the fly by my JIT compiler that calls CPP code compiled with gcc 4.2.1 that throws. The catch is in the CPP code The throw must go through the JIT code, so it needs the DWARF frame descriptions that I generate. Apparently there is a difference. Thanks in advance for any information. jacob Jacob, Are you compiling on darwin10 and using the Apple or FSF gcc compilers? If you are using Apple's, this question should be on the darwin-devel mailing list instead. I did that. I was compiling with Apple's gcc. Now, I downloaded the source code of gcc 4.2.1 and compiled that in my Mac. The build crashed in the java section by the way, there was a script that supposed the object files in a .libs directory but the objects were in the same directory as the source code. This happened several times, so at the end I stopped since I am not interested in Java. I installed gcc, everything went OK, and I recompiled the source code with the new gcc. Then, in the new executable, the normal throws that have been working under the Apple's gcc do not work anymore and any throw (not only those that go through the JIT) fail. I do not understand what is going on. I would mention though that darwin10 is problematic in that the libgcc and its unwinder calls are now subsumed into libSystem. This means that regardless of how you try to link in libgcc, the new code in libSystem will always be used. For darwin10, Apple decided to default their linker over to compact unwind which causes problems with some of the java testcases on gcc 4.4.x. This is fixed for FSF gcc 4.5 by forcing the compiler to always link with the -no_compact_unwind option. If you use that option I get ld: symbol dyld_stub_binding_helper not defined (usually in crt1.o/dylib1.o/bundle1.o) and Apple's linker refuses to go on. Another complexity is that Apple decided to silently abort some of the libgcc calls (now in libSystem) that require access to FDEs like _Unwind_FindEnclosingFunction(). The reasoning was that the default behavior (compact unwind info) doesn't use FDEs. This is fixed for gcc 4.5 by http://gcc.gnu.org/ml/gcc-patches/2009-12/msg00998.html. If you are using any other unwinder call that is now silently aborting, let me know as it may be another that we need to re-export under a different name from libgcc_ext. Alternatively, you may be able to work around this issue by using -mmacosx-version-min=10.5 under darwin10. Jack OK, now, what would be the procedure for getting to avoid Apple's modifications to the exception handling stuff? Pleeeze :-) P.S. If this discussion does not belong in this list please send me just an email. Thanks for your answers. jacob
Re: Exception handling information in the macintosh
Jack Howarth a écrit : Jacob, Apple's gcc is based on their own branch and is not the same as FSF gcc. The first FSF gcc that is validated on on darwin10 was gcc 4.4. However I would suggest you first start testing against current FSF gcc trunk. There are a number of fixes for darwin10 that aren't present in the FSF gcc 4.4.x releases yet. In particular, the compilers now link with -no_compact_unwind by default on darwin10 to avoid using the new compact unwinder. Also, when you build your JVM, I would suggest you stick to the FSF gcc trunk compilers you build. In particular, the Apple libstdc++ and FSF libstdc++ aren't interchangable on intel. So you don't want to mix c++ code built with the two different compilers. Jack OK. I downloaded gcc 4.4 and recompiled all the server again with it. Now, throws within C++ work but not when they have to pass through the JITted code. The problem is that we need to link with quite a lot of libraries: /usr/local/gcc-4.5/bin/g++ -o Debug/server -m64 -fPIC -fno-omit-frame-pointer -g -O0 -w Debug/service_helper.o Debug/service.o -L../Debug/64 -ldatabaselibrary -lcryptolibrary -lCPThreadLibrary -lshared -lwin32 -L../CryptoLibrary/lib/Darwin/64 -lcrypto -lssl -lsrp -lint128 -ldl -lpthread ../icu/icu3.4/lib/mac/libicui18n.dylib.34 ../icu/icu3.4/lib/mac/libicuuc.dylib.34 ../icu/icu3.4/lib/mac/libicudata.dylib.34 ../CompilerLibrary/mac/libcclib64.a ../Debug/64/libwin32.a What can I do about libssl.a, libdl.a libcrypto.A? Those are system libraries and I do not have the source code. Should I compile those too? I downloaded gcc 4.5 and the situation is the same... jacob
Dynamically generated code and DWARF exception handling
Hi We have an application compiled with gcc written in C++. This application generates dynamically code and executes it, using a JIT, a Just In time Compiler. Everything is working OK until the C++ code generates a throw. To get to the corresponding catch, the runtime should skip through the intermediate frames in assembler generated by the JIT. We would like to know how should be the interface with gcc to do this. We thought for a moment that using sjlj_exceptions it would work, but we fear that this is no longer being maintained. 1) Is this true? 2) Can we use sjlj exception under linux safely? Otherwise, would it be possible to generate the DWARF Tables and add those tables dynamically to the running program? Under windows, Microsoft provides an API for JITs that does exactly that. Is there an equivalent API for linux? Thanks in advancefor any information about this. jacob
Re: Dynamically generated code and DWARF exception handling
Daniel Jacobowitz a écrit : On Tue, May 02, 2006 at 07:21:24PM -0700, Mike Stump wrote: Otherwise, would it be possible to generate the DWARF Tables and add those tables dynamically to the running program? Yes (could require OS changes). Under windows, Microsoft provides an API for JITs that does exactly that. Is there an equivalent API for linux? Don't think so. There isn't really. But I know that other JITs have managed to do this - I just don't know how. They may use a nasty hack somewhere. Maybe there is some references somewhere about this? Which JIT? Is there a source code example or something? Would sljl exceptions work? Please there MUST be some knowledgable people here that can answer questions like this. Thanks in advance
Re: Dynamically generated code and DWARF exception handling
Tom Tromey a écrit : "jacob" == jacob navia <[EMAIL PROTECTED]> writes: jacob> This application generates dynamically code and executes it, using a jacob> JIT, a Just In time Compiler. Everything is working OK until the C++ jacob> code generates a throw. Fun! I looked at this a little bit with libgcj. In some ways for libgcj it is simpler than it is for C++, since in the gcj context we know that the only objects thrown will be pointers. So, if we were so inclined, we could give the JIT its own exception handling approach and have little trampoline functions to handle the boundary cases. Well, there is no exception handling in the generated machine code, and we want not to have any exception handling but to let the C++ exception handling PASS THROUGH our stack frames to find its eventual catch. This means that we have to give the run time function that implements the throw some way of getting into the higher up frames without crashing. The problem is that any throw that encounters intermediate assembler frames will inevitably CRASH. Unfortunately things are also worse for libgcj, in that we need to be able to generate stack traces as well, and the trampoline function approach won't work there. ? Sorry I do not follow here Still, if you know something about the uses of 'throw' in your program, maybe this would work for you. Longer term, yeah, gcc's unwinder needs a JIT API, and then the various JITs need to be updated to use it. At least LLVM appears to be headed this direction. Very interesting but maybe you could be more specific? I browsed that "llvm" and seems a huge project, as libgcj is. To go through all that code to find how they implement this, will be extremely difficult. If you could give me some hints as to where is the needle in those haystacks I would be really thankful. Tom jacob
Re: Dynamically generated code and DWARF exception handling
Andrew Haley a écrit : Richard Henderson writes: > On Tue, May 02, 2006 at 01:23:56PM +0200, jacob navia wrote: > > Is there an equivalent API for linux? > > __register_frame_info_bases / __deregister_frame_info_bases. Are these an exported API? I metioned the existence of these entry points in a reply to Jacob on March 10. Jacob, did you investigate this? Andrew. Well, I searched for those and found some usage examples in the source of Apple Darwin gcc code for the startup. But then... is that current? I have googled around but I find only small pieces of information that may or may not apply to AMD64. ALL of this is extremely confusing and I am getting scared really. This stuff costed me 2 weeks hard work under windows, but somehow I had there an API that I could call. Under linux the stuff is still as complex as in windows (DWARF info generation is not easy) but what scares me is that there is NO API, no standard way of doing this. I have downloaded gcc 4.1 and will try to figure out where in the source I find those functions or where in the binutils source those functions could be hidden. Then, I will try to figure out from the source what they are doing and what they expect. As far as I know, there is no AMD64 specific docs, just ITANIUM docs that *could* be used for AMD64 but nobody knows for sure if those docs apply or they are just obsolete. What a mess people. I am getting wet pants... jacob
Assembler clarification
I can't explain myself what is going on within this lines in the .debug_frame section. Context: AMD64 linux64 system. (Ubuntu) Within the debug_frame section I find the following assembly instructions: .byte0x4 .long.LCFI0-.LFB2 The distance between labels LCFI0 and LFB2 is exactly one byte. I would expect then, that the assembler generates 0x04 (byte 1) 0x01 (byte 2) i.e. TWO bytesw, one with 4 and the other with 1. but I find that the output is 0x41, i.e. the 4 in the highest NIBBLE of a byte and the 1 in the lower nibble. Why? Is this documented somehow? Is there a compressing pass in the debug_frame section thanks
Bug in gnu assembler?
How to reproduce this problem - 1) Take some C file. I used for instance dwarf.c from the new binutils distribution. 2) Generate an assembler listing of this file 3) Using objdump -s dwarf.o I dump all the sections of the executable in hexadecimal format. Put the result of this dump in some file, I used "vv" as name. 4) Dump the contents of the eh_frame section in symbolic form. You should use readelf -W. Put the result in some file, say, "dwarf.framedump" --- OK Now let's start. I go to the assembly listing (dwarf.s) and search for "eh_frame" in the editor. I arrive at: .section.debug_frame,"",@progbits This section consists of a CIE (Common Information Entry in GNU terminology) that is generated as follows in the assembly listing .Lframe0: .long .LECIE0-.LSCIE0 .LSCIE0: .long 0x .byte 0x1 .string "" .uleb128 0x1 .sleb128 -8 .byte 0x10 .byte 0xc .uleb128 0x7 .uleb128 0x8 .byte 0x90 .uleb128 0x1 .align 8 .LECIE0: --- This corresponds to a symbolic listing like this: (file dwarf.framedump) The section .debug_frame contains: 0014 CIE Version: 1 Augmentation: "" Code alignment factor: 1 Data alignment factor: -8 Return address column: 16 DW_CFA_def_cfa: r7 ofs 8 DW_CFA_offset: r16 at cfa-8 DW_CFA_nop DW_CFA_nop DW_CFA_nop DW_CFA_nop DW_CFA_nop DW_CFA_nop This means that this entry starts at offset 0 and goes for 20+4 bytes (the length field is 4 bytes). Our binary dump of the contents of the first 96 bytes (0x60) looks like this: Contents of section .eh_frame: 1400 01000178 100c0708 ...x 0010 9001 1c00 1c00 0020 5900 Y... 0030 410e1083 0200 1c00 3c00 A...<... 0040 6800 h... 0050 410e1083 0200 1400 5c00 A...\... 0060 4e00 N... We eliminate the first 24 (0x18) bytes and we obtain: 0018 1c00 1c00 0020 5900 Y... 0030 410e1083 0200 1c00 3c00 A...<... The is a FDE or Frame description entry in GNU terminology. We have first a 32 bit length field represented by the difference LEFDZ0 - LASDFE0. This is 1c00 above Then we have another .long instruction, (32 bits) that corresponds to the second 1c00 above. Then we have two .quad instructions that correspond to the line 0020 5900 Y... above AND NOW IT BECOMES VERY INTERESTING: We have the instructions .byte0x4 .long.LCFI0 - .LFB50 .byte 0xe .uleb128 0x10 .byte 0x83 .uleb128 0x2 .align 8 And we find in the hexademical dump the line 0030 410e1083 0200 1c00 3c00 A...<... The 4 and the 1 are in the same byte, followed by the correct 0xe byte the correct 0x10 byte (uleb128 is 0x10) followed by the correct 0x83 and followed by the correctd 0x02 byte. WHERE AM I WRONG ? I am getting CRAZY with this Here is the full assembly listing of the FDE: .LSFDE0: .long .LEFDE0-.LASFDE0 /* first field 1c00 */ .LASFDE0: .long .Lframe0 .quad .LFB50 .quad .LFE50-.LFB50 .byte 0x4 .long .LCFI0-.LFB50 .byte 0xe .uleb128 0x10 .byte 0x83 .uleb128 0x2 .align 8
In which library is __register_frame_info defined??
Hi I want to use the function __register_frame_info to dynamically register DWARF2 unwind frames. Where is the library I should link with?? Environment: linux 64 bits Thanks in advance Jacob P.S. I have posted some messages here before, concerning this problem. I had to do a long rewriting of the code generator to adapt it better to the style of code used in lcc64. This done, I have figured out the format of the DWARF2 eh_frame stuff, and now I generate that stuff dynamically. Now, the only thing left is to pass it to __register_frame_info.
Questions regarding __register_frame_info
Hi I have now everything in place for dynamically register the debug frame information that my JIT (Just in time compiler) generates. I generate a CIE (Common information block), followed by a series of FDE (Frame Description Entries) describing each stack frame. The binary code is the same as gcc uses, the contents of my stuff are identical to the contents of the .eh_frame information. There are several of those functions defined in unwind-dw2-fde.c: __register_frame_info __register_frame_info_bases __register_frame_info_table_bases If I use the __register_frame_info stuff, nothing happens and the program aborts. Using __register_frame_jnfo_table_bases seems to work better since it crashes a little bit further with a hard crash. Questions: What is the procedure for registering the frame info? I use following: memset(&ob,0,sizeof(ob)); ob.pc_begin = (void *)-1; ob.tbase = Parms.codebuf; // Machine instructions ob.dbase = Parms.codebuf+myLccParms.text_size; // data of the program ob.s.i = 0; ob.s.b.encoding = 0xff; //DW_EH_PE_omit; __register_frame_info_table_bases(Parms.pUnwindTables,&ob,ob.tbase,ob.dbase); "ob" is an object defined as follows: struct object { void *pc_begin; void *tbase; void *dbase; union { const struct dwarf_fde *single; struct dwarf_fde **array; struct fde_vector *sort; } u; union { struct { unsigned long sorted : 1; unsigned long from_array : 1; unsigned long mixed_encoding : 1; unsigned long encoding : 8; unsigned long count : 21; } b; size_t i; } s; #ifdef DWARF2_OBJECT_END_PTR_EXTENSION char *fde_end; #endif struct object *next; }; From the code of register_frame_info (in file unwind-dw2-fde.c) that function just inserts the new data into a linked list, but it does not do anything more. That is why probably it will never work. Could someone here explain me or tell me what to do exactly to register the frame information? This will be useful for all people that write JITs, for instance the Java people, and many others. Thanks in advance for your help, and thanks for the help this group has provided me already jacob
Bug in the specs or bug in the code?
Hi Bug in the specs or bug in the code? I do not know, but one of this is wrong: In the Linux Standard specs in http://www.freestandards.org/spec/booksets/LSB-Core-generic/LSB-Core-generic/ehframechpt.html it is written in the specification of the FDE (Frame Description Entry) the following: CIE Pointer A 4 byte unsigned value that when subtracted from the offset of the current FDE yields the offset of the start of the associated CIE. This value shall never be 0. So, the offset is from the beginning of the current FDE, the specs say BUT What does the code say? In the file unwind-dw2-fde.h we find: /* Locate the CIE for a given FDE. */ static inline const struct dwarf_cie * get_cie (const struct dwarf_fde *f) { return (void *)&f->CIE_delta - f->CIE_delta; } Note that the first term is &f->CIE_delta and NOT &f as specified by the standard. This fact took me two days of work for finding it out. Either a bug in the code a bug in the specs. The difference is 4 bytes since CIE_delta comes after the length field. Please fix the specs, since if you fix the code everything will go crashing as my program did... jacob
How to insert dynamic code? (continued)
Hi Context: I am writing a JIT and need to register the frame information about the generated program within the context of a larger C++ program compiled with g++. Stack layout is like this: catch established by C++ JITTED code generated dynamically JITTED code JITTED code calls a C++ routine C++ routine calls other C++ routines C++ routine makes a THROW The throw must go past the JITTED code to the established C++ catch. Problem. The stack unwinder stops with END_OF_STACK at the Jitted code. Why? Following the code with the debugger I see that the unwider looks for the next frame using the structures established by the dynamic loader, specifically in the function "__dl_iterate_phdr" in the file "dl-iteratephdr.c" in the glibc. So, this means: 1) I am cooked and what I want to do is impossible. This means I will probably get cooked at work for proposing something stupid like this :-) 2) There is an API or a way of adding at run time a routine to the lists of loaded objects in the same way as the dynamic loader does. PLEEZE do not answer with: "Just look at the code of the dynamic loader!" because I have several megabytes of code to understand already! I am so near the end that it would be a shame to stop now. My byte codes for the DWARF interpreter LOAD into the interpreter successfully, and they are executed OK, what has costed me several weeks of efforts, wading through MBs of code and missing/wrong specs. I just would like to know a way of registering (and deregistering obviously) code that starts at address X and is Y bytes long. JUst that. Thanks in advance guys jacob
Re: How to insert dynamic code? (continued)
Andrew Haley wrote: The way you do not reply to mails replying to your questions doesn't encourage people to help you. Please try harder to answer. I did answer last time but directly to the poster that replied, and forgot to CC the list. Excuse me for that. I suspect that the gcc unwinder is relying on __dl_iterate_phdr to scan the loaded libraries and isn't using the region that you have registered. But this is odd, becasue when I look at _Unwind_Find_FDE in unwind-dw2-fde-glibc.c, I see: ret = _Unwind_Find_registered_FDE (pc, bases); ... if (dl_iterate_phdr (_Unwind_IteratePhdrCallback, &data) < 0) return NULL; So, it looks to me as though we do call _Unwind_Find_registered_FDE first. If you have registered your EH data, it should be found. OK, so I have to look there then. Actually this is good news because figuring out how to mess with the dynamic loader data is not something for the faint of heart :-) So, what happens when _Unwind_Find_registered_FDE is called? Does it find the EH data you have registered? Yes but then it stops there instead of going upwards and finding the catch! It is as my insertion left the list of registered routines in a bad state. I will look again at this part (the registering part) and will try to find out what is going on. Thanks for yourt answer. If you are right this is a very GOOD news! jacob
Re: Bug in the specs or bug in the code?
Daniel Jacobowitz wrote: On Thu, Jul 13, 2006 at 04:46:19PM +0200, jacob navia wrote: In the Linux Standard specs in http://www.freestandards.org/spec/booksets/LSB-Core-generic/LSB-Core-generic/ehframechpt.html it is written in the specification of the FDE (Frame Description Entry) the following: I suggest you report this problem to the LSB, since they wrote that documentation. The documentation is incorrect. Mmmm "report this problem to the LSB". Maybe you have someone there I could reach? An email address? There is no "feedback" or "bugs" button in their page. Thanks
Re: How to insert dynamic code? (continued)
Daniel Jacobowitz wrote: On Thu, Jul 13, 2006 at 05:06:25PM +0200, jacob navia wrote: So, what happens when _Unwind_Find_registered_FDE is called? Does it find the EH data you have registered? Yes but then it stops there instead of going upwards and finding the catch! It is as my insertion left the list of registered routines in a bad state. I will look again at this part (the registering part) and will try to find out what is going on. It sounds to me more like it used your data, and then was left pointing somewhere garbage, not to the next frame. That is, it sounds like there's something wrong with your generated unwind tables. That's the usual cause for unexpected end of stack. Yeah... My fault obviously, who else? Problem is, there are so mny undocumented stuff that I do not see how I could avoid making a mistake here. 1) I generate exactly the same code now as gcc: Prolog: push %ebp movq %rsp,%rbp subqxxx,%rsp and I do not touch the stack any more. Nothing is pushed, in the "xxx" is already the stack space for argument pushing reserved, just as gcc does. This took me 3 weeks to do. Now, I write my stuff as follows: 1) CIE 2) FDE for function 1 . 1 fde for each function 3) Empty FDE to zero terminate the stuff. 4) Table of pointers to the CIE, then to the FDE p = result.FunctionTable; // Starting place, where CIE, then FDEs are written p = WriteCIE(p); // Write first the CIE pFI = DefinedFunctions; nbOfFunctions=0; pFdeTable[nbOfFunctions++] = result.FunctionTable; while (pFI) { // For each function, write the FDE fde_start = p; p = Write32(0,p); // reserve place for length field (4 bytes) p = Write32(p - result.FunctionTable,p); //Write offset to CIE symbolP = pFI->FunctionInfo.AssemblerSymbol; adr = (long long)symbolP->SymbolValue; adr += (unsigned long long)code_start; // code_start is the pointer to the Jitted code p = Write64(adr,p); p = Write64(pFI->FunctionSize,p); // Write the length in bytes of the function *p++ = 0x41;/// Write the opcodes *p++ = 0x0e; // This opcodes are the same as gcc writes *p++ = 0x10; *p++ = 0x86; *p++ = 0x02; *p++ = 0x43; *p++ = 0x0d; *p++ = 0x06; p = align8(p); Write32((p - fde_start)-4,fde_start);// Fix the length of the FDE pFdeTable[nbOfFunctions] = fde_start; // Save pointer to it in table nbOfFunctions++; pFI = pFI->Next; // loop } The WriteCIE function is this: static unsigned char *WriteCIE(unsigned char *start) { start = Write32(0x14,start); start = Write32(0,start); *start++ = 1; // version 1 *start++ = 0; // no augmentation *start++ = 1; *start++ = 0x78; *start++ = 0x10; *start++ = 0xc; *start++ = 7; *start++ = 8; *start++ = 0x90; *start++ = 1; *start++ = 0; *start++ = 0; start = Write32(0,start); return start; } I hope this is OK... jacob
Re: How to insert dynamic code? (continued)
Seongbae Park wrote: The above code looks incorrect, for various reasons, not the least of which is that you're assuming CIE/FDE are fixed-length. This is a trivial thing I will add later. There are various factors that affect FDE/CIE depending on PIC/non-PIC, C or C++, 32bit/64bit, etc - some of them must be invariant for your JIT but some of them may not. I generate always the same prologue for exactly this reason: I do not want to mess with this stuff. Also some of the datum are encoded as uleb128 (see dwarf spec for the detail of LEB128 encoding) which is a variable-length encoding whose length depends on the value. For this values the uleb128 and leb128 routines produce exactly the values shown. In short, you'd better start looking at how CIE/FDE structures are *logically* layed out - otherwise you won't be able to generate correct entries. So far I have understood what those opcodes do, and are the same as gcc. Please try to understand my situation and find the bug ( or where the bug could be). It is not in here? I mean changing *p++ = 1; or p = encodeuleb128(1,p); is *the same* in this context.
JIT exception handling
This is just to tell you that now it is working. I have suceeded in making my JIT generate the right tables for gcc As it seems, both gcc 4.1 and gcc 3.3 seem to work OK. Can anyone confirm this? There isn't any difference between gcc-3.x and gcc4.x at this level isn't it? jacob
Re: JIT exception handling
Andrew Haley a écrit : jacob navia writes: > This is just to tell you that now it is working. > > I have suceeded in making my JIT generate the right tables for gcc Excellent. > As it seems, both gcc 4.1 and gcc 3.3 seem to work OK. > Can anyone confirm this? That they work OK? No, you are the only person who has done this. > There isn't any difference between gcc-3.x and gcc4.x at this > level isn't it? There have been changes in this area, but they shouldn't affect compatibility. It would be nice if you told us what you did to make it work. Andrew. Well, remember that I posted here that the lsb specs had a bug? I did post the bug, but *I did not correct the code* !!! Can you imagine something more stupid than that? There was a point in my code where I did not correct for that mistake, that is all. I followed the code in the debugger (after finishing building all the debug libraries needed) and I noticed it. Corrected it, and it worked. jacob
New version of gnu assembler
Hi I have developed a new version of the gnu assembler for riscv machines. Abstract: ——— The GNU assembler (gas) is centered on flexibility and portability. These two objectives have quite a cost in program readability, code size and execution time. I have developed a « tiny » version of the GNU assembler focusing on simplicity and speed. I have picked up from the several hundreds of megabytes of binutils just the routines that are needed to a functional assembler, for the use case of compiler generated assembler text for a single machine. That meant: 1) There is no linker code in this assembler. An assembler doesn’t need any linker code. It is an assembler, period. 2) There are no macros, no preprocessing, nothing that makes an assembler easier to use for a human developer. This is NOT a replacement of gas, that is obviously still available everywhere. If you want to develop in assembler use gas, not this tiny assembler. 3) Since there isn’t a human user, all the sophisticated error handling is not necessary. Messages are in English ONLY and if you do not know that language just do not make any mistakes! 4) All the vectorization for separating the front end and the backend are eliminated. There is no indirection through function tables the functions in the backend are called directly. This has the advantage that when you see a function call like statement like foo(42); it means that you are calling the « foo » function, not a macro that is expanded into something else then renamed to yet another name. 5) The BFD library has been disabled. Only some procedures of that library are in the code. The same for libierty, that has almost vanished. 6) The code has been cleaned up from all cruft like this: /* The magic number BSD_FILL_SIZE_CROCK_4 is from BSD 4.2 VAX * flavoured AS. The following bizarre behaviour is to be * compatible with above. I guess they tried to take up to 8 * bytes from a 4-byte expression and they forgot to sign * extend. */ #define BSD_FILL_SIZE_CROCK_4 (4) So, we are still in 2023 keeping bug compatibility with an assembler for a machine that ceased production in 2000? In a similar vein, all code that referenced the Motorola 68000 (an even older machine) the Z80, the SUN SPARC, etc is gone. This assembler will only produce 64 bits ELF code and compile for a 64 bit risk CPU. Availability: $ git clone https://github.com/jacob-navia/tiny-asm Building the tiny assembler: $ gcc -o asm asm.c There is no Makefile In some machines, the obstack library is not a part of the libc. (Not linux, Apple, for instance). For those machines obtsack.c is provided in the distribution and the compilation command should be: $ gcc -o asm asm.c obstack.c star64:~/riscv-asm$ objdump -h asm | grep text 11 .text 0002e53e 00028060 00028060 00028060 2**2 Just 189 758 bytes. The gnu assembler is: star64:~/riscv-asm$ objdump -h ../binutils-gdb/gas/as-new | grep text 11 .text 000d8d10 000465a0 000465a0 000465a0 2**2 888 080 bytes. Further work: The idea is to replace the system assembler in gcc and replaced with a linked assembler that speeds gcc: instead of writing an assembler file you just pass a pointer to the text buffer in memory. But that is still much further down the road. Enjoy! jacob
Tiny asm
Dear Friends: 1) I have (of course) kept your copyright notice at the start of the « asm.h » header file of my project. 2) I have published my source code using your GPL V3 license I am not trying to steal you anything. And I would insist that I have great respect for the people working with gcc. In no way I am trying to minimize their accomplishments. What happens is that layers of code produced by many developers have accumulated across the years, like the dust in the glass shelf of my grand mother back home. Sometimes in spring she would clean it. I am doing just that. That said, now I have some questions: 1) What kind of options does gcc pass to its assembler? Is there in the huge source tree of gcc a place where those options are emitted? This would allow me to keep only those options into tiny-asm and erase all others (and the associated code) 2) I have to re-engineer the output of assembler instructions. Instead of writing to an assembler file (or to a memory assembler file) I will have to convince gcc to output into a buffer, and will pass the buffer address to the assembler. So, instead of outputting several MBs worth of assembler instructions, we would pass only 8 bytes of a buffer address. If the buffer is small (4K, for instance), it would pass into the CPU cache. Since the CPU cache is 16KB some of it may be kept there. 3) To do that, I need to know where in the back end source code you are writing to disk. Thanks for your help, and thanks to the people that posted encouraging words. jacob
Tiny asm (continued)
Hi The assembler checks at each instruction if the instruction is within the selected subset of risc-v extensions or not. I do not quite understand why this check is done here. I suppose that gcc, before emitting any instruction does this check too, somewhere. Because if an instruction is emitted to the assembler and the assembler rejects it, there is no way to pass that information back to the compiler, and emitting an obscure error message about some instruction not being legal will not help the user at all that probably doesn’t know any assembler language. I would like to drop this test in tiny-asm, but I am not 100% sure that it is really redundant. The checks are expensive to do, and they are done at EACH instruction... In the other hand, if the assembler doesn’t catch a faulty instruction, the user will know that at runtime (maybe) with an illegal instruction exception or similar… That would make bugs very difficult to find. Question then: can the assembler assume that gcc emits correct instructions? Thanks in advance for your attention. Jacob
Suspicious code
Consider this code: 1202 static fragS * get_frag_for_reloc (fragS *last_frag, 1203 const segment_info_type *seginfo, 1204 const struct reloc_list *r) 1205 { 1206 fragS *f; 1207 1208 for (f = last_frag; f != NULL; f = f->fr_next) 1209 if (f->fr_address <= r->u.b.r.address 1210 && r->u.b.r.address < f->fr_address + f->fr_fix) 1211 return f; 1212 1213 for (f = seginfo->frchainP->frch_root; f != NULL; f = f->fr_next) 1214 if (f->fr_address <= r->u.b.r.address 1215 && r->u.b.r.address < f->fr_address + f->fr_fix) 1216 return f; 1217 1218 for (f = seginfo->frchainP->frch_root; f != NULL; f = f->fr_next) 1219 if (f->fr_address <= r->u.b.r.address 1220 && r->u.b.r.address <= f->fr_address + f->fr_fix) 1221 return f; 1222 1223 as_bad_where (r->file, r->line, 1224 _("reloc not within (fixed part of) section")); 1225 return NULL; 1226 } This function consists of 3 loops: 1208-1211, 1213 to 1216 and 1218 to 1221. Lines 1213 - 1216 are ALMOST identical to lines 1218 to 1221. The ONLY difference that I can see is that the less in line 1215 is replaced by a less equal in line 1220. But… why? This code is searching the fragment that contains a given address in between the start and end addresses of the frags in question, either in the fragment list given by last_frag or in the list given by seginfo. To know if a fragment is OK you should start with the given address and stop one memory address BEFORE the limit given by fr_address + f->fr_fix. That is what the first two loops are doing. The third loop repeats the second one and changes the less to less equal, so if fr_address+fr_fix is one MORE than the address it will still pass. Why it is doing that? If that code is correct, it is obvious that we could merge the second and third loops and put a <= in t he second one and erase the third one… UNLESS priority should be given to matches that are less and not less equal, what seems incomprehensible … to me. This change was introduced on Aug 18th 2011 by Mr Alan Modra with the rather terse comment: "(get_frag_for_reloc): New function. ». There are no further comments in the code at all. This code is run after all relocations are fixed just before the software writes them out. The code is in file « write.c » in the gas directory. Note that this code runs through ALL relocations lists each time for EACH relocation, so it is quite expensive. In general the list data structure is not really optimal here but that is another story. Thanks in advance for your help. Jacob
Calculating cosinus/sinus
Hi When caculating the cos/sinus, gcc generates a call to a complicated routine that takes several thousand instructions to execute. Suppose the value is stored in some XMM register, say xmm0 and the result should be in another xmm register, say xmm1. Why it doesn't generate: movsd%xmm0,(%rsp) fldl (%rsp) fsin fstpl(%rsp) movsd(%rsp),%xmm1 My compiler system (lcc-win) is generating that when optimizations are ON. Maybe there are some flags in gcc that I am missing? Òr there is some other reason? Thanks in advance for your attention. jacob
Re: Calculating cosinus/sinus
Le 11/05/13 11:20, Oleg Endo a écrit : Hi, This question is not appropriate for this mailing list. Please take any further discussions to the gcc-help mailing list. On Sat, 2013-05-11 at 11:15 +0200, jacob navia wrote: Hi When caculating the cos/sinus, gcc generates a call to a complicated routine that takes several thousand instructions to execute. Suppose the value is stored in some XMM register, say xmm0 and the result should be in another xmm register, say xmm1. Why it doesn't generate: movsd%xmm0,(%rsp) fldl (%rsp) fsin fstpl(%rsp) movsd(%rsp),%xmm1 My compiler system (lcc-win) is generating that when optimizations are ON. Maybe there are some flags in gcc that I am missing? These optimizations are usually turned on with -ffast-math. You also have to make sure to select the appropriate CPU or architecture type to enable the usage of certain instructions. For more information see: http://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html Cheers, Oleg Sorry but I DID try: gcc -ffast-math -S -O3 -finline-functions tsin.c Will generate a call to sin() and NOT the assembly above. AndI DID look at the options page.
Re: Calculating cosinus/sinus
Le 11/05/13 11:30, Marc Glisse a écrit : On Sat, 11 May 2013, jacob navia wrote: Hi When caculating the cos/sinus, gcc generates a call to a complicated routine that takes several thousand instructions to execute. Suppose the value is stored in some XMM register, say xmm0 and the result should be in another xmm register, say xmm1. Why it doesn't generate: movsd%xmm0,(%rsp) fldl (%rsp) fsin fstpl(%rsp) movsd(%rsp),%xmm1 My compiler system (lcc-win) is generating that when optimizations are ON. Maybe there are some flags in gcc that I am missing? Òr there is some other reason? fsin is slower and less precise than the libc SSE2 implementation. Excuse me but: 1) The fsin instruction is ONE instruction! The sin routine is (at least) thousand instructions! Even if the fsin instruction itself is "slow" it should be thousand times faster than the complicated routine gcc calls. 2) The FPU is at 64 bits mantissa using gcc, i.e. fsin will calculate with 64 bits mantissa and NOT only 53 as SSE2. The fsin instruction is more precise! I think that gcc has a problem here. I am pointing you to this problem, but please keep in mind I am no newbee... jacob
Re: Calculating cosinus/sinus
Le 11/05/13 16:01, Ondřej Bílka a écrit : As 1) only way is measure that. Compile following an we will see who is rigth. cat " #include int main(){ int i; double x=0; double ret=0; double f; for(i=0;i<1000;i++){ ret+=sin(x); x+=0.3; } return ret; } " > sin.c OK I did a similar thing. I just compiled sin(argc) in main. The results prove that you were right. The single fsin instruction takes longer than several HUNDRED instructions (calls, jumps table lookup what have you) Gone are the times when an fsin would take 30 cycles or so. Intel has destroyed the FPU. But is this the case in real code? The results are around 2 seconds for 100 million sin calculations and 4 seconds for the same calculations doing fsin. But the code used for the fsin solutions is just a few bytes, compared to the several hundred bytes of the sin function, not counting the table lookups. In the benchmark code all that code/data is in the L1 cache. In real life code you use the sin routine sometimes, and the probability of it not being in the L1 cache is much higher, I would say almost one if you do not do sin/cos VERY often. For the time being I will go on generating the fsin code. I will try to optimize Moshier's SIN function later on. I suppose this group is for asking this kind of questions. I thank everyone that answered. Yours sincerely Jacob, the developer of lcc-win (http://www.cs.virginia.edu/~lcc-win32)
Sources required...
Hi Looking at the code generated by the riscv backend: Consider this C source code: void shup1(QfloatAccump x) { QELT newbits,bits; int i; bits = x->mantissa[9] >> 63; x->mantissa[9] <<= 1; for( i=8; i>0; i-- ) { newbits = x->mantissa[i] >> 63; x->mantissa[i] <<= 1; x->mantissa[i] |= bits; bits = newbits; } x->mantissa[0] <<= 1; x->mantissa[0] |= bits; } This code is shifting a $64\times 10\rightarrow640$ bits right by 1 position. The algorithm is simple: save the highest bit, do the shift, and introduce the bits of the previous position at the least significant position. When compiling with gcc the generated code looks extremely weird. Instead of loading a 64 bit number into some register, doing the operation, then storing the result into memory, gcc does the following: 1) Load the 64 bit number byte by byte into 8 different registers. Each 64 bit register contains only one byte. 2) ORing the 8 registers together into a 64 bit number 3) Doing the 64 bit operation 4) Splitting the result into 8 different registers 5) Storing the 8 different bytes one by one. Obviously, I thought that this is a serious bug in gcc. I was going to write that bug report but I had the reflex of rewriting that function using reasonable assembly like this: 1) Loading 64 bits into 10 different registers 2) Doing the operations 3) Storing 64 bits at a time. The results are /catastrophic/ Instead of increasing performance, there is a slow down of several times compared to the performance of gcc. Now, my question is: Where did you get this information from? Because I can’t believe that by « trial and error » you arrived at that weird way of doing things. There must be some document that pointed you to the right solution. Can you share that information with the public? Thanks in advance. Jacob sipeed@lpi4a:~/lcc/qlibriscv$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/riscv64-linux-gnu/13/lto-wrapper Target: riscv64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 13.2.0-4revyos1' --with-bugurl=file:///usr/share/doc/gcc-13/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-13 --program-prefix=riscv64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/libexec --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --disable-multilib --with-arch=rv64gc --with-abi=lp64d --enable-checking=release --build=riscv64-linux-gnu --host=riscv64-linux-gnu --target=riscv64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=16 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.2.0 (Debian 13.2.0-4revyos1)
Riscv code generation
Hi In a previous post I pointed to a strange code generation`by gcc in the riscv-64 targets. To resume: Suppose a 64 bit operation: c = a OP b; Gcc does the following: Instead of loading 64 bits from memory gcc loads 8 bytes into 8 separate registers for both operands. Then it ORs the 8 bytes into a single 64 bit number. Then, it executes the 64 bit operation. And lastly, it splits the 64 bits result into 8 bytes into 8 different registers, and stores this 8 bytes one after the other. When I saw this I was impressed that that utterly bloated code did run faster than a hastyly written assembly program I did in 10 minutes. Obviously I didn’t take any pipeline turbulence into account and my program was slower. When I did take pipeline turbulence into account, I managed to write a program that runs several times faster than the bloated code. You realize that for the example above, instead of 1) Load 64 bits into a register (2 operations) 2) Do the operation 3) Store the result We have 2 loads, and 1 operation + a store. 4 instructions compared to 46 operations for the « gcc way » (16 loads of a byte, 14 x 2 OR operations and 8 shifts to split the result and 8 stores of a byte each. I think this is a BUG, but I’m still not convinced that it is one, and I do not have a clue WHY you do this. Is here anyone doing the riscv backend? This happens only with -O3 by the way Sample code: #define ACCUM_MENGTH 9 #define WORDSIZE 64 Typedef struct { Int sign, exponent; Long long mantissa[ACCUM_LENGTH]; } QfloatAccum,*QfloatAccump; void shup1(QfloatAccump x) { QELT newbits,bits; int i; bits = x->mantissa[ACCUM_LENGTH] >> (WORDSIZE-1); x->mantissa[ACCUM_LENGTH] <<= 1; for( i=ACCUM_LENGTH-1; i>0; i-- ) { newbits = x->mantissa[i] >> (WORDSIZE - 1); x->mantissa[i] <<= 1; x->mantissa[i] |= bits; bits = newbits; } x->mantissa[0] <<= 1; x->mantissa[0] |= bits; } Please point me to the right person. Thanks
Problem solved
Hi I have foujnd the reason for the weird behavior of gcc when reading 64 bits data. I found out how to avoid this. The performance of the generated code doubled. I thank everyone in this forum for their silence to my repeated help requests. They remind me that: THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. Jacob
Mystery instructions
Hi The GNU assembler supports two instructions for the T-Head risk machines called: th.ipop th.ipush With no arguments. These instructions (they are no macros or aliases) are UNDOCUMENTED in the T-Head instruction manuals that I have, and a google search yields absolutely nothing. Can anyone here point me to some documentation that describes what these instructions do? Thanks in advance.
Re: Mystery instructions
Well, the pdf I have dates from 2022, but it hasn’t any reference to those instructions. But that link works, THANKS A LOT! I will now use that documentation. Jacob P.S. I am going through EACH one of the instructions in the instruction table, and documenting what it does, syntax, abstract arguments and mode of operation. > Le 23 janv. 2024 à 15:16, Alex Huang a écrit : > > Hi, > > These instructions looks to be part of the T-head vendor extension > instruction set. With the spec here > https://github.com/T-head-Semi/thead-extension-spec. > > Best regards > Alex > >> On Jan 23, 2024, at 8:42 AM, jacob navia via Gcc wrote: >> >> Hi >> The GNU assembler supports two instructions for the T-Head risk machines >> called: >> >> th.ipop >> th.ipush >> >> With no arguments. These instructions (they are no macros or aliases) are >> UNDOCUMENTED in the T-Head instruction manuals that I have, and a google >> search yields absolutely nothing. >> >> Can anyone here point me to some documentation that describes what these >> instructions do? >> >> Thanks in advance.