[Bug target/53975] New: [ia64] Target register of a speculative load moved to a branch register prior to the chk.s instruction
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53975 Bug #: 53975 Summary: [ia64] Target register of a speculative load moved to a branch register prior to the chk.s instruction Classification: Unclassified Product: gcc Version: 4.7.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: ja...@jermar.eu When building HelenOS/ia64 with gcc 4.7.1, we hit a problem in this code sequence: 4404570: 09 70 00 42 38 10 [MMI] ld8.s r14=[r33] 4404576: a0 04 90 70 20 20 ld8.s r74=[r36] 440457c: 19 00 00 90 mov r73=1;; 4404580: 11 38 00 10 06 39 [MIB] cmp.eq p7,p6=0,r8 4404586: 60 70 04 80 83 03 mov b6=r14 440458c: a0 fd ff 4b (p07) br.cond.dpnt.few 4404320 ;; 4404590: c2 40 8a 19 00 21 [MII] (p06) adds r72=98,r12 4404596: 00 00 00 02 00 40 nop.i 0x0;; 440459c: a3 04 00 01 chk.s.i r74,4404730 44045a0: 08 b8 38 00 40 04 [MMI] chk.s.m r14,4404710 44045a6: 00 00 00 02 00 00 nop.m 0x0 44045ac: 00 00 04 00 nop.i 0x0 44045b0: 11 00 00 00 01 00 [MIB] nop.m 0x0 44045b6: 00 00 00 02 00 00 nop.i 0x0 44045bc: 68 00 80 10 br.call.sptk.many b0=b6;; Note that r14 is loaded speculatively and thus can have its NaT bit set as a result of a deferred exception. This is checked and, if necessary, recovered by the chk.s instruction at 0x44045a0. The problem seems to be that r14 is moved to b6 at 0x4404586 prior to the chk.s instruction. If r14 indeed has the NaT bit set, the instruction will generate the NaT Consumption vector exception. Seems to me that the mov b6=r14 instruction should have been scheduled only after the chk.s instruction when we are sure r14's NaT is cleared.
[Bug target/53975] [ia64] Target register of a speculative load moved to a branch register prior to the chk.s instruction
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53975 --- Comment #5 from Jakub Jermar 2012-07-16 06:18:55 UTC --- (In reply to comment #1) > Without a test case on a platform that is supported in GCC, there isn't much > anyone can do to help. Can you reproduce this on linux or hpux? > > BTW, are there plans to contribute HelenOS support to FSF GCC? Well, we are squatting on ia64-pc-linux-gnu-gcc to build HelenOS, so this is reproducible on Linux (no HelenOS-specific support exists). The testcase is our 'image.boot' loader program, but it's kind of huge. Even the function which contains this issue, i.e. printf_core(), is pretty huge and inlines lots of other functions.
[Bug target/53975] [ia64] Target register of a speculative load moved to a branch register prior to the chk.s instruction
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53975 --- Comment #6 from Jakub Jermar 2012-07-16 06:28:29 UTC --- (In reply to comment #4) > Ah, of course the "move branch register" instruction faults if the NaT bit of > the source register is set. So the recovery code is irrelevant, and this could > be a GCC bug. Need a test case to investigate, though... Exactly. The problem is that the NaT bit cannot propagate any further when the new destination is a branch register and so the exception can no longer remain deferred. As for the test case, once you have the toolchain in place, the easiest way to reproduce this is simply to build HelenOS and disassemble the image.boot binary around the addresses above. I'd be more than happy to provide assistance with this. If tinkering with the entire HelenOS is not plausible, I can try to separate at least the printf_core() into a separately buildable testcase.
[Bug target/53975] [ia64] Target register of a speculative load moved to a branch register prior to the chk.s instruction
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53975 --- Comment #7 from Jakub Jermar 2012-07-16 21:25:47 UTC --- Created attachment 27805 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=27805 Reproducible testcase Here is a trimmed down reproducible testcase which exhibits the problem. The tarball contains a README file with instructions.
[Bug target/53975] [ia64] Target register of a speculative load moved to a branch register prior to the chk.s instruction
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53975 --- Comment #11 from Jakub Jermar 2012-07-19 18:06:16 UTC --- For HelenOS, we started to hit this only after we switched from gcc 4.6.3 to 4.7.1.
[Bug target/66660] [ia64] Speculative load not checked before use, leading to a NaT Consumption Vector interruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=0 --- Comment #7 from Jakub Jermar --- Thank you, I will get back to you once I am done with the test.
[Bug target/66660] [ia64] Speculative load not checked before use, leading to a NaT Consumption Vector interruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=0 --- Comment #8 from Jakub Jermar --- Yep, verified the patch fixes the problem (and reverting the patch reintroduces it) with gcc 5.3.0 and mainline HelenOS. Thanks again!
[Bug target/66660] [ia64] Speculative load not checked before use, leading to a NaT Consumption Vector interruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=0 --- Comment #2 from Jakub Jermar --- Has there been any progress on this front?
[Bug target/66660] [ia64] Speculative load not checked before use, leading to a NaT Consumption Vector interruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=0 --- Comment #4 from Jakub Jermar --- Thanks for looking into this. Let me know if you need to verify the fix, when it's ready.
[Bug c/66660] New: [ia64] Speculative load not checked before use, leading to a NaT Consumption Vector interruption
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=0 Bug ID: 0 Summary: [ia64] Speculative load not checked before use, leading to a NaT Consumption Vector interruption Product: gcc Version: 5.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: jakub at jermar dot eu Target Milestone: --- Created attachment 35848 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35848&action=edit Reproducible test case. I hit what appears to be a bug in handling speculative loads on Itanium in GCC 5.1.0. I am going to attach a simplified testcase derived from the latest HelenOS sources, which reproduces the issue. In the Ski IA-64 simulator running HelenOS, I can see the following sequence of events wrt. r23: 4407ff0: 09 b0 00 00 00 21 [MMI] mov r22=r0 4407ff6: 70 01 88 70 20 00 ld8.s r23=[r34] <= speculative load, sets the NaT bit for r23 4407ffc: e0 08 2a 00 mov.i ar.lc=14;; 4408000: 02 00 00 00 01 00 [MII] nop.m 0x0 4408006: 60 01 58 22 00 00 zxt2 r22=r22;; 440800c: 82 78 20 79 shl r16=r8,r15 4408010: 03 00 00 00 01 00 [MII] nop.m 0x0 4408016: f0 78 38 80 3c e0 shr.u r15=r14,r15;; 440801c: 01 78 44 00 zxt2 r15=r15;; 4408020: 18 70 fc 21 3f 23 [MMB] adds r14=-1,r16 4408026: 00 78 74 10 23 00 st2 [r29]=r15 440802c: 00 00 00 20 nop.b 0x0 4408030: 01 00 50 34 98 11 [MII] st8 [r26]=r20 4408036: 50 11 3c 3c 29 60 extr.u r21=r15,1,31 440803c: 12 78 b0 80 and r19=1,r15;; 4408040: 0b c8 64 1c 0c 20 [MMI] and r25=r25,r14;; 4408046: 00 00 00 02 00 c0 nop.m 0x0 440804c: 01 c8 44 00 zxt2 r14=r25;; 4408050: 08 00 00 00 01 00 [MMI] nop.m 0x0 4408056: 60 72 98 00 40 00 add r38=r14,r38 440805c: 00 00 04 00 nop.i 0x0 4408060: 03 70 00 22 38 10 [MII] ld8.s r14=[r17] 4408066: 00 00 00 02 00 20 nop.i 0x0;; 440806c: 05 30 59 00 sxt4 r41=r38;; 4408070: 08 00 00 00 01 00 [MMI] nop.m 0x0 4408076: 00 00 00 02 00 00 nop.m 0x0 440807c: 00 00 04 00 nop.i 0x0 4408080: 09 80 00 24 38 10 [MMI] ld8.s r16=[r18] 4408086: 00 00 00 02 00 c0 nop.m 0x0 440808c: 00 a0 1c e4 cmp.eq p6,p7=0,r20;; 4408090: f1 a0 fc 29 3f 23 [MIB] (p07) adds r20=-1,r20 4408096: 00 00 00 02 80 03 nop.i 0x0 440809c: a0 00 00 43 (p07) br.cond.dpnt.few 4408130 ;; ... 4408130: 01 00 00 00 01 00 [MII] nop.m 0x0 4408136: f0 00 54 22 00 00 zxt2 r15=r21 440813c: 32 b1 38 80 or r16=r19,r22;; 4408140: 00 00 3c 3a 88 11 [MII] st2 [r29]=r15 4408146: 60 81 f8 9c 29 00 dep.z r22=r16,1,15 440814c: 02 80 44 00 zxt2 r16=r16 4408150: 18 00 50 34 98 11 [MMB] st8 [r26]=r20 4408156: 00 00 00 02 00 00 nop.m 0x0 440815c: 00 00 00 20 nop.b 0x0 4408160: 11 00 00 00 01 00 [MIB] nop.m 0x0 4408166: 70 b9 60 00 40 00 add r23=r23,r24 <= NaT bit not yet consumed 440816c: 30 00 00 40 br.few 4408190 ;; ... 4408190: 10 00 00 00 01 00 [MIB] nop.m 0x0 4408196: 60 01 58 22 00 00 zxt2 r22=r22 440819c: 00 00 00 20 nop.b 0x0 44081a0: 09 b8 00 2e 08 10 [MMI] ld2 r23=[r23] <= NaT consumption vector In short, after doing the speculative load to r23, a conditional branch is taken to code path which uses the speculatively-loaded register without first running chk.s on it. So if the register has a NaT bit set (such as after a deferred exception), there is going to be the NaT consumption vector interruption on address 44081a0 above. The issue did not show with GCC 4.8.1, but shows with GCC 5.1.0. Don't know about the versions in between. The issue is also somewhat similar to an already fixed Bug ID #53975. The same workaround applies, i.e. -fno-selective-scheduling -fno-selective-scheduling2 will prevent the bug from occurring.
[Bug inline-asm/36558] New: "V" inline asm constraint not working on sparc64
The following program: ---&<--- static volatile long last; main() { long a, b; a = last; b = a + 1; asm volatile ("casx %0, %2, %1\n" : "+V" (last), "+r" (b) :"r" (a)); } --->&--- will not compile. The error message printed is: [EMAIL PROTECTED]:~/x$ /usr/local/sparc64/bin/sparc64-linux-gnu-gcc -c test.c test.c: In function 'main': test.c:9: error: inconsistent operand constraints in an 'asm' Even though the use of the "m" constraint will fix this testcase, "m" cannot be used in general, because it allows the operand to be offsetable. The casx instruction will not tolerate an offset. According to gcc info page, "V" should be just like "m", but not offsetable. Wonder why "V" does not work when "m" does in this case. Moreover, there are more or less ugly workarounds for this, but this has been bugging me for some time and I think it should be fixed. The gcc has been configured using: configure --target=sparc64-linux-gnu --prefix=/usr/local/sparc64 --program-prefix=sparc64-linux-gnu- --with-gnu-as --with-gnu-ld --disable-nls --disable-threads --enable-languages=c,objc,c++,obj-c++ --disable-multilib --disable-libgcj --without-headers --disable-shared Thanks, Jakub -- Summary: "V" inline asm constraint not working on sparc64 Product: gcc Version: 4.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: inline-asm AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: jakub at jermar dot eu GCC build triplet: 4.3.1 GCC host triplet: i486-linux-gnu GCC target triplet: sparc64-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36558
[Bug target/112604] New: [ia64] Output register not preserved after a branch is not taken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112604 Bug ID: 112604 Summary: [ia64] Output register not preserved after a branch is not taken Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: jakub at jermar dot eu Target Milestone: --- After an upgrade to GCC 13.2 the HelenOS VFS server started to crash on IA-64 (see http://www.helenos.org/ticket/864). I looked into the issue and the problem seems to be that in the following code snippet: fibril_mutex_lock(&vfs_data->lock); if (!vfs_data->files) { vfs_data->files = malloc(VFS_MAX_OPEN_FILES * sizeof(vfs_file_t *)); if (!vfs_data->files) { fibril_mutex_unlock(&vfs_data->lock); return false; } memset(vfs_data->files, 0, VFS_MAX_OPEN_FILES * sizeof(vfs_file_t *)); } fibril_mutex_unlock(&vfs_data->lock); The output argument prepared for the possible call to malloc (1024 in this case) destroys the argument for fibril_mutex_unlock() if the branch is not taken. In assembly it looks like this: 40001a00 <_vfs_fd_alloc>: 40001a00: 08 48 39 18 80 05 [MMI] alloc r41=ar.pfs,14,12,0 40001a06: c0 02 80 00 42 00 mov r44=r32 40001a0c: 05 00 c4 00 mov r40=b0 40001a10: 09 38 01 41 00 21 [MMI] adds r39=64,r32 40001a16: a0 02 04 00 42 40 mov r42=r1 40001a1c: 04 10 41 00 zxt1 r34=r34;; 40001a20: 11 28 fd 01 00 24 [MIB] mov r37=127 40001a26: b0 02 04 65 00 00 mov.i r43=ar.lc 40001a2c: e8 b4 01 50 br.call.sptk.many b0=4001cf00 ;; 40001a30: 08 60 01 40 00 21 [MMI] mov r44=r32 40001a36: e0 00 9c 30 20 20 ld8 r14=[r39] 40001a3c: 00 50 01 84 mov r1=r42 40001a40: 0a 68 05 00 00 24 [MMI] mov r45=1;; 40001a46: c0 02 00 10 48 e0 mov r44=1024 40001a4c: 00 70 18 e4 cmp.eq p7,p6=0,r14 40001a50: 16 00 00 00 00 c8 [BBB] nop.b 0x0 40001a56: 01 f0 01 80 21 00 (p07) br.cond.dpnt.few 40001e30 <_vfs_fd_alloc+0x430> 40001a5c: 10 00 00 40 br.few 40001a60 <_vfs_fd_alloc+0x60> 40001a60: 11 00 00 00 01 00 [MIB] nop.m 0x0 40001a66: 00 00 00 02 00 00 nop.i 0x0 40001a6c: 28 ba 01 50 br.call.sptk.many b0=4001d480 ;; 40001a70: 08 60 01 40 00 21 [MMI] mov r44=r32 The out0 is r44 in this context. Note how it is first correctly restored to the mutex address at address 1a30 after the fibril_mutex_lock call. But then this value is not used and gets rewritten to 1024 at address 1a46 in preparation for a possible branch and a consequent call to malloc. If the branch is taken, the register is restored properly (not shown here), but if the branch is not taken at address 1a56, the call to fibril_mutex_unlock at address 1a6c is made with a wrong value of r44. We used the following command line to compile the above snippet: usr/local/cross/bin/ia64-helenos-gcc -Iuspace/srv_vfs.p -Iuspace -I../../../uspace -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -std=gnu11 -imacros /home/jermar/software/HelenOS/helenos/build_all/ia64/ski/config.h -O3 -fexec-charset=UTF-8 -finput-charset=UTF-8 -D_HELENOS_SOURCE -Wa,--fatal-warnings -Wall -Wextra -Wwrite-strings -Wunknown-pragmas -Wno-unused-parameter -pipe -ffunction-sections -fdata-sections -fno-common -fdebug-prefix-map=/home/jermar/software/HelenOS/helenos/= -fdebug-prefix-map=../../= -Wsystem-headers -Werror -Wmissing-prototypes -Werror-implicit-function-declaration -Wno-missing-braces -Wno-missing-field-initializers -Wno-unused-parameter -Wno-clobbered -Wno-nonnull-compare -fno-builtin-strftime -isystem../../../common/include -isystem../../../abi/include -isystem../../../abi/arch/ia64/include -isystem../../../uspace/lib/c/arch/ia64/include -isystem../../../uspace/lib/c/include -D__LE__ -fno-unwind-tables -MD -MQ uspace/srv_vfs.p/srv_vfs_vfs_file.c.o -MF uspace/srv_vfs.p/srv_vfs_vfs_file.c.o.d -o uspace/srv_vfs.p/srv_vfs_vfs_file.c.o -c ../../../uspace/srv/vfs/vfs_file.c $ /usr/local/cross/bin/ia64-helenos-gcc -v 1209ms Sat 18 Nov 2023 12:42:07 PM UTC Using built-in specs
[Bug target/112604] [ia64] Output register not preserved after a branch is not taken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112604 --- Comment #2 from Jakub Jermar --- Created attachment 56632 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56632&action=edit Requested object and preprocessed source files Here are the bad and good object and preprocessed sources fies for vfs_file.c. Note that the difference between the good and bad version is -O3 (bad) v.s -O2 (good), but it was the same compiler. I am not sure this is going to be very helpful as with -O2 the code is quite different and the offending function is not inlined. If desired, I may try to go back to GCC 8.2 (the last good version known to me) and try to provide a good file generated with the same compiler flags. Let me know if this would be more useful. I also attached the entire binary of the VFS server, both good and bad versions. Note these are HelenOS binaries and need to be run in the environment of the HelenOS operating system, which might not be practical for you.
[Bug target/112604] [ia64] Output register not preserved after a branch is not taken
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112604 --- Comment #3 from Jakub Jermar --- Created attachment 56633 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56633&action=edit Updated requested good object file for vfs_file.c Seems like -O3 -fno-unswitch-loops makes the bug go away. See the new attachment for a better example of the good binary and the object file. With this optimization disabled the argument for the malloc/calloc is set only after the branch is taken so it does not destroy the argument for fibril_mutex_unlock(): 40001a00 <_vfs_fd_alloc>: 40001a00: 08 48 3d 1a 80 05 [MMI] alloc r41=ar.pfs,15,13,0 40001a06: d0 02 80 00 42 60 mov r45=r32 40001a0c: 05 00 cc 00 mov r43=pr 40001a10: 09 38 01 41 00 21 [MMI] adds r39=64,r32 40001a16: a0 02 04 00 42 40 mov r42=r1 40001a1c: 04 10 41 00 zxt1 r34=r34;; 40001a20: 11 80 00 44 91 39 [MIB] cmp4.eq p16,p17=0,r34 40001a26: 80 02 00 62 00 00 mov r40=b0 40001a2c: 68 b2 01 50 br.call.sptk.many b0=4001cc80 ;; 40001a30: 08 68 01 40 00 21 [MMI] mov r45=r32 40001a36: 44 02 00 00 42 20 (p16) mov r36=r0 40001a3c: 00 50 01 84 mov r1=r42 40001a40: 09 70 00 4e 18 10 [MMI] ld8 r14=[r39] 40001a46: 54 02 00 00 42 c0 (p16) mov r37=r0 40001a4c: 15 00 00 90 mov r46=1;; 40001a50: 38 22 e1 01 07 64 [MMB] (p17) mov r36=1016 40001a56: 54 fa 03 00 48 00 (p17) mov r37=127 40001a5c: 00 00 00 20 nop.b 0x0 40001a60: 11 38 00 1c 06 39 [MIB] cmp.eq p7,p6=0,r14 40001a66: c0 02 04 65 80 03 mov.i r44=ar.lc 40001a6c: f0 03 00 43 (p07) br.cond.dpnt.few 40001e50 <_vfs_fd_alloc+0x450>;; 40001a70: 10 00 00 00 01 00 [MIB] nop.m 0x0 40001a76: 00 00 00 02 00 00 nop.i 0x0 40001a7c: 10 00 00 40 br.few 40001a80 <_vfs_fd_alloc+0x80> 40001a80: 11 00 00 00 01 00 [MIB] nop.m 0x0 40001a86: 00 00 00 02 00 00 nop.i 0x0 40001a8c: 88 b7 01 50 br.call.sptk.many b0=4001d200 ;; 40001a90: 11 68 01 40 00 21 [MIB] mov r45=r32 40001e50: 11 68 01 00 08 24 [MIB] mov r45=1024 40001e56: 00 00 00 02 00 00 nop.i 0x0 40001e5c: f8 bd 00 50 br.call.sptk.many b0=4000dc40 ;;