Re: RTL representation of i386 shrdl instruction is incorrect?
On Thu, Jun 5, 2014 at 12:03 AM, Niranjan Hasabnis wrote: > Hello, > > I was studying i386 machine description for my research purpose, > and I stumbled upon following MD entry for 'shrdl' x86 instruction. > It is obtained from the most recent i386.md file. > > (define_insn "x86_shrd" > [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m") > (ior:SI (ashiftrt:SI (match_dup 0) > (match_operand:QI 2 "nonmemory_operand" "Ic")) > (ashift:SI (match_operand:SI 1 "register_operand" "r") > (minus:QI (const_int 32) (match_dup 2) >(clobber (reg:CC FLAGS_REG))] > "" > "shrd{l}\t{%s2%1, %0|%0, %1, %2}" > [(set_attr "type" "ishift") >(set_attr "prefix_0f" "1") >(set_attr "mode" "SI") >(set_attr "pent_pair" "np") >(set_attr "athlon_decode" "vector") >(set_attr "amdfam10_decode" "vector") >(set_attr "bdver1_decode" "vector")]) > > It seems to me that the RTL representation for 'shrdl' is incorrect. > > Semantics of shrdl instruction as per Intel manual is: > "The instruction shifts the first operand (destination operand) to the right > the number of bits specified by the third operand (count operand). > The second operand (source operand) provides bits to shift in from the > left (starting with the most significant bit of the destination operand)." > And the way RTL does it is by inclusive-or of arithmetically > right-shifted destination and left-shifted source operand. > > But the problem is that: in case of a destination (reg/mem) containing > negative value, arithmetically right-shifted destination will have top bits > set to 1. Inclusive-or with such a value is going to generate a > result with top bits set to 1 instead of moving contents of source > into top bits of destination. > > E.g., when ebx = b72f60d0, ebp = bfcbd2c8 > shrdl $16, %ebp, %ebx (ebx is dest, ebp is src) > produces 0xd2c8b72f in ebx. > But the corresponding RTL produces 0xb72f in ebx. > > So it seems to me that instead of 'ashiftrt', RTL should have 'lshiftrt'. > Can anyone help me with this confusion? The way I read your explanation you are correct. It should be possible to write a testcase that is miscompiled - just try to produce the matched RTL pattern in C and feed it with operands at runtime that end up producing a bogus value when shrdl is used. Oh, and you might want to file a bugreport then ;) Richard. > -- > > Thanks, > Niranjan Hasabnis, > PhD student, > Secure Systems Lab, > Stony brook University, > Stony brook, NY.
Gimplilfy ICE in gnat.dg/array18.adb
Hi, On my private port, I'm unable to debug an ICE on GCC4.7.3 (GNAT 7.1.2) during the internal test testsuite/gnat.dg/array18.adb. Here is the test source code: - with Array18_Pkg; use Array18_Pkg; procedure Array18 is A : String (1 .. 1); begin A := F; end; - - package Array18_Pkg is function N return Positive; subtype S is String (1 .. N); function F return S; end Array18_Pkg; - The size of the String returned by 'F' can't be known at compile time. GNAT will compare the size of the returned string to the size of 'A' at runtime (to call last_chance_handler or not). The ICE is an assert inside force_constant_size of gimplify.c (at line 717) : --- static void force_constant_size (tree var) { /* The only attempt we make is by querying the maximum size of objects of the variable's type. */ HOST_WIDE_INT max_size; gcc_assert (TREE_CODE (var) == VAR_DECL); max_size = max_int_size_in_bytes (TREE_TYPE (var)); gcc_assert (max_size >= 0); max_size = -1 !! DECL_SIZE_UNIT (var) = build_int_cst (TREE_TYPE (DECL_SIZE_UNIT (var)), max_size); DECL_SIZE (var) = build_int_cst (TREE_TYPE (DECL_SIZE (var)), max_size * BITS_PER_UNIT); } --- The 'var' parameter contains : --- unit size align 8 symtab 0 alias set -1 canonical type 0x2ab02348 precision 8 min max context pointer_to_this > sizes-gimplified visited nonaliased-component BLK size readonly visited arg 0 readonly visited arg 0 > arg 1 > unit size readonly visited arg 0 > align 8 symtab 0 alias set 0 canonical type 0x2abb2f18 domain sizes-gimplified visited SI size unit size align 32 symtab 0 alias set -1 canonical type 0x2abb2e70 precision 32 min max index type chain > context chain > used ignored BLK file /vues_statiques/FPGA/belbachir/prism/compiler/gcc_test/internal/../../gcc/gcc/testsuite/gnat.dg/array18.adb line 9 col 9 size unit size align 8> --- I used -fdump-tree-all option and run my cross compiler and the native x86_64 compiler (same GCC/GNAT version) to compare the dumps I don't have the same results event in the first dump : >> X86_64 array18.adb.003t.original : -- Array18 () { typedef array18__TTaSP1___XDLU_1__1 array18__TTaSP1___XDLU_1__1; typedef struct ; typedef character array18__TaS[1:1]; character a[1:1]; character R.0[1:(sizetype) (integer) array18_pkg__R1s]; typedef array18__TTaSP1___XDLU_1__1 array18__TTaSP1___XDLU_1__1; typedef struct ; typedef character array18__TaS[1:1]; character a[1:1]; if (array18_pkg__R1s != 1) { .gnat_last_chance_handler ("array18.adb", 9); } else { } character R.0[1:(sizetype) (integer) array18_pkg__R1s]; R.0 = array18_pkg.f (); a = VIEW_CONVERT_EXPR(R.0); return; } _GLOBAL.SZ0.ada_array18 (integer p0, integer p1) { return p1 <= p0 ? (bitsizetype) sizetype) p0 - (sizetype) p1) + 1) * 8) : 0; _GLOBAL.SZ1.ada_array18 (integer p0, integer p1) { return p1 <= p0 ? ((sizetype) p0 - (sizetype) p1) + 1 : 0; -- >> MyPort array18.adb.003t.original : -- Array18 () { typedef array18__TTaSP1___XDLU_1__1 array18__TTaSP1___XDLU_1__1; typedef struct ; typedef character array18__TaS[1:1]; typedef struct array18__a___PAD array18__a___PAD; struct array18__a___PAD a; typedef array18__TTaSP1___XDLU_1__1 array18__TTaSP1___XDLU_1__1; typedef struct ; typedef character array18__TaS[1:1]; typedef struct array18__a___PAD array18__a___PAD; struct array18__a___PAD a; if (array18_pkg__R1s != 1) { .gnat_last_chance_handler ("array18.adb", 9); } else { } a = {.F=VIEW_CONVERT_EXPR(array18_pkg.f ())}; return; } _GLOBAL.SZ0.ada_array18 (integer p0, integer p1) { return p1 <= p0 ? ((bitsizetype) ((sizetype) p0 - (sizetype) p1) + 1) * 8 : 0; _GLOBAL.SZ1.ada_array18 (integer p0, integer p1) { return p1 <= p0 ? ((sizetype) p0 - (sizetype) p1) + 1 : 0; -- The next 004t.gimple dump is incomplete since the ICE is during gimplify pass. Using debugger I saw that the X86_64 port never goes inside force_constant_size (where is located the ICE on my port)... Can someone give me a hint to solve my problem ? I have no idea which part of my backend could be related to the GENERIC or GIMPLE generation and I'm very unfamiliar with this part of GCC. Regards, Selim Belbachir
Re: Cross-testing libsanitizer
On 3 June 2014 14:46, Christophe Lyon wrote: > On 3 June 2014 12:16, Yury Gribov wrote: Is this 8G of RAM? If yes - I'd be curious to know which part of libsanitizer needs so much memory. >>> >>> >>> Here is what I have in gcc.log: >>> ==12356==ERROR: AddressSanitizer failed to allocate 0x21000 >>> (8589938688) bytes at address ff000 (errno: 12)^M >>> ==12356==ReserveShadowMemoryRange failed while trying to map >>> 0x21000 bytes. Perhaps you're using ulimit -v^M >> >> >> Interesting. AFAIK Asan maps shadow memory with NORESERVE flag so it should >> not consume any RAM at all... >> > > Thanks for the reminder in fact I posted a qemu patch in February > http://lists.gnu.org/archive/html/qemu-devel/2014-02/msg00319.html > I thought it was applied, but it's not yet in trunk > > I used to use a patched qemu, but when I upgraded to 2.0 I forgot > about this patch. > I am going to re-check with a patched qemu, and ping them. So after applying my patch to qemu, I no longer see this error. Now, all execution tests fail in timeout after generating ASAN:SEGV. Which means I have to investigate is going-on :-( It worked better in February :-( Christophe.
Re: RTL representation of i386 shrdl instruction is incorrect?
Hi Richard, Thanks for your reply. I looked into some of the details of how that particular RTL template is used. It seems to me that the particular RTL template is used only when shifting 64-bit data type on a 32-bit machine. This is the underlying assumption encoded in i386.c file which generates that particular RTL only when instruction mode is DImode. If that is the case, then it won't matter whether one uses arithmetic shift or logical shift to right shift lower 4-bytes of a 8-byte value. In other words, the mapping between RTL template and shrdl is incorrect, but the underlying assumption in i386.c guards the bug. On Thu, Jun 5, 2014 at 3:51 AM, Richard Biener wrote: > On Thu, Jun 5, 2014 at 12:03 AM, Niranjan Hasabnis > wrote: >> Hello, >> >> I was studying i386 machine description for my research purpose, >> and I stumbled upon following MD entry for 'shrdl' x86 instruction. >> It is obtained from the most recent i386.md file. >> >> (define_insn "x86_shrd" >> [(set (match_operand:SI 0 "nonimmediate_operand" "+r*m") >> (ior:SI (ashiftrt:SI (match_dup 0) >> (match_operand:QI 2 "nonmemory_operand" "Ic")) >> (ashift:SI (match_operand:SI 1 "register_operand" "r") >> (minus:QI (const_int 32) (match_dup 2) >>(clobber (reg:CC FLAGS_REG))] >> "" >> "shrd{l}\t{%s2%1, %0|%0, %1, %2}" >> [(set_attr "type" "ishift") >>(set_attr "prefix_0f" "1") >>(set_attr "mode" "SI") >>(set_attr "pent_pair" "np") >>(set_attr "athlon_decode" "vector") >>(set_attr "amdfam10_decode" "vector") >>(set_attr "bdver1_decode" "vector")]) >> >> It seems to me that the RTL representation for 'shrdl' is incorrect. >> >> Semantics of shrdl instruction as per Intel manual is: >> "The instruction shifts the first operand (destination operand) to the right >> the number of bits specified by the third operand (count operand). >> The second operand (source operand) provides bits to shift in from the >> left (starting with the most significant bit of the destination operand)." >> And the way RTL does it is by inclusive-or of arithmetically >> right-shifted destination and left-shifted source operand. >> >> But the problem is that: in case of a destination (reg/mem) containing >> negative value, arithmetically right-shifted destination will have top bits >> set to 1. Inclusive-or with such a value is going to generate a >> result with top bits set to 1 instead of moving contents of source >> into top bits of destination. >> >> E.g., when ebx = b72f60d0, ebp = bfcbd2c8 >> shrdl $16, %ebp, %ebx (ebx is dest, ebp is src) >> produces 0xd2c8b72f in ebx. >> But the corresponding RTL produces 0xb72f in ebx. >> >> So it seems to me that instead of 'ashiftrt', RTL should have 'lshiftrt'. >> Can anyone help me with this confusion? > > The way I read your explanation you are correct. It should be possible > to write a testcase that is miscompiled - just try to produce the > matched RTL pattern in C and feed it with operands at runtime that > end up producing a bogus value when shrdl is used. > > Oh, and you might want to file a bugreport then ;) > > Richard. > >> -- >> >> Thanks, >> Niranjan Hasabnis, >> PhD student, >> Secure Systems Lab, >> Stony brook University, >> Stony brook, NY. -- Thanks, Niranjan Hasabnis, PhD student, Secure Systems Lab, Stony brook University, Stony brook, NY.
Re: [RFC] PR61300 K&R incoming args
On 05/31/14 00:30, Alan Modra wrote: On Fri, May 30, 2014 at 11:27:52AM -0600, Jeff Law wrote: On 05/26/14 01:38, Alan Modra wrote: PR61300 shows a need to differentiate between incoming and outgoing REG_PARM_STACK_SPACE for the PowerPC64 ELFv2 ABI, due to code like function.c:assign_parm_is_stack_parm determining that a stack home is available for incoming args if REG_PARM_STACK_SPACE is non-zero. Background: The ELFv2 ABI requires a parameter save area only when stack is actually used to pass parameters, and since varargs are passed on the stack, unprototyped calls must pass both on the stack and in registers. OK, easy you say, !prototype_p(fun) means a parameter save area is needed. However, a prototype might not be in scope when compiling an old K&R style C function body, but this does *not* mean a parameter save area has necesasrily been allocated. A caller may well have a prototype in scope at the point of the call. Ugh. This reminds me a lot of the braindamage we had to deal with in the original PA abi's handling of FP values. In the general case, how can any function ever be sure as to whether or not its prototype was in scope at a call site? Yea, we can know for things with restricted scope, but if it's externally visible, I don't see how we're going to know the calling context with absolute certainty. What am I missing here? When compiling the function body you don't need to know whether a prototype was in scope at the call site. You just need to know the rules. :) For functions with variable argument lists, you'll always have a parameter save area. For other functions, whether or not you have a parameter save area just depends on the number of arguments and their types (ie. whether you run out of registers for parameter passing), and you have that whether or not the function is prototyped. A simple example might help clear up any confusion. I think the confusion was all mine. I think Ulrich's comments in the BZ and the paragraph above really help clarify things. This is entirely an issue on the callee side. On the callee side, we want to be looking strictly at the number/types of the arguments. The prototype/lack thereof isn't relevant. Given void fun1(int a, int b, double c); void fun2(int a, ...); ... fun1 (1, 2, 3.0); fun2 (1, 2, 3.0); A call to fun1 with a prototype in scope won't allocate a parameter save area, and will pass the first arg in r3, the second in r4, and the third in f1. A call to fun2 with a prototype in scope will allocate a parameter save area of 64 bytes (the minimum size of a parameter save area), and will pass the first arg in r3, the second in the second slot of the parameter save area, and the third in the third slot of the parameter save area. Now the first eight slots/double-words of the parameter save area are passed in r3 thru r10, so this means the second arg is actually passed in r4 and the third in r5, not the stack! Right, so for a call to fun2 the caller allocates, but the callee flushes the parameter registers into the allocated space. Right? And the allocated space would be contiguous with the space used to pass arguments once you've exhausted the argument registers? FP values get passed in integer and FP regs to varargs or unprototyped functions. [ Why, oh why couldn't we have done all this ~10 years ago when all the aspects of the PA ABIs were will fresh in my head. There's a lot of similarities, but remembering how it all worked is proving difficult. ] A call to fun1 or fun2 without a prototype in scope will allocate a parameter save area, and pass the first arg in r3, the second in r4, and the third in both f1 and r5. When compiling fun1 body, the first arg is known to be in r3, the second in r4, and the third in f1, and we don't use the parameter save area for storing incoming args to a stack slot. (At least, after PR61300 is fixed..) It doesn't matter if the parameter save area was allocated or not, we just don't use it. Right. It's reasonably similar to how certain aspects of the PA ABIs worked, at least from the callee's viewpoint. When compiling fun2 body, the first arg is known to be in r3, the second in r4 and the third in r5. Since the function has a variable argument list, registers r4 thru r10 are saved to the parameter save area stack, and we set up our va_list pointer to the second double-word of the parameter save area stack. Of course, code optimisation might lead to removing the saves and using the args in their incoming regs, but this is conceptually what happens. Right. That's also consistent with the old PA32 ABI. And so the problem you're trying to solve is that when compiling the callee. You incorrectly assumed that if there was not a prototype for the callee's definition that the caller had set up the save area and that you could flush arguments to it. That's not true in the case where the caller had a prototype for the callee in-scope (and the cal
Re: RTL representation of i386 shrdl instruction is incorrect?
On Thu, 5 Jun 2014, Niranjan Hasabnis wrote: Thanks for your reply. I looked into some of the details of how that particular RTL template is used. It seems to me that the particular RTL template is used only when shifting 64-bit data type on a 32-bit machine. This is the underlying assumption encoded in i386.c file which generates that particular RTL only when instruction mode is DImode. If that is the case, then it won't matter whether one uses arithmetic shift or logical shift to right shift lower 4-bytes of a 8-byte value. In other words, the mapping between RTL template and shrdl is incorrect, but the underlying assumption in i386.c guards the bug. This is still a bug, please file a PR. The use of (match_dup 0) apparently prevents combine from matching the insn (that's just a guess from my notes in PR 55583, I don't have access to my gcc machine right now to check), but that doesn't mean we shouldn't fix things. -- Marc Glisse
Re: Gimplilfy ICE in gnat.dg/array18.adb
> Can someone give me a hint to solve my problem ? I have no idea which part > of my backend could be related to the GENERIC or GIMPLE generation and I'm > very unfamiliar with this part of GCC. Look at the patch installed in conjunction with gnat.dg/array18.adb. -- Eric Botcazou
gcc-4.8-20140605 is now available
Snapshot gcc-4.8-20140605 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.8-20140605/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.8 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_8-branch revision 211290 You'll find: gcc-4.8-20140605.tar.bz2 Complete GCC MD5=b1d39c37aac3269cc5ff2ed305139e19 SHA1=14b69930b581493a0d836b5961b3f27d793e5e34 Diffs from 4.8-20140529 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.8 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: [RFC] PR61300 K&R incoming args
On Thu, Jun 05, 2014 at 01:19:19PM -0600, Jeff Law wrote: > And so the problem you're trying to solve is that when compiling the > callee. You incorrectly assumed that if there was not a prototype > for the callee's definition that the caller had set up the save area > and that you could flush arguments to it. That's not true in the > case where the caller had a prototype for the callee in-scope (and > the callee was not a varargs function). > > Right? Just want to make sure I understand the problem. Exactly correct. -- Alan Modra Australia Development Lab, IBM