Re: Compiling with profiling flags
On 10/17/06, Revital1 Eres <[EMAIL PROTECTED]> wrote: Hello, Is there an option to change the name of the .gcno file that is generated by using profiling flags like -fprofile-generate and later used by -fprofile-use? I read that "For each source file compiled with `-fprofile-arcs', an accompanying `.gcda' file will be placed in the object file directory. " - Can I change it such that the .gcda will be named as I wish? Thanks As far as I know, there's no easy way to do that. However, by setting the environment variable GCOV_PREFIX during the execution of the instrumented program, you can redirect where the gcda files are stored. Alas, I don't know any way to use such out-of-place stored gcda for -fprofile-use. I have changes that add an option to enable exactly that but lacking the copyright assignment, I can't send it as a patch - which also includes some other profile related enhancements mentioned in: http://gcc.gnu.org/wiki/ProfileFeedbackEnhancements -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: GCC optimizes integer overflow: bug or feature?
On 12/20/06, Dave Korn <[EMAIL PROTECTED]> wrote: ... > We (in a major, commercial application) ran into exactly this issue. > 'asm volatile("lock orl $0,(%%esp)"::)' is your friend when this happens > (it is a barrier across which neither the compiler nor CPU will reorder > things). Failing that, no-op cross-library calls (that can't be inlined) > seem to do the trick. This simply means you have failed to correctly declare a variable volatile that in fact /is/ likely to be spontaneously changed by a separate thread of execution. The C or C++ standard doesn't define ANYTHING related to threads, and thus anything related to threads is beyond the standard. If you think volatile means something in an MT environment, think again. You can deduce certain aspect (e.g. guaranteed appearance of store or load), but nothing beyond that. Add memory model to the mix, and you're way beyond what the language says, and you need to rely on the non-standard non-portable facilities, if provided at all. Even in a single threaded environment, what exactly volatile means is not quite clear in the standard (except for setjmp/longjmp related aspect). I liked the following paper (for general users, not for the compiler developers, mind you): http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 12/29/06, Paul Eggert <[EMAIL PROTECTED]> wrote: ... > the much more often reported problems are with > -fstrict-aliasing, and this one also doesn't get any > special treatment by autoconf. That's a good point, and it somewhat counterbalances the opposing point that -O2 does not currently imply '-ffast-math'ish optimizations even though the C standard would allow it to. Can you point me to the relevant section/paragraph in C99 standard where it allows the implementation to do -ffast-math style optimization ? C99 Annex F.8 quite clearly says the implementation can't, as long as it claims any conformity to IEC 60559. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 12/29/06, Paul Eggert <[EMAIL PROTECTED]> wrote: "Seongbae Park" <[EMAIL PROTECTED]> writes: > On 12/29/06, Paul Eggert <[EMAIL PROTECTED]> wrote: >> -O2 does not currently imply '-ffast-math'ish optimizations even >> though the C standard would allow it to. > > Can you point me to the relevant section/paragraph in C99 standard > where it allows the implementation to do -ffast-math style optimization ? > C99 Annex F.8 quite clearly says the implementation can't, > as long as it claims any conformity to IEC 60559. This is more of a pedantic standards question than a real-world programming question, but I'll answer anyway C99 does not require implementations to conform to IEC 60559 as specified in C99 Annex F. It's optional. Similarly, C99 does not require implementations to conform to LIA-1 wrapping semantics as specified in C99 Annex H. That's optional, too. The cases are not entirely equivalent, as Annex F is normative but Annex H is informative. But as far as the standard is concerned, it's clear that gcc could enable many non-IEEE optimizations (including some of those enabled by -ffast-math); that would conform to the minimal standard. Not when __STDC_IEC_559__ is defined, which is the case for most modern glibc+gcc combo. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 30 Dec 2006 03:20:11 +0100, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: ... The C standard, in effect, has an appendix (Annex H) that was not there in the C89 edition, and that talks about the very specific issue at hand H.2.2 Integer types [#1] The signed C integer types int, long int, long long int, and the corresponding unsigned types are compatible with LIA-1. If an implementation adds support for the LIA-1 exceptional values ``integer_overflow'' and ``undefined'', then those types are LIA-1 conformant types. C's unsigned integer types are ``modulo'' in the LIA-1 sense in that overflows or out-of-bounds results silently wrap.An implementation that defines signed integer types as also being modulo need not detect integer overflow, in which case, only integer divide-by-zero need be detected. which clearly says LIA-1 isn't a requirement - notice "if" in the second setence. H.1 makes it clear that the entire Annex H doesn't add any extra rule to the language but merely describes what C is in regard to LIA-1. H.2 doubly makes it clear that C as it defined isn't LIA-1 - again, notice "if" in H.2p1. The second sentence of H.3p1 confirms this again: C's operations are compatible with LIA−1 in that C allows an implementation to cause a notification to occur when any arithmetic operation returns an exceptional value as defined in LIA−1 clause 5. i.e. "compatible" means C's definition doesn't prevent a LIA-1 conformant implementation. In other words, all LIA-1 comformant compiler is conformant to C99 in terms of arithmetic and types. However, not all C99 conformant compiler aren't LIA-1 conformant. C isn't conformant to LIA-1 but merely compatible, exactly because of the undefined aspect. That's enough playing a language laywer for me in a day. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On 12/31/06, Daniel Berlin <[EMAIL PROTECTED]> wrote: ... > I added -fwrapv to the Dec30 run of SPEC at > http://www.suse.de/~gcctest/SPEC/CFP/sb-vangelis-head-64/recent.html > and > http://www.suse.de/~gcctest/SPEC/CINT/sb-vangelis-head-64/recent.html Note the distinct drop in performance across almost all the benchmarks on Dec 30, including popular programs like bzip2 and gzip. Also, this is only on x86 - other targets that benefit more from software pipelinine/modulo scheduling may suffer even more than x86, especially on the FP side. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: gcc, mplayer and profile (mcount)
On 03 Jan 2007 10:07:57 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: Adam Sulmicki <[EMAIL PROTECTED]> writes: > In spirit making OSS better, I took the extra effor to report > findings back to both lists. In reward I got flamed on both list. You got flamed on the gcc list? I don't see any flames there. All I told you was to use the gcc-help mailing list, which was correct. We have different mailing lists for a reason. Sending e-mail to the wrong list simply wastes people's time. I also did my best to answer your question. In your original message you didn't mention that there was an mcount variable in your program, so it's not surprising that I wasn't able to identify the problem. > * Rename mcount in mplayer/libmenu/menu.c to something else. > > * document mcount in gcc man page > > * gcc prints warning. > > * do nothing. > > I can immagine that gcc developers might have intentionally left > mcount() visible to user space so that user can replace gcc's mcount() > with their own implementation. gcc does not provide the function which in this case is called mcount(). That function comes from the C library. The C library used on GNU/Linux systems, glibc, actually provides the function _mcount, with a weak alias named mcount. So this seems to be a bug in gcc: it should be calling _mcount. In fact, by default, gcc for the i386 targets will call _mcount. gcc for i386 GNU/Linux targets was changed to call mcount instead of _mcount with this patch: Thu Mar 30 06:20:36 1995 H.J. Lu ([EMAIL PROTECTED]) * configure (i[345]86-*-linux*): Set xmake_file=x-linux, tm_file=i386/linux.h, and don't set extra_parts. (i[345]86-*-linux*aout*): New configuration. (i[345]86-*-linuxelf): Deleted. * config/linux{,-aout}.h, config/x-linux, config/xm-linux.h: New files. * config/i386/linux-aout.h: New file. * config/i386/linux.h: Extensive modifications to use ELF format as default. (LIB_SPEC): Don't use libc_p.a for -p. don't use libg.a unless for -ggdb. (LINUX_DEFAULT_ELF): Defined. * config/i386/linuxelf.h,config/i386/x-linux: Files deleted. * config/i386/xm-linux.h: Just include xm-i386.h and xm-linux.h. I believe that was during the time H.J. was maintaining a separate branch of glibc for GNU/Linux systems. Presumably his version provided mcount but not _mcount. I haven't tried to investigate further. In any case clearly gcc for i386 GNU/Linux systems today should call _mcount rather than mcount. I will make that change. I remember someone wanting to provide his own mcount(). Presumably, mcount() is weak in libc to cover such a use case ? -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: RFC: Mark a section to be discarded for DSO and executable
On 09 Jan 2007 10:09:35 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: "H. J. Lu" <[EMAIL PROTECTED]> writes: > With LTO, an object file may contain sections with IL, which > can be discarded when building DSO and executable. Currently we can't > mark such sections with gABI. With GNU linker, we can use a > linker script to discard such sections. But it will be more generic > to make a section to be discarded for DSO and executable in gABI. > In that case, we don't need a special linker script to discard > such sections. Something like > > #define SHF_DISCARD 0x800 > > would work. That is not strictly required for LTO as I see it. With LTO, the lto program is going to read the .o files with the IL information. It will then generate a new .s file to pass to the assembler. The IL information will never go through the linker. Not only this isn't a requirement, there's a scenario where you *want* to keep the IL inside LTO-optimized DSO and executables for re-optimization. With LTO and IL inside executables and DSO, it becomes possible to do profile-collection without any source code and re-optimize the binary. i.e. an executable can be released with IL inside, and the user of the binary can choose to do profile collection on their own input, and re-optimize the binary to squeeze more performance. This scenario also improves the usability of the profile-feedback directed optimization (by not requiring the source code access and the whole build environment for profile feedback optimization). Of course, it is also possible that LTO .o files with IL information will be passed directly to the linker, for whatever reason. In that case, we may want the linker to remove the IL information. This is not substantially different from the linker's current ability to strip debugging information on request. It should be optional for the linker to remove the sections containing IL. We probably want to have a new option to strip to remove such sections as well as linker option to remove such sections. So if you want to propose a solution for that, I think you should consider how it can be used for debugging information as well. And I don't think SHF_DISCARD is a good name. We don't want to arbitrarily discard the data, we want to discard it in certain specific scenarios. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Signed int overflow behaviour in the security context
On 1/26/07, Daniel Berlin <[EMAIL PROTECTED]> wrote: > > > Every leading C compiler has for years done things like this to boost > > performance on scientific codes. > > The Sun cc is a counter-example. And even then, authors of scientific > code usually do read the compiler manual, and will discover any > additional optimizer flags. > Errr, actually, Seongbae, who worked for Sun on Sun CC until very recently, says otherwise, unless i'm mistaken. Seongbae, didn't you say that Sun's compiler uses the fact that signed overflow is undefined when performing optimizations? Correct. Sun's SPARC backend even had a compiler flag that tells the optimizer to ignore overflow case for unsigned integer which I believe is used in some of the SPEC CPU submissions. There's no equivalent flag for signed integer because it's default, and we've never had a problem with it - there were bugs filed against it, and when customers were told signed integer overflow is undefined in the standard, they went back and fixed their code. I believe HP's compiler does the similar, although they also provide an option to override the default behavior of ignoring the integer overflow at high optimization level (see their documentation about +Ointeger_overflow= flag). IBM xlc has very similar approach, IIRC. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: 2007 GCC Developers Summit
On 1/29/07, Diego Novillo <[EMAIL PROTECTED]> wrote: Ben Elliston wrote on 01/28/07 17:45: > One idea I've always pondered is to have brief (perhaps 1-2 hr) > tutorials, given by people in their area of expertise, as a means for > other developers to come up to speed on a topic that interests them. Is > this something that appeals to others? > Sounds good to me. For instance, the new java front end, a description of the new build system, etc. Although it's not something new, I'd be interested in a tutorial on loop optimizations (IV opt and all related). Based on my understanding, the topic might be of interest to some other people as well. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Some thoughts and quetsions about the data flow infrastracture
On 2/13/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote: ... I am only affraid that solution for faster infrastructure (e.g. another slimmer data representation) might change the interface considerably. I am not sure that I can convinince in this. But I am more worried about 4.3 release and I really believe that inclusion of the data flow infrastructure should be the 1st step of stage 1 to give people more time to solve at least some problems. Vlad, I'm really interested in hearing what aspect of the current interface is not right. Can you tell us about just some rough sketch of what slimmer (hence better) data representation would look like, and why the current interface won't be ideal for that ? It's not too late - if there's anything we can do to change the interface so that it can accomodate a potentially better implementation in the future, I won't object to it - it's just that I haven't talked to you to figure out what you have in mind. Saying that I hurt some feeling people who put a lof of efforts in the infrastracture like Danny, Ken, and Seongbae and I am sorry for that. No feelings hurt. Thanks for all your feedback. More eyes, especially experienced ones like you, can only help. Also, thanks for trying out DF on Core2 - we need to look more closely why those regressions are there and what/how we can do to fix those, before such an evaluation, I can't tell whether this is a serious fundamental problem or some easily fixable things that we haven't gotten to yet. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: bad edge created in CFG ?
On 3/14/07, Sunzir Deepur <[EMAIL PROTECTED]> wrote: Hello, I have used -da and -dv to produce vcg files of the CFG of this simple program: int main(int argc, char**argv) { if(argc) printf("positive\n"); else printf("zero\n"); return 0; } I have expected to get a CFG as follows: --- | BB 0 | --- / \ --- --- | BB 1 | | BB 2 | --- --- \ / --- | END | --- But instead, I was surprised to get this CFG: --- | BB 0 | --- / \ --- --- | BB 1 | > | BB 2 | --- --- \ / --- | END | --- as if one case of the "if" can lead to the other ! Can someone please explain to me why it is so ? I am attaching the VCG representation, the VCG text file, and the original test program.. Thank You sunzir I don't know what kind of vcg viewer/converter you're using, but set it to ignore class 3 edges - you'll get what you expected. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: register reload problem in global register allocation
On 3/21/07, wonsubkim <[EMAIL PROTECTED]> wrote: I have some problems in global register allocation phase. I have described some simple architecture using machine description and target macro file. I use gnu GCC version 4.1.1. But, "can't combine" message is printed out in *.c.37.greg file in global register allocation phase. After i have traced the error message, i have found out the message reload_as_needed. But, I do not recognize the reason about this problem. I really can't know the relationship between register reload and my description. What is your REAL problem ? I doubt "can't combine" is the real immediate symptom, as it's most likely just a dump of rld[].nocombine. Show us at least your error message and the rtl that you think is causing the problem. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Proposal: changing representation of memory references
On 4/4/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote: Hello, at the moment, any pass that needs to process memory references are complicated (or restricted to handling just a limited set of cases) by the need to interpret the quite complex representation of memory references that we have in gimple. For example, there are about 1000 of lines of quite convoluted code in tree-data-ref.c and about 500 lines in tree-ssa-loop-ivopts.c dealing with parsing and analysing memory references. I would like to propose (and possibly also work on implementing) using a more uniform representation, described in more details below. The proposal is based on the previous discussions (http://gcc.gnu.org/ml/gcc/2006-06/msg00295.html) and on what I learned about the way memory references are represented in ICC. It also subsumes the patches of Daniel to make p[i] (where p is pointer) use ARRAY_REFs/MEM_REFs. I am not sure whether someone does not already work on something similar, e.g. with respect to LTO (where the mentioned discussion started), or gimple-tuples branch? Proposal: For each memory reference, we remember the following information: -- base of the reference -- constant offset -- vector of indices -- type of the accessed location -- original tree of the memory reference (or another summary of the structure of the access, for aliasing purposes) -- flags for each index, we remeber -- lower and upper bound -- step -- value of the index The address of the reference can be computed as base + offset + sum_{idx} offsetof(idx) where offsetof(idx) = (value - lower) * step For example, a.x[i][j].y would be represented as base = &a offset = offsetof (x) + offsetof (y); indices: lower = 0 upper = ? step = sizeof (a.x[i]) value = i lower = 0 upper = ? step = sizeof (a.x[j]) value = j p->x would be represented as base = p; offset = offsetof (x); indices: empty p[i] as base = p; offset = 0 indices: lower = 0 upper = ? step = sizeof (p[i]) value = i Remarks: -- it would be guaranteed that the indices of each memory reference are independent, i.e., that &ref[idx1][idx2] == &ref[idx1'][idx2'] only if idx1 == idx1' and idx2 = idx2'; this is important for dependency analysis (and for this reason we also need to remember the list of indices, rather than trying to reconstruct them from the address). I didn't completely think through this, but how would this property help in the presence of array flattening ? e.g. suppose two adjacent loops, both two-nest-deep, and only one of them is flattened, then one loop will have one dimensional access (hence will have only one index) vs the other loop will have two dimensional. In other words, &ref[idx1] == &ref[idx1'][idx2'] can happen. So most likely we'll need to be able to compare the linearized address form, rather than simply comparing vectors of indexes. This is not to say I don't like your proposal (I actually do), but I just want to understand what properties you're expecting from the representation (and I think this has implication on the canonicalization of the references you mentioned below). Another question is, how would the representation look for more complicated address calculations e.g. a closed hashtable access like: table[hash_value % hash_size] and would it help in such cases ? -- it would be guaranteed that the computations of the address do not overflow. -- possibly there could be a canonicalization pass that, given for (p = q; p < q + 100; p++) p->x = ... {represented the way described above} would transform it to for (p = q, i = 0; p < q + 100; p++, i++) {base = q, offset = offsetof(x), indices: lower = 0 upper = ? step = sizeof (*p) value = i} so that dependence analysis and friends do not have to distinguish between accesses through pointers and arrays -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: DF-branch benchmarking on SPEC2000
On 4/23/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote: ... To improve the scores I'd recommend to pay attention to big degradation in SPEC score: 9% perlbmk degradation on Pentium4 3% fma3d degradation on Core2 3% eon and art degradation on Itanium 3% gap and wupwise degradation on PPC64. Vlad, Thanks a LOT for the measurement and the summary. Thanks to your previous report, I've been looking at wupwise on PPC64, and have found at least one missed optimization opportunity. I'm testing the fix. Hopefully it will address some of the remaining regressions above. As for the perlbmk slowdown on P4. my initial guess is that it might be due to cross-jumping or block ordering - those are things I noticed the dataflow branch generates slightly different code than mainline. I didn't try to narrow down where the difference comes from, as most of the differences seemed unimportant. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Backport fix for spurious anonymous ns warnings PR29365 to 4.2?
On 5/1/07, Andrew Pinski <[EMAIL PROTECTED]> wrote: On 01 May 2007 14:28:07 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: > I agree that it would be appropriate to backport the patch to gcc 4.2. Lets first get the patch which fixes the ICE regression that this patch causes approved :). Which can be found at: http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01746.html Thanks, Andrew Pinski Thanks for the plug, Andrew. C++ maintainers, Please consider this as another ping for my patch :) -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Backport fix for spurious anonymous ns warnings PR29365 to 4.2?
On 5/2/07, Mark Mitchell <[EMAIL PROTECTED]> wrote: Seongbae Park wrote: > On 5/1/07, Andrew Pinski <[EMAIL PROTECTED]> wrote: >> On 01 May 2007 14:28:07 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: >> > I agree that it would be appropriate to backport the patch to gcc 4.2. >> >> Lets first get the patch which fixes the ICE regression that this >> patch causes approved :). >> >> Which can be found at: >> http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01746.html This patch is OK for mainline. Attached is the patch I commited just now. As for backporting to 4.2, this isn't a regression, so the default answer would be "no". I'm unconvinced that this is a sufficiently serious problem to merit violating that policy; after all, we're only talking about a spurious warning. (However, the other bug here is that we don't have a warning option for this, so users can't use -Wno- to turn this off.) In any case, we're not going to do this for 4.2.0. As per the policy I posted recently on PRs, please find a C++ maintainer who wants to argue for backporting this and ask them to mark the PR as P3 with an argument as to why this is important to backport. Thanks, My guess is that this will be big enough nuisance to be worth backporting, but I agree that this isn't a regression - I won't bother other c++ maintainers for this. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com"; Index: gcc/testsuite/g++.dg/warn/anonymous-namespace-2.h === --- gcc/testsuite/g++.dg/warn/anonymous-namespace-2.h (revision 0) +++ gcc/testsuite/g++.dg/warn/anonymous-namespace-2.h (revision 0) @@ -0,0 +1,3 @@ +namespace { + struct bad { }; +} Index: gcc/testsuite/g++.dg/warn/anonymous-namespace-2.C === --- gcc/testsuite/g++.dg/warn/anonymous-namespace-2.C (revision 0) +++ gcc/testsuite/g++.dg/warn/anonymous-namespace-2.C (revision 0) @@ -0,0 +1,29 @@ +// Test for the warning of exposing types from an anonymous namespace +// { dg-do compile } +// +#include "anonymous-namespace-2.h" + +namespace { +struct good { }; +} + +struct g1 { +good * A; +}; +struct b1 { // { dg-warning "uses the anonymous namespace" } +bad * B; +}; + +struct g2 { +good * A[1]; +}; +struct b2 { // { dg-warning "uses the anonymous namespace" } +bad * B[1]; +}; + +struct g3 { +good (*A)[1]; +}; +struct b3 { // { dg-warning "uses the anonymous namespace" } +bad (*B)[1]; +}; Index: gcc/cp/decl2.c === --- gcc/cp/decl2.c (revision 124362) +++ gcc/cp/decl2.c (working copy) @@ -1856,7 +1856,7 @@ constrain_class_visibility (tree type) for (t = TYPE_FIELDS (type); t; t = TREE_CHAIN (t)) if (TREE_CODE (t) == FIELD_DECL && TREE_TYPE (t) != error_mark_node) { - tree ftype = strip_array_types (TREE_TYPE (t)); + tree ftype = strip_pointer_or_array_types (TREE_TYPE (t)); int subvis = type_visibility (ftype); if (subvis == VISIBILITY_ANON) Index: gcc/c-common.c === --- gcc/c-common.c (revision 124362) +++ gcc/c-common.c (working copy) @@ -3894,6 +3894,15 @@ strip_pointer_operator (tree t) return t; } +/* Recursively remove pointer or array type from TYPE. */ +tree +strip_pointer_or_array_types (tree t) +{ + while (TREE_CODE (t) == ARRAY_TYPE || POINTER_TYPE_P (t)) +t = TREE_TYPE (t); + return t; +} + /* Used to compare case labels. K1 and K2 are actually tree nodes representing case labels, or NULL_TREE for a `default' label. Returns -1 if K1 is ordered before K2, -1 if K1 is ordered after Index: gcc/c-common.h === --- gcc/c-common.h (revision 124362) +++ gcc/c-common.h (working copy) @@ -727,6 +727,7 @@ extern bool c_promoting_integer_type_p ( extern int self_promoting_args_p (tree); extern tree strip_array_types (tree); extern tree strip_pointer_operator (tree); +extern tree strip_pointer_or_array_types (tree); extern HOST_WIDE_INT c_common_to_target_charset (HOST_WIDE_INT); /* This is the basic parsing function. */
Re: Optimize flag breaks code on many versions of gcc (not all)
On 6/18/06, Paolo Carlini <[EMAIL PROTECTED]> wrote: Zdenek Dvorak wrote: >... I suspect there is something wrong with your >code (possibly invoking some undefined behavior, using uninitialized >variable, sensitivity to rounding errors, or something like that). > > A data point apparently in favor of this suspect is that the "problem" goes away if double is replaced everywhere with long double... Paolo. As suspected by everybody, this is a bug in the code. From your original code: 110 coord[i]=start[i]+maxT[whichPlane]*dir[i]; 111 // Uncomment one of these to make the program work. 112 //sleep(0); 113 //cerr << ""; 114 if ((coord[i]ur[i])) { 115// outside box 116return false; 117 } The compiler is allowed to calculate coord[i] at 110 in 80-bit and use that in the comparison at line 114. One way to see this impact, without debugging at the assembly level, is to change the code to: 110 long double temp; 111 temp = start[i]+maxT[whichPlane]*dir[i]; 112 // Uncomment one of these to make the program work. 113 //sleep(0); 114 //cerr << ""; 115 if ((tempur[i])) { If you try this code, you'll find that this always fails and if you inspect temp and ur[i], you'll find that they are very close - within 1 ULP of double precision. One bandaid for this problem is to use -ffloat-store but you'll suffer the performance penalty and it won't really fix the root cause - which is your code. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Optimize flag breaks code on many versions of gcc (not all)
On 6/19/06, Dave Korn <[EMAIL PROTECTED]> wrote: On 19 June 2006 00:04, Paolo Carlini wrote: > Zdenek Dvorak wrote: > >> ... I suspect there is something wrong with your >> code (possibly invoking some undefined behavior, using uninitialized >> variable, sensitivity to rounding errors, or something like that). >> >> > A data point apparently in favor of this suspect is that the "problem" > goes away if double is replaced everywhere with long double... > > Paolo. Is this another case of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 then? cheers, DaveK It is the same case. Fundamentally, this is not fixable by the compiler alone without significant performance penalty. There are very few implementations [1] that are completely IEEE754 conformant and making them to be so is often prohibitively expensive, hence it's not done or at least not by default. So whenever you're programming a floating-point code, you need to be aware of the caveats of a particular implementation. Beside x86's well-known extended precision issue, other processors have things like flush-denormal-inputs-to-zero, or multiply-add instruction that is not equivalent to separate muitply and add. I think it's not fair to expect gcc to somehow "fix" this whole mess alone. Of course, whenever there's a reasonable workaround for a particular issue, I'm sure gcc developers will try to accomodate it, but IMHO this one (bug 323) isn't such. [1] by implementation, I mean the combination of: microprocessor, OS, compiler and runtime libraries.. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Project RABLET
On 6/23/06, Andrew MacLeod <[EMAIL PROTECTED]> wrote: ... 1 - One of the core themes in RABLE was very early selection of instructions from patterns. RTL patterns are initially chosen by the EXPAND pass. EXPAND tends to generates better rtl patterns by being handed complex trees which it can process and get better combinations. When TREE-SSA was first implemented, we got very poor RTL because expand was seeing very small trees. TER (Temporary Expression Replacement) was invented, which mashed any single-def/single-use ssa_names together into more complex trees. This gave expand a better chance of selecting better instructions, and made a huge difference. Have you considered using BURG/IBURG style tree pattern matching instruction selection ? http://www.cs.princeton.edu/software/iburg/ That approach can certainly provide a low register pressure high quality instruction selection. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: unable to detect exception model
On 6/25/06, Eric Botcazou <[EMAIL PROTECTED]> wrote: > So, something obviously wrong with > > struct max_alignment { > char c; > union { >HOST_WIDEST_INT i; >long double d; > } u; > }; > > /* The biggest alignment required. */ > > #define MAX_ALIGNMENT (offsetof (struct max_alignment, u)) > > for SPARC 32bit? I don't think so, the ABI says 8 in both cases. Note that bootstrap doesn't fail on SPARC/Solaris 2.[56] and (presumably) SPARC/Linux, which have HOST_WIDE_INT == 32, whereas SPARC/Solaris 7+ have HOST_WIDE_INT == 64. All are 32-bit compilers. Bootstrap doesn't fail on SPARC64/Solaris 7+ either, for which the ABI says 16 for the alignment in both cases. They are 64-bit compilers. SPARC psABI3.0 (32bit version) defines long double as 8 byte aligned. SCD4.2, 64bit version, defines long double as 16 byte aligned with some caveat (which essentially says long double can be 8-byte aligned in some cases - fortran common block case - but the compiler should assume 16-byte unless it can prove otherwise). On 32bit ABI, there's also a possiblity of "double"s being only 4-byte aligned when a double is passed on the stack. I don't know enough about gcc's gc to know whether the above can trip it over, but the memory allocation (malloc and the likes) shouldn't be a problem as long as it returns 8-byte aligned block on 32bit and 16-byte aligned on 64bit. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: externs and thread local storage
On 7/1/06, Gary Funck <[EMAIL PROTECTED]> wrote: ... What are the technical reasons for the front-end enforcing this restriction, when apparently some linkers will handle the TLS linkage fine? If in fact it is required that __thread be added to the extern, is the compiler simply accommodating a limitation/bug in the linker? Because the compiler has to generate different code for accesses to __thread vs non __thread variable: # cat -n t.c 1 extern int i1; 2 extern __thread int i2; 3 4 int func1() { return i1; } 5 int func2() { return i2; } # gcc -O -S t.c # cat t.s .file "t.c" .text .globl func1 .type func1,@function func1: pushl %ebp movl%esp, %ebp movli1, %eax leave ret .Lfe1: .size func1,.Lfe1-func1 .globl func2 .type func2,@function func2: pushl %ebp movl%esp, %ebp movl[EMAIL PROTECTED], %eax movl%gs:(%eax), %eax leave ret .Lfe2: .size func2,.Lfe2-func2 .ident "GCC: (GNU) 3.2.2 20030222 (Red Hat Linux 3.2.2-5)" # -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: externs and thread local storage
On 7/2/06, Gary Funck <[EMAIL PROTECTED]> wrote: ... Seongbae Park wrote: > Because the compiler has to generate different code > for accesses to __thread vs non __thread variable In my view, this is implementation-defined, and generally can vary depending upon the underlying linker and OS technology. Further, there is at least one known platform (IA64) which seems to not impose this restriction. A few implementation techniques come to mind, where the programmer would not need to explicitly tag 'extern's with __thread: That's the only platform I know of that doesn't require different sequence. Should we make the language rules such that it's easy to implement on one platform but not on the others, or should we make it such that it's easy to implement in almost all platforms ? Also, what is the benefit of allowing mismatch between declaration and definition of __thread vs non __thread ? It only makes reading the code more difficult because it confuses everybody - you need to look at the definition as well as the declaration to know whether you're dealing with a thread local variable or not which is BAD. ...proposed scheme snipped... The question to me is not whether it's doable, but whether it's worth doing - I see only downside and no upside of allowing mismatch. If you're convinced that this is a really useful thing for a particular platform, why don't you create a new language extension flag that allows this, and make it default on that platform ? -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: externs and thread local storage
On 7/2/06, Gary Funck <[EMAIL PROTECTED]> wrote: ... More to the point, I think it is rather too bad that the extern has to have the __thread attribute at all, and would have hoped that the linker and OS could have collaborted to make this transparent, in the same way that data can arranged in separately linked sections without impacting the way the external references are written. Thus, implementation is separated from interface. TLS variable and non-TLS variable have totally different behaviors, and hence they are not interchangeable. Therefore, TLS aspect should not be made "transparent" i.e. The programmer has to treat a global variable differently depending on whether or not it is a TLS and getting a different one when you expected otherwise will cause subtle (well, actually not so suble, since it's likely to blow up in your face) runtime problems. I think the requirement to apply _thread to an extern should also be target specific. As I said, you're welcome to implement a new option (either a runtime option or a compile time configuration option) that will allow mixing TLS vs non-TLS. Whether or not it should be enabled for a particular platform should be a matter of discussion, and whether or not that patch will be accepted in the mainline will be yet another. Because of the reasons I said above, I think it's a bad idea in general and I'll oppose to it for any of platforms I care about. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: externs and thread local storage
On 7/3/06, Gary Funck <[EMAIL PROTECTED]> wrote: Seongbae Park wrote: > As I said, you're welcome to implement a new option > (either a runtime option or a compile time configuration option) > that will allow mixing TLS vs non-TLS. In a way, we've already done that -- in an experimental dialact of "C" called UPC. When compiled for pthreads, all file scope data not declared in a system header file is made thread local, and in fact all data referenced through externs is also made thread local. There is a new syntax (a "shared" qualifier) used by the programmer to identify objects shared across all threads. Sounds a little scary, but works amazingly well. Because the tagging of data as __thread local is done by the compiler transparently, I tend to think that we probably stress the TLS feature more than most. In UPC, anything that's not TLS (or in UPC term, "private") is marked explicitly as "shared". So it's NOT trasparent in any sense of the word. See, you have two choices - either 1) make every global variable TLS by default and mark only non-TLS (UPC) or 2) vice versa (C99). It is not sane to allow TLS/non-TLS attribute changing underneath you - which is what you proposed. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: externs and thread local storage
On 7/3/06, Gary Funck <[EMAIL PROTECTED]> wrote: ... Operations on UPC's shared objects have different semantics than regular "C". Array indexing and pointer operations span threads. Thus A[i] is on one thread but A[i] will (by default) take you to the next (cyclic) thread. Since the semantics are different, This is a totally off-topic, but the layout in UPC doesn't really affect the semantic. The layout, or the distribution, is primarily there for the performance, not for the correctness. And it really doesn't behave any differently (except for the extra restrictions and checks put there to make programmers think about data distribution across threads). A[10] is A[10], or the tenth element of the array A regardless of which thread has the affinity to the element. That's actually the whole point of UPC - that the code is mostly semantically same as regular C. But that's grossly off-topic so please don't do any followup on this. the programmer needs to know that -- it affects the API. TLS objects behave like regular "C" objects, at least from the perspective of the referencing thread. They don't behave the same way from any perspective. Non TLS can change underneath by other threads, whereas TLS can not (except when the pointer to a TLS has escaped to other threads, which UPC prevents by extra casting restriction on shared and private pointers). It has a serious impact on how you code around the variable - whether you need locking and/or some sort of atomic update or not, whether you can use it to communicate between threads or not, how the variable is initialized, etc. Or, since you seem familiar with UPC: In UPC: int local_counter; shared int global_counter; is exactly semantically equivalent to: int __thread local_counter; int global_counter; in C99 w/ __thread extension. Your proposal is equivalent, in UPC, like allowing declaring a global variable as private, and then later turn it into a shared in the definition, or vice versa. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: How to insert dynamic code? (continued)
On 7/13/06, jacob navia <[EMAIL PROTECTED]> wrote: Daniel Jacobowitz wrote: >On Thu, Jul 13, 2006 at 05:06:25PM +0200, jacob navia wrote: > > >>>So, what happens when _Unwind_Find_registered_FDE is called? Does it >>>find the EH data you have registered? >>> >>> >>> >>> >>> >>Yes but then it stops there instead of going upwards and finding the catch! >>It is as my insertion left the list of registered routines in a bad state. >> >>I will look again at this part (the registering part) and will try to >>find out what >>is going on. >> >> > >It sounds to me more like it used your data, and then was left pointing >somewhere garbage, not to the next frame. That is, it sounds like >there's something wrong with your generated unwind tables. That's the >usual cause for unexpected end of stack. > > > Yeah... My fault obviously, who else? Problem is, there are so mny undocumented stuff that I do not see how I could avoid making a mistake here. 1) I generate exactly the same code now as gcc: Prolog: push %ebp movq %rsp,%rbp subqxxx,%rsp and I do not touch the stack any more. Nothing is pushed, in the "xxx" is already the stack space for argument pushing reserved, just as gcc does. This took me 3 weeks to do. Now, I write my stuff as follows: 1) CIE 2) FDE for function 1 . 1 fde for each function 3) Empty FDE to zero terminate the stuff. 4) Table of pointers to the CIE, then to the FDE p = result.FunctionTable; // Starting place, where CIE, then FDEs are written p = WriteCIE(p); // Write first the CIE pFI = DefinedFunctions; nbOfFunctions=0; pFdeTable[nbOfFunctions++] = result.FunctionTable; while (pFI) { // For each function, write the FDE fde_start = p; p = Write32(0,p); // reserve place for length field (4 bytes) p = Write32(p - result.FunctionTable,p); //Write offset to CIE symbolP = pFI->FunctionInfo.AssemblerSymbol; adr = (long long)symbolP->SymbolValue; adr += (unsigned long long)code_start; // code_start is the pointer to the Jitted code p = Write64(adr,p); p = Write64(pFI->FunctionSize,p); // Write the length in bytes of the function *p++ = 0x41;/// Write the opcodes *p++ = 0x0e; // This opcodes are the same as gcc writes *p++ = 0x10; *p++ = 0x86; *p++ = 0x02; *p++ = 0x43; *p++ = 0x0d; *p++ = 0x06; p = align8(p); Write32((p - fde_start)-4,fde_start);// Fix the length of the FDE pFdeTable[nbOfFunctions] = fde_start; // Save pointer to it in table nbOfFunctions++; pFI = pFI->Next; // loop } The WriteCIE function is this: static unsigned char *WriteCIE(unsigned char *start) { start = Write32(0x14,start); start = Write32(0,start); *start++ = 1; // version 1 *start++ = 0; // no augmentation *start++ = 1; *start++ = 0x78; *start++ = 0x10; *start++ = 0xc; *start++ = 7; *start++ = 8; *start++ = 0x90; *start++ = 1; *start++ = 0; *start++ = 0; start = Write32(0,start); return start; } I hope this is OK... jacob The above code looks incorrect, for various reasons, not the least of which is that you're assuming CIE/FDE are fixed-length. There are various factors that affect FDE/CIE depending on PIC/non-PIC, C or C++, 32bit/64bit, etc - some of them must be invariant for your JIT but some of them may not. Also some of the datum are encoded as uleb128 (see dwarf spec for the detail of LEB128 encoding) which is a variable-length encoding whose length depends on the value. In short, you'd better start looking at how CIE/FDE structures are *logically* layed out - otherwise you won't be able to generate correct entries. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Followup: [g++] RFH: Is there a way to make gcc place global const class objects in .rodata ?
On 8/9/06, Haase Bjoern (PT-BEU/EMT) <[EMAIL PROTECTED]> wrote: I realized just a bit too late that there is an open bug report for the issue: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=4131 > Bjoern Haase wrote >Hello, > >Is there a way to place global const objects in .rodata, thus avoiding construction at program start? So: Sorry for the noise on the list. According to PR4131, there is currently no solution for this issue. The ".rodata" placement would be highly desireable for my current projects. This means that I would be willing to try to implement the required enhancements myself. Since so far I know only about how the back-ends work, I'd appreciate a hint on where to start, which part of the code to look at, etc. . Bjoern. Of course, it's best if this can be implemented in the compiler, but if the size and the number of the readonly data is manageable, you can do this by hand - inspect what the layout of the class is by writing a test program and looking at how the fields are layed out, and replicating the same data in the assembly. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: gcc trunk vs python
On 8/27/06, Guido van Rossum <[EMAIL PROTECTED]> wrote: ... I have not received reports about bugs in the offending code when compiled with other compilers. I do know at least one another compiler that does this, and at least one significant project (which is actually quite a bit larger than Python) that suffered similar problem as Python's suffering - in short, this isn't new and this isn't even something gcc specific (or fault). And Python isn't the first program that suffers from this either. > code. But this particular optimization will let us do a much better > job on very simple loops like > for (int i = 0; i != n; i += 5) > The issue with a loop like this is that if signed overflow is defined > to be like unsigned overflow, then this loop could plausibly wrap > around INT_MAX and then terminate. If signed overflow is undefined, > then we do not have to worry about and correctly handle that case. That seems to me rather obviously broken code unless the programmer has proven to himself that n is a multiple of 5. So why bother attempting to optimize it? There are legitimate code with similar form, like pointer increment compared against the "end" pointer which happens often in string manipulation or buffer management code. More importantly some of C++ STL iterators often end up in such a form. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!
On 8/30/06, Mark Mitchell <[EMAIL PROTECTED]> wrote: ... I guess my overriding concern is that we're focusing heavily on the data format here (DWARF? Something else? Memory-mappable? What compression scheme?) and we may not have enough data. I guess we just have to pick something and run with it. I think we should try to keep that code as as separate as possible so that we can recover easily if whatever we pick turns out to be (another) bad choice. :-) At the risk of stating the obvious and also repeating myself, please allow me give my thought on this issue. I think we should take even a step further than "try to keep the code as separate". We should try to come up with a set of procedural and datastructural interface for the input/output of the program structure, and try to *completely* separate the optimization/datastructure cleanup work from the encoding/decoding. Beside the basic requirement of being able to pass through enough information to produce valid program, I think there is a critical requirement to implement inter-module/inter-procedural optimization efficiently - that the I/O interface allows efficient handling of iterating through module/procedure-level information without reading each and every module/procedure bodies (as Ken mentioned). There are certain amount of information per object/procedure that are accessed during different optimization and with sufficiently different pattern - e.g. type tree is naturally an object-level information that we may want to go through for each and every object file, without read all function bodies, and other function level information such as caller/callee information would be useful without the function body. We'll need to identify such information (in other words, the requirement of the interprocedural optimization/analysis) so that the new interface would provide ways to walk through them without loading the entire function bodies - even with large address space, if the data is scattered everywhere, it becomes extremely inefficient on modern machines to go through them, so it's actually more important to identify what logical information that we want to access during various interprocedural optimizations and the I/O interface needs to handle them efficiently. This requirement should dictate how we encode/layout the data into the disk, before anything else. Also how the information is presented to the actual inter-module optimization/analysis. Also, part of defining the interface would involve restricting the existing structures (e.g. GIMPLE) in possibly more limited form than what's currently allowed. By virtue of having an interface that separates the encoding/decoding from the rest of the compilation, we can throw away and recompute certain information (e.g. often certain control flow graph can be recovered, hence does not need to be encoded) but those details can be worked out as the implementation of the IO interface gets more in shape. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Merging identical functions in GCC
On 15 Sep 2006 13:54:00 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: Laurent GUERBY <[EMAIL PROTECTED]> writes: > On Fri, 2006-09-15 at 09:27 -0700, Ian Lance Taylor wrote: > > I think Danny has a 75% implementation based on hashing the RTL for a > > section and using that to select the COMDAT section signature. Or the > > hashing can be done in the linker, but it still needs compiler help to > > know when it is permissible. > > For code sections (I assume read-only), isn't the linker always able to > merge identical ones? What can the compiler do better than the linker? The linker can't merge just any identical code section, it can only merge code sections where that is permitted. For example, consider: int foo() { return 0; } int bar() { return 0; } int quux(int (*pfn)()) { return pfn == &foo; } Compile with -ffunction-sections. If the linker merges sections, that program will break. I think that in general it would be safe to merge any equivalent COMDAT code section, though. Just like most other compiler optimizations, as-if rule would apply here as well. If the linker can fake the above or prevent the optimization to happen only for above cases, the optimization can be safe (e.g. if there's no relocation against the function other than direct call and the function symbol is "hidden" (as in linker scoping), the linker can know that the address of a function is not taken). However, I'd like to note that the debug information could become screwy under this optimization (as usual with any optimization of this sort) and could potentially make debugger's life somewhat miserable (e.g. a template of a pointers may end up having exactly the same code thus eligible for merging by linker, then debugger may not be able to tell what is the real type of the pointer, and you'll get "impossible" stack traces, etc). -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: reducing stack size by breaking SPARC ABI for OS-less environments
On Mon, Dec 15, 2008 at 2:20 PM, David Meggy wrote: > Hi, I'm working on a very embedded project where we have no operating > system, and there is no window overflow trap handler. I'm really > stretched for memory and I'd like to reduce the stack size. I haven't > not being able to find anyone else who is looking at reducing the stack > usage by google searching, but it seems like there is room to be saved, > and I need the extra space. > > As I have no window overflow trap handler the space reserved on the > stack for saving in and local registers is just wasted memory. Is there > any way I can reclaim this space by forcing the compiler to not honour > the standard SPARC ABI? > > Would it be possible to take this one step further and drop the stack > space for arguments? We don't have functions with large numbers of > arguments, so all arguments should be passed in registers. > > Although only a single word, the One-word hidden parameter, seems like a > waste if none of our code ever uses it. > > David With gcc 3.x, you can use -mflat which would do what you want. gcc 4.x completely dropped the support for -mflat, so unless you (or someone) work on reintroducing it, IHMO, you're out of luck with gcc 4.x. Seongbae
Re: fbranch-probabilities bug
This is the intended behavior, though now I see that the documentation isn't very clear. You need to use -fprofile-use - the typical usage scenario is to compile with -fprofile-generate to build an executable to do profile collection, and then compile with -fprofile-use to build optimized code using the profile data. Seongbae On Thu, Jan 8, 2009 at 6:30 AM, Hariharan Sandanagobalane wrote: > Hi Seongbae, > I was doing some work on profiling for picochip, when i noticed what looks > to me like a bug. It looks to me that using fbranch-probabilities on the > commandline (after a round of profile-generate or profile-arcs) would just > not work on any target. Reason.. > > Coverage.c:1011 > > if (flag_profile_use) >read_counts_file (); > > Should this not be > > if (flag_profile_use || flag_branch_probabilities) // Maybe more flags >read_counts_file (); > > ?? > > Of course, i hit the problem later on since the counts were not read, it > just assumed that the .gcda file were not available, when it actually was. > > Thanks > Hari >
Re: fbranch-probabilities bug
On Thu, Jan 8, 2009 at 10:11 AM, Hariharan wrote: > Hi Seongbae, > Does that mean that someone cant use the profile just to annotate branches > (and get better code by that), without having to get the additional baggage > of "unroll-loops", "peel-loops" etc? You can do that by selectively turning optimizations off (e.g. -fprofile-use -fno-unroll-loops -fno-peel-loops ). > In my case, i am interested in not bloating the code size, but get any > performance that is to be had from profiling. Is that possible? > > Note: My profile generate phase was also just -fprofile-arcs since i am not > interested in other kinds of profile. Have you measured the impact on the performance and the code size from using full -fprofile-generate/-fprofile-use ? If yes, and you have seen any performance degradation or unnecessary code bloat from other optimization, please file a bug. If not, then I'd say you probably want to try measuring it - in particular, value profiling has been becoming more and more useful. And in my experience, majority of the code size increase as well as the performance benefit with -fprofile-use comes from extra inlining (which -fprofile-arcs then -fbranch-probabilities also enable). Seongbae
Re: Excess registers pushed - regs_ever_live not right way?
On Wed, Feb 27, 2008 at 5:16 PM, Andrew Hutchinson <[EMAIL PROTECTED]> wrote: > Register saves by prolog (pushes) are typically made with reference to > "df_regs_ever_live_p()" or "regs_ever_live. "|| > > If my understanding is correct, these calls reflect register USEs and > not register DEFs. So if register is used in a function, but not > otherwise changed, it will get pushed unnecessarily on stack by prolog. This implies that the register is either a global register or a parameter register, in either case it won't be saved/restored as callee save. What kind of a register is it and how com there's only use of it in a function but it's not a global ? Seongbae
Re: Excess registers pushed - regs_ever_live not right way?
You can use DF_REG_DEF_COUNT() - if this is indeed a parameter register, there should be only one def (artificial def) or no def at all. Or if you want to see all defs for the reg, follow DF_REG_DEF_CHAIN(). Seongbae On Wed, Feb 27, 2008 at 6:03 PM, Andrew Hutchinson <[EMAIL PROTECTED]> wrote: > Register contains parameter that is passed to function. This register > is not part of call used set. > > If this type of register were modified by function, then it would be > saved by function. > > If this register is not modified by function, it should not be saved. > This is true even if function is not a leaf function (as same register > would be preserved by deeper calls) > > > Andy > > > > > > Seongbae Park (박성배, 朴成培) wrote: > > On Wed, Feb 27, 2008 at 5:16 PM, Andrew Hutchinson > > <[EMAIL PROTECTED]> wrote: > > > >> Register saves by prolog (pushes) are typically made with reference to > >> "df_regs_ever_live_p()" or "regs_ever_live. "|| > >> > >> If my understanding is correct, these calls reflect register USEs and > >> not register DEFs. So if register is used in a function, but not > >> otherwise changed, it will get pushed unnecessarily on stack by prolog. > >> > > > > This implies that the register is either a global register > > or a parameter register, in either case it won't be saved/restored > > as callee save. > > What kind of a register is it and how com there's only use of it in a > function > > but it's not a global ? > > > > Seongbae > > > > > -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Excess registers pushed - regs_ever_live not right way?
2008/3/1 Andrew Hutchinson <[EMAIL PROTECTED]>: > I'm am still struggling with a good solution that avoids unneeded saves > of parameter registers. > > To solve problem all I need to know are the registers actually used for > parameters. Since the caller assumes all of these are clobbered by > callee - they should never need to be saved. I'm totally confused what is the problem here. I thought you were seeing extra callee-save register save/restore in prologue, but now it sounds like you're seeing extra caller-save register save/restore. Which one are you trying to solve, and what kind of target is this ? > DF_REG_DEF_COUNT is showing 1 artificial def for all POTENTIAL parameter > registers - not just the ones that are really used (since it uses target > FUNCTION_ARG_REGNO_P to get parameter registers) You said you wanted to know if there's a def of a register within a function. For an incoming parameter, there will be one artificial def, and if there's no other def, it means there's no real def of the register within the function. > So the DF artificial defs are useless in trying to find real parameter > registers. I don't understand what you mean by this. What do you mean by "real parameter register" ? > That seem to require going over all DF chains to work out which > registers are externally defined. DF does not solve problem for me. What do you mean by "externally defined" ? DF may not solve the problem for you, but now I'm completely lost on what your problem is. > There has got to be an easier way of finding parameter registers used by > function. If you want to find all the uses (use as in reading a register but not writing to it), you should look at USE chain, not DEF chain, naturally. Seongbae
Re: Test Harness and SPARC VIS Instructions
On Thu, Mar 13, 2008 at 11:31 AM, Joel Sherrill <[EMAIL PROTECTED]> wrote: > Hi, > > Moving on the SPARC, I see a lot of similar > unsupported instruction failures. I only > see a single sparc feature test. It is for > "-multrasparc -mvis" and it is actually > passing on the sparc instruction simulator in > gdb. It doesn't make me feel good that this > part passes since I thought SIS was a vanilla > V7 simulator. I think this test isn't tight enough: > > proc check_effective_target_ultrasparc_hw { } { > return [check_runtime ultrasparc_hw { > int main() { return 0; } > } "-mcpu=ultrasparc"] > } > > For sure though, SIS does NOT support VIS and > there is no test for that. This leads to this: > > Starting program: > /home/joel/work-gnat/svn/b-gcc1-sparc/gcc/testsuite/gcc/pr18400.exe > Unexpected trap (0x 2) at address 0x02001318 > illegal instruction > > Program exited normally. > (gdb) disassemble 0x02001318 > Dump of assembler code for function check_vect: > ... > 0x02001318 : impdep1 62, %g0, %g0, %g0 > ... > > Can someone familiar with VIS provide an instruction > that is OK to do a run-time test with to check if > it is supported? I don't think there's any user level instruction/register to do that. You'll have to catch the illegal instruction trap :( Seongbae
Re: Test Harness and SPARC VIS Instructions
2008/3/13 Joel Sherrill <[EMAIL PROTECTED]>: > > Seongbae Park (박성배, 朴成培) wrote: > > On Thu, Mar 13, 2008 at 11:31 AM, Joel Sherrill > > <[EMAIL PROTECTED]> wrote: > > > >> Hi, > >> > >> Moving on the SPARC, I see a lot of similar > >> unsupported instruction failures. I only > >> see a single sparc feature test. It is for > >> "-multrasparc -mvis" and it is actually > >> passing on the sparc instruction simulator in > >> gdb. It doesn't make me feel good that this > >> part passes since I thought SIS was a vanilla > >> V7 simulator. I think this test isn't tight enough: > >> > >> proc check_effective_target_ultrasparc_hw { } { > >> return [check_runtime ultrasparc_hw { > >> int main() { return 0; } > >> } "-mcpu=ultrasparc"] > >> } > >> > >> For sure though, SIS does NOT support VIS and > >> there is no test for that. This leads to this: > >> > >> Starting program: > >> /home/joel/work-gnat/svn/b-gcc1-sparc/gcc/testsuite/gcc/pr18400.exe > >> Unexpected trap (0x 2) at address 0x02001318 > >> illegal instruction > >> > >> Program exited normally. > >> (gdb) disassemble 0x02001318 > >> Dump of assembler code for function check_vect: > >> ... > >> 0x02001318 : impdep1 62, %g0, %g0, %g0 > >> ... > >> > >> Can someone familiar with VIS provide an instruction > >> that is OK to do a run-time test with to check if > >> it is supported? > >> > > > > I don't think there's any user level instruction/register to do that. > > You'll have to catch the illegal instruction trap :( > > > > > I have learned a lot the past few days so let me see if I can explain > what I have learned. :-D > > Depending upon the test and target architecture, there are various > mechanisms to prevent the execution of code which is clearly not > supported on a particular target board/cpu (as opposed to the compiler > target). > > + many architectures check that a multilib flag is supported > + Some do a run-time check. x86 uses cpuid and feature check > to avoid things at run-time > + Some run a test and let it die on the target board. This is used > by Altivec and ARM Neon. > > The last alternative is what will have to be done here. I just > need the single easiest VIS instruction to force the failure. I see. What I meant was the second bullet item above doesn't exist for SPARC. But you can use, e.g. "fzero" instruction (which is VIS 1.0) for the third bullet, assuming the target board will correctly trigger illtrap. Seongbae
Re: Test Harness and SPARC VIS Instructions
On Thu, Mar 13, 2008 at 12:32 PM, Joel Sherrill <[EMAIL PROTECTED]> wrote: > > Uros Bizjak wrote: > > Hello! > > > > > >> Can someone familiar with VIS provide an instruction > >> that is OK to do a run-time test with to check if > >> it is supported? > >> > > > > Perhaps this fragment from testsuite/gcc.dg/vect/tree-vect.h may help: > > > > #elif defined(__sparc__) > > asm volatile (".word\t0x81b007c0"); > > > Thanks. That helped a lot. Now I only see these cases on vect.exp > > == > > This one looks like another test slipping another unsupported > instruction by. > > 0x020012b8 : bne,pn %icc, 0x200138c > > Is this UltraSPARC and not V7? Do we need two bad instructions > in the test case? Branc with prediction is v9. Even v8 doesn't have it. No v8 processor has VIS, so actually this test is sufficient. If you really want v7 check (not v8 check), then you should use something else like, umul, which is only available on v8 and up. Seongbae
Re: Miscompilations for cris-elf on dataflow-branch
Thanks for the detailed instruction on how to reproduce it - I have successfully reproduced the problem, and narrowed it down to combine that's deleting the insn in question. Hopefully I'll be able to figure out what's wrong soon. Seongbae On 6/10/07, Hans-Peter Nilsson <[EMAIL PROTECTED]> wrote: I hear dataflow-branch is near merge to trunk, so I thought it'd be about time to verify that it works for the targets I maintain... Comparing dataflow-branch with trunk, both r125590, I see these regressions (alas no improvements) on the branch for cris-elf cross from x86_64-unknown-linux-gnu (Debian etch, I think): gcc.sum gcc.c-torture/execute/20020201-1.c gcc.sum gcc.c-torture/execute/20041011-1.c gcc.sum gcc.c-torture/execute/920501-8.c gcc.sum gcc.c-torture/execute/920726-1.c gcc.sum gcc.c-torture/execute/ashldi-1.c gcc.sum gcc.c-torture/execute/ashrdi-1.c gcc.sum gcc.c-torture/execute/builtin-bitops-1.c gcc.sum gcc.c-torture/execute/builtins/pr23484-chk.c gcc.sum gcc.c-torture/execute/builtins/snprintf-chk.c gcc.sum gcc.c-torture/execute/builtins/sprintf-chk.c Though repeatable by anyone (see for example <http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01571.html>), all are unfortunately execution failures, so I thought best to do some preliminary analysis. Looking at the source code for what the tests have in common, it seems all either use sprintf "%d" or a DImode shift operation (requiring a library call). My money is on all being the same one bug. Here's a cut-down test-case, derived from gcc.c-torture/execute/builtin-bitops-1.c: -- static int my_popcountll (unsigned long long x) { int i; int count = 0; for (i = 0; i < 8 * sizeof (unsigned long long); i++) if (x & ((unsigned long long) 1 << i)) count++; return count; }; extern void abort (void); extern void exit (int); int main (void) { int i; if (64 != my_popcountll (0xULL)) abort ();; exit (0); } -- Here's the assembly diff to trunk, revisions as per above, option -Os as mentioned below: --- lshr1.s.trunk 2007-06-11 03:49:21.0 +0200 +++ lshr1.s.df 2007-06-11 03:49:59.0 +0200 @@ -15,7 +15,6 @@ _main: move.d ___lshrdi3,$r2 moveq -1,$r10 .L7: - move.d $r10,$r11 move.d $r0,$r12 Jsr $r2 btstq (1-1),$r10 To repeat this without building a complete toolchain, have a gcc svn checkout with those darned mpfr and gmp available somewhere (added in that checkout or installed on the host system), then do, in an empty directory: /path/to/gcctop/configure --target=cris-elf --enable-languages=c && make all-gcc This will give you a cc1, which you know how to handle. :) To repeat with the program above named lshr1.c, just use: ./cc1 -Os lshr1.c The lost insn, numbered 61 in both trees, loads the high part of that all-bits operand to the register in which that part of the parameter is passed to the DImode left-shift library function __lshrdi3. From the dump-file it seems the first pass it is lost, is combine. Let me know if I can be of help. brgds, H-P -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Miscompilations for cris-elf on dataflow-branch
This little patch: diff -r 9e2b1e62931a gcc/combine.c --- a/gcc/combine.c Wed Jun 06 23:08:38 2007 + +++ b/gcc/combine.c Mon Jun 11 05:39:25 2007 + @@ -4237,7 +4237,7 @@ subst (rtx x, rtx from, rtx to, int in_d So force this insn not to match in this (rare) case. */ if (! in_dest && code == REG && REG_P (from) - && REGNO (x) == REGNO (from)) + && reg_overlap_mentioned_p (x, from)) return gen_rtx_CLOBBER (GET_MODE (x), const0_rtx); /* If this is an object, we are done unless it is a MEM or LO_SUM, both should fix the problem (thanks to Ian Lance Talyor and Andrew Pinski for helping me debug the problem on IRC). I've started the bootstrap/regtest on x86-64. I'd appreciate it if you can test this on cris. Although the change is approved by Ian already, I think I'll hold the patch till the dataflow merge happens. Thanks, Seongbae On 6/10/07, Seongbae Park (박성배, 朴成培) <[EMAIL PROTECTED]> wrote: Thanks for the detailed instruction on how to reproduce it - I have successfully reproduced the problem, and narrowed it down to combine that's deleting the insn in question. Hopefully I'll be able to figure out what's wrong soon. Seongbae On 6/10/07, Hans-Peter Nilsson <[EMAIL PROTECTED]> wrote: > I hear dataflow-branch is near merge to trunk, so I thought it'd > be about time to verify that it works for the targets I > maintain... > > Comparing dataflow-branch with trunk, both r125590, I see these > regressions (alas no improvements) on the branch for cris-elf > cross from x86_64-unknown-linux-gnu (Debian etch, I think): > > gcc.sum gcc.c-torture/execute/20020201-1.c > gcc.sum gcc.c-torture/execute/20041011-1.c > gcc.sum gcc.c-torture/execute/920501-8.c > gcc.sum gcc.c-torture/execute/920726-1.c > gcc.sum gcc.c-torture/execute/ashldi-1.c > gcc.sum gcc.c-torture/execute/ashrdi-1.c > gcc.sum gcc.c-torture/execute/builtin-bitops-1.c > gcc.sum gcc.c-torture/execute/builtins/pr23484-chk.c > gcc.sum gcc.c-torture/execute/builtins/snprintf-chk.c > gcc.sum gcc.c-torture/execute/builtins/sprintf-chk.c > > Though repeatable by anyone (see for example > <http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01571.html>), all > are unfortunately execution failures, so I thought best to do > some preliminary analysis. > > Looking at the source code for what the tests have in common, it > seems all either use sprintf "%d" or a DImode shift operation > (requiring a library call). My money is on all being the same > one bug. > > Here's a cut-down test-case, derived from > gcc.c-torture/execute/builtin-bitops-1.c: > > -- > static int > my_popcountll (unsigned long long x) > { > int i; > int count = 0; > for (i = 0; i < 8 * sizeof (unsigned long long); i++) > if (x & ((unsigned long long) 1 << i)) > count++; > return count; > }; > > extern void abort (void); > extern void exit (int); > > int > main (void) > { > int i; > > if (64 != my_popcountll (0xULL)) > abort ();; > > exit (0); > } > -- > > Here's the assembly diff to trunk, revisions as per above, > option -Os as mentioned below: > > --- lshr1.s.trunk 2007-06-11 03:49:21.0 +0200 > +++ lshr1.s.df 2007-06-11 03:49:59.0 +0200 > @@ -15,7 +15,6 @@ _main: > move.d ___lshrdi3,$r2 > moveq -1,$r10 > .L7: > - move.d $r10,$r11 > move.d $r0,$r12 > Jsr $r2 > btstq (1-1),$r10 > > To repeat this without building a complete toolchain, have a gcc > svn checkout with those darned mpfr and gmp available somewhere > (added in that checkout or installed on the host system), then > do, in an empty directory: > /path/to/gcctop/configure --target=cris-elf --enable-languages=c && make all-gcc > This will give you a cc1, which you know how to handle. :) > > To repeat with the program above named lshr1.c, just use: > > ./cc1 -Os lshr1.c > > The lost insn, numbered 61 in both trees, loads the high part of > that all-bits operand to the register in which that part of the > parameter is passed to the DImode left-shift library function > __lshrdi3. From the dump-file it seems the first pass it is > lost, is combine. > > Let me know if I can be of help. > > brgds, H-P > -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com"; -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Some regressions from the dataflow merge
On 6/12/07, Richard Guenther <[EMAIL PROTECTED]> wrote: On Tue, 12 Jun 2007, Richard Guenther wrote: > > On ia64 SPEC2000 I see fma3d and applu now miscompare. On x86_64 186.wupwise ICEs with -O2 -ffast-math and FDO: /gcc/spec/sb-haydn-fdo-64/x86_64/install-200706120559/bin/gfortran -c -o zscal.o-fprofile-use -O2 -ffast-math zscal.f Error from fdo_make_pass2 'specmake -j2 FDO=PASS2 build 2> fdo_make_pass2.err | tee fdo_make_pass2.out': zgemm.f: In function 'zgemm': zgemm.f:413: internal compiler error: in remove_insn, at emit-rtl.c:3597 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html> for instructions. Likewise for 176.gcc: combine.c: In function 'simplify_comparison': combine.c:9697: internal compiler error: in remove_insn, at emit-rtl.c:3597 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html> for instructions. specmake: *** [combine.o] Error 1 Sounds like there's a pass that are emitting a barrier during cfglayout mode. I'll look at them. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Some regressions from the dataflow merge
On 6/14/07, Richard Guenther <[EMAIL PROTECTED]> wrote: On Wed, 13 Jun 2007, Kenneth Zadeck wrote: > Richard Guenther wrote: > > On Tue, 12 Jun 2007, Richard Guenther wrote: > > > > > >> On ia64 SPEC2000 I see fma3d and applu now miscompare. > >> > > > > On x86_64 186.wupwise ICEs with -O2 -ffast-math and FDO: > > > > /gcc/spec/sb-haydn-fdo-64/x86_64/install-200706120559/bin/gfortran -c -o > > zscal.o-fprofile-use -O2 -ffast-math zscal.f > > Error from fdo_make_pass2 'specmake -j2 FDO=PASS2 build 2> > > fdo_make_pass2.err | tee fdo_make_pass2.out': > > zgemm.f: In function 'zgemm': > > zgemm.f:413: internal compiler error: in remove_insn, at emit-rtl.c:3597 > > Please submit a full bug report, > > with preprocessed source if appropriate. > > See http://gcc.gnu.org/bugs.html> for instructions. > > > > Likewise for 176.gcc: > > > > combine.c: In function 'simplify_comparison': > > combine.c:9697: internal compiler error: in remove_insn, at > > emit-rtl.c:3597 > > Please submit a full bug report, > > with preprocessed source if appropriate. > > See http://gcc.gnu.org/bugs.html> for instructions. > > specmake: *** [combine.o] Error 1 > > > > Richard. > > > Richard, > > did these two regression go away? We committed a change to gcse that > should have fixed this bug. These two are gone now. The ia64 miscompares still are there. I'm looking at it. It is either a scheduler problem, or some other problem downstream triggered by the scheduler. However, I'm having hard time adding fine-grained dbg_cnt to our scheduler - do you know who might be interested in adding insn level dbg_cnt in the scheduler ? Current dbg_cnt (sched_insn) causes ICE :( -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Fixing m68hc11 reorg after dataflow merge
On 6/16/07, Rask Ingemann Lambertsen <[EMAIL PROTECTED]> wrote: I need some help making m68hc11_reorg() work after the dataflow merge, in particular this bit: /* Re-create the REG_DEAD notes. These notes are used in the machine description to use the best assembly directives. */ if (optimize) { /* Before recomputing the REG_DEAD notes, remove all of them. This is necessary because the reload_cse_regs() pass can have replaced some (MEM) with a register. In that case, the REG_DEAD that could exist for that register may become wrong. */ for (insn = first; insn; insn = NEXT_INSN (insn)) { if (INSN_P (insn)) { rtx *pnote; pnote = ®_NOTES (insn); while (*pnote != 0) { if (REG_NOTE_KIND (*pnote) == REG_DEAD) *pnote = XEXP (*pnote, 1); else pnote = &XEXP (*pnote, 1); } } } life_analysis (PROP_REG_INFO | PROP_DEATH_NOTES); } -- Rask Ingemann Lambertsen Try: df_note_add_problem (); df_analyze (); -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Fwd: INCOMING_RETURN_ADDR_RTX in ia64.h
Forwarding to gcc@, as this might be interesting to other people, and I'd like to ask whoever working on ia64 to take this issue up. Seongbae -- Forwarded message -- From: Seongbae Park (박성배, 朴成培) <[EMAIL PROTECTED]> Date: Jun 18, 2007 12:44 AM Subject: INCOMING_RETURN_ADDR_RTX in ia64.h To: [EMAIL PROTECTED] Cc: Kenneth Zadeck <[EMAIL PROTECTED]>, "Berlin, Daniel" <[EMAIL PROTECTED]>, Ian Lance Taylor <[EMAIL PROTECTED]>, Paolo Bonzini <[EMAIL PROTECTED]>, Richard Guenther <[EMAIL PROTECTED]> Hi Jim, This is an analysis of a correctness bug on ia64, which root cause has to do with your code. After dataflow merge, Richard Guenther reported two runtime failures in SPEC CPU2000 programs on ia64. It turned out to be related to the following code you wrote (or at least committed, according to svn). From ia64.h: 928 /* A C expression whose value is RTL representing the location of the incoming 929return address at the beginning of any function, before the prologue. This 930RTL is either a `REG', indicating that the return value is saved in `REG', 931or a `MEM' representing a location in the stack. This enables DWARF2 932unwind info for C++ EH. */ 933 #define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (VOIDmode, BR_REG (0)) 934 935 /* ??? This is not defined because of three problems. 9361) dwarf2out.c assumes that DWARF_FRAME_RETURN_COLUMN fits in one byte. 937The default value is FIRST_PSEUDO_REGISTER which doesn't. This can be 938worked around by setting PC_REGNUM to FR_REG (0) which is an otherwise 939unused register number. 9402) dwarf2out_frame_debug core dumps while processing prologue insns. We 941need to refine which insns have RTX_FRAME_RELATED_P set and which don't. 9423) It isn't possible to turn off EH frame info by defining DWARF2_UNIND_INFO 943to zero, despite what the documentation implies, because it is tested in 944a few places with #ifdef instead of #if. */ 945 #undef INCOMING_RETURN_ADDR_RTX Here, because INCOMING_RETURN_ADDR_RTX is ultimately undef'ed, dataflow doesn't see any definition of the return address register, and happily treats b0 as not live throughout the function body. Then, global allocator, guided by this information, allocates b0 for something else - leading to the return address corruption. Removing the undef leads to an ICE in dwarf2out.c (probably due to 2 in your comment ?). Certainly from the new dataflow point of view, we need this macro to be defined, because, otherwise, the use of b0 in the return is considered a use without def. Previously, flow() didn't consider uninitialized registers and just having a use of b0 in the return was sufficient, as it made b0 live across the entire function thanks to flow's backward only live analysis. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: virtual stack regs.
On 6/19/07, Rask Ingemann Lambertsen <[EMAIL PROTECTED]> wrote: .. Hmm, how do you handle arg_pointer_rtx, frame_pointer_rtx and the like? The are all uninitialized until the prologue is emitted, which is some time after reload. ARG_POINTER_REGNUM is included in the artificial defs of all blocks (which I think is overly conservative - just having them in the entry block def should be enough). Hence, from dataflow point of view, they are always considered initialized. I think we should probably do something similar for VIRTUAL_STACK_*_REGNUM. > 5) How can I tell if a reg is a virtual_stack_reg? FIRST_VIRTUAL_REGISTER <= regno <= LAST_VIRTUAL_REGISTER -- Rask Ingemann Lambertsen -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: class 3 edges
On 6/19/07, Sunzir Deepur <[EMAIL PROTECTED]> wrote: hello, when I compile with -dv -fdump-rtl-* I somtimes see in the VCG files some edges that have no meaning in the flow of the program. these edges are always green and class 3. what are those edges ? what is their purposes ? thank you sunzir See gcc/graph.c:print_rtl_graph_with_bb(). -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Help on testsuite failures, only with optimization & bootstrap
On 6/26/07, Steve Ellcey <[EMAIL PROTECTED]> wrote: After the dataflow merge (and after doing a couple of other patches that were needed just to boostrap GCC on IA64 HP-UX), I am still getting some failures in the GCC testsuite and am hoping for some advise / help on figuring out what is going on. A bunch of tests like the following one: void __attribute__ ((__noinline__)) foo (int xf) {} int main() { foo (1); return 0; } Are failing because the generate the following warning: $ obj_gcc/gcc/cc1 -quiet -O2 x.c x.c:2: warning: inline function 'foo' given attribute noinline Now, the problem here is, of course, that foo was no declared to be inline and the warning should not be happening. If I recompile the GCC file c-decl.c with -O2 -fno-tree-dominator-opts (instead of just -O2) then the resulting GCC will not generate the warning message. But so far I haven't been able to look at the tree dump files and see where things are going wrong. It doesn't help that the dominator pass gets run 3 times and I am not sure which one is causing the problem. Unfortunately, I am only seeing this on IA64 HP-UX. It does not happen on IA64 Linux. Does anyone have any advice / ideas / recommendations on how to debug this problem? Steve Ellcey [EMAIL PROTECTED] If you want to find out exactl which invocation of dominator pass is causing the problem, I recommend adding a new dbg_cnt, something like (untested): diff -r d856dc0baad4 gcc/dbgcnt.def --- a/gcc/dbgcnt.defWed Jun 27 01:21:13 2007 + +++ b/gcc/dbgcnt.defTue Jun 26 21:17:55 2007 -0700 @@ -84,3 +84,5 @@ DEBUG_COUNTER (tail_call) DEBUG_COUNTER (tail_call) DEBUG_COUNTER (global_alloc_at_func) DEBUG_COUNTER (global_alloc_at_reg) +DEBUG_COUNTER (uncprop_at_func) +DEBUG_COUNTER (dominator_at_func) diff -r d856dc0baad4 gcc/tree-ssa-dom.c --- a/gcc/tree-ssa-dom.cWed Jun 27 01:21:13 2007 + +++ b/gcc/tree-ssa-dom.cTue Jun 26 21:18:26 2007 -0700 @@ -44,6 +44,7 @@ Boston, MA 02110-1301, USA. */ #include "tree-ssa-propagate.h" #include "langhooks.h" #include "params.h" +#include "dbgcnt.h" /* This file implements optimizations on the dominator tree. */ @@ -365,7 +366,7 @@ static bool static bool gate_dominator (void) { - return flag_tree_dom != 0; + return flag_tree_dom != 0 && dbg_cnt (dominator_at_func); } struct tree_opt_pass pass_dominator = diff -r d856dc0baad4 gcc/tree-ssa-uncprop.c --- a/gcc/tree-ssa-uncprop.cWed Jun 27 01:21:13 2007 + +++ b/gcc/tree-ssa-uncprop.cTue Jun 26 21:18:35 2007 -0700 @@ -40,6 +40,7 @@ Boston, MA 02110-1301, USA. */ #include "tree-pass.h" #include "tree-ssa-propagate.h" #include "langhooks.h" +#include "dbgcnt.h" /* The basic structure describing an equivalency created by traversing an edge. Traversing the edge effectively means that we can assume @@ -604,7 +605,7 @@ static bool static bool gate_uncprop (void) { - return flag_tree_dom != 0; + return flag_tree_dom != 0 && dbg_cnt (uncprop_at_func); } struct tree_opt_pass pass_uncprop = This will at least allow you to fairly quickly find which invocation of the pass is causing the problem, by doing a binary search on "n" by adding the following extra flag to the compilation line: -fdbg-cnt=uncprop_at_func:n or -fdbg-cnt=dominator_at_func:n Of course, once you narrowed it down to that level, you'll most likely still need to narrow it down further but you'll have a better chance (you may want to add another more fine grained dbgcnt for that). -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: -dv -fdump-rtl-all question
On 7/18/07, Sunzir Deepur <[EMAIL PROTECTED]> wrote: Hi list, Is it ok to assume that when I compile a C file (that is guranteed to have some code in it) under the following flags, I always get the mentioned VCG file (and do not get a bigger one) ? Flags Maximum VCG file that is always created === " -dv -fdump-rtl-all" .49.stack.vcg "-O1 -dv -fdump-rtl-all" .49.stack.vcg "-O2 -dv -fdump-rtl-all" .50.compgotos.vcg "-O3 -dv -fdump-rtl-all" .50.compgotos.vcg So basically I want to assume the maximum vcg file that is created is a function of the optimizations and not a function of the source file.. Why? There's no guarantee, although usually on any particular version of gcc on a given platform with given options, probably it is. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: RFC: Rename Non-Autpoiesis maintainers category
On 7/30/07, Diego Novillo <[EMAIL PROTECTED]> wrote: > On 7/27/07 9:58 AM, Zdenek Dvorak wrote: > > Hello, > > > >> I liked the idea of 'Reviewers' more than any of the other options. > >> I would like to go with this patch, unless we find a much better > >> option? > > > > to cancel this category of maintainers completely? > > An interesting idea, but let's discuss that issue separately. In this > thread I'm only interested in changing the name of this category. Not > discuss whether the category should exist at all. > > Since I have not heard any strong opposition to changing the category > name to 'Reviewers', I will go ahead with this patch later this week. > > > Index: MAINTAINERS > === > --- MAINTAINERS (revision 126951) > +++ MAINTAINERS (working copy) > @@ -231,7 +231,7 @@ > maintainers need approval to check in algorithmic changes or changes > outside of the parts of the compiler they maintain. > > - Non-Autopoiesis Maintainers > + Reviewers > > dataflow Daniel Berlin [EMAIL PROTECTED] > dataflow Paolo Bonzini [EMAIL PROTECTED] > @@ -251,10 +251,9 @@ > FortranPaul Thomas [EMAIL PROTECTED] > > > -Note that individuals who maintain parts of the compiler as > -non-autopoiesis maintainers need approval changes outside of the parts > -of the compiler they maintain and also need approval for their own > -patches. > +Note that individuals who maintain parts of the compiler as reviewers > +need approval for changes outside of the parts of the compiler they > +maintain and also need approval for their own patches. Now that the name has been changed to reviewer, I think the following wording is slightly better: While reviewers can approve the changes in the parts of the compiler they maintain, they still need approval of their own patches from other maintainers or reviewers. > Write After Approval(last name alphabetical > order) -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: i seem to have hit a problem with my new conflict finder.
On 8/17/07, Kenneth Zadeck <[EMAIL PROTECTED]> wrote: ... > we should talk. I am avail today. i am leaving on vacation tomorrow > for a week. Please send me the patch before you leave (and please leave valinor turned on) - I'll give a look while you're gone. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: question about rtl loop-iv analysis
On 8/28/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote: ... > that obviously is not the case here, though. Do you (or someone else > responsible for df) have time to have a look at this problem? > Otherwise, we may discuss it forever, but we will not get anywhere. > > Zdenek Open a PR and assign it to me, if you're not in a hurry - I should be able to look at it next week. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Profile information - CFG
On 9/27/07, Hariharan Sandanagobalane <[EMAIL PROTECTED]> wrote: > Hello, > I am implementing support for PBO on picochip port of GCC (not yet > submitted to mainline). > > I see that GCC generates 2 files, xx.gcno and xx.gcda, containing the > profile information, the former containing the flow graph > information(compile-time) and later containing the edge profile > information(run-time). The CFG information seems to be getting emitted > quite early in the compilation process(pass_tree_profile). Is the > instrumentation also done at this time? If it is, as later phases change Yes. > CFG, how is the instrumentation code sanity maintained? If it isnt, How Instrumentation code sanity is naturally maintained since those are global load/stores. The compiler transformations naturally preserve the original semantic of the input and since profile counters are global variables, update to those are preserved to provide what unoptimized code would do. > would you correlate the CFG in gcno file to the actual CFG at > execution(that produces the gcda file)? > As for our port's case, we are already able to generate profile > information using our simulator/hardware, and it is not-too-difficult > for me to format that information into .gcno and .gcda files. But, i > guess the CFG that i would have at runtime would be quite different from > the CFG at initial phases of compilation (even at same optimization > level). Any suggestions on this? Would i be better off keeping the gcno > file that GCC generates, try to match the runtime-CFG to the one on the > gcno file and then write gcda file accordingly? Not only better off, you *need* to provide information that matches what's in gcno, otherwise gcc can't read that gcda nor use it. How you match gcno is a different problem - there's no guarantee that you'll be able to recover enough information from the output assembly of gcc, because without instrumentation, gcc can optimize away the control flow. pass_tree_profile is when both the instrumentation (with -fprofile-generate) and reading of the profile data (with -fprofile-use) are done. The CFG has to remain the same between generate and use - otherwise the compiler isn't able to use the profile data. Seongbae
Re: Profile information - CFG
On 10/5/07, Hariharan Sandanagobalane <[EMAIL PROTECTED]> wrote: > > > Seongbae Park (???, ???) wrote: > > On 9/27/07, Hariharan Sandanagobalane <[EMAIL PROTECTED]> wrote: > >> Hello, > >> I am implementing support for PBO on picochip port of GCC (not yet > >> submitted to mainline). > >> > >> I see that GCC generates 2 files, xx.gcno and xx.gcda, containing the > >> profile information, the former containing the flow graph > >> information(compile-time) and later containing the edge profile > >> information(run-time). The CFG information seems to be getting emitted > >> quite early in the compilation process(pass_tree_profile). Is the > >> instrumentation also done at this time? If it is, as later phases change > > > > Yes. > > > >> CFG, how is the instrumentation code sanity maintained? If it isnt, How > > > > Instrumentation code sanity is naturally maintained > > since those are global load/stores. The compiler transformations naturally > > preserve the original semantic of the input > > and since profile counters are global variables, > > update to those are preserved to provide what unoptimized code would do. > > > >> would you correlate the CFG in gcno file to the actual CFG at > >> execution(that produces the gcda file)? > > > >> As for our port's case, we are already able to generate profile > >> information using our simulator/hardware, and it is not-too-difficult > >> for me to format that information into .gcno and .gcda files. But, i > >> guess the CFG that i would have at runtime would be quite different from > >> the CFG at initial phases of compilation (even at same optimization > >> level). Any suggestions on this? Would i be better off keeping the gcno > >> file that GCC generates, try to match the runtime-CFG to the one on the > >> gcno file and then write gcda file accordingly? > > > > Not only better off, you *need* to provide information that matches > > what's in gcno, otherwise gcc can't read that gcda nor use it. > > How you match gcno is a different problem > > - there's no guarantee that you'll be able to recover > > enough information from the output assembly of gcc, > > because without instrumentation, gcc can optimize away the control flow. > > > > pass_tree_profile is when both the instrumentation (with -fprofile-generate) > > and reading of the profile data (with -fprofile-use) are done. > > The CFG has to remain the same between generate and use > > - otherwise the compiler isn't able to use the profile data. > > Thanks for your help, seongbae. > > I have managed to get the profile information formatted in the way .gcda > would look. But, does GCC expect the profile to be accurate? Would it > accept profile data that came out of sampling? > > -Hari Gcc expects the profile to be "flow consistent". i.e. at any basic block, the sum of execution count of all incoming edges have to be equal to that of outgoing edges. I have a patch adding a new option to tolerate inconsistency but it will have to wait for stage1 opening in 4.4. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: df_insn_refs_record's handling of global_regs[]
On 10/16/07, David Miller <[EMAIL PROTECTED]> wrote: > From: David Miller <[EMAIL PROTECTED]> > Date: Tue, 16 Oct 2007 03:12:23 -0700 (PDT) > > > I have a bug I'm trying to investigate where, starting in gcc-4.2.x, > > the loop invariant pass considers a computation involving a global > > register variable as invariant across a call. The basic structure > > of the code is: > > Here is the most simplified test case I could come up with, > compile it with "-m64 -Os" on sparc. expression(regval) is > moved to before the loop by loop-invariant > > register unsigned long regval asm("g5"); > > extern void cond_resched(void); > > unsigned int var; > > void *expression(unsigned long regval) > { > void *ret; > > __asm__("" : "=r" (ret) : "0" (&var)); > return ret + regval; > } > > void func(void **pp) > { > int i; > > for (i = 0; i < 56; i++) { > cond_resched(); > *pp = expression(regval); > } > } loop-invariant.cc uses ud-chain. So if there's something wrong with the chain, it could go nuts. Can you send me the rtl dump of loop2_invariant pass ? -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: df_insn_refs_record's handling of global_regs[]
On 10/16/07, David Miller <[EMAIL PROTECTED]> wrote: > From: "Seongbae Park (박성배, 朴成培)" <[EMAIL PROTECTED]> > Date: Tue, 16 Oct 2007 21:53:37 -0700 > > Annyoung haseyo, Park-sanseng-nim, :) > > loop-invariant.cc uses ud-chain. > > So if there's something wrong with the chain, > > it could go nuts. > > Can you send me the rtl dump of loop2_invariant pass ? > > I have found the problem, and yes it has to do with the ud chains. > > Because global registers are only marked via df_invalidated_by_call, > they get the DF_REF_MAY_CLOBBER flag. > > This flag causes the dataflow problem solver to not add the global > register definitions to the generator set. Specifically I am > talking about the code in df_rd_bb_local_compute_process_def(), it > says: > > if (!(DF_REF_FLAGS (def) > & (DF_REF_MUST_CLOBBER | DF_REF_MAY_CLOBBER))) > bitmap_set_bit (bb_info->gen, DF_REF_ID (def)); > > Global registers don't get clobbered by calls, they are potentially > set as a side effect of calling them. And they are set to valid > values we might actually depend upon as inputs later. > > I tried a potential fix, which is to change df_insn_refs_record(), > such that it handles global registers instead like this: > > if (global_regs[i]) > df_ref_record (dflow, regno_reg_rtx[i], ®no_reg_rtx[i], > bb, insn, DF_REF_REG_DEF, 0, true); > > and this made the illegal loop-invariant transformation no longer > occur in my test case. Did you replace the DF_REF_REG_USE with DEF ? If so, that's not correct. We need to add DEF as well as USE: diff -r fd0f94fbe89d gcc/df-scan.c --- a/gcc/df-scan.c Wed Oct 10 03:32:43 2007 + +++ b/gcc/df-scan.c Tue Oct 16 22:52:44 2007 -0700 @@ -3109,8 +3109,13 @@ df_get_call_refs (struct df_collection_r so they are recorded as used. */ for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) if (global_regs[i]) - df_ref_record (collection_rec, regno_reg_rtx[i], -NULL, bb, insn, DF_REF_REG_USE, flags); + { +df_ref_record (collection_rec, regno_reg_rtx[i], + NULL, bb, insn, DF_REF_REG_USE, flags); +df_ref_record (collection_rec, regno_reg_rtx[i], + NULL, bb, insn, DF_REF_REG_DEF, flags); + } + is_sibling_call = SIBLING_CALL_P (insn); EXECUTE_IF_SET_IN_BITMAP (df_invalidated_by_call, 0, ui, bi) Then, we'll need to change the df_invalidated_by_call loop not to add global_regs[] again (with MAY_CLOBBER bits). -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: Plans for Linux ELF "i686+" ABI ? Like SPARC V8+ ?
On 10/14/07, Darryl L. Miles <[EMAIL PROTECTED]> wrote: > > Hello, > > On SPARC there is an ABI that is V8+ which allows the linking (and > mixing) of V8 ABI but makes uses of features of 64bit UltraSparc CPUs > (that were not available in the older 32bit only CPUs). Admittedly > looking at the way this works it could be said that Sun had a certain > about of forward thinking when they developed their 32bit ABI (this is > not true of the 32bit Intel IA32 ABIs that exist). Sun didn't have much forward thinking with their 32-bit ABI. It's just that their 32-bit ISA was relatively amenable to 64-bit extension with 32-bit ABI compatibility, which can not be said for IA32. > Are there any plans for a plan a new Intel IA32 ABI that is designed > solely to run on 64bit capable Intel IA32 CPUs (that support EMT64) ? > Userspace would have 32bit memory addressing, but access to more > registers, better function call conventions, etc... > > This would be aimed to replace the existing i386/i686 ABIs on the > desktop and would not be looking to maintain backward compatibility > directly. > > My own anecdotal evidence is that I've been using a x86_64 distribution > (with dual 64bit and 32bit runtime support) for a few years now and have > found performance to be lacking in my two largest footprint applications > (my browser and my development IDE totaling 5Gb of footprint between > them). I recently converted both these applications from their 64bit > versions to 32bit (they are the only 32bit applications running) and the > overall interactive performance has improved considerably possibly due > to the reduced memory footprint alone, a 4.5 Gb footprint 64bit > application is equivalent to a 2 Gb footprint 32bit application in these > application domains. > > Maybe someone knows of a white paper published to find out if the > implications and benefit a movement in this direction would mean. Maybe > just using the existing 64bit ABI with 32bit void pointers (and long's) > is as good a specification as any. > > RFCs, > > Darryl More appropriate comparison is probably against MIPS, with their two 32-bit ABIs (O32 and N32 ABI). Essentially you're asking for N32 equivalent. My bet is that most people simply don't care enough about the performance differential. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: df_insn_refs_record's handling of global_regs[]
On 10/19/07, David Miller <[EMAIL PROTECTED]> wrote: > From: "Seongbae Park (박성배, 朴成培)" <[EMAIL PROTECTED]> > Date: Fri, 19 Oct 2007 17:25:14 -0700 > > > If you're not in a hurry, can you wait > > till I run the regtest against 4.2 on x86-64 ? > > I've already discussed the patch with Kenny > > and we agreed that this is the right approach, > > but I'd like to see the clean regtest on x86 for both 4.2 and 4.3 > > before I approve. > > Thanks, > > I am in no rush, please let me know if you want some help > tracking down the failure you are seeing. > > Since you say it is a libgomp failure... I wonder if some of > the atomic primitives need some side effect markings which > are missing and thus exposed by not clobbering global regs > at call sites any more. It looks like it's just a flaky test - it randomly fails on my test machine with or without the patch (for interested, it's omp_parse3.f90 with -O0). I haven't started 4.2 testing yet - I'll let you know when I get that done. -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Re: df_insn_refs_record's handling of global_regs[]
On 10/19/07, David Miller <[EMAIL PROTECTED]> wrote: > From: "Seongbae Park (박성배, 朴成培)" <[EMAIL PROTECTED]> > Date: Tue, 16 Oct 2007 22:56:49 -0700 > > > Did you replace the DF_REF_REG_USE with DEF ? > > If so, that's not correct. We need to add DEF as well as USE: > ... > > Then, we'll need to change the df_invalidated_by_call loop > > not to add global_regs[] again (with MAY_CLOBBER bits). > > Seongbae-ssi, I've done full regstraps of mainline with the > following patch on sparc-linux-gnu and sparc64-linux-gnu. I've been testing this on x86-64 on top of 4.3, and I see one regression in libgomp which I'm trying to find out whether it's this patch or some other external cause. > Do you mind if I check in this fix? I would also like to > pursue getting this installed on the gcc-4.2 branch as well, > as I stated I've already done several regstraps of the 4.2 > backport of this fix as well. If you're not in a hurry, can you wait till I run the regtest against 4.2 on x86-64 ? I've already discussed the patch with Kenny and we agreed that this is the right approach, but I'd like to see the clean regtest on x86 for both 4.2 and 4.3 before I approve. Thanks, Seongbae > Thank you. > > 2007-10-18 Seongbae Park <[EMAIL PROTECTED]> > David S. Miller <[EMAIL PROTECTED]> > > * df-scan.c (df_get_call_refs): Mark global registers as both a > DF_REF_REG_USE and a non-clobber DF_REF_REG_DEF. > > --- df-scan.c.ORIG 2007-10-18 16:56:19.0 -0700 > +++ df-scan.c 2007-10-18 16:56:50.0 -0700 > @@ -3109,18 +3109,22 @@ > so they are recorded as used. */ >for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) > if (global_regs[i]) > - df_ref_record (collection_rec, regno_reg_rtx[i], > -NULL, bb, insn, DF_REF_REG_USE, flags); > + { > + df_ref_record (collection_rec, regno_reg_rtx[i], > + NULL, bb, insn, DF_REF_REG_USE, flags); > + df_ref_record (collection_rec, regno_reg_rtx[i], > + NULL, bb, insn, DF_REF_REG_DEF, flags); > + } > >is_sibling_call = SIBLING_CALL_P (insn); >EXECUTE_IF_SET_IN_BITMAP (df_invalidated_by_call, 0, ui, bi) > { > - if ((!bitmap_bit_p (defs_generated, ui)) > + if (!global_regs[ui] > + && (!bitmap_bit_p (defs_generated, ui)) > && (!is_sibling_call > || !bitmap_bit_p (df->exit_block_uses, ui) > || refers_to_regno_p (ui, ui+1, > current_function_return_rtx, NULL))) > - > df_ref_record (collection_rec, regno_reg_rtx[ui], >NULL, bb, insn, DF_REF_REG_DEF, DF_REF_MAY_CLOBBER | > flags); > }
Re: df_insn_refs_record's handling of global_regs[]
Hi Dave, On x86-64, no regression in 4.2 with the patch. So both 4.2 and mainline patches are OK. I'd appreciate it if you can add the testcase - it's up to you whether to add it in a separate patch or with this patch. Thanks for fixing it. Seongbae On 10/19/07, Seongbae Park (박성배, 朴成培) <[EMAIL PROTECTED]> wrote: > On 10/19/07, David Miller <[EMAIL PROTECTED]> wrote: > > From: "Seongbae Park (박성배, 朴成培)" <[EMAIL PROTECTED]> > > Date: Fri, 19 Oct 2007 17:25:14 -0700 > > > > > If you're not in a hurry, can you wait > > > till I run the regtest against 4.2 on x86-64 ? > > > I've already discussed the patch with Kenny > > > and we agreed that this is the right approach, > > > but I'd like to see the clean regtest on x86 for both 4.2 and 4.3 > > > before I approve. > > > Thanks, > > > > I am in no rush, please let me know if you want some help > > tracking down the failure you are seeing. > > > > Since you say it is a libgomp failure... I wonder if some of > > the atomic primitives need some side effect markings which > > are missing and thus exposed by not clobbering global regs > > at call sites any more. > > It looks like it's just a flaky test - it randomly fails on my test machine > with or without the patch (for interested, it's omp_parse3.f90 with -O0). > I haven't started 4.2 testing yet - I'll let you know when I get that done.
Re: A question about df
On 10/24/07, Revital1 Eres <[EMAIL PROTECTED]> wrote: > > Hello, > > While testing a patch for the SMS I got an ICE which seems > to be related to the fact we build def-use chains only > and not use-def chains. (removed in the following patch - > http://gcc.gnu.org/ml/gcc-patches/2006-12/msg01682.html) > The problem arises when we delete an insn from the df that contains a > use but do not update the def-use chain of it's def as we do not have > the use-def chain to reach it's def, This later causes a problem when > we try to dump the def-use chain of it's def. I'm sorry but I don't understand the description of the problem. What do you mean by "dump" and what problem does this "dump" cause ? > So, it seems that when asking for only def-use problem and later dump > the function we should ask for use-def problem as well to avoid cases > like the above. The df chain dump routines are supposed to handle DU-only or UD-only cases properly. If that's not the case, please send us a testcase (and preferably file a bugzilla report). Thanks, Seongbae
Re: A question about df
On 10/24/07, Revital1 Eres <[EMAIL PROTECTED]> wrote: > > > The problem arises when we delete an insn from the df that contains a > > > use but do not update the def-use chain of it's def as we do not have > > > the use-def chain to reach it's def, This later causes a problem when > > > we try to dump the def-use chain of it's def. > > > > I'm sorry but I don't understand the description of the problem. > > What do you mean by "dump" and what problem does this "dump" cause ? > > By dump I mean printing the function including all the DU chain info > (in TODO_dump_func at the end of the pass). This causes a problem in > our case becuase an insn with a use was deleted (in df_insn_delete) > without unlinking it from the def-use chain of it's def (because we > can not access the def using a use-def chain - we do not build it). > Once we want to print the def's def-use chain we get an ICE. Hope this > explains the problem better. I see. You're right, however, once an insn is deleted, the chain is not correct anyway (e.g. deleting an insn could expose a def to reach new uses that it didn't reach before) and it seems incorrect to keep using the chain without rebuilding them. Seongbae
Re: Designs for better debug info in GCC
I think both sides are talking over each other, partially because two different goals are in mind. IMHO, there are two extremes when it comes to the so called debugging optimized code. One camp wants the full debuggability (let's call them debuggability crowd) - which means they want to know the value of any valid program state anywhere, and wants to set breakpoint anywhere and be able to even change the program state anywhere as if there was an assignment at the point the debugger stopped the program at. This camp still wants better performance (like everyone else) but they don't want to sacrifice the debuggability for performance, because they rely on these. The other camp is the performance crowd, where they want the absolute best performance but they still want as much debug information possible. Most people fall in this camp and this is what gcc has implemented. This camp doesn't want to change the code so that they can get better debugging information. Of course, the real world is somewhere in between, but in practice, most people fall in the latter group (aka performance crowd). Alexandre's proposal would make it possible to make the debuggability crowd happy at some unknown cost of compile-time/runtime cost and maintenance cost. Richiard's proposal (from what I can understand) would make performance crowd happy, since it would be less costly to implement than Alexandre's and would provide incrementally better debugging information than current, but it doesn't seem to be that it would make the debuggability crowd happy (or at least the extremists among debuggability crowd). So I think the difference in the opinion isn't so much as Alexandre's proposal is good or bad, but rather whether we aim to make the debuggability crowd happy or the performance crowd happy or both. Ideally we should serve both groups of users, but there's non-trivial ongoing maintenance cost for having two different approaches. So I'd like to ask both Alexandre and Richard whether they each can satisfy the other camp, that is, Alexandre to come up with a way to tweak his proposal so that it is possible to keep the compile time cost comparable to what is right now with similar or better debug information, and with reasonable maintenance cost, and Richard whether his proposal can satisfy the debuggability crowd. Of course, another possible opinion would be to ignore the debuggability crowd on the ground that they are not important or big. I personally think it's a mistake to do so, but you may disagree on that point. Seongbae On 08 Nov 2007 12:50:17 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: > Alexandre Oliva <[EMAIL PROTECTED]> writes: > > > So... The compiler is outputting code that tells other tools where to > > look for certain variables at run time, but it's putting incorrect > > information there. How can you possibly argue that this is not a code > > correctness issue? > > I don't see any point to going around this point again, so I'll just > note that I disagree. > > > > >> >> > We've fixed many many bugs and misoptimizations over the years due > > >> >> > to > > >> >> > NOTEs. I'm concerned that adding DEBUG_INSN in RTL repeats a > > >> >> > mistake > > >> >> > we've made in the past. > > >> >> > > >> >> That's a valid concern. However, per this reasoning, we might as well > > >> >> push every operand in our IL to separate representations, because > > >> >> there have been so many bugs and misoptimizations over the years, > > >> >> especially when the representation didn't make transformations > > >> >> trivially correct. > > >> > > >> > Please don't use strawman arguments. > > >> > > >> It's not, really. A reference to an object within a debug stmt or > > >> insn is very much like any other operand, in that most optimizer > > >> passes must keep them up to date. If you argue for pushing them > > >> outside the IL, why would any other operands be different? > > > > > I think you misread me. I didn't argue for pushing debugging > > > information outside the IL. I argued against a specific > > > implementation--DEBUG_INSN--based on our experience with similar > > > implementations. > > > > Do you remember any other notes that contained actual rtx expressions > > and expected optimization passes to keep them accurate? > > No. > > > Do you think > > we'd gain anything by moving them to a separate, out-of-line > > representation? > > I don't know. I don't see such a proposal on the table, and I don't > have one myself, so I don't know how to evaluate it. > > Ian > -- #pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";