Re: Compiling with profiling flags

2006-10-18 Thread Seongbae Park

On 10/17/06, Revital1 Eres <[EMAIL PROTECTED]> wrote:


Hello,

Is there an option to change the name of the .gcno file that is generated
by using profiling
flags like -fprofile-generate and later used by -fprofile-use?
I read that "For each source file compiled with `-fprofile-arcs', an
accompanying
 `.gcda' file will be placed in the object file directory. " - Can I change
it such that
the .gcda will be named as I wish?

Thanks


As far as I know, there's no easy way to do that.

However, by setting the environment variable GCOV_PREFIX
during the execution of the instrumented program,
you can redirect where the gcda files are stored.

Alas, I don't know any way to use such out-of-place stored gcda
for -fprofile-use.
I have changes that add an option to enable exactly that
but lacking the copyright assignment, I can't send it as a patch
- which also includes some other profile related enhancements
mentioned in:

http://gcc.gnu.org/wiki/ProfileFeedbackEnhancements
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: GCC optimizes integer overflow: bug or feature?

2006-12-20 Thread Seongbae Park

On 12/20/06, Dave Korn <[EMAIL PROTECTED]> wrote:
...

> We (in a major, commercial application) ran into exactly this issue.
> 'asm volatile("lock orl $0,(%%esp)"::)' is your friend when this happens
> (it is a barrier across which neither the compiler nor CPU will reorder
> things). Failing that, no-op cross-library calls (that can't be inlined)
> seem to do the trick.

  This simply means you have failed to correctly declare a variable volatile
that in fact /is/ likely to be spontaneously changed by a separate thread of
execution.


The C or C++ standard doesn't define ANYTHING related to threads,
and thus anything related to threads is beyond the standard.
If you think volatile means something in an MT environment,
think again. You can deduce certain aspect (e.g. guaranteed
appearance of store or load), but nothing beyond that.
Add memory model to the mix, and you're way beyond what the language says,
and you need to rely on the non-standard non-portable facilities,
if provided at all.
Even in a single threaded environment, what exactly volatile means
is not quite clear in the standard (except for setjmp/longjmp related aspect).

I liked the following paper (for general users,
not for the compiler developers, mind you):

http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Seongbae Park

On 12/29/06, Paul Eggert <[EMAIL PROTECTED]> wrote:
...

> the much more often reported problems are with
> -fstrict-aliasing, and this one also doesn't get any
> special treatment by autoconf.

That's a good point, and it somewhat counterbalances the
opposing point that -O2 does not currently imply
'-ffast-math'ish optimizations even though the C standard
would allow it to.


Can you point me to the relevant section/paragraph in C99 standard
where it allows the implementation to do -ffast-math style optimization ?
C99 Annex F.8 quite clearly says the implementation can't,
as long as it claims any conformity to IEC 60559.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Seongbae Park

On 12/29/06, Paul Eggert <[EMAIL PROTECTED]> wrote:

"Seongbae Park" <[EMAIL PROTECTED]> writes:

> On 12/29/06, Paul Eggert <[EMAIL PROTECTED]> wrote:
>> -O2 does not currently imply '-ffast-math'ish optimizations even
>> though the C standard would allow it to.
>
> Can you point me to the relevant section/paragraph in C99 standard
> where it allows the implementation to do -ffast-math style optimization ?
> C99 Annex F.8 quite clearly says the implementation can't,
> as long as it claims any conformity to IEC 60559.

This is more of a pedantic standards question than a
real-world programming question, but I'll answer anyway

C99 does not require implementations to conform to IEC 60559
as specified in C99 Annex F.  It's optional.




Similarly, C99 does not require implementations to conform
to LIA-1 wrapping semantics as specified in C99 Annex H.
That's optional, too.

The cases are not entirely equivalent, as Annex F is
normative but Annex H is informative.
But as far as the
standard is concerned, it's clear that gcc could enable many
non-IEEE optimizations (including some of those enabled by
-ffast-math); that would conform to the minimal standard.


Not when __STDC_IEC_559__ is defined,
which is the case for most modern glibc+gcc combo.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Seongbae Park

On 30 Dec 2006 03:20:11 +0100, Gabriel Dos Reis
<[EMAIL PROTECTED]> wrote:
...

The C standard, in effect, has an appendix (Annex H) that was not
there in the C89 edition, and that talks about the very specific issue
at hand

   H.2.2  Integer types

   [#1] The signed C integer types int,  long  int,  long  long
   int,  and  the  corresponding  unsigned types are compatible
   with LIA-1.  If an implementation adds support for the LIA-1
   exceptional  values  ``integer_overflow'' and ``undefined'',
   then those types are LIA-1 conformant types.   C's  unsigned
   integer  types  are  ``modulo''  in  the LIA-1 sense in that
   overflows  or  out-of-bounds  results  silently  wrap.An
   implementation  that  defines  signed  integer types as also
   being modulo need not  detect  integer  overflow,  in  which
   case, only integer divide-by-zero need be detected.


which clearly says LIA-1 isn't a requirement - notice "if" in the
second setence.
H.1 makes it clear that the entire Annex H doesn't add any extra rule
to the language but merely describes what C is in regard to LIA-1.
H.2 doubly makes it clear that C as it defined isn't LIA-1
- again, notice "if" in H.2p1.
The second sentence of H.3p1 confirms this again:

  C's operations are compatible with LIA−1 in that C
  allows an implementation to cause a notification to occur
  when any arithmetic operation
  returns an exceptional value as defined in LIA−1 clause 5.

i.e. "compatible" means C's definition doesn't prevent
a LIA-1 conformant implementation.
In other words, all LIA-1 comformant compiler is conformant to C99
in terms of arithmetic and types.
However, not all C99 conformant compiler aren't LIA-1 conformant.
C isn't conformant to LIA-1 but merely compatible,
exactly because of the undefined aspect.

That's enough playing a language laywer for me in a day.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-31 Thread Seongbae Park

On 12/31/06, Daniel Berlin <[EMAIL PROTECTED]> wrote:
...

> I added -fwrapv to the Dec30 run of SPEC at
> http://www.suse.de/~gcctest/SPEC/CFP/sb-vangelis-head-64/recent.html
> and
> http://www.suse.de/~gcctest/SPEC/CINT/sb-vangelis-head-64/recent.html

Note the distinct drop in performance across almost all the benchmarks
on Dec 30, including popular programs like bzip2 and gzip.


Also, this is only on x86 -
other targets that benefit more from software pipelinine/modulo scheduling
may suffer even more than x86, especially on the FP side.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: gcc, mplayer and profile (mcount)

2007-01-03 Thread Seongbae Park

On 03 Jan 2007 10:07:57 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:

Adam Sulmicki <[EMAIL PROTECTED]> writes:

>   In spirit making OSS better, I took the extra effor to report
>   findings back to both lists. In reward I got flamed on both list.

You got flamed on the gcc list?  I don't see any flames there.  All I
told you was to use the gcc-help mailing list, which was correct.  We
have different mailing lists for a reason.  Sending e-mail to the
wrong list simply wastes people's time.

I also did my best to answer your question.  In your original message
you didn't mention that there was an mcount variable in your program,
so it's not surprising that I wasn't able to identify the problem.


>   * Rename mcount in mplayer/libmenu/menu.c to something else.
>
>   * document mcount in gcc man page
>
>   * gcc prints warning.
>
>   * do nothing.
>
> I can immagine that gcc developers might have intentionally left
> mcount() visible to user space so that user can replace gcc's mcount()
> with their own implementation.

gcc does not provide the function which in this case is called
mcount().  That function comes from the C library.

The C library used on GNU/Linux systems, glibc, actually provides the
function _mcount, with a weak alias named mcount.  So this seems to be
a bug in gcc: it should be calling _mcount.

In fact, by default, gcc for the i386 targets will call _mcount.  gcc
for i386 GNU/Linux targets was changed to call mcount instead of
_mcount with this patch:

Thu Mar 30 06:20:36 1995  H.J. Lu   ([EMAIL PROTECTED])

* configure (i[345]86-*-linux*): Set xmake_file=x-linux,
tm_file=i386/linux.h, and don't set extra_parts.
(i[345]86-*-linux*aout*): New configuration.
(i[345]86-*-linuxelf): Deleted.
* config/linux{,-aout}.h, config/x-linux, config/xm-linux.h: New files.
* config/i386/linux-aout.h: New file.
* config/i386/linux.h: Extensive modifications to use ELF format
as default.
(LIB_SPEC): Don't use libc_p.a for -p. don't use libg.a
unless for -ggdb.
(LINUX_DEFAULT_ELF): Defined.
* config/i386/linuxelf.h,config/i386/x-linux: Files deleted.
* config/i386/xm-linux.h: Just include xm-i386.h and xm-linux.h.

I believe that was during the time H.J. was maintaining a separate
branch of glibc for GNU/Linux systems.  Presumably his version
provided mcount but not _mcount.  I haven't tried to investigate
further.

In any case clearly gcc for i386 GNU/Linux systems today should call
_mcount rather than mcount.  I will make that change.


I remember someone wanting to provide his own mcount().
Presumably, mcount() is weak in libc to cover such a use case ?
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: RFC: Mark a section to be discarded for DSO and executable

2007-01-09 Thread Seongbae Park

On 09 Jan 2007 10:09:35 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:

"H. J. Lu" <[EMAIL PROTECTED]> writes:

> With LTO, an object file may contain sections with IL, which
> can be discarded when building DSO and executable. Currently we can't
> mark such sections with gABI. With GNU linker, we can use a
> linker script to discard such sections. But it will be more generic
> to make a section to be discarded for DSO and executable in gABI.
> In that case, we don't need a special linker script to discard
> such sections. Something like
>
> #define SHF_DISCARD   0x800
>
> would work.

That is not strictly required for LTO as I see it.  With LTO, the lto
program is going to read the .o files with the IL information.  It
will then generate a new .s file to pass to the assembler.  The IL
information will never go through the linker.


Not only this isn't a requirement, there's a scenario where
you *want* to keep the IL inside LTO-optimized DSO and executables
for re-optimization.
With LTO and IL inside executables and DSO,
it becomes possible to do profile-collection without any source code
and re-optimize the binary. i.e. an executable can be released with IL inside,
and the user of the binary can choose to do profile collection
on their own input, and re-optimize the binary to squeeze more performance.
This scenario also improves the usability of the profile-feedback directed
optimization (by not requiring the source code access and the whole
build environment for profile feedback optimization).


Of course, it is also possible that LTO .o files with IL information
will be passed directly to the linker, for whatever reason.  In that
case, we may want the linker to remove the IL information.  This is
not substantially different from the linker's current ability to strip
debugging information on request.


It should be optional for the linker to remove the sections containing IL.
We probably want to have a new option to strip to remove such sections
as well as linker option to remove such sections.


So if you want to propose a solution for that, I think you should
consider how it can be used for debugging information as well.  And I
don't think SHF_DISCARD is a good name.  We don't want to arbitrarily
discard the data, we want to discard it in certain specific scenarios.

--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Signed int overflow behaviour in the security context

2007-01-26 Thread Seongbae Park

On 1/26/07, Daniel Berlin <[EMAIL PROTECTED]> wrote:

>
> > Every leading C compiler has for years done things like this to boost
> > performance on scientific codes.
>
> The Sun cc is a counter-example.  And even then, authors of scientific
> code usually do read the compiler manual, and will discover any
> additional optimizer flags.
>
Errr, actually, Seongbae, who worked for Sun on Sun CC until very
recently, says otherwise, unless i'm mistaken.

Seongbae, didn't you say that Sun's compiler uses the fact that signed
overflow is undefined when performing optimizations?


Correct.
Sun's SPARC backend even had a compiler flag that tells the optimizer
to ignore overflow case for unsigned integer
which I believe is used in some of the SPEC CPU submissions.

There's no equivalent flag for signed integer because it's default,
and we've never had a problem with it -
there were bugs filed against it, and when customers were told
signed integer overflow is undefined in the standard,
they went back and fixed their code.

I believe HP's compiler does the similar,
although they also provide an option to override the default behavior
of ignoring the integer overflow at high optimization level
(see their documentation about +Ointeger_overflow= flag).

IBM xlc has very similar approach, IIRC.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: 2007 GCC Developers Summit

2007-01-29 Thread Seongbae Park

On 1/29/07, Diego Novillo <[EMAIL PROTECTED]> wrote:

Ben Elliston wrote on 01/28/07 17:45:

> One idea I've always pondered is to have brief (perhaps 1-2 hr)
> tutorials, given by people in their area of expertise, as a means for
> other developers to come up to speed on a topic that interests them.  Is
> this something that appeals to others?
>
Sounds good to me.  For instance, the new java front end, a description
of the new build system, etc.


Although it's not something new,
I'd be interested in a tutorial on loop optimizations (IV opt and all related).
Based on my understanding, the topic might be of interest
to some other people as well.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Some thoughts and quetsions about the data flow infrastracture

2007-02-13 Thread Seongbae Park

On 2/13/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote:
...

 I am only affraid that solution for faster infrastructure
(e.g. another slimmer data representation) might change the interface
considerably.  I am not sure that I can convinince in this.  But I am
more worried about 4.3 release and I really believe that inclusion of
the data flow infrastructure should be the 1st step of stage 1 to give
people more time to solve at least some problems.


Vlad,

I'm really interested in hearing what aspect of the current interface
is not right. Can you tell us about just some rough sketch of
what slimmer (hence better) data representation would look like,
and why the current interface won't be ideal for that ?
It's not too late - if there's anything we can do to change
the interface so that it can accomodate
a potentially better implementation in the future,
I won't object to it - it's just that I haven't talked to you
to figure out what you have in mind.


Saying that I hurt some feeling people who put a lof of efforts in the
infrastracture like Danny, Ken, and Seongbae and I am sorry for that.


No feelings hurt. Thanks for all your feedback.
More eyes, especially experienced ones like you, can only help.
Also, thanks for trying out DF on Core2
- we need to look more closely why those regressions are
there and what/how we can do to fix those,
before such an evaluation, I can't tell whether this is a serious fundamental
problem or some easily fixable things that we haven't gotten to yet.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: bad edge created in CFG ?

2007-03-14 Thread Seongbae Park

On 3/14/07, Sunzir Deepur <[EMAIL PROTECTED]> wrote:

Hello,

I have used -da and -dv to produce vcg files of the CFG of this simple program:

int main(int argc, char**argv)
{
if(argc)
printf("positive\n");
else
printf("zero\n");

return 0;
}

I have expected to get a CFG as follows:

 ---
 | BB 0 |
 ---
  /  \
   --- ---
   | BB 1 |  | BB 2 |
   --- ---
  \  /
 ---
 | END |
 ---

But instead, I was surprised to get this CFG:

 ---
 | BB 0 |
 ---
  /  \
   --- ---
   | BB 1 |  > | BB 2 |
   --- ---
  \  /
 ---
 | END |
 ---

as if one case of the "if" can lead to the other !

Can someone please explain to me why it is so ?

I am attaching the VCG representation, the VCG text file, and the original
test program..

Thank You
sunzir


I don't know what kind of vcg viewer/converter you're using,
but set it to ignore class 3 edges - you'll get what you expected.

--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: register reload problem in global register allocation

2007-03-21 Thread Seongbae Park

On 3/21/07, wonsubkim <[EMAIL PROTECTED]> wrote:

I have some problems in global register allocation phase.

I have described some simple architecture using machine description and target
macro file. I use gnu GCC version 4.1.1.

But, "can't combine" message is printed out in *.c.37.greg file in global
register allocation phase. After i have traced the error message, i have
found out the message reload_as_needed. But, I do not recognize the reason
about this problem. I really can't know the relationship between register
reload and my description.


What is your REAL problem ?
I doubt "can't combine" is the real immediate symptom,
as it's most likely just a dump of rld[].nocombine.
Show us at least your error message and the rtl
that you think is causing the problem.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Proposal: changing representation of memory references

2007-04-04 Thread Seongbae Park

On 4/4/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote:

Hello,

at the moment, any pass that needs to process memory references are
complicated (or restricted to handling just a limited set of cases) by
the need to interpret the quite complex representation of memory
references that we have in gimple.  For example, there are about 1000 of
lines of quite convoluted code in tree-data-ref.c and about 500 lines in
tree-ssa-loop-ivopts.c dealing with parsing and analysing memory
references.

I would like to propose (and possibly also work on implementing) using a
more uniform representation, described in more details below.  The
proposal is based on the previous discussions
(http://gcc.gnu.org/ml/gcc/2006-06/msg00295.html) and on what I learned
about the way memory references are represented in ICC.
It also subsumes the patches of Daniel to make p[i] (where p is pointer)
use ARRAY_REFs/MEM_REFs.

I am not sure whether someone does not already work on something
similar, e.g. with respect to LTO (where the mentioned discussion
started), or gimple-tuples branch?

Proposal:

For each memory reference, we remember the following information:

-- base of the reference
-- constant offset
-- vector of indices
-- type of the accessed location
-- original tree of the memory reference (or another summary of the
  structure of the access, for aliasing purposes)
-- flags

for each index, we remeber
-- lower and upper bound
-- step
-- value of the index

The address of the reference can be computed as

base + offset + sum_{idx} offsetof(idx)

where offsetof(idx) = (value - lower) * step

For example, a.x[i][j].y  would be represented as
base = &a
offset = offsetof (x) + offsetof (y);
indices:
  lower = 0 upper = ? step = sizeof (a.x[i]) value = i
  lower = 0 upper = ? step = sizeof (a.x[j]) value = j

p->x would be represented as
base = p;
offset = offsetof (x);
indices: empty

p[i] as
base = p;
offset = 0
indices:
  lower = 0 upper = ? step = sizeof (p[i]) value = i

Remarks:
-- it would be guaranteed that the indices of each memory reference are
   independent, i.e., that &ref[idx1][idx2] == &ref[idx1'][idx2'] only
   if idx1 == idx1' and idx2 = idx2'; this is important for dependency
   analysis (and for this reason we also need to remember the list of
   indices, rather than trying to reconstruct them from the address).


I didn't completely think through this,
but how would this property help in the presence of array flattening ?
e.g. suppose two adjacent loops, both two-nest-deep,
and only one of them is flattened, then
one loop will have one dimensional access (hence will have only one index)
vs the other loop will have two dimensional.
In other words, &ref[idx1] == &ref[idx1'][idx2'] can happen.
So most likely we'll need to be able to compare the linearized address form,
rather than simply comparing vectors of indexes.

This is not to say I don't like your proposal (I actually do),
but I just want to understand what properties
you're expecting from the representation
(and I think this has implication on the canonicalization of the references
you mentioned below).

Another question is, how would the representation look
for more complicated address calculations
e.g. a closed hashtable access like:

table[hash_value % hash_size]

and would it help in such cases ?


-- it would be guaranteed that the computations of the address do not
   overflow.
-- possibly there could be a canonicalization pass that, given

   for (p = q; p < q + 100; p++)
 p->x = ...  {represented the way described above}

   would transform it to

   for (p = q, i = 0; p < q + 100; p++, i++)
 {base = q, offset = offsetof(x), indices: lower = 0 upper = ? step = 
sizeof (*p) value = i}

   so that dependence analysis and friends do not have to distinguish
   between accesses through pointers and arrays

--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: DF-branch benchmarking on SPEC2000

2007-04-23 Thread Seongbae Park

On 4/23/07, Vladimir Makarov <[EMAIL PROTECTED]> wrote:
...

  To improve the scores I'd recommend to pay attention to big
degradation in SPEC score:

9% perlbmk degradation on Pentium4
3% fma3d degradation on Core2
3% eon and art degradation on Itanium
3% gap and wupwise degradation on PPC64.


Vlad,

Thanks a LOT for the measurement and the summary.
Thanks to your previous report, I've been looking at wupwise on PPC64,
and have found at least one missed optimization opportunity.
I'm testing the fix.
Hopefully it will address some of the remaining regressions above.

As for the perlbmk slowdown on P4.
my initial guess is that it might be due to cross-jumping or block ordering
- those are things  I noticed the dataflow branch generates slightly
different code
than mainline.
I didn't try to narrow down where the difference comes from,
as most of the differences seemed unimportant.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Backport fix for spurious anonymous ns warnings PR29365 to 4.2?

2007-05-01 Thread Seongbae Park

On 5/1/07, Andrew Pinski <[EMAIL PROTECTED]> wrote:

On 01 May 2007 14:28:07 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:
> I agree that it would be appropriate to backport the patch to gcc 4.2.

Lets first get the patch which fixes the ICE regression that this
patch causes approved :).

Which can be found at:
http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01746.html

Thanks,
Andrew Pinski


Thanks for the plug, Andrew.

C++ maintainers,
Please consider this as another ping for my patch :)
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Backport fix for spurious anonymous ns warnings PR29365 to 4.2?

2007-05-02 Thread Seongbae Park

On 5/2/07, Mark Mitchell <[EMAIL PROTECTED]> wrote:

Seongbae Park wrote:
> On 5/1/07, Andrew Pinski <[EMAIL PROTECTED]> wrote:
>> On 01 May 2007 14:28:07 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:
>> > I agree that it would be appropriate to backport the patch to gcc 4.2.
>>
>> Lets first get the patch which fixes the ICE regression that this
>> patch causes approved :).
>>
>> Which can be found at:
>> http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01746.html

This patch is OK for mainline.


Attached is the patch I commited just now.


As for backporting to 4.2, this isn't a regression, so the default
answer would be "no".  I'm unconvinced that this is a sufficiently
serious problem to merit violating that policy; after all, we're only
talking about a spurious warning.  (However, the other bug here is that
we don't have a warning option for this, so users can't use
-Wno- to turn this off.)

In any case, we're not going to do this for 4.2.0.  As per the policy I
posted recently on PRs, please find a C++ maintainer who wants to argue
for backporting this and ask them to mark the PR as P3 with an argument
as to why this is important to backport.

Thanks,


My guess is that this will be big enough nuisance to be worth backporting,
but I agree that this isn't a regression
- I won't bother other c++ maintainers for this.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";
Index: gcc/testsuite/g++.dg/warn/anonymous-namespace-2.h
===
--- gcc/testsuite/g++.dg/warn/anonymous-namespace-2.h   (revision 0)
+++ gcc/testsuite/g++.dg/warn/anonymous-namespace-2.h   (revision 0)
@@ -0,0 +1,3 @@
+namespace {
+  struct bad { };
+}
Index: gcc/testsuite/g++.dg/warn/anonymous-namespace-2.C
===
--- gcc/testsuite/g++.dg/warn/anonymous-namespace-2.C   (revision 0)
+++ gcc/testsuite/g++.dg/warn/anonymous-namespace-2.C   (revision 0)
@@ -0,0 +1,29 @@
+// Test for the warning of exposing types from an anonymous namespace
+// { dg-do compile }
+//
+#include "anonymous-namespace-2.h"
+
+namespace {
+struct good { };
+}
+
+struct g1 {
+good * A;
+};
+struct b1 { // { dg-warning "uses the anonymous namespace" }
+bad * B;
+};
+
+struct g2 {
+good * A[1];
+};
+struct b2 { // { dg-warning "uses the anonymous namespace" }
+bad * B[1];
+};
+
+struct g3 {
+good (*A)[1];
+};
+struct b3 { // { dg-warning "uses the anonymous namespace" }
+bad (*B)[1];
+};
Index: gcc/cp/decl2.c
===
--- gcc/cp/decl2.c  (revision 124362)
+++ gcc/cp/decl2.c  (working copy)
@@ -1856,7 +1856,7 @@ constrain_class_visibility (tree type)
   for (t = TYPE_FIELDS (type); t; t = TREE_CHAIN (t))
 if (TREE_CODE (t) == FIELD_DECL && TREE_TYPE (t) != error_mark_node)
   {
-   tree ftype = strip_array_types (TREE_TYPE (t));
+   tree ftype = strip_pointer_or_array_types (TREE_TYPE (t));
int subvis = type_visibility (ftype);
 
if (subvis == VISIBILITY_ANON)
Index: gcc/c-common.c
===
--- gcc/c-common.c  (revision 124362)
+++ gcc/c-common.c  (working copy)
@@ -3894,6 +3894,15 @@ strip_pointer_operator (tree t)
   return t;
 }
 
+/* Recursively remove pointer or array type from TYPE. */
+tree
+strip_pointer_or_array_types (tree t)
+{
+  while (TREE_CODE (t) == ARRAY_TYPE || POINTER_TYPE_P (t))
+t = TREE_TYPE (t);
+  return t;
+}
+
 /* Used to compare case labels.  K1 and K2 are actually tree nodes
representing case labels, or NULL_TREE for a `default' label.
Returns -1 if K1 is ordered before K2, -1 if K1 is ordered after
Index: gcc/c-common.h
===
--- gcc/c-common.h  (revision 124362)
+++ gcc/c-common.h  (working copy)
@@ -727,6 +727,7 @@ extern bool c_promoting_integer_type_p (
 extern int self_promoting_args_p (tree);
 extern tree strip_array_types (tree);
 extern tree strip_pointer_operator (tree);
+extern tree strip_pointer_or_array_types (tree);
 extern HOST_WIDE_INT c_common_to_target_charset (HOST_WIDE_INT);
 
 /* This is the basic parsing function.  */


Re: Optimize flag breaks code on many versions of gcc (not all)

2006-06-18 Thread Seongbae Park

On 6/18/06, Paolo Carlini <[EMAIL PROTECTED]> wrote:

Zdenek Dvorak wrote:

>... I suspect there is something wrong with your
>code (possibly invoking some undefined behavior, using uninitialized
>variable, sensitivity to rounding errors, or something like that).
>
>
A data point apparently in favor of this suspect is that the "problem"
goes away if double is replaced everywhere with long double...

Paolo.


As suspected by everybody, this is a bug in the code.


From your original code:


   110  coord[i]=start[i]+maxT[whichPlane]*dir[i];
   111  //  Uncomment one of these to make the program work.
   112  //sleep(0);
   113  //cerr << "";
   114  if ((coord[i]ur[i])) {
   115// outside box
   116return false;
   117  }

The compiler is allowed to calculate coord[i] at 110 in 80-bit
and use that in the comparison at line 114.
One way to see this impact, without debugging at the assembly level,
is to change the code to:

   110  long double temp;
   111  temp = start[i]+maxT[whichPlane]*dir[i];
   112  //  Uncomment one of these to make the program work.
   113  //sleep(0);
   114  //cerr << "";
   115  if ((tempur[i])) {

If you try this code, you'll find that this always fails
and if you inspect temp and ur[i], you'll find that they are very close -
within 1 ULP of double precision.

One bandaid for this problem is to use -ffloat-store
but you'll suffer the performance penalty and it won't really fix
the root cause - which is your code.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Optimize flag breaks code on many versions of gcc (not all)

2006-06-19 Thread Seongbae Park

On 6/19/06, Dave Korn <[EMAIL PROTECTED]> wrote:

On 19 June 2006 00:04, Paolo Carlini wrote:

> Zdenek Dvorak wrote:
>
>> ... I suspect there is something wrong with your
>> code (possibly invoking some undefined behavior, using uninitialized
>> variable, sensitivity to rounding errors, or something like that).
>>
>>
> A data point apparently in favor of this suspect is that the "problem"
> goes away if double is replaced everywhere with long double...
>
> Paolo.

  Is this another case of http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323
then?

cheers,
  DaveK


It is the same case. Fundamentally, this is not fixable by the compiler alone
without significant performance penalty.
There are very few implementations [1] that are completely IEEE754 conformant
and making them to be so is often prohibitively expensive,
hence it's not done or at least not by default.
So whenever you're programming a floating-point code,
you need to be aware of the caveats of a particular implementation.
Beside x86's well-known extended precision issue,
other processors have things like flush-denormal-inputs-to-zero,
or multiply-add instruction that is not equivalent to separate muitply and add.

I think it's not fair to expect gcc to somehow "fix" this whole mess alone.
Of course, whenever there's a reasonable workaround for a particular issue,
I'm sure gcc developers will try to accomodate it,
but IMHO this one (bug 323) isn't such.

[1] by implementation, I mean the combination of:
microprocessor, OS, compiler and runtime libraries..
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Project RABLET

2006-06-23 Thread Seongbae Park

On 6/23/06, Andrew MacLeod <[EMAIL PROTECTED]> wrote:
...

1 - One of the core themes in RABLE was very early selection of
instructions from patterns.  RTL patterns are initially chosen by the
EXPAND pass. EXPAND tends to generates better rtl patterns by being
handed complex trees which it can process and get better combinations.

  When TREE-SSA was first implemented, we got very poor RTL because
expand was seeing very small trees.  TER (Temporary Expression
Replacement) was invented, which mashed any single-def/single-use
ssa_names together into more complex trees. This gave expand a better
chance of selecting better instructions, and made a huge difference.


Have you considered using BURG/IBURG style tree pattern matching
instruction selection ?

http://www.cs.princeton.edu/software/iburg/

That approach can certainly provide a low register pressure
high quality instruction selection.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: unable to detect exception model

2006-06-25 Thread Seongbae Park

On 6/25/06, Eric Botcazou <[EMAIL PROTECTED]> wrote:

> So, something obviously wrong with
>
> struct max_alignment {
>  char c;
>  union {
>HOST_WIDEST_INT i;
>long double d;
>  } u;
> };
>
> /* The biggest alignment required.  */
>
> #define MAX_ALIGNMENT (offsetof (struct max_alignment, u))
>
> for SPARC 32bit?

I don't think so, the ABI says 8 in both cases.

Note that bootstrap doesn't fail on SPARC/Solaris 2.[56] and (presumably)
SPARC/Linux, which have HOST_WIDE_INT == 32, whereas SPARC/Solaris 7+ have
HOST_WIDE_INT == 64.  All are 32-bit compilers.

Bootstrap doesn't fail on SPARC64/Solaris 7+ either, for which the ABI says 16
for the alignment in both cases.  They are 64-bit compilers.


SPARC psABI3.0 (32bit version) defines long double as 8 byte aligned.
SCD4.2, 64bit version, defines long double as 16 byte aligned with some caveat
(which essentially says long double can be 8-byte aligned in some cases
- fortran common block case - but the compiler should assume
16-byte unless it can prove otherwise).
On 32bit ABI, there's also a possiblity of "double"s being only 4-byte aligned
when a double is passed on the stack.

I don't know enough about gcc's gc to know whether the above can trip it over,
but the memory allocation (malloc and the likes) shouldn't be a
problem as long as
it returns 8-byte aligned block on 32bit and 16-byte aligned on 64bit.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: externs and thread local storage

2006-07-02 Thread Seongbae Park

On 7/1/06, Gary Funck <[EMAIL PROTECTED]> wrote:
...

What are the technical reasons for the front-end enforcing this restriction,
when apparently some linkers will handle the TLS linkage fine?  If in fact
it is required that __thread be added to the extern, is the compiler simply
accommodating a limitation/bug in the linker?


Because the compiler has to generate different code
for accesses to __thread vs non __thread variable:
# cat -n t.c
1  extern  int i1;
2  extern __thread int i2;
3
4  int func1() { return i1; }
5  int func2() { return i2; }
# gcc -O -S t.c
# cat t.s
   .file   "t.c"
   .text
.globl func1
   .type   func1,@function
func1:
   pushl   %ebp
   movl%esp, %ebp
   movli1, %eax
   leave
   ret
.Lfe1:
   .size   func1,.Lfe1-func1
.globl func2
   .type   func2,@function
func2:
   pushl   %ebp
   movl%esp, %ebp
   movl[EMAIL PROTECTED], %eax
   movl%gs:(%eax), %eax
   leave
   ret
.Lfe2:
   .size   func2,.Lfe2-func2
   .ident  "GCC: (GNU) 3.2.2 20030222 (Red Hat Linux 3.2.2-5)"
#

--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: externs and thread local storage

2006-07-02 Thread Seongbae Park

On 7/2/06, Gary Funck <[EMAIL PROTECTED]> wrote:
...

Seongbae Park wrote:
> Because the compiler has to generate different code
> for accesses to __thread vs non __thread variable

In my view, this is implementation-defined, and generally can vary
depending upon the underlying linker and OS technology.  Further,
there is at least one known platform (IA64) which seems to not impose
this restriction.  A few implementation techniques come to mind,
where the programmer would not need to explicitly tag 'extern's
with __thread:


That's the only platform I know of that doesn't require different sequence.
Should we make the language rules such that
it's easy to implement on one platform but not on the others,
or should we make it such that it's easy to implement in almost all platforms ?

Also, what is the benefit of allowing mismatch between
declaration and definition of __thread vs non __thread ?
It only makes reading the code more difficult
because it confuses everybody - you need to look at the definition
as well as the declaration to know whether you're dealing
with a thread local variable or not which is BAD.

...proposed scheme snipped...

The question to me is not whether it's doable, but whether it's worth doing
- I see only downside and no upside of allowing mismatch.
If you're convinced that this is a really useful thing for a
particular platform,
why don't you create a new language extension flag that allows this,
and make it default on that platform ?
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: externs and thread local storage

2006-07-02 Thread Seongbae Park

On 7/2/06, Gary Funck <[EMAIL PROTECTED]> wrote:
...

More to the point, I think it is rather too bad that the
extern has to have the __thread attribute at all, and would
have hoped that the linker and OS could have collaborted to
make this transparent, in the same way that data can arranged
in separately linked sections without impacting the way
the external references are written.  Thus, implementation
is separated from interface.


TLS variable and non-TLS variable have totally different behaviors,
and hence they are not interchangeable.
Therefore, TLS aspect should not be made "transparent"
i.e. The programmer has to treat a global variable differently
depending on whether or not it is a TLS
and getting a different one when you expected otherwise will cause
subtle (well, actually not so suble, since it's likely to blow up in your face)
runtime problems.


 I think the requirement to apply
_thread to an extern should also be target specific.


As I said, you're welcome to implement a new option
(either a runtime option or a compile time configuration option)
that will allow mixing TLS vs non-TLS.
Whether or not it should be enabled for a particular platform
should be a matter of discussion, and whether or not that patch will be
accepted in the mainline will be yet another.

Because of the reasons I said above, I think it's a bad idea in general
and I'll oppose to it for any of platforms I care about.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: externs and thread local storage

2006-07-03 Thread Seongbae Park

On 7/3/06, Gary Funck <[EMAIL PROTECTED]> wrote:


Seongbae Park wrote:
> As I said, you're welcome to implement a new option
> (either a runtime option or a compile time configuration option)
> that will allow mixing TLS vs non-TLS.

In a way, we've already done that -- in an experimental dialact of "C"
called UPC.  When compiled for pthreads, all file scope data not
declared in a system header file is made thread local, and in fact all
data referenced through externs is also made thread local.  There is a new
syntax (a "shared" qualifier) used by the programmer to identify
objects shared across all threads.  Sounds a little scary, but works
amazingly well.  Because the tagging of data as __thread local is done
by the compiler transparently, I tend to think that we probably stress
the TLS feature more than most.


In UPC, anything that's not TLS (or in UPC term, "private")
is marked explicitly as "shared". So it's NOT trasparent in any sense
of the word.
See, you have two choices - either
1) make every global variable TLS by default and mark only non-TLS (UPC)
or
2) vice versa (C99).

It is not sane to allow TLS/non-TLS attribute changing underneath you
- which is what you proposed.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: externs and thread local storage

2006-07-03 Thread Seongbae Park

On 7/3/06, Gary Funck <[EMAIL PROTECTED]> wrote:
...

Operations on UPC's shared objects have different semantics than
regular "C".  Array indexing and pointer operations span threads.




Thus A[i] is on one thread but A[i] will (by default) take you to
the next (cyclic) thread.  Since the semantics are different,


This is a totally off-topic, but the layout in UPC doesn't really
affect the semantic.
The layout, or the distribution, is primarily there for the performance,
not for the correctness.
And it really doesn't behave any differently (except for the extra restrictions
and checks put there to make programmers think about data distribution
across threads).

A[10] is A[10], or the tenth element of the array A
regardless of which thread has the affinity to the element.
That's actually the whole point of UPC
- that the code is mostly semantically same as regular C.
But that's grossly off-topic so please don't do any followup on this.


the programmer needs to know that -- it affects the API.  TLS objects
behave like regular "C" objects, at least from the perspective of
the referencing thread.


They don't behave the same way from any perspective.
Non TLS can change underneath by other threads, whereas TLS can not
(except when the pointer to a TLS has escaped to other threads,
which UPC prevents by extra casting restriction on shared and private pointers).
It has a serious impact on how you code around the variable
- whether you need locking and/or some sort of atomic update or not,
whether you can use it to communicate between threads or not,
how the variable is initialized, etc.

Or, since you seem familiar with UPC:

In UPC:

   int local_counter;
   shared int global_counter;

is exactly semantically equivalent to:

   int __thread local_counter;
   int global_counter;

in C99 w/ __thread extension.
Your proposal is equivalent, in UPC, like allowing
declaring a global variable as private, and then later
turn it into a shared in the definition, or vice versa.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: How to insert dynamic code? (continued)

2006-07-13 Thread Seongbae Park

On 7/13/06, jacob navia <[EMAIL PROTECTED]> wrote:

Daniel Jacobowitz wrote:

>On Thu, Jul 13, 2006 at 05:06:25PM +0200, jacob navia wrote:
>
>
>>>So, what happens when _Unwind_Find_registered_FDE is called?  Does it
>>>find the EH data you have registered?
>>>
>>>
>>>
>>>
>>>
>>Yes but then it stops there instead of going upwards and finding the catch!
>>It is as my insertion left the list of registered routines in a bad state.
>>
>>I will look again at this part (the registering part) and will try to
>>find out what
>>is going on.
>>
>>
>
>It sounds to me more like it used your data, and then was left pointing
>somewhere garbage, not to the next frame.  That is, it sounds like
>there's something wrong with your generated unwind tables.  That's the
>usual cause for unexpected end of stack.
>
>
>
Yeah...

My fault obviously, who else?

Problem is, there are so mny undocumented stuff that I do not see how I
could
avoid making a mistake here.

1) I generate exactly the same code now as gcc:

Prolog:

push %ebp
movq  %rsp,%rbp
subqxxx,%rsp

and I do not touch the stack any more. Nothing is pushed, in the "xxx"
is already the stack
space for argument pushing reserved, just as gcc does. This took me 3
weeks to do.

Now, I write my stuff as follows:
1) CIE
2) FDE for function 1
   . 1 fde for each function
3) Empty FDE to zero terminate the stuff.
4) Table of pointers to the CIE, then to the FDE

p = result.FunctionTable; // Starting place, where CIE, then
FDEs are written
p = WriteCIE(p); // Write first the CIE
pFI = DefinedFunctions;
nbOfFunctions=0;
pFdeTable[nbOfFunctions++] = result.FunctionTable;
while (pFI) { // For each function, write the FDE
fde_start = p;
p = Write32(0,p); // reserve place for length field (4
bytes)
p = Write32(p - result.FunctionTable,p); //Write offset
to CIE
symbolP = pFI->FunctionInfo.AssemblerSymbol;
adr = (long long)symbolP->SymbolValue;
adr += (unsigned long long)code_start; // code_start is
the pointer to the Jitted code
p = Write64(adr,p);
p = Write64(pFI->FunctionSize,p); // Write the length in
bytes of the function
*p++ = 0x41;/// Write the opcodes
*p++ = 0x0e; // This opcodes are the same as gcc writes
*p++ = 0x10;
*p++ = 0x86;
*p++ = 0x02;
*p++ = 0x43;
*p++ = 0x0d;
*p++ = 0x06;
p = align8(p);
Write32((p - fde_start)-4,fde_start);// Fix the length
of the FDE
pFdeTable[nbOfFunctions] = fde_start; // Save pointer to
it in table
nbOfFunctions++;
pFI = pFI->Next; // loop
}

The WriteCIE function is this:
static unsigned char *WriteCIE(unsigned char *start)
{
start = Write32(0x14,start);
start = Write32(0,start);
*start++ = 1; // version 1
*start++ = 0; // no augmentation
*start++ = 1;
*start++ = 0x78;
*start++ = 0x10;
*start++ = 0xc;
*start++ = 7;
*start++ = 8;
*start++ = 0x90;
*start++ = 1;
*start++ = 0;
*start++ = 0;
start = Write32(0,start);
return start;
}

I hope this is OK...

jacob


The above code looks incorrect, for various reasons,
not the least of which is that you're assuming CIE/FDE are fixed-length.
There are various factors that affect FDE/CIE
depending on PIC/non-PIC, C or C++, 32bit/64bit, etc -
some of them must be invariant for your JIT but some of them may not.
Also some of the datum are encoded as uleb128
(see dwarf spec for the detail of LEB128 encoding)
which is a variable-length encoding whose length depends on the value.

In short, you'd better start looking at how CIE/FDE structures are *logically*
layed out - otherwise you won't be able to generate correct entries.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Followup: [g++] RFH: Is there a way to make gcc place global const class objects in .rodata ?

2006-08-09 Thread Seongbae Park

On 8/9/06, Haase Bjoern (PT-BEU/EMT) <[EMAIL PROTECTED]> wrote:

I realized just a bit too late that there is an open bug report for the
issue:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=4131

> Bjoern Haase wrote
>Hello,
>
>Is there a way to place global const objects in .rodata, thus avoiding
construction at program start?

So: Sorry for the noise on the list.

According to PR4131, there is currently no solution for this issue. The
".rodata" placement would be highly desireable for my current projects.
This means that I would be willing to try to implement the required
enhancements myself.

Since so far I know only about how the back-ends work, I'd appreciate a
hint on where to start, which part of the code to look at, etc. .

Bjoern.


Of course, it's best if this can be implemented in the compiler,
but if the size and the number of the readonly data is manageable,
you can do this by hand -
inspect what the layout of the class is by writing a test program
and looking at how the fields are layed out,
and replicating the same data in the assembly.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: gcc trunk vs python

2006-08-28 Thread Seongbae Park

On 8/27/06, Guido van Rossum <[EMAIL PROTECTED]> wrote:
...

I have not received reports about bugs in the offending code when
compiled with other compilers.


I do know at least one another compiler that does this,
and at least one significant project (which is actually quite a bit
larger than Python)
that suffered similar problem as Python's suffering -
in short, this isn't new and this isn't even something gcc specific (or fault).
And Python isn't the first program that suffers from this either.


> code.  But this particular optimization will let us do a much better
> job on very simple loops like
> for (int i = 0; i != n; i += 5)
> The issue with a loop like this is that if signed overflow is defined
> to be like unsigned overflow, then this loop could plausibly wrap
> around INT_MAX and then terminate.  If signed overflow is undefined,
> then we do not have to worry about and correctly handle that case.

That seems to me rather obviously broken code unless the programmer
has proven to himself that n is a multiple of 5. So why bother
attempting to optimize it?


There are legitimate code with similar form,
like pointer increment compared against the "end" pointer
which happens often in string manipulation or buffer management code.
More importantly some of C++ STL iterators often end up in such a form.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: First cut on outputing gimple for LTO using DWARF3. Discussion invited!!!!

2006-08-30 Thread Seongbae Park

On 8/30/06, Mark Mitchell <[EMAIL PROTECTED]> wrote:
...

I guess my overriding concern is that we're focusing heavily on the data
format here (DWARF?  Something else?  Memory-mappable?  What compression
scheme?) and we may not have enough data.  I guess we just have to pick
something and run with it.  I think we should try to keep that code as
as separate as possible so that we can recover easily if whatever we
pick turns out to be (another) bad choice. :-)


At the risk of stating the obvious and also repeating myself,
please allow me give my thought on this issue.

I think we should take even a step further than "try to keep the code
as separate".
We should try to come up with a set of
procedural and datastructural interface for the input/output
of the program structure,
and try  to *completely* separate the optimization/datastructure cleanup work
from the encoding/decoding.

Beside the basic requirement of being able to pass through
enough information to produce valid program,
I think there is a critical requirement
to implement inter-module/inter-procedural optimization efficiently
- that the I/O interface allows efficient handling of
iterating through module/procedure-level information
without reading each and every module/procedure bodies
(as Ken mentioned).

There are certain amount of information per object/procedure that
are accessed during different optimization and with sufficiently
different pattern -
e.g. type tree is naturally an object-level information
that we may want to go through for each and every object file,
without read all function bodies,
and other function level information such as caller/callee information
would be useful without the function body.

We'll need to identify such information (in other words,
the requirement of the interprocedural optimization/analysis)
so that the new interface would provide ways to walk through them
without loading the entire function bodies - even with large address space,
if the data is scattered everywhere, it becomes extremely inefficient
on modern machines to go through them,
so it's actually more important to identify what logical information
that we want to access during various interprocedural optimizations
and the I/O interface needs to handle them efficiently.

This requirement should dictate how we encode/layout the data
into the disk, before anything else. Also how the information is
presented to the actual inter-module optimization/analysis.

Also, part of defining the interface would involve restricting
the existing structures (e.g. GIMPLE) in possibly more limited form
than what's currently allowed. By virtue of having an interface
that separates the encoding/decoding from the rest of the compilation,
we can throw away and recompute certain information
(e.g. often certain control flow graph can be recovered,
hence does not need to be encoded)
but those details can be worked out as the implementation of the IO interface
gets more in shape.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Merging identical functions in GCC

2006-09-15 Thread Seongbae Park

On 15 Sep 2006 13:54:00 -0700, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:

Laurent GUERBY <[EMAIL PROTECTED]> writes:

> On Fri, 2006-09-15 at 09:27 -0700, Ian Lance Taylor wrote:
> > I think Danny has a 75% implementation based on hashing the RTL for a
> > section and using that to select the COMDAT section signature.  Or the
> > hashing can be done in the linker, but it still needs compiler help to
> > know when it is permissible.
>
> For code sections (I assume read-only), isn't the linker always able to
> merge identical ones? What can the compiler do better than the linker?

The linker can't merge just any identical code section, it can only
merge code sections where that is permitted.  For example, consider:

int foo() { return 0; }
int bar() { return 0; }
int quux(int (*pfn)()) { return pfn == &foo; }

Compile with -ffunction-sections.  If the linker merges sections, that
program will break.

I think that in general it would be safe to merge any equivalent
COMDAT code section, though.


Just like most other compiler optimizations,
as-if rule would apply here as well.
If the linker can fake the above or prevent the optimization to happen
only for above cases, the optimization can be safe
(e.g. if there's no relocation against the function other than direct call
and the function symbol is "hidden" (as in linker scoping),
the linker can know that the address of a function is not taken).

However, I'd like to note that the debug information could become screwy
under this optimization (as usual with any optimization of this sort)
and could potentially make debugger's life somewhat miserable
(e.g. a template of a pointers may end up having exactly the same code
thus eligible for merging by linker,
then debugger may not be able to tell what is the real type of the pointer,
and you'll get "impossible" stack traces, etc).
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: reducing stack size by breaking SPARC ABI for OS-less environments

2008-12-15 Thread Seongbae Park 박성배 朴成培
On Mon, Dec 15, 2008 at 2:20 PM, David Meggy  wrote:
> Hi, I'm working on a very embedded project where we have no operating
> system, and there is no window overflow trap handler.  I'm really
> stretched for memory and I'd like to reduce the stack size.  I haven't
> not being able to find anyone else who is looking at reducing the stack
> usage by google searching, but it seems like there is room to be saved,
> and I need the extra space.
>
> As I have no window overflow trap handler the space reserved on the
> stack for saving in and local registers is just wasted memory.  Is there
> any way I can reclaim this space by forcing the compiler to not honour
> the standard SPARC ABI?
>
> Would it be possible to take this one step further and drop the stack
> space for arguments?  We don't have functions with large numbers of
> arguments, so all arguments should be passed in registers.
>
> Although only a single word, the One-word hidden parameter, seems like a
> waste if none of our code ever uses it.
>
> David

With gcc 3.x, you can use -mflat which would do what you want.
gcc 4.x completely dropped the support for -mflat, so unless you (or
someone) work on reintroducing it,
IHMO, you're out of luck with gcc 4.x.

Seongbae


Re: fbranch-probabilities bug

2009-01-08 Thread Seongbae Park 박성배 朴成培
This is the intended behavior, though now I see that the documentation
isn't very clear.
You need to use -fprofile-use - the typical usage scenario is to
compile with -fprofile-generate
to build an executable to do profile collection, and then compile with
-fprofile-use
to build optimized code using the profile data.

Seongbae

On Thu, Jan 8, 2009 at 6:30 AM, Hariharan Sandanagobalane
 wrote:
> Hi Seongbae,
> I was doing some work on profiling for picochip, when i noticed what looks
> to me like a bug. It looks to me that using fbranch-probabilities on the
> commandline (after a round of profile-generate or profile-arcs) would just
> not work on any target. Reason..
>
> Coverage.c:1011
>
>  if (flag_profile_use)
>read_counts_file ();
>
> Should this not be
>
>  if (flag_profile_use || flag_branch_probabilities)  // Maybe more flags
>read_counts_file ();
>
> ??
>
> Of course, i hit the problem later on since the counts were not read, it
> just assumed that the .gcda file were not available, when it actually was.
>
> Thanks
> Hari
>


Re: fbranch-probabilities bug

2009-01-08 Thread Seongbae Park 박성배 朴成培
On Thu, Jan 8, 2009 at 10:11 AM, Hariharan  wrote:
> Hi Seongbae,
> Does that mean that someone cant use the profile just to annotate branches
> (and get better code by that), without having to get the additional baggage
> of "unroll-loops", "peel-loops" etc?

You can do that by selectively turning optimizations off (e.g.
-fprofile-use -fno-unroll-loops -fno-peel-loops ).

> In my case, i am interested in not bloating the code size, but get any
> performance that is to be had from profiling. Is that possible?
>
> Note: My profile generate phase was also just -fprofile-arcs since i am not
> interested in other kinds of profile.

Have you measured the impact on the performance and the code size from
using full -fprofile-generate/-fprofile-use ?
If yes, and you have seen any performance degradation or unnecessary
code bloat from other optimization,
please file a bug.
If not, then I'd say you probably want to try measuring it - in
particular, value profiling has been
becoming more and more useful. And in my experience, majority of the
code size increase as well as the performance benefit
with -fprofile-use comes from extra inlining (which -fprofile-arcs
then -fbranch-probabilities also enable).

Seongbae


Re: Excess registers pushed - regs_ever_live not right way?

2008-02-27 Thread Seongbae Park (박성배, 朴成培)
On Wed, Feb 27, 2008 at 5:16 PM, Andrew Hutchinson
<[EMAIL PROTECTED]> wrote:
> Register saves by prolog (pushes) are typically made with reference to
>  "df_regs_ever_live_p()" or  "regs_ever_live. "||
>
>  If my understanding is correct,  these calls reflect register USEs and
>  not register DEFs. So if register is used in a function, but not
>  otherwise changed, it will get pushed unnecessarily on stack by prolog.

This implies that the register is either a global register
or a parameter register, in either case it won't be saved/restored
as callee save.
What kind of a register is it and how com there's only use of it in a function
but it's not a global ?

Seongbae


Re: Excess registers pushed - regs_ever_live not right way?

2008-02-27 Thread Seongbae Park (박성배, 朴成培)
You can use DF_REG_DEF_COUNT() - if this is indeed a parameter register,
there should be only one def (artificial def) or no def at all.
Or if you want to see all defs for the reg,
follow DF_REG_DEF_CHAIN().

Seongbae

On Wed, Feb 27, 2008 at 6:03 PM, Andrew Hutchinson
<[EMAIL PROTECTED]> wrote:
> Register contains  parameter that is passed to function. This register
>  is not part of call used set.
>
>  If this type of register were modified by function, then it would be
>  saved by function.
>
>  If this register is not modified by function, it should not be saved.
>  This is true even if function is not a leaf function (as same register
>  would be preserved by deeper calls)
>
>
>  Andy
>
>
>
>
>
>  Seongbae Park (박성배, 朴成培) wrote:
>  > On Wed, Feb 27, 2008 at 5:16 PM, Andrew Hutchinson
>  > <[EMAIL PROTECTED]> wrote:
>  >
>  >> Register saves by prolog (pushes) are typically made with reference to
>  >>  "df_regs_ever_live_p()" or  "regs_ever_live. "||
>  >>
>  >>  If my understanding is correct,  these calls reflect register USEs and
>  >>  not register DEFs. So if register is used in a function, but not
>  >>  otherwise changed, it will get pushed unnecessarily on stack by prolog.
>  >>
>  >
>  > This implies that the register is either a global register
>  > or a parameter register, in either case it won't be saved/restored
>  > as callee save.
>  > What kind of a register is it and how com there's only use of it in a 
> function
>  > but it's not a global ?
>  >
>  > Seongbae
>  >
>  >
>



-- 
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Excess registers pushed - regs_ever_live not right way?

2008-03-01 Thread Seongbae Park (박성배, 朴成培)
2008/3/1 Andrew Hutchinson <[EMAIL PROTECTED]>:
> I'm am still struggling with a good solution that avoids unneeded saves
>  of parameter registers.
>
>  To solve problem all I need to know are the registers actually used for
>  parameters. Since the caller assumes all of these are clobbered by
>  callee - they should never need to be saved.

I'm totally confused what is the problem here.
I thought you were seeing extra callee-save register save/restore in prologue,
but now it sounds like you're seeing extra caller-save register save/restore.
Which one are you trying to solve, and what kind of target is this ?

>  DF_REG_DEF_COUNT is showing 1 artificial def for all POTENTIAL parameter
>  registers - not just the ones that are really used (since it uses target
>  FUNCTION_ARG_REGNO_P to get parameter registers)

You said you wanted to know if there's a def of a register within a function.
For an incoming parameter, there will be one artificial def,
and if there's no other def, it means there's no real def
of the register within the function.

>  So the DF artificial defs are useless in trying to find real parameter
>  registers.

I don't understand what you mean by this. What do you mean by
"real parameter register" ?

>  That seem to require going over all DF chains to work out which
>  registers are externally defined. DF does not solve problem for me.

What do you mean by "externally defined" ?
DF may not solve the problem for you,
but now I'm completely lost on what your problem is.

>  There has got to be an easier way of finding parameter registers used by
>  function.

If you want to find all the uses (use as in reading a register but not
writing to it),
you should look at USE chain, not DEF chain, naturally.

Seongbae


Re: Test Harness and SPARC VIS Instructions

2008-03-13 Thread Seongbae Park (박성배, 朴成培)
On Thu, Mar 13, 2008 at 11:31 AM, Joel Sherrill
<[EMAIL PROTECTED]> wrote:
> Hi,
>
>  Moving on the SPARC, I see a lot of similar
>  unsupported instruction failures.  I only
>  see a single sparc feature test.  It is for
>  "-multrasparc -mvis" and it is actually
>  passing on the sparc instruction simulator in
>  gdb. It doesn't make me feel good that this
>  part passes since I thought SIS was a vanilla
>  V7 simulator. I think this test isn't tight enough:
>
>  proc check_effective_target_ultrasparc_hw { } {
> return [check_runtime ultrasparc_hw {
> int main() { return 0; }
> } "-mcpu=ultrasparc"]
>  }
>
>  For sure though, SIS does NOT support VIS and
>  there is no test for that.  This leads to this:
>
>  Starting program:
>  /home/joel/work-gnat/svn/b-gcc1-sparc/gcc/testsuite/gcc/pr18400.exe
>  Unexpected trap (0x 2) at address 0x02001318
>  illegal instruction
>
>  Program exited normally.
>  (gdb) disassemble 0x02001318
>  Dump of assembler code for function check_vect:
>  ...
>  0x02001318 : impdep1  62, %g0, %g0, %g0
>  ...
>
>  Can someone familiar with VIS provide an instruction
>  that is OK to do a run-time test with to check if
>  it is supported?

I don't think there's any user level instruction/register to do that.
You'll have to catch the illegal instruction trap :(

Seongbae


Re: Test Harness and SPARC VIS Instructions

2008-03-13 Thread Seongbae Park (박성배, 朴成培)
2008/3/13 Joel Sherrill <[EMAIL PROTECTED]>:
>
> Seongbae Park (박성배, 朴成培) wrote:
>  > On Thu, Mar 13, 2008 at 11:31 AM, Joel Sherrill
>  > <[EMAIL PROTECTED]> wrote:
>  >
>  >> Hi,
>  >>
>  >>  Moving on the SPARC, I see a lot of similar
>  >>  unsupported instruction failures.  I only
>  >>  see a single sparc feature test.  It is for
>  >>  "-multrasparc -mvis" and it is actually
>  >>  passing on the sparc instruction simulator in
>  >>  gdb. It doesn't make me feel good that this
>  >>  part passes since I thought SIS was a vanilla
>  >>  V7 simulator. I think this test isn't tight enough:
>  >>
>  >>  proc check_effective_target_ultrasparc_hw { } {
>  >> return [check_runtime ultrasparc_hw {
>  >> int main() { return 0; }
>  >> } "-mcpu=ultrasparc"]
>  >>  }
>  >>
>  >>  For sure though, SIS does NOT support VIS and
>  >>  there is no test for that.  This leads to this:
>  >>
>  >>  Starting program:
>  >>  /home/joel/work-gnat/svn/b-gcc1-sparc/gcc/testsuite/gcc/pr18400.exe
>  >>  Unexpected trap (0x 2) at address 0x02001318
>  >>  illegal instruction
>  >>
>  >>  Program exited normally.
>  >>  (gdb) disassemble 0x02001318
>  >>  Dump of assembler code for function check_vect:
>  >>  ...
>  >>  0x02001318 : impdep1  62, %g0, %g0, %g0
>  >>  ...
>  >>
>  >>  Can someone familiar with VIS provide an instruction
>  >>  that is OK to do a run-time test with to check if
>  >>  it is supported?
>  >>
>  >
>  > I don't think there's any user level instruction/register to do that.
>  > You'll have to catch the illegal instruction trap :(
>  >
>  >
>  I have learned a lot the past few days so let me see if I can explain
>  what I have learned. :-D
>
>  Depending upon the test and target architecture, there are various
>  mechanisms to prevent the execution of code which is clearly not
>  supported on a particular target board/cpu (as opposed to the compiler
>  target).
>
>  + many architectures check that a multilib flag is supported
>  + Some do a run-time check. x86 uses cpuid and feature check
>  to avoid things at run-time
>  + Some run a test and let it die on the target board. This is used
>  by Altivec and ARM Neon.
>
>  The last alternative is what will have to be done here. I just
>  need the single easiest VIS instruction to force the failure.

I see. What I meant was the second bullet item above doesn't exist for SPARC.
But you can use, e.g. "fzero" instruction (which is VIS 1.0)
for the third bullet, assuming the target board will correctly trigger illtrap.

Seongbae


Re: Test Harness and SPARC VIS Instructions

2008-03-13 Thread Seongbae Park (박성배, 朴成培)
On Thu, Mar 13, 2008 at 12:32 PM, Joel Sherrill
<[EMAIL PROTECTED]> wrote:
>
> Uros Bizjak wrote:
>  > Hello!
>  >
>  >
>  >> Can someone familiar with VIS provide an instruction
>  >> that is OK to do a run-time test with to check if
>  >> it is supported?
>  >>
>  >
>  > Perhaps this fragment from testsuite/gcc.dg/vect/tree-vect.h may help:
>  >
>  > #elif defined(__sparc__)
>  >   asm volatile (".word\t0x81b007c0");
>  >
>  Thanks. That helped a lot.  Now I only see these cases on vect.exp
>
>  ==
>
>  This one looks like another test slipping another unsupported
>  instruction by.
>
>  0x020012b8 :   bne,pn   %icc, 0x200138c 
>
>  Is this UltraSPARC and not V7?  Do we need two bad instructions
>  in the test case?

Branc with prediction is v9. Even v8 doesn't have it. No v8 processor has VIS,
so actually this test is sufficient. If you really want v7 check (not v8 check),
then you should use something else like, umul,
which is only available on v8 and up.

Seongbae


Re: Miscompilations for cris-elf on dataflow-branch

2007-06-10 Thread Seongbae Park (박성배, 朴成培)

Thanks for the detailed instruction on how to reproduce it
- I have successfully reproduced the problem, and narrowed it down
to combine that's deleting the insn in question.
Hopefully I'll be able to figure out what's wrong soon.

Seongbae

On 6/10/07, Hans-Peter Nilsson <[EMAIL PROTECTED]> wrote:

I hear dataflow-branch is near merge to trunk, so I thought it'd
be about time to verify that it works for the targets I
maintain...

Comparing dataflow-branch with trunk, both r125590, I see these
regressions (alas no improvements) on the branch for cris-elf
cross from x86_64-unknown-linux-gnu (Debian etch, I think):

gcc.sum gcc.c-torture/execute/20020201-1.c
gcc.sum gcc.c-torture/execute/20041011-1.c
gcc.sum gcc.c-torture/execute/920501-8.c
gcc.sum gcc.c-torture/execute/920726-1.c
gcc.sum gcc.c-torture/execute/ashldi-1.c
gcc.sum gcc.c-torture/execute/ashrdi-1.c
gcc.sum gcc.c-torture/execute/builtin-bitops-1.c
gcc.sum gcc.c-torture/execute/builtins/pr23484-chk.c
gcc.sum gcc.c-torture/execute/builtins/snprintf-chk.c
gcc.sum gcc.c-torture/execute/builtins/sprintf-chk.c

Though repeatable by anyone (see for example
<http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01571.html>), all
are unfortunately execution failures, so I thought best to do
some preliminary analysis.

Looking at the source code for what the tests have in common, it
seems all either use sprintf "%d" or a DImode shift operation
(requiring a library call).  My money is on all being the same
one bug.

Here's a cut-down test-case, derived from
gcc.c-torture/execute/builtin-bitops-1.c:

--
static int
my_popcountll (unsigned long long x)
{
  int i;
  int count = 0;
  for (i = 0; i < 8 * sizeof (unsigned long long); i++)
if (x & ((unsigned long long) 1 << i))
  count++;
  return count;
};

extern void abort (void);
extern void exit (int);

int
main (void)
{
  int i;

  if (64 != my_popcountll (0xULL))
abort ();;

  exit (0);
}
--

Here's the assembly diff to trunk, revisions as per above,
option -Os as mentioned below:

--- lshr1.s.trunk   2007-06-11 03:49:21.0 +0200
+++ lshr1.s.df  2007-06-11 03:49:59.0 +0200
@@ -15,7 +15,6 @@ _main:
move.d ___lshrdi3,$r2
moveq -1,$r10
 .L7:
-   move.d $r10,$r11
move.d $r0,$r12
Jsr $r2
btstq (1-1),$r10

To repeat this without building a complete toolchain, have a gcc
svn checkout with those darned mpfr and gmp available somewhere
(added in that checkout or installed on the host system), then
do, in an empty directory:
 /path/to/gcctop/configure --target=cris-elf --enable-languages=c && make 
all-gcc
This will give you a cc1, which you know how to handle. :)

To repeat with the program above named lshr1.c, just use:

 ./cc1 -Os lshr1.c

The lost insn, numbered 61 in both trees, loads the high part of
that all-bits operand to the register in which that part of the
parameter is passed to the DImode left-shift library function
__lshrdi3.  From the dump-file it seems the first pass it is
lost, is combine.

Let me know if I can be of help.

brgds, H-P




--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Miscompilations for cris-elf on dataflow-branch

2007-06-10 Thread Seongbae Park (박성배, 朴成培)

This little patch:

diff -r 9e2b1e62931a gcc/combine.c
--- a/gcc/combine.c Wed Jun 06 23:08:38 2007 +
+++ b/gcc/combine.c Mon Jun 11 05:39:25 2007 +
@@ -4237,7 +4237,7 @@ subst (rtx x, rtx from, rtx to, int in_d

 So force this insn not to match in this (rare) case.  */
  if (! in_dest && code == REG && REG_P (from)
-  && REGNO (x) == REGNO (from))
+  && reg_overlap_mentioned_p (x, from))
return gen_rtx_CLOBBER (GET_MODE (x), const0_rtx);

  /* If this is an object, we are done unless it is a MEM or LO_SUM, both

should fix the problem (thanks to Ian Lance Talyor and Andrew Pinski
for helping me debug the problem on IRC).
I've started the bootstrap/regtest on x86-64.
I'd appreciate it if you can test this on cris.
Although the change is approved by Ian already,
I think I'll hold the patch till the dataflow merge happens.
Thanks,

Seongbae

On 6/10/07, Seongbae Park (박성배, 朴成培) <[EMAIL PROTECTED]> wrote:

Thanks for the detailed instruction on how to reproduce it
- I have successfully reproduced the problem, and narrowed it down
to combine that's deleting the insn in question.
Hopefully I'll be able to figure out what's wrong soon.

Seongbae

On 6/10/07, Hans-Peter Nilsson <[EMAIL PROTECTED]> wrote:
> I hear dataflow-branch is near merge to trunk, so I thought it'd
> be about time to verify that it works for the targets I
> maintain...
>
> Comparing dataflow-branch with trunk, both r125590, I see these
> regressions (alas no improvements) on the branch for cris-elf
> cross from x86_64-unknown-linux-gnu (Debian etch, I think):
>
> gcc.sum gcc.c-torture/execute/20020201-1.c
> gcc.sum gcc.c-torture/execute/20041011-1.c
> gcc.sum gcc.c-torture/execute/920501-8.c
> gcc.sum gcc.c-torture/execute/920726-1.c
> gcc.sum gcc.c-torture/execute/ashldi-1.c
> gcc.sum gcc.c-torture/execute/ashrdi-1.c
> gcc.sum gcc.c-torture/execute/builtin-bitops-1.c
> gcc.sum gcc.c-torture/execute/builtins/pr23484-chk.c
> gcc.sum gcc.c-torture/execute/builtins/snprintf-chk.c
> gcc.sum gcc.c-torture/execute/builtins/sprintf-chk.c
>
> Though repeatable by anyone (see for example
> <http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01571.html>), all
> are unfortunately execution failures, so I thought best to do
> some preliminary analysis.
>
> Looking at the source code for what the tests have in common, it
> seems all either use sprintf "%d" or a DImode shift operation
> (requiring a library call).  My money is on all being the same
> one bug.
>
> Here's a cut-down test-case, derived from
> gcc.c-torture/execute/builtin-bitops-1.c:
>
> --
> static int
> my_popcountll (unsigned long long x)
> {
>   int i;
>   int count = 0;
>   for (i = 0; i < 8 * sizeof (unsigned long long); i++)
> if (x & ((unsigned long long) 1 << i))
>   count++;
>   return count;
> };
>
> extern void abort (void);
> extern void exit (int);
>
> int
> main (void)
> {
>   int i;
>
>   if (64 != my_popcountll (0xULL))
> abort ();;
>
>   exit (0);
> }
> --
>
> Here's the assembly diff to trunk, revisions as per above,
> option -Os as mentioned below:
>
> --- lshr1.s.trunk   2007-06-11 03:49:21.0 +0200
> +++ lshr1.s.df  2007-06-11 03:49:59.0 +0200
> @@ -15,7 +15,6 @@ _main:
> move.d ___lshrdi3,$r2
> moveq -1,$r10
>  .L7:
> -   move.d $r10,$r11
> move.d $r0,$r12
> Jsr $r2
> btstq (1-1),$r10
>
> To repeat this without building a complete toolchain, have a gcc
> svn checkout with those darned mpfr and gmp available somewhere
> (added in that checkout or installed on the host system), then
> do, in an empty directory:
>  /path/to/gcctop/configure --target=cris-elf --enable-languages=c && make 
all-gcc
> This will give you a cc1, which you know how to handle. :)
>
> To repeat with the program above named lshr1.c, just use:
>
>  ./cc1 -Os lshr1.c
>
> The lost insn, numbered 61 in both trees, loads the high part of
> that all-bits operand to the register in which that part of the
> parameter is passed to the DImode left-shift library function
> __lshrdi3.  From the dump-file it seems the first pass it is
> lost, is combine.
>
> Let me know if I can be of help.
>
> brgds, H-P
>


--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";




--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Some regressions from the dataflow merge

2007-06-12 Thread Seongbae Park (박성배, 朴成培)

On 6/12/07, Richard Guenther <[EMAIL PROTECTED]> wrote:

On Tue, 12 Jun 2007, Richard Guenther wrote:

>
> On ia64 SPEC2000 I see fma3d and applu now miscompare.

On x86_64 186.wupwise ICEs with -O2 -ffast-math and FDO:

/gcc/spec/sb-haydn-fdo-64/x86_64/install-200706120559/bin/gfortran -c -o
zscal.o-fprofile-use -O2 -ffast-math  zscal.f
Error from fdo_make_pass2 'specmake -j2 FDO=PASS2 build 2>
fdo_make_pass2.err | tee fdo_make_pass2.out':
zgemm.f: In function 'zgemm':
zgemm.f:413: internal compiler error: in remove_insn, at emit-rtl.c:3597
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html> for instructions.

Likewise for 176.gcc:

combine.c: In function 'simplify_comparison':
combine.c:9697: internal compiler error: in remove_insn, at
emit-rtl.c:3597
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html> for instructions.
specmake: *** [combine.o] Error 1


Sounds like there's a pass that are emitting a barrier during cfglayout mode.
I'll look at them.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Some regressions from the dataflow merge

2007-06-14 Thread Seongbae Park (박성배, 朴成培)

On 6/14/07, Richard Guenther <[EMAIL PROTECTED]> wrote:

On Wed, 13 Jun 2007, Kenneth Zadeck wrote:

> Richard Guenther wrote:
> > On Tue, 12 Jun 2007, Richard Guenther wrote:
> >
> >
> >> On ia64 SPEC2000 I see fma3d and applu now miscompare.
> >>
> >
> > On x86_64 186.wupwise ICEs with -O2 -ffast-math and FDO:
> >
> > /gcc/spec/sb-haydn-fdo-64/x86_64/install-200706120559/bin/gfortran -c -o
> > zscal.o-fprofile-use -O2 -ffast-math  zscal.f
> > Error from fdo_make_pass2 'specmake -j2 FDO=PASS2 build 2>
> > fdo_make_pass2.err | tee fdo_make_pass2.out':
> > zgemm.f: In function 'zgemm':
> > zgemm.f:413: internal compiler error: in remove_insn, at emit-rtl.c:3597
> > Please submit a full bug report,
> > with preprocessed source if appropriate.
> > See http://gcc.gnu.org/bugs.html> for instructions.
> >
> > Likewise for 176.gcc:
> >
> > combine.c: In function 'simplify_comparison':
> > combine.c:9697: internal compiler error: in remove_insn, at
> > emit-rtl.c:3597
> > Please submit a full bug report,
> > with preprocessed source if appropriate.
> > See http://gcc.gnu.org/bugs.html> for instructions.
> > specmake: *** [combine.o] Error 1
> >
> > Richard.
> >
> Richard,
>
> did these two regression go away?  We committed a change to gcse that
> should have fixed this bug.

These two are gone now.  The ia64 miscompares still are there.


I'm looking at it. It is either a scheduler problem,
or some other problem downstream triggered by the scheduler.
However, I'm having hard time adding fine-grained dbg_cnt to our scheduler
- do you know who might be interested in adding insn level
dbg_cnt in the scheduler ? Current dbg_cnt (sched_insn) causes ICE :(
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Fixing m68hc11 reorg after dataflow merge

2007-06-16 Thread Seongbae Park (박성배, 朴成培)

On 6/16/07, Rask Ingemann Lambertsen <[EMAIL PROTECTED]> wrote:

   I need some help making m68hc11_reorg() work after the dataflow merge, in
particular this bit:

  /* Re-create the REG_DEAD notes.  These notes are used in the machine
 description to use the best assembly directives.  */
  if (optimize)
{
  /* Before recomputing the REG_DEAD notes, remove all of them.
 This is necessary because the reload_cse_regs() pass can
 have replaced some (MEM) with a register.  In that case,
 the REG_DEAD that could exist for that register may become
 wrong.  */
  for (insn = first; insn; insn = NEXT_INSN (insn))
{
  if (INSN_P (insn))
{
  rtx *pnote;

  pnote = ®_NOTES (insn);
  while (*pnote != 0)
{
  if (REG_NOTE_KIND (*pnote) == REG_DEAD)
*pnote = XEXP (*pnote, 1);
  else
pnote = &XEXP (*pnote, 1);
}
}
}

  life_analysis (PROP_REG_INFO | PROP_DEATH_NOTES);
}

--
Rask Ingemann Lambertsen


Try:

df_note_add_problem ();
df_analyze ();
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Fwd: INCOMING_RETURN_ADDR_RTX in ia64.h

2007-06-18 Thread Seongbae Park (박성배, 朴成培)

Forwarding to gcc@, as this might be interesting to other people,
and I'd like to ask whoever working on ia64 to take this issue up.

Seongbae

-- Forwarded message --
From: Seongbae Park (박성배, 朴成培) <[EMAIL PROTECTED]>
Date: Jun 18, 2007 12:44 AM
Subject: INCOMING_RETURN_ADDR_RTX in ia64.h
To: [EMAIL PROTECTED]
Cc: Kenneth Zadeck <[EMAIL PROTECTED]>, "Berlin, Daniel"
<[EMAIL PROTECTED]>, Ian Lance Taylor <[EMAIL PROTECTED]>, Paolo
Bonzini <[EMAIL PROTECTED]>, Richard Guenther
<[EMAIL PROTECTED]>


Hi Jim,

This is an analysis of a correctness bug on ia64,
which root cause has to do with your code.

After dataflow merge, Richard Guenther reported
two runtime failures in SPEC CPU2000 programs on ia64.
It turned out to be related to the following code you wrote
(or at least committed, according to svn). From ia64.h:

   928 /* A C expression whose value is RTL representing the location
of the incoming
   929return address at the beginning of any function, before the
prologue.  This
   930RTL is either a `REG', indicating that the return value is
saved in `REG',
   931or a `MEM' representing a location in the stack.  This enables DWARF2
   932unwind info for C++ EH.  */
   933 #define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (VOIDmode, BR_REG (0))
   934
   935 /* ??? This is not defined because of three problems.
   9361) dwarf2out.c assumes that DWARF_FRAME_RETURN_COLUMN fits
in one byte.
   937The default value is FIRST_PSEUDO_REGISTER which doesn't.
This can be
   938worked around by setting PC_REGNUM to FR_REG (0) which is
an otherwise
   939unused register number.
   9402) dwarf2out_frame_debug core dumps while processing
prologue insns.  We
   941need to refine which insns have RTX_FRAME_RELATED_P set and
which don't.
   9423) It isn't possible to turn off EH frame info by defining
DWARF2_UNIND_INFO
   943to zero, despite what the documentation implies, because it
is tested in
   944a few places with #ifdef instead of #if.  */
   945 #undef INCOMING_RETURN_ADDR_RTX

Here, because INCOMING_RETURN_ADDR_RTX is ultimately undef'ed,
dataflow doesn't see any definition of the return address register,
and happily treats b0 as not live throughout the function body.
Then, global allocator, guided by this information, allocates
b0 for something else - leading to the return address corruption.

Removing the undef leads to an ICE in dwarf2out.c (probably due to 2
in your comment ?).

Certainly from the new dataflow point of view,
we need this macro to be defined,
because, otherwise, the use of b0 in the return is considered a use without def.
Previously, flow() didn't consider uninitialized registers
and just having a use of b0 in the return was sufficient,
as it made b0 live across the entire function
thanks to flow's backward only live analysis.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: virtual stack regs.

2007-06-19 Thread Seongbae Park (박성배, 朴成培)

On 6/19/07, Rask Ingemann Lambertsen <[EMAIL PROTECTED]> wrote:
..

   Hmm, how do you handle arg_pointer_rtx, frame_pointer_rtx and the like?
The are all uninitialized until the prologue is emitted, which is some time
after reload.


ARG_POINTER_REGNUM is included in the artificial defs of all blocks
(which I think is overly conservative - just having them
in the entry block def should be enough).
Hence, from dataflow point of view, they are always considered initialized.

I think we should probably do something similar
for VIRTUAL_STACK_*_REGNUM.


> 5) How can I tell if a reg is a virtual_stack_reg?

   FIRST_VIRTUAL_REGISTER <= regno <= LAST_VIRTUAL_REGISTER

--
Rask Ingemann Lambertsen


--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: class 3 edges

2007-06-19 Thread Seongbae Park (박성배, 朴成培)

On 6/19/07, Sunzir Deepur <[EMAIL PROTECTED]> wrote:

hello,

when I compile with -dv -fdump-rtl-* I somtimes see in the VCG files
some edges that have no meaning in the flow of the program.
these edges are always green and class 3.

what are those edges ? what is their purposes ?

thank you
sunzir


See gcc/graph.c:print_rtl_graph_with_bb().
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Help on testsuite failures, only with optimization & bootstrap

2007-06-26 Thread Seongbae Park (박성배, 朴成培)

On 6/26/07, Steve Ellcey <[EMAIL PROTECTED]> wrote:

After the dataflow merge (and after doing a couple of other patches that
were needed just to boostrap GCC on IA64 HP-UX), I am still getting some
failures in the GCC testsuite and am hoping for some advise / help on
figuring out what is going on.

A bunch of tests like the following one:

void __attribute__ ((__noinline__)) foo (int xf) {}
int main() { foo (1); return 0; }

Are failing because the generate the following warning:

$ obj_gcc/gcc/cc1 -quiet -O2 x.c
x.c:2: warning: inline function 'foo' given attribute noinline

Now, the problem here is, of course, that foo was no declared to be
inline and the warning should not be happening.  If I recompile the GCC
file c-decl.c with -O2 -fno-tree-dominator-opts (instead of just -O2)
then the resulting GCC will not generate the warning message.  But so
far I haven't been able to look at the tree dump files and see where
things are going wrong.  It doesn't help that the dominator pass gets
run 3 times and I am not sure which one is causing the problem.

Unfortunately, I am only seeing this on IA64 HP-UX.  It does not happen
on IA64 Linux.

Does anyone have any advice / ideas / recommendations on how to debug
this problem?

Steve Ellcey
[EMAIL PROTECTED]


If you want to find out exactl which invocation of dominator pass
is causing the problem,
I recommend adding a new dbg_cnt, something like (untested):

diff -r d856dc0baad4 gcc/dbgcnt.def
--- a/gcc/dbgcnt.defWed Jun 27 01:21:13 2007 +
+++ b/gcc/dbgcnt.defTue Jun 26 21:17:55 2007 -0700
@@ -84,3 +84,5 @@ DEBUG_COUNTER (tail_call)
DEBUG_COUNTER (tail_call)
DEBUG_COUNTER (global_alloc_at_func)
DEBUG_COUNTER (global_alloc_at_reg)
+DEBUG_COUNTER (uncprop_at_func)
+DEBUG_COUNTER (dominator_at_func)
diff -r d856dc0baad4 gcc/tree-ssa-dom.c
--- a/gcc/tree-ssa-dom.cWed Jun 27 01:21:13 2007 +
+++ b/gcc/tree-ssa-dom.cTue Jun 26 21:18:26 2007 -0700
@@ -44,6 +44,7 @@ Boston, MA 02110-1301, USA.  */
#include "tree-ssa-propagate.h"
#include "langhooks.h"
#include "params.h"
+#include "dbgcnt.h"

/* This file implements optimizations on the dominator tree.  */

@@ -365,7 +366,7 @@ static bool
static bool
gate_dominator (void)
{
-  return flag_tree_dom != 0;
+  return flag_tree_dom != 0 && dbg_cnt (dominator_at_func);
}

struct tree_opt_pass pass_dominator =
diff -r d856dc0baad4 gcc/tree-ssa-uncprop.c
--- a/gcc/tree-ssa-uncprop.cWed Jun 27 01:21:13 2007 +
+++ b/gcc/tree-ssa-uncprop.cTue Jun 26 21:18:35 2007 -0700
@@ -40,6 +40,7 @@ Boston, MA 02110-1301, USA.  */
#include "tree-pass.h"
#include "tree-ssa-propagate.h"
#include "langhooks.h"
+#include "dbgcnt.h"

/* The basic structure describing an equivalency created by traversing
   an edge.  Traversing the edge effectively means that we can assume
@@ -604,7 +605,7 @@ static bool
static bool
gate_uncprop (void)
{
-  return flag_tree_dom != 0;
+  return flag_tree_dom != 0 && dbg_cnt (uncprop_at_func);
}

struct tree_opt_pass pass_uncprop =


This will at least allow you to fairly quickly find which invocation of the pass
is causing the problem, by doing a binary search on "n"
by adding the following extra flag to the compilation line:

-fdbg-cnt=uncprop_at_func:n
or
-fdbg-cnt=dominator_at_func:n

Of course, once you narrowed it down to that level,
you'll most likely still need to narrow it down further
but you'll have a better chance (you may want to add
another more fine grained dbgcnt for that).
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: -dv -fdump-rtl-all question

2007-07-18 Thread Seongbae Park (박성배, 朴成培)

On 7/18/07, Sunzir Deepur <[EMAIL PROTECTED]> wrote:

Hi list,

Is it ok to assume that when I compile a C file (that is guranteed to have
some code in it) under the following flags, I always get the mentioned
VCG file (and do not get a bigger one) ?

Flags  Maximum VCG file that is always created
===
"  -dv -fdump-rtl-all"  .49.stack.vcg
"-O1 -dv -fdump-rtl-all"  .49.stack.vcg
"-O2 -dv -fdump-rtl-all"  .50.compgotos.vcg
"-O3 -dv -fdump-rtl-all"  .50.compgotos.vcg

So basically I want to assume the maximum vcg file that is created
is a function of the optimizations and not a function of the source file..


Why?
There's no guarantee,
although usually on any particular version of gcc on a given platform
with given options, probably it is.
--
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: RFC: Rename Non-Autpoiesis maintainers category

2007-07-30 Thread Seongbae Park (박성배, 朴成培)
On 7/30/07, Diego Novillo <[EMAIL PROTECTED]> wrote:
> On 7/27/07 9:58 AM, Zdenek Dvorak wrote:
> > Hello,
> >
> >> I liked the idea of 'Reviewers' more than any of the other options.
> >> I would like to go with this patch, unless we find a much better
> >> option?
> >
> > to cancel this category of maintainers completely?
>
> An interesting idea, but let's discuss that issue separately.  In this
> thread I'm only interested in changing the name of this category.  Not
> discuss whether the category should exist at all.
>
> Since I have not heard any strong opposition to changing the category
> name to 'Reviewers', I will go ahead with this patch later this week.
>
>
> Index: MAINTAINERS
> ===
> --- MAINTAINERS (revision 126951)
> +++ MAINTAINERS (working copy)
> @@ -231,7 +231,7 @@
> maintainers need approval to check in algorithmic changes or changes
> outside of the parts of the compiler they maintain.
>
> -   Non-Autopoiesis Maintainers
> +   Reviewers
>
>  dataflow   Daniel Berlin   [EMAIL PROTECTED]
>  dataflow   Paolo Bonzini   [EMAIL PROTECTED]
> @@ -251,10 +251,9 @@
>  FortranPaul Thomas [EMAIL PROTECTED]
>
>
> -Note that individuals who maintain parts of the compiler as
> -non-autopoiesis maintainers need approval changes outside of the parts
> -of the compiler they maintain and also need approval for their own
> -patches.
> +Note that individuals who maintain parts of the compiler as reviewers
> +need approval for changes outside of the parts of the compiler they
> +maintain and also need approval for their own patches.

Now that the name has been changed to reviewer, I think
the following wording is slightly better:

While reviewers can approve the changes in the parts of the compiler
they maintain,
they still need approval of their own patches from other maintainers
or reviewers.

>  Write After Approval(last name alphabetical
> order)
-- 
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: i seem to have hit a problem with my new conflict finder.

2007-08-17 Thread Seongbae Park (박성배, 朴成培)
On 8/17/07, Kenneth Zadeck <[EMAIL PROTECTED]> wrote:
...
> we should talk.  I am avail today.  i am leaving on vacation tomorrow
> for a week.

Please send me the patch before you leave (and please leave valinor
turned on) - I'll give a look while you're gone.
-- 
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: question about rtl loop-iv analysis

2007-08-28 Thread Seongbae Park (박성배, 朴成培)
On 8/28/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote:
...
> that obviously is not the case here, though.  Do you (or someone else
> responsible for df) have time to have a look at this problem?
> Otherwise, we may discuss it forever, but we will not get anywhere.
>
> Zdenek

Open a PR and assign it to me, if you're not in a hurry -
I should be able to look at it next week.
-- 
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Profile information - CFG

2007-09-28 Thread Seongbae Park (박성배, 朴成培)
On 9/27/07, Hariharan Sandanagobalane <[EMAIL PROTECTED]> wrote:
> Hello,
> I am implementing support for PBO on picochip port of GCC (not yet
> submitted to mainline).
>
> I see that GCC generates 2 files, xx.gcno and xx.gcda, containing the
> profile information, the former containing the flow graph
> information(compile-time) and later containing the edge profile
> information(run-time). The CFG information seems to be getting emitted
> quite early in the compilation process(pass_tree_profile). Is the
> instrumentation also done at this time? If it is, as later phases change

Yes.

> CFG, how is the instrumentation code sanity maintained? If it isnt, How

Instrumentation code sanity is naturally maintained
since those are global load/stores. The compiler transformations naturally
preserve the original semantic of the input
and since profile counters are global variables,
update to those are preserved to provide what unoptimized code would do.

> would you correlate the CFG in gcno file to the actual CFG at
> execution(that produces the gcda file)?

> As for our port's case, we are already able to generate profile
> information using our simulator/hardware, and it is not-too-difficult
> for me to format that information into .gcno and .gcda files. But, i
> guess the CFG that i would have at runtime would be quite different from
> the CFG at initial phases of compilation (even at same optimization
> level). Any suggestions on this? Would i be better off keeping the gcno
> file that GCC generates, try to match the runtime-CFG to the one on the
> gcno file and then write gcda file accordingly?

Not only better off, you *need* to provide information that matches
what's in gcno, otherwise gcc can't read that gcda nor use it.
How you match gcno is a different problem
- there's no guarantee that you'll be able to recover
enough information from the output assembly of gcc,
because without instrumentation, gcc can optimize away the control flow.

pass_tree_profile is when both the instrumentation (with -fprofile-generate)
and reading of the profile data (with -fprofile-use) are done.
The CFG has to remain the same between generate and use
 - otherwise the compiler isn't able to use the profile data.

Seongbae


Re: Profile information - CFG

2007-10-05 Thread Seongbae Park (박성배, 朴成培)
On 10/5/07, Hariharan Sandanagobalane <[EMAIL PROTECTED]> wrote:
>
>
> Seongbae Park (???, ???) wrote:
> > On 9/27/07, Hariharan Sandanagobalane <[EMAIL PROTECTED]> wrote:
> >> Hello,
> >> I am implementing support for PBO on picochip port of GCC (not yet
> >> submitted to mainline).
> >>
> >> I see that GCC generates 2 files, xx.gcno and xx.gcda, containing the
> >> profile information, the former containing the flow graph
> >> information(compile-time) and later containing the edge profile
> >> information(run-time). The CFG information seems to be getting emitted
> >> quite early in the compilation process(pass_tree_profile). Is the
> >> instrumentation also done at this time? If it is, as later phases change
> >
> > Yes.
> >
> >> CFG, how is the instrumentation code sanity maintained? If it isnt, How
> >
> > Instrumentation code sanity is naturally maintained
> > since those are global load/stores. The compiler transformations naturally
> > preserve the original semantic of the input
> > and since profile counters are global variables,
> > update to those are preserved to provide what unoptimized code would do.
> >
> >> would you correlate the CFG in gcno file to the actual CFG at
> >> execution(that produces the gcda file)?
> >
> >> As for our port's case, we are already able to generate profile
> >> information using our simulator/hardware, and it is not-too-difficult
> >> for me to format that information into .gcno and .gcda files. But, i
> >> guess the CFG that i would have at runtime would be quite different from
> >> the CFG at initial phases of compilation (even at same optimization
> >> level). Any suggestions on this? Would i be better off keeping the gcno
> >> file that GCC generates, try to match the runtime-CFG to the one on the
> >> gcno file and then write gcda file accordingly?
> >
> > Not only better off, you *need* to provide information that matches
> > what's in gcno, otherwise gcc can't read that gcda nor use it.
> > How you match gcno is a different problem
> > - there's no guarantee that you'll be able to recover
> > enough information from the output assembly of gcc,
> > because without instrumentation, gcc can optimize away the control flow.
> >
> > pass_tree_profile is when both the instrumentation (with -fprofile-generate)
> > and reading of the profile data (with -fprofile-use) are done.
> > The CFG has to remain the same between generate and use
> >  - otherwise the compiler isn't able to use the profile data.
>
> Thanks for your help, seongbae.
>
> I have managed to get the profile information formatted in the way .gcda
> would look. But, does GCC expect the profile to be accurate? Would it
> accept profile data that came out of sampling?
>
> -Hari

Gcc expects the profile to be "flow consistent".
i.e. at any basic block, the sum of execution count of all incoming edges
have to be equal to that of outgoing edges.

I have a patch adding a new option to tolerate inconsistency
but it will have to wait for stage1 opening in 4.4.
-- 
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: df_insn_refs_record's handling of global_regs[]

2007-10-16 Thread Seongbae Park (박성배, 朴成培)
On 10/16/07, David Miller <[EMAIL PROTECTED]> wrote:
> From: David Miller <[EMAIL PROTECTED]>
> Date: Tue, 16 Oct 2007 03:12:23 -0700 (PDT)
>
> > I have a bug I'm trying to investigate where, starting in gcc-4.2.x,
> > the loop invariant pass considers a computation involving a global
> > register variable as invariant across a call.  The basic structure
> > of the code is:
>
> Here is the most simplified test case I could come up with,
> compile it with "-m64 -Os" on sparc.  expression(regval) is
> moved to before the loop by loop-invariant
>
> register unsigned long regval asm("g5");
>
> extern void cond_resched(void);
>
> unsigned int var;
>
> void *expression(unsigned long regval)
> {
>   void *ret;
>
>   __asm__("" : "=r" (ret) : "0" (&var));
>   return ret + regval;
> }
>
> void func(void **pp)
> {
>   int i;
>
>   for (i = 0; i < 56; i++) {
> cond_resched();
> *pp = expression(regval);
>   }
> }

loop-invariant.cc uses ud-chain.
So if there's something wrong with the chain,
it could go nuts.
Can you send me the rtl dump of loop2_invariant pass ?
-- 
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: df_insn_refs_record's handling of global_regs[]

2007-10-16 Thread Seongbae Park (박성배, 朴成培)
On 10/16/07, David Miller <[EMAIL PROTECTED]> wrote:
> From: "Seongbae Park (박성배, 朴成培)" <[EMAIL PROTECTED]>
> Date: Tue, 16 Oct 2007 21:53:37 -0700
>
> Annyoung haseyo, Park-sanseng-nim,

:)

> > loop-invariant.cc uses ud-chain.
> > So if there's something wrong with the chain,
> > it could go nuts.
> > Can you send me the rtl dump of loop2_invariant pass ?
>
> I have found the problem, and yes it has to do with the ud chains.
>
> Because global registers are only marked via df_invalidated_by_call,
> they get the DF_REF_MAY_CLOBBER flag.
>
> This flag causes the dataflow problem solver to not add the global
> register definitions to the generator set.  Specifically I am
> talking about the code in df_rd_bb_local_compute_process_def(), it
> says:
>
> if (!(DF_REF_FLAGS (def)
>   & (DF_REF_MUST_CLOBBER | DF_REF_MAY_CLOBBER)))
>   bitmap_set_bit (bb_info->gen, DF_REF_ID (def));
>
> Global registers don't get clobbered by calls, they are potentially
> set as a side effect of calling them.  And they are set to valid
> values we might actually depend upon as inputs later.
>
> I tried a potential fix, which is to change df_insn_refs_record(),
> such that it handles global registers instead like this:
>
> if (global_regs[i])
>   df_ref_record (dflow, regno_reg_rtx[i], ®no_reg_rtx[i],
>  bb, insn, DF_REF_REG_DEF, 0, true);
>
> and this made the illegal loop-invariant transformation no longer
> occur in my test case.

Did you replace the DF_REF_REG_USE with DEF ?
If so, that's not correct.  We need to add DEF as well as USE:

diff -r fd0f94fbe89d gcc/df-scan.c
--- a/gcc/df-scan.c Wed Oct 10 03:32:43 2007 +
+++ b/gcc/df-scan.c Tue Oct 16 22:52:44 2007 -0700
@@ -3109,8 +3109,13 @@ df_get_call_refs (struct df_collection_r
  so they are recorded as used.  */
   for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
 if (global_regs[i])
-  df_ref_record (collection_rec, regno_reg_rtx[i],
-NULL, bb, insn, DF_REF_REG_USE, flags);
+  {
+df_ref_record (collection_rec, regno_reg_rtx[i],
+  NULL, bb, insn, DF_REF_REG_USE, flags);
+df_ref_record (collection_rec, regno_reg_rtx[i],
+  NULL, bb, insn, DF_REF_REG_DEF, flags);
+  }
+

   is_sibling_call = SIBLING_CALL_P (insn);
   EXECUTE_IF_SET_IN_BITMAP (df_invalidated_by_call, 0, ui, bi)


Then, we'll need to change the df_invalidated_by_call loop
not to add global_regs[] again (with MAY_CLOBBER bits).
-- 
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: Plans for Linux ELF "i686+" ABI ? Like SPARC V8+ ?

2007-10-16 Thread Seongbae Park (박성배, 朴成培)
On 10/14/07, Darryl L. Miles <[EMAIL PROTECTED]> wrote:
>
> Hello,
>
> On SPARC there is an ABI that is V8+ which allows the linking (and
> mixing) of V8 ABI but makes uses of features of 64bit UltraSparc CPUs
> (that were not available in the older 32bit only CPUs).  Admittedly
> looking at the way this works it could be said that Sun had a certain
> about of  forward thinking when they developed their 32bit ABI (this is
> not true of the 32bit Intel IA32 ABIs that exist).

Sun didn't have much forward thinking with their 32-bit ABI.
It's just that their 32-bit ISA was relatively amenable
to 64-bit extension with 32-bit ABI compatibility,
which can not be said for IA32.

> Are there any plans for a plan a new Intel IA32 ABI that is designed
> solely to run on 64bit capable Intel IA32 CPUs (that support EMT64) ?
> Userspace would have 32bit memory addressing, but access to more
> registers, better function call conventions, etc...
>
> This would be aimed to replace the existing i386/i686 ABIs on the
> desktop and would not be looking to maintain backward compatibility
> directly.
>
> My own anecdotal evidence is that I've been using a x86_64 distribution
> (with dual 64bit and 32bit runtime support) for a few years now and have
> found performance to be lacking in my two largest footprint applications
> (my browser and my development IDE totaling 5Gb of footprint between
> them).   I recently converted both these applications from their 64bit
> versions to 32bit (they are the only 32bit applications running) and the
> overall interactive performance has improved considerably possibly due
> to the reduced memory footprint alone, a 4.5 Gb footprint 64bit
> application is equivalent to a 2 Gb footprint 32bit application in these
> application domains.
>
> Maybe someone knows of a white paper published to find out if the
> implications and benefit a movement in this direction would mean.  Maybe
> just using the existing 64bit ABI with 32bit void pointers (and long's)
> is as good a specification as any.
>
> RFCs,
>
> Darryl

More appropriate comparison is probably against MIPS,
with their two 32-bit ABIs (O32 and N32 ABI).
Essentially you're asking for N32 equivalent.

My bet is that most people simply don't care enough about
the performance differential.
-- 
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: df_insn_refs_record's handling of global_regs[]

2007-10-19 Thread Seongbae Park (박성배, 朴成培)
On 10/19/07, David Miller <[EMAIL PROTECTED]> wrote:
> From: "Seongbae Park (박성배, 朴成培)" <[EMAIL PROTECTED]>
> Date: Fri, 19 Oct 2007 17:25:14 -0700
>
> > If you're not in a hurry, can you wait
> > till I run the regtest against 4.2 on x86-64 ?
> > I've already discussed the patch with Kenny
> > and we agreed that this is the right approach,
> > but I'd like to see the clean regtest on x86 for both 4.2 and 4.3
> > before I approve.
> > Thanks,
>
> I am in no rush, please let me know if you want some help
> tracking down the failure you are seeing.
>
> Since you say it is a libgomp failure... I wonder if some of
> the atomic primitives need some side effect markings which
> are missing and thus exposed by not clobbering global regs
> at call sites any more.

It looks like it's just a flaky test - it randomly fails on my test machine
with or without the patch (for interested, it's omp_parse3.f90  with -O0).
I haven't started 4.2 testing yet - I'll let you know when I get that done.
-- 
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";


Re: df_insn_refs_record's handling of global_regs[]

2007-10-19 Thread Seongbae Park (박성배, 朴成培)
On 10/19/07, David Miller <[EMAIL PROTECTED]> wrote:
> From: "Seongbae Park (박성배, 朴成培)" <[EMAIL PROTECTED]>
> Date: Tue, 16 Oct 2007 22:56:49 -0700
>
> > Did you replace the DF_REF_REG_USE with DEF ?
> > If so, that's not correct.  We need to add DEF as well as USE:
>  ...
> > Then, we'll need to change the df_invalidated_by_call loop
> > not to add global_regs[] again (with MAY_CLOBBER bits).
>
> Seongbae-ssi, I've done full regstraps of mainline with the
> following patch on sparc-linux-gnu and sparc64-linux-gnu.

I've been testing this on x86-64 on top of 4.3,
and I see one regression in libgomp which I'm trying to find out
whether it's this patch or some other external cause.

> Do you mind if I check in this fix?  I would also like to
> pursue getting this installed on the gcc-4.2 branch as well,
> as I stated I've already done several regstraps of the 4.2
> backport of this fix as well.

If you're not in a hurry, can you wait
till I run the regtest against 4.2 on x86-64 ?
I've already discussed the patch with Kenny
and we agreed that this is the right approach,
but I'd like to see the clean regtest on x86 for both 4.2 and 4.3
before I approve.
Thanks,

Seongbae

> Thank you.
>
> 2007-10-18  Seongbae Park <[EMAIL PROTECTED]>
> David S. Miller  <[EMAIL PROTECTED]>
>
> * df-scan.c (df_get_call_refs): Mark global registers as both a
> DF_REF_REG_USE and a non-clobber DF_REF_REG_DEF.
>
> --- df-scan.c.ORIG  2007-10-18 16:56:19.0 -0700
> +++ df-scan.c   2007-10-18 16:56:50.0 -0700
> @@ -3109,18 +3109,22 @@
>   so they are recorded as used.  */
>for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
>  if (global_regs[i])
> -  df_ref_record (collection_rec, regno_reg_rtx[i],
> -NULL, bb, insn, DF_REF_REG_USE, flags);
> +  {
> +   df_ref_record (collection_rec, regno_reg_rtx[i],
> +  NULL, bb, insn, DF_REF_REG_USE, flags);
> +   df_ref_record (collection_rec, regno_reg_rtx[i],
> +  NULL, bb, insn, DF_REF_REG_DEF, flags);
> +  }
>
>is_sibling_call = SIBLING_CALL_P (insn);
>EXECUTE_IF_SET_IN_BITMAP (df_invalidated_by_call, 0, ui, bi)
>  {
> -  if ((!bitmap_bit_p (defs_generated, ui))
> +  if (!global_regs[ui]
> + && (!bitmap_bit_p (defs_generated, ui))
>   && (!is_sibling_call
>   || !bitmap_bit_p (df->exit_block_uses, ui)
>   || refers_to_regno_p (ui, ui+1,
> current_function_return_rtx, NULL)))
> -
>  df_ref_record (collection_rec, regno_reg_rtx[ui],
>NULL, bb, insn, DF_REF_REG_DEF, DF_REF_MAY_CLOBBER | 
> flags);
>  }


Re: df_insn_refs_record's handling of global_regs[]

2007-10-22 Thread Seongbae Park (박성배, 朴成培)
Hi Dave,

On x86-64, no regression in 4.2 with the patch.
So both 4.2 and mainline patches are OK.

I'd appreciate it if you can add the testcase
- it's up to you whether to add it in a separate patch or with this patch.
Thanks for fixing it.

Seongbae

On 10/19/07, Seongbae Park (박성배, 朴成培) <[EMAIL PROTECTED]> wrote:
> On 10/19/07, David Miller <[EMAIL PROTECTED]> wrote:
> > From: "Seongbae Park (박성배, 朴成培)" <[EMAIL PROTECTED]>
> > Date: Fri, 19 Oct 2007 17:25:14 -0700
> >
> > > If you're not in a hurry, can you wait
> > > till I run the regtest against 4.2 on x86-64 ?
> > > I've already discussed the patch with Kenny
> > > and we agreed that this is the right approach,
> > > but I'd like to see the clean regtest on x86 for both 4.2 and 4.3
> > > before I approve.
> > > Thanks,
> >
> > I am in no rush, please let me know if you want some help
> > tracking down the failure you are seeing.
> >
> > Since you say it is a libgomp failure... I wonder if some of
> > the atomic primitives need some side effect markings which
> > are missing and thus exposed by not clobbering global regs
> > at call sites any more.
>
> It looks like it's just a flaky test - it randomly fails on my test machine
> with or without the patch (for interested, it's omp_parse3.f90  with -O0).
> I haven't started 4.2 testing yet - I'll let you know when I get that done.


Re: A question about df

2007-10-24 Thread Seongbae Park (박성배, 朴成培)
On 10/24/07, Revital1 Eres <[EMAIL PROTECTED]> wrote:
>
> Hello,
>
> While testing a patch for the SMS I got an ICE which seems
> to be related to the fact we build def-use chains only
> and not use-def chains.  (removed in the following patch -
> http://gcc.gnu.org/ml/gcc-patches/2006-12/msg01682.html)

> The problem arises when we delete an insn from the df that contains a
> use but do not update the def-use chain of it's def as we do not have
> the use-def chain to reach it's def, This later causes a problem when
> we try to dump the def-use chain of it's def.

I'm sorry but I don't understand the description of the problem.
What do you mean by "dump" and what problem does this "dump" cause ?

> So, it seems that when asking for only def-use problem and later dump
> the function we should ask for use-def problem as well to avoid cases
> like the above.

The df chain dump routines are supposed to handle DU-only or UD-only cases
properly. If that's not the case, please send us a testcase
(and preferably file a bugzilla report). Thanks,

Seongbae


Re: A question about df

2007-10-24 Thread Seongbae Park (박성배, 朴成培)
On 10/24/07, Revital1 Eres <[EMAIL PROTECTED]> wrote:
> > > The problem arises when we delete an insn from the df that contains a
> > > use but do not update the def-use chain of it's def as we do not have
> > > the use-def chain to reach it's def, This later causes a problem when
> > > we try to dump the def-use chain of it's def.
> >
> > I'm sorry but I don't understand the description of the problem.
> > What do you mean by "dump" and what problem does this "dump" cause ?
>
> By dump I mean printing the function including all the DU chain info
> (in TODO_dump_func at the end of the pass).  This causes a problem in
> our case becuase an insn with a use was deleted (in df_insn_delete)
> without unlinking it from the def-use chain of it's def (because we
> can not access the def using a use-def chain - we do not build it).
> Once we want to print the def's def-use chain we get an ICE.  Hope this
> explains the problem better.

I see. You're right, however, once an insn is deleted,
the chain is not correct anyway (e.g. deleting an insn
could expose a def to reach new uses that it didn't reach before)
and it seems incorrect to keep using the chain without rebuilding them.

Seongbae


Re: Designs for better debug info in GCC

2007-11-08 Thread Seongbae Park (박성배, 朴成培)
I think both sides are talking over each other, partially because two
different goals are in mind.
IMHO, there are two extremes when it comes to the so called debugging
optimized code.

One camp wants the full debuggability (let's call them debuggability
crowd) - which means
they want to know the value of any valid program state anywhere, and
wants to set breakpoint anywhere
and be able to even change the program state anywhere as if there was
an assignment at the point
the debugger stopped the program at. This camp still wants better
performance (like everyone else)
but they don't want to sacrifice the debuggability for performance,
because they rely on these.

The other camp is the performance crowd, where they want the absolute
best performance
but they still want as much debug information possible. Most people
fall in this camp
and this is what gcc has implemented. This camp doesn't want to change the code
so that they can get better debugging information.

Of course, the real world is somewhere in between, but in practice,
most people fall in the latter group
(aka performance crowd).
Alexandre's proposal would make it possible to make the debuggability
crowd happy
at some unknown cost of compile-time/runtime cost and maintenance cost.

Richiard's proposal (from what I can understand)
would make performance crowd happy, since it would be
less costly to implement than Alexandre's and would provide
incrementally better debugging information
than current,
but it doesn't seem to be that it would make the debuggability crowd happy
(or at least the extremists among debuggability crowd).

So I think the difference in the opinion isn't so much as Alexandre's
proposal is good or bad,
but rather whether we aim to make the debuggability crowd happy or the
performance crowd happy
or both.
Ideally we should serve both groups of users,
but there's non-trivial ongoing maintenance cost for having two
different approaches.

So I'd like to ask both Alexandre and Richard
whether they each can satisfy the other camp,
that is, Alexandre to come up with a way to tweak his proposal so that
it is possible to keep the compile time cost comparable to what is
right now with similar or  better debug information,
and with reasonable maintenance cost,
and Richard whether his proposal can satisfy the debuggability crowd.
Of course, another possible opinion would be to ignore the debuggability crowd
on the ground that they are not important or big.
I personally think it's a mistake to do so, but you may disagree on that point.

Seongbae

On 08 Nov 2007 12:50:17 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:
> Alexandre Oliva <[EMAIL PROTECTED]> writes:
>
> > So...  The compiler is outputting code that tells other tools where to
> > look for certain variables at run time, but it's putting incorrect
> > information there.  How can you possibly argue that this is not a code
> > correctness issue?
>
> I don't see any point to going around this point again, so I'll just
> note that I disagree.
>
>
> > >> >> > We've fixed many many bugs and misoptimizations over the years due 
> > >> >> > to
> > >> >> > NOTEs.  I'm concerned that adding DEBUG_INSN in RTL repeats a 
> > >> >> > mistake
> > >> >> > we've made in the past.
> > >> >>
> > >> >> That's a valid concern.  However, per this reasoning, we might as well
> > >> >> push every operand in our IL to separate representations, because
> > >> >> there have been so many bugs and misoptimizations over the years,
> > >> >> especially when the representation didn't make transformations
> > >> >> trivially correct.
> > >>
> > >> > Please don't use strawman arguments.
> > >>
> > >> It's not, really.  A reference to an object within a debug stmt or
> > >> insn is very much like any other operand, in that most optimizer
> > >> passes must keep them up to date.  If you argue for pushing them
> > >> outside the IL, why would any other operands be different?
> >
> > > I think you misread me.  I didn't argue for pushing debugging
> > > information outside the IL.  I argued against a specific
> > > implementation--DEBUG_INSN--based on our experience with similar
> > > implementations.
> >
> > Do you remember any other notes that contained actual rtx expressions
> > and expected optimization passes to keep them accurate?
>
> No.
>
> > Do you think
> > we'd gain anything by moving them to a separate, out-of-line
> > representation?
>
> I don't know.  I don't see such a proposal on the table, and I don't
> have one myself, so I don't know how to evaluate it.
>
> Ian
>



-- 
#pragma ident "Seongbae Park, compiler, http://seongbae.blogspot.com";