Preventing link time optimization from inlining

2012-12-20 Thread Adam
Hi,

When using -flto is there a way to tell gcc to not inline a particular 
function? attribute noinline appears to have no effect. I am using gcc 4.7.2. 
The use case is for certain functions that cause optimization problems when 
they are inlined. An example is when the function is throwing a c++ exception. 

Thanks,
Adam


-pg problem

2007-01-02 Thread Adam Sulmicki


hello,
can someone help me understand what is going on?

are the -p and -pg options supposed to work?



# gcc -v
Using built-in specs.
Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-libgcj-multifile
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada
--enable-java-awt=gtk --disable-dssi --enable-plugin
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre
--with-cpu=generic --host=i386-redhat-linux
Thread model: posix
gcc version 4.1.1 20061011 (Red Hat 4.1.1-30)


the C fragment (from menu.c of mplayer)

#define BUF_MIN 128
#define BUF_MAX BUF_STEP*1024
int menu_init(char* cfg_file) {
  char* buffer = NULL;
  int bl = BUF_STEP, br = 0;
  int f, fd;
#ifndef HAVE_FREETYPE
  if(vo_font == NULL)
return 0;
#endif



# cc -c -I. -I.. -Wdeclaration-after-statement  -march=pentium2
-mtune=pentium2 -pipe -ffast-math -ggdb  -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/X11R6/include -pthread
-I/usr/include/kde/artsc -I/usr/include/glib-2.0
-I/usr/lib/glib-2.0/include -I/usr/include/freetype2 -o menu.o menu.c
-save-temps
cc: warning: -pipe ignored because -save-temps specified
cc: warning: -pipe ignored because -save-temps specified

# objdump -S -D menu.o -G -C -t -x | less

03a5 :

/// This will build the menu_defs list from the cfg file
#define BUF_STEP 1024
#define BUF_MIN 128
#define BUF_MAX BUF_STEP*1024
int menu_init(char* cfg_file) {
 3a5:   55  push   %ebp
 3a6:   89 e5   mov%esp,%ebp
 3a8:   83 ec 38sub$0x38,%esp
  char* buffer = NULL;
 3ab:   c7 45 e8 00 00 00 00movl   $0x0,0xffe8(%ebp)
  int bl = BUF_STEP, br = 0;
 3b2:   c7 45 ec 00 04 00 00movl   $0x400,0xffec(%ebp)
 3b9:   c7 45 f0 00 00 00 00movl   $0x0,0xfff0(%ebp)
  int f, fd;
#ifndef HAVE_FREETYPE
  if(vo_font == NULL)
return 0;
#endif


now repeat above with extra "-p" flag

# cc -c -I. -I.. -Wdeclaration-after-statement  -march=pentium2
-mtune=pentium2 -pipe -ffast-math -ggdb  -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -I/usr/X11R6/include -pthread
-I/usr/include/kde/artsc -I/usr/include/glib-2.0
-I/usr/lib/glib-2.0/include -I/usr/include/freetype2 -o menu.o menu.c
-save-temps -p
cc: warning: -pipe ignored because -save-temps specified
cc: warning: -pipe ignored because -save-temps specified

# objdump -S -D menu.o -G -C -t -x | less

/// This will build the menu_defs list from the cfg file
#define BUF_STEP 1024
#define BUF_MIN 128
#define BUF_MAX BUF_STEP*1024
int menu_init(char* cfg_file) {
 3aa:   55  push   %ebp
 3ab:   89 e5   mov%esp,%ebp
 3ad:   83 ec 38sub$0x38,%esp
 3b0:   e8 00 00 00 00  call   3b5 
3b1: R_386_PC32 .bss
  char* buffer = NULL;
 3b5:   c7 45 e8 00 00 00 00movl   $0x0,0xffe8(%ebp)
  int bl = BUF_STEP, br = 0;
 3bc:   c7 45 ec 00 04 00 00movl   $0x400,0xffec(%ebp)
 3c3:   c7 45 f0 00 00 00 00movl   $0x0,0xfff0(%ebp)
  int f, fd;
#ifndef HAVE_FREETYPE
  if(vo_font == NULL)
return 0;
#endif

now WTF is the line 3b0 ???
ideas?

----



--
Adam Sulmicki
http://www.eax.com  The Supreme Headquarters of the 32 bit registers


Re: -pg problem

2007-01-02 Thread Adam Sulmicki

On Tue, 2 Jan 2007, Ian Lance Taylor wrote:


Adam Sulmicki <[EMAIL PROTECTED]> writes:


 can someone help me understand what is going on?

 are the -p and -pg options supposed to work?


This question is not appropriate for gcc@gcc.gnu.org, which is a
mailing list for gcc developments.  It is appropriate for
[EMAIL PROTECTED]

-p does work on GNU/Linux systems.


  3b0:   e8 00 00 00 00  call   3b5 
 3b1: R_386_PC32 .bss


That is presumably the call to mcount which records the profiling
information.


except that :

# grep mcount menu.c
static int mcount = 0;


--
Adam Sulmicki
http://www.eax.com  The Supreme Headquarters of the 32 bit registers


Re: -pg problem

2007-01-02 Thread Adam Sulmicki

 can someone help me understand what is going on?

 are the -p and -pg options supposed to work?


This question is not appropriate for gcc@gcc.gnu.org, which is a
mailing list for gcc developments.  It is appropriate for
[EMAIL PROTECTED]


I apologize for this.


-p does work on GNU/Linux systems.


  3b0:   e8 00 00 00 00  call   3b5 
 3b1: R_386_PC32 .bss


That is presumably the call to mcount which records the profiling
information.


except that :

# grep mcount menu.c
static int mcount = 0;


Just to wrap things up. I have renamed all occurences of "mcount" in 
sources to something else, and the program does not crash anymore.


Thanks a lot for help!

--
Adam Sulmicki
http://www.eax.com  The Supreme Headquarters of the 32 bit registers


Re: running bprob.exp tests in a cross-testing environment

2007-01-02 Thread Adam Nemet
Ben Elliston <[EMAIL PROTECTED]> writes:
> I see a couple of solutions, but would like to discuss them here before
> working on a patch:
>
> 1. have the bprob.exp test driver create the appropriate directory
>tree on the target (and remove it when finished); or
>
> 2. set GCOV_PREFIX when running the test case so that the home
>directory on the target is prefixed.  The test harness would
>need to also prefix the .gcda filename when fetching the data
>file from the target.

If it is not too late I'd prefer the latter.  If I understand the
problem correctly the former would still fail if the test user is not
privileged enough to recreate the directory structure under /.

Thanks for working on this.

Adam


gcc, mplayer and profile (mcount)

2007-01-03 Thread Adam Sulmicki


Hello folks,

This is my last post on the subject of mcount.

I have spent a quite bit of time on this to find out
that the results of myserious crashes is the mcount variable.
(with help from Ian Lance Taylor).

I have reported the issue to both gcc and mplayer lists, and
from _BOTH_ lists I got reply that they do not feel it is their
problem and blamed the other package.

WELL, as far as I'm concerned the issue is solved, I know how to
fix it and works for me.

In spirit making OSS better, I took the extra effor to report
findings back to both lists. In reward I got flamed on both list.

The Issue:

Anyway, the scop is that once I try to compile mplayer with
-pg option (or -p) then it will start myseriously crash on
mplayer/libmenu/menu.c:menu_init().

It comes out that  in mplayer/libmenu/menu.c there's in use
"mcount" variable:

# grep mcount menu.c
static int mcount = 0;

however, once "-pg" or "-p" options are enabled,
gcc uses mcount _function_ for its internal profiling.

The results are mysterious crashes on menu_init().

Suggested solutions:

* Rename mcount in mplayer/libmenu/menu.c to something else.

* document mcount in gcc man page

* gcc prints warning.

* do nothing.

I can immagine that gcc developers might have intentionally left mcount() 
visible to user space so that user can replace gcc's mcount() with their 
own implementation.


Just FWIW, I don't care if anything will get done about it.

--
Adam Sulmicki
http://www.eax.com  The Supreme Headquarters of the 32 bit registers


Re: gcc, mplayer and profile (mcount)

2007-01-03 Thread Adam Sulmicki

On Wed, 3 Jan 2007, Ian Lance Taylor wrote:


I told you was to use the gcc-help mailing list, which was correct.



So this seems to be a bug in gcc: it should be calling _mcount.


It just that it is my impression that gcc list is more 
appropriate for gcc bugs than gcc-help.


I also did my best to answer your question.  In your original message 
you didn't mention that there was an mcount variable in your program, so 
it's not surprising that I wasn't able to identify the problem.


Yeah, my mistake, back then I did not know that gcc used mcount() for its 
own purposes.


On Wed, 3 Jan 2007, Andrew Haley wrote:


Trivia time: what is the longest delay between a bug being committed
to gcc before someone notices and a fix being committed?  This one is
eleven years and eight months.  I wonder if we have a record.


not gcc related, but here's my own record :

http://www.cs.unm.edu/~sulmicki/eax/patches/xterm/xterm2.diff

this bug as present in xterm ever since it ran on linux for the first 
time, up until 2001. No idea how long it is, but at least a decade.


--
Adam Sulmicki
http://www.eax.com  The Supreme Headquarters of the 32 bit registers


RE: gcc, mplayer and profile (mcount)

2007-01-03 Thread Adam Sulmicki

On Wed, 3 Jan 2007, Dave Korn wrote:

 Is that your idea of an apology?  Regardless of topicality there's no 
reasonable reading of Ian's words as a flame, they were entirely polite 
and well-measured, and you should withdraw your baseless accusation and 
say sorry rather than trying to rationalise it.


Fine.

I am sorry, and I apoligize.

Adam


--
Adam Sulmicki
http://www.eax.com  The Supreme Headquarters of the 32 bit registers


Calling a pure virtual function

2005-07-09 Thread Adam Nielsen
Hi all,

I was expecting the following code snippet to work, so am I doing
something wrong, or is there an issue with GCC?  I was under the
impression that this is allowed, according to
http://www.parashift.com/c++-faq-lite/strange-inheritance.html#faq-23.1

It seems like GCC initially allows it as it starts to compile okay, but
then I get an undefined reference error from the linker (because it
seems to be actually calling Base::number(), which obviously won't work
as it's a pure virtual function.)

I was only able to try this with GCC 3.2.3 (i486 target) and 3.3.6
(djgpp/msdos target) and both versions give the same error.

Thanks,
Adam.

--

#include 

using namespace std;

class Base {
  public:
Base()
{
  cout << "This is class " << this->number();
}

virtual int number() = 0;
};

class One: virtual public Base {
  public:
int number()
{
  return 1;
}
};

int main(void)
{

  // Correctly fails stating Base is abstract
//  Base *b = new Base();

  // Won't compile giving undefined reference to Base::number()
  One *o = new One();

  return 0;
}


Re: Calling a pure virtual function

2005-07-09 Thread Adam Nielsen
> Roughly speaking, when number() is invoked, the object still has type
> Base (with a corresponding vtable).  One's constructor will change the
> type once the Base part has been constructed.

Aha yes, I didn't even think of that!  Thanks Florian and Lion for your
helpful answers!

> The following FAQ entry covers this:
> http://www.parashift.com/c++-faq-lite/strange-inheritance.html#faq-23.3

Hmm, I should've read just a little further... ;-)

It still makes me wonder whether GCC is reporting the correct error for
this mistake though, I would've expected a compiler error (something
along the lines of 'you can't call a pure virtual function') rather than
a linker error.  Especially as GCC should be able to tell at compile
time the base constructor is calling a pure virtual function.  I guess
it's treating the constructor like any other function, where this
behaviour would be permitted.

Either way, that solved the problem for me, so thanks again!

Cheers,
Adam.


Re: 64-bit on A64 running 32-bit

2005-07-12 Thread Adam Wood


On 12 Jul 2005, at 21:58, David Rasmussen wrote:

If I am running Windows XP or some regular 32-bit Linux  
distribution, can I still use gcc 3.4.x to compile a 64-bit A64  
version of my code?


I have a program that uses 64-bit integers heavily, and I would  
like it to utilize 64-bit mode in A64, even if this A64 is running  
a 32-bit OS.


Is this possible? Or is 32-bit and 64-bit entirely different  
"modes" on A64?


/David



They are different modes -- you need an OS that can transition to the  
64 bit modes, so that's Windows 64 bit or a 64-bit Linux  
distribution, for example.


--
Adam


MIPS: comparison modes in conditional branches

2005-12-04 Thread Adam Nemet
Hi,

I am trying to understand why the MIPS backend's handling of
comparison modes seems to be different from what rtl.texi says:

  The mode of the comparison operation is independent of the mode
  of the data being compared.  If the comparison operation is being tested
  (e.g., the first operand of an @code{if_then_else}), the mode must be
  @code{VOIDmode}.

In the MIPS backend conditional branches based on integer comparisons
are valid only if the mode of the comparison matches the mode of the
operands, e.g. in 64-bit mode (if_then_else (eq:DI (reg:DI 1)
(const_int 0)) ...).

This is relevant because combine could optimize this:

  (set (reg:DI 2) (and:DI (reg:DI 1) (const_int 1)))
  (set (reg:SI 3) (truncate:SI (reg:DI 2)))
  (set (pc) (if_then_else (eq:SI (reg:SI 3) (const_int 0)) ...))

into this:

  (set (reg:DI 2) (and:DI (regDI 1) (const_int 1)))
  (set (pc) (if_then_else (eq:SI (reg:DI 2) (const_int 0)) ...))

I think this is a valid transformation on MIPS64 too.  In fact I
believe that as long as we compare valid SImode or DImode values the
64-bit comparison should work fine.

If I remove the comparison modes from the conditional branch patterns
regression testing on mips64-elf does not show any problem (diff is
attached for reference).  When I however compare the assembly output
it is clear that some of the changes are incorrect.  This is
a simplified testcase illustrating the problem:

  f (long long a)
  {
if ((a & 0x) != 0)
  abort ();
  }
  
  long long a = 0x12345678LL;
  main ()
  {
f (a);
  }

What happens is that combine.c:simplify_comparison  noticing the
zero-extension transforms this:

  (ne:DI (and:DI (reg:DI 4) (const_int 0x)) (const_int 0))

into this:

  (ne:DI (reg:SI 4) (const_int 0))

And now we do allow comparison like this in conditional branches.

On targets where DI->SI mode-switch is not a nop I think the
transformation in simplify_comparison is incorrect.  We need an
explicit truncate here to enforce valid SI values:

  (ne:DI (truncate:SI (reg:DI 4)) (const_int 0)).

Now if I am correct and this last thing is really a bug then the
obvious question is whether it has anything to do with the more
restrictive form for conditional branches on MIPS64?  And of course if
I fix it then whether it would be OK to lift the mode restrictions in
the conditional branch patterns.

Thanks,
Adam

Index: mips.md
===
--- mips.md (revision 107758)
+++ mips.md (working copy)
@@ -4291,7 +4291,7 @@
 (define_insn "*branch_zero"
   [(set (pc)
(if_then_else
-(match_operator:GPR 0 "comparison_operator"
+(match_operator 0 "comparison_operator"
 [(match_operand:GPR 2 "register_operand" "d")
  (const_int 0)])
 (label_ref (match_operand 1 "" ""))
@@ -4311,7 +4311,7 @@
 (define_insn "*branch_zero_inverted"
   [(set (pc)
(if_then_else
-(match_operator:GPR 0 "comparison_operator"
+(match_operator 0 "comparison_operator"
 [(match_operand:GPR 2 "register_operand" "d")
  (const_int 0)])
 (pc)
@@ -4333,7 +4333,7 @@
 (define_insn "*branch_equality"
   [(set (pc)
(if_then_else
-(match_operator:GPR 0 "equality_operator"
+(match_operator 0 "equality_operator"
 [(match_operand:GPR 2 "register_operand" "d")
  (match_operand:GPR 3 "register_operand" "d")])
 (label_ref (match_operand 1 "" ""))
@@ -4353,7 +4353,7 @@
 (define_insn "*branch_equality_inverted"
   [(set (pc)
(if_then_else
-(match_operator:GPR 0 "equality_operator"
+(match_operator 0 "equality_operator"
 [(match_operand:GPR 2 "register_operand" "d")
  (match_operand:GPR 3 "register_operand" "d")])
 (pc)


Re: MIPS: comparison modes in conditional branches

2005-12-06 Thread Adam Nemet
Jim Wilson writes:
> Yes, it looks like fixing the combiner problem would make it possible to 
> remove the mistaken mode checks.

Thanks very much, Jim.  I will work toward removing these then.

Adam


Re: [C++ Patch] Remove is_auto_or_concept, etc

2017-05-16 Thread Adam Butcher
Sorry for not getting back to your original post Paolo.  I haven't been 
picking up mails for a while.


On 2017-05-01 16:56, Jason Merrill wrote:
On Thu, Apr 27, 2017 at 2:02 PM, Paolo Carlini 
 wrote:

On 26/04/2017 12:32, Paolo Carlini wrote:


in 2013 (2013-09-16) Adam added two slightly obscure functions and I 
can't

find much around in terms of rationale, etc:

is_auto_or_concept (const_tree type)

type_uses_auto_or_concept (tree type)



I don't think they were there in the original 2009 version on the old 
lambdas branch -- but it stagnated somewhat and when I remade it against 
mainline (4.something) it was around the time that concepts were showing 
similar semantics (in terms of implicit/abbrievated templates).  I made 
a cardinal sin and introduced an overly-generic function name expecting 
that a future concepts implementation would need to trigger at the same 
point of the parse too.  I.e. if a concept name or 'auto' is seen we 
inject an implicit template parameter (or make the "plain" function into 
a template at that point).


That intent was not well documented or published (other than in the API 
name) and, since -fconcepts is now working without any calls to this 
function, it's clearly not been necessary or has been more naturally 
done in a different way.



The latter seems completely unused (it's meant for debugging 
purposes?);




Quite possibly for debugging though maybe some refactoring got rid of 
the need for it and neglected to bin it.




the former evidently simply forwards to is_auto, and we end up in the
front-end with uses of both, which in fact are equivalent, which 
seems
weird: IMHO, if they are actually equivalent in our implementation we 
should

clearly explain that in the comment and have only one. Or what?



... replying to myself, in practice we could do the below, which 
certainly
passes testing, and in fact now seems to me even more obvious than I 
thought

a couple of days ago...



Definitely OK to bin.  No point in having dead or confusing code; it's 
complicated enough as it is.  :)




Re: Code optimization with GCSE

2009-07-15 Thread Adam Nemet
Jean Christophe Beyler  writes:
> uint64_t foo (void)
> {
> return data[0] + data[1] + data[2];
> }
>
> And this generates :
>
> la  r9,data
> la  r7,data+8
> ldd r6,0(r7)
> ldd r8,0(r9)
> ldd r7,16(r9)
>
> I'm trying to see if there is a problem with my rtx costs function
> because again, I don't understand why it would generate 2 la instead
> of using an offset of 8 and 16.

You probably want to look at the RTL dumps.  This code should have been
expanded with the correct offsets (at least that is what happens on
MIPS).  I don't see how later passes would modify the code other than
removing 2 of the 3 "la rX, data" insns.

Adam


Re: Code optimization with GCSE

2009-07-16 Thread Adam Nemet
Jean Christophe Beyler writes:
> As we can see, all three are using the symbol_ref data before adding
> their offset. But after cse, we get this:
> 
> (insn 5 2 6 2 ex1b.c:8 (set (reg/f:DI 74)
> (const:DI (plus:DI (symbol_ref:DI ("data") )
> (const_int 8 [0x8] 71 {movdi_internal} (nil))
> 
> (insn 6 5 7 2 ex1b.c:8 (set (reg/f:DI 75)
> (symbol_ref:DI ("data") )) 71
> {movdi_internal} (nil))

You will have to debug CSE.  The actual replacment is done in this loop in
cse_insn:

  /* Terminate loop when replacement made.  This must terminate since
 the current contents will be tested and will always be valid.  */
  while (1)
{

Adam


Re: [lambda] Segmentation fault in simple lambda program

2009-07-29 Thread Adam Butcher
dded to the capture if it was a
+ * 'default capture' -- note that this would not be necessary if
+ * the stack-pointer variant was implemented -- since the layout
+ * would be known.
+ * Relayingout here might have nasty effect if one were to query
+ * sizeof *this from within the body -- would that even be
+ * possible -- *this would refer to the lambda or the enclosing
+ * class instance -- if there was one.???
+ *
+ * NOTE: I think this is a redefinition error; I'm just faking that
+ * it isn't by clearing the size node... need to use stack pointer
+ * method.  But that will likely bring its own issues -- not with
+ * class layout though.
+ */
+TYPE_SIZE (type) = NULL_TREE;
+finish_struct_1 (type);
+
 parser->in_function_body = saved_in_function_body;
 parser->num_template_parameter_lists = saved_num_template_parameter_lists;
 --parser->num_classes_being_defined;
#

The way 'default reference capture' is implemented on the lambda branch seems 
to be kind of reactive.  I would expect
that inheriting the stack somehow (maybe using just a stack pointer) would be 
better but without more investigation I
don't know if that is possible or how one would go about doing it.  There is 
also the problem that the lambda would
want its own stack also so its unlikely that you'd be able to simply drop it 
into the same frame as the enclosing
function (though it may not be any different to having a nested scope in a 
function in terms of identifier ref'ing). 
Here is a comment I added into the code as I was going through -- apologies for 
not rewriting!

gcc/cp/semantics.c:5308 (working copy)
/*
 * Rather than looping through the identifiers used in the scope of
 * the lambda and synthesizing individual captures of them, it would
 * be better to ref the stack frame as transparently as possible...
 * e.g. given just three ints i, j and k in scope:
 * Instead of transforming:
 *
 *[&] { i = 1; j = 2; k = 3; };
 *
 * into
 *
 *[&i,&j,&k] { i = 1; j = 2; k = 3; };
 *
 * and thus storing three pointers to int, transform it into:
 *
 *[sp=enclosing-stack-pointer] { var-from-stack(i,sp) = 1;
 *   var-from-stack(j,sp) = 2;
 *   var-from-stack(k,sp) = 3; };
 *
 * and only store one pointer.  This presumes that var-from-stack
 * can be achieved with no (or negligible) cost (i.e. no worse than
 * the individually captured version above.  Since std algorithms
 * generally take and return copies of the function object passed
 * in it is useful to keep them as small as possible.  One note,
 * are reference_closures still in the standard?
 *
 * I don't know if its possible but it may be possible to 'append'
 * the lambda's stack to the existing scope rather than creating a
 * new constrained scope -- from a logical point of view.  This
 * might be difficult if the compiler's design doesn't lend itself
 * to working like that.
 */

Sorry that this is a bit a of a brain dump.  I'm just finding my way around and 
there is obviously at least some
interest here.  Maybe one of you could try adding the finish_struct_1() call 
and see how much further you get.

Regards,
Adam




Re: [lambda] Segmentation fault in simple lambda program

2009-08-04 Thread Adam Butcher
Jason Merrill wrote:
> On 08/03/2009 09:36 PM, Adam Butcher wrote:
>> Thanks.  I haven't any copyright assignments on file -- this is my first 
>> dabbling with gcc and I've been doing it
>> mostly to experiment with C++ lambda support and non-standard extensions 
>> such as polymorphic lambda support.
>
> OK.  We'll need an assignment in order to incorporate your changes into
> GCC.  If you want to assign all present and future changes to the
> compiler, you could use the form
>
OK I've kicked that off.

>> I've been working in my own repository which I rebased earlier today from 
>> trunk (r150148).  I've attached the
>> rebase including the fix for generating correct copying of lambda classes 
>> (re-laying out after parsing body).

> Hmm, having the new changes folded into the old changes is kind of
> inconvenient, but I suppose I can deal.  It's moot anyway until your
> assignment is on file.
>
Yes sorry about that.  I appreciate the issue.  I had taken a branch of trunk 
and applied the lambda changes to it to
keep only lambda changes on my working branch (allowing simpler future 
rebaseing).  There were a number of things I
had to change to to get the lambda changes a) building and b) working with test 
programs.  Unfortunately I did all
this in one commit.  Not very helpful.  I will mail you my 'fixes' diff against 
the latest lambda branch head
independent of the rebase noise if you want.

> Incidentally, comments like
>
> /*
>   *
>   *
>   */
>
> aren't customary in GCC sources; that's just something John did that
> we'll need to clean up before the code goes into the trunk.  There are
> other formatting irregularities, some of which I've corrected.
>
Ah I see.  I was following the principle of following the nearest style!  I 
have reworked my changes in this style.

>> Experimenting with a working version and seeing it's issues will be useful 
>> to me.  To others to maybe.  With
>> concepts
>> gone from C++0x and being reworked for C++15(?) maybe support for 
>> polymorphic lambdas could be reintroduced? --
>> though
>> I'm sure its much too late for that and that its likely been around the buoy 
>> many times.  From what I have read I
>> got
>> the idea that the Callable concept was the primary reason for polymorphic 
>> lambdas not being accepted.
>
> I don't know what the reasoning was there, but people have been somewhat
> conservative about what usages of lambdas are allowed for fear of
> unforseen implementation issues.  Certainly having a working
> implementation would go a lot toward convincing people to allow it, even
> if it doesn't make it into C++0x.
>
Hopefully.  From my point of view the class generated by a lambda expression 
should be equivalent to something you
could write yourself -- aside from the single stack-pointer reference 
optimization which only a compiler could achieve
-- the class has a name, albeit invisible to the user (except in 
errors/warnings), and instances of should be useable
just as if there were user-defined functors.  I'm sure there are things I've 
overlooked but hopefully this
proof-of-concept will help to allay people's fears.

>> Implied template typename arguments via auto are not currently supported.  
>> The syntax parses but I haven't yet
>> synthesized the template arguments and therefore not replaced the auto's 
>> with them so it doesn't compile.
>
> Since templates work so differently from normal functions, I'm a little
> uncomfortable with the idea of templates that don't involve any template
> syntax, just the use of auto in the parameter list.  But I'm open to
> giving it a try, at least in the lambda context.  Maybe outside of
> lambda it could be used with a small template introducer...
>
I did think about auto as a shortcut for template typename outside of lambdas 
but I figured that was pushing it a bit
too far.  It wouldn't be obvious that a function declared in that way was 
actually a function template.  The benefit
of using it in lambdas is that the call operator is never 'seen', so whether 
its a template or not shouldn't affect
the caller providing the input arguments are compatible.

>> I'd welcome feedback on the syntax and semantics and of course any 
>> fundamental things I've missed by coming up with
>> such a syntax!
>>
>>  From a grammar point of view I've just expanded 
>> lambda-parameter-declaration to start with:
>>lambda-template-param-decl [opt]
>> where lambda-template-param-decl is:
>><  template-parameter-list>
>
> Makes sense.  I suppose

Re: [lambda] Segmentation fault in simple lambda program

2009-08-04 Thread Adam Butcher
Hi Jason,

Pending response from assign at gnu dot org, I've attached diffs made against 
your latest lambda head.  They are
cleaned up a little to from the previous diffs against trunk.  First is the 
re-layout fix, second is experimental
polymorphic lambda support.

Cheers,
Adam


0001-Re-laid-out-lambda-class-after-parsing-body.patch
Description: Binary data


0002-First-pass-polymorphic-lambda-support.patch
Description: Binary data


Re: [lambda] Segmentation fault in simple lambda program

2009-08-06 Thread Adam Butcher
Adam Butcher wrote:
>John Freeman wrote:
>>
>> I just inspected my code again.  The call to layout_class_type at the
>> beginning of finish_lambda_function_body at semantics.c:5241 was
>> intended to recalculate offsets to members in the case of default captures.
>>
>> Here is the complete order of the pertinent function calls (file
>> location has nothing to do with call order; I supply them to help the
>> reader to follow):
>>
>> finish_struct @ parser.c:6985
>>
>> cp_parser_lambda_body @ parser.c:6992
>> --> finish_lambda_function_body @ parser.c:7380
>> > layout_class_type @ semantics.c:5241
>>
>> finish_struct_1 @  I don't see this added yet.  I've checked out
>> revision 150396.
>>
>I think Jason's waiting for the formality of copyright assignment to be 
>finalized.   I attached my patches against the
>latest lambda branch head in the following mail if you want to try them out:
>  http://gcc.gnu.org/ml/gcc/2009-08/msg00058.html
>
I see you've committed the fix.  Great.  Much better to do the relayout in 
semantics.c where the previous layout stuff
was than in the parser.  I take you're point on it potentially being overkill 
but at least it means that user programs
that copy can work.

I guess this thread is done with now that the fix has been committed.  I should 
start another to discuss polymorphic
lambda experimentation and implicit template parameters.  BTW I have got the 
latter working now -- to a certain (read
limited and buggy) extent.

The 'implicit template parameter via auto' addition is literally a quick hack 
for me to investigate what functionally
needs to occur to achieve it -- the implementation is not pleasant by any means 
as yet.

I've attached my two diffs made against the latest lambda head.  First is 
explicit polymorphic lambda support via the
additional template parameter syntax, second is the very hacky 
'for-discovery-purposes-only' prototype for typename
inference.  The examples below demonstrate the supported syntaxes.

  1.  []  (T const& t, U u) { return t + u; }
  2.  [] (auto const& t, auto u) { return t + u; }
  3.  []  (T const& t, auto u) { return t + u; }

Currently for auto typename inference, cv-qualifiers (and other bits like 
attributes) are lost but I'll come to that
when I rewrite it all in light of what I have found out.  Just thought I'd 
share this functional-approximation to a
solution.  As a result of the aforementioned bug, although 1. and 3. produce 
effectively the same code, 2. ends up
being equivalent to:

  []  (__AutoT1& t, __AutoT2 u) { 
return t + u; }

rather than the expected:

  []  (__AutoT1 const& t, __AutoT2 u) 
{ return t + u; }

There's a number of things I'm not sure about regarding location of the 
implementation (parser.c, semantics.c, decl.c
etc).

One thing I'm worried about is that I'm using make_tree_vec() with a length one 
greater than that of the previous
vector in order to grow the template parameter list whilst parsing function 
arguments.  This seems inefficient and
ugly.  Not least as there seems to be no way to ditch the old tree-vec.  I can 
ggc_free it but that won't do any
housekeeping of the tree counts and sizes.  It looks like tree-vecs are only 
supposed to be alloc'd into the pool
(zone?) and never removed.  In this case, incrementally adding additional 
parameters, you get allocs like:

  [--tree-vec-1--]
  [--tree-vec-1--] [-+-tree-vec-2-+-]
  [--tree-vec-1--] [-+-tree-vec-2-+-] [-++-tree-vec-3-++-]

And all you want is the last one.  I appreciate that its probably done this way 
to avoiding full fragmentation
management but I'd expect this sort of thing may happen often (or maybe it 
shouldn't!).  Off the top of my head, one
solution would be to add tree_vec_resize() which would realloc the memory and 
update the counts/sizes iff the tree-vec
were the last in the list.  The fallback behaviour if it weren't the last would 
be to do the previous manual
make_tree_vec() behaviour.  Something like:

   [--tv1--]

tv1 = tree_vec_resize (tv1, TREE_VEC_LENGTH (tv1) + 1);

   [-+-tv1-+-]

tv1 = tree_vec_resize (tv1, TREE_VEC_LENGTH (tv1) + 1);

   [-++-tv1-++-]

tv2 = make_tree_vec (n)

   [-++-tv1-++-] [--tv2--]

tv1 = tree_vec_resize (tv1, TREE_VEC_LENGTH (tv1) + 1);

   [-++-tv1-++-] [--tv2--] [-+++-tv1-+++-]
   ^   ^
   | no longer |
   | referenced|

This seems to work optimally in the case of incremental addition to the last 
tree-vec -- you are left with the minimum
tree-vecs necessary to avoid full fragmentation handling, and it degrades to 
supports the effect of the manual method
of calling make_tree_vec (old_size + n).

Maybe a doubling size on realloc and starting at, say, 4 elements could make 
realloc'ing more efficient -- 

[lambda] Latest experimental polymorphic lambda patches

2009-08-10 Thread Adam Butcher
Attached are my latest experimental polymorphic lambda patches against the 
latest lambda branch.  Template parameters
implied by auto in the lambda's call parameter list no longer lose qualifiers.  
The following examples produce
equivalent functions:

   1.   [] (auto x, auto& y, auto const& z) { return x + y + z; }
   2.   []  (X x, Y& y, Z const& z) { 
return x + y + z; }
   3.   []  (auto x, Y& y, auto const& z) { return x + y + z; }

Note that using an explicit template parameter is only really useful if you 
wish to either a) ensure consistency among
a number of call parameters, b) use a non-type template parameter or c) specify 
a call parameter type constraint (or
other complex parameter type).

I have flattened the latest auto-template-inference changes with the previous 
one to remove complexity due to
re-arrangement.  In particular it no longer abuses decl.c -- those changes have 
currently moved into parser.c but it
is pretty clear that they have more in common with pt.c owing to a need to pull 
in some local static functions from
that file.  I intend to split or move parts (or most) into pt.c on my next 
attempt (if I get time to make one!)

There are currently many issues:
  - efficiency of template parameter list growing.
  - implicit return type deduction doesn't work from
inside a template context if the lambda's return
expression involves a dependent type.  Specifying
decltype explicitly in these contexts is a
workaround.
  - dependent inferred return type needs to be
deferred and decayed.  This may go some way (all
the way?) to solving the previous point.
  - location of implementation.
  - only a few use-cases have been considered.


Adam


Summary:

Subject: [PATCH] First pass polymorphic lambda support.

This commit adds experimental support for an optional template-parameter-list 
in angle-brackets at the start of a
lambda-parameter-declaration.
---
Subject: [PATCH] Restored decltype in lambda return type deduction for when 
expr is dependent.
---
Subject: [PATCH] Second version of typename inference from auto parameters in 
lambda call operator.

Still quite hacky -- though better than previous.  No longer loses
qualifiers on replaced auto parameters so is functionally closer to
what's really needed.

 - This is just a behavioural proof to find out how things work.
 - Need to shuffle some stuff into pt.c and do away with code dup.
 - Not sure how to free tree_vec's and tidy up the counts and sizes
   (looks like they're only intended to grow.)
 - Added `type_decays_to (non_reference (finish_decltype_type' as
   suggested by Jason.  Currently doesn't remove cv-quals from
   non-class types though.  Need to treat implicit return type
   differently for dependent types -- should defer and mark that it
   needs to be computed later.
---


0001-First-pass-polymorphic-lambda-support.patch
Description: Binary data


0002-Restored-decltype-in-lambda-return-type-deduction-fo.patch
Description: Binary data


0003-Second-version-of-typename-inference-from-auto-param.patch
Description: Binary data


Re: [lambda] Latest experimental polymorphic lambda patches

2009-08-11 Thread Adam Butcher
Thanks for the feedback.

Jason Merrill wrote:
>Adam Butcher wrote:
>> The following examples produce
>> equivalent functions:
>>
>>1.   [] (auto x, auto& y, auto const& z) { return x + y + z; }
>>2.   []  (X x, Y& y, Z const& z) { 
>> return x + y + z; }
>>3.   []  (auto x, Y& y, auto const& z) { return x + y + z; }
>
> IMO #3 should not be equivalent to the others; the auto template parms
> should come after Y.  And I believe that's currently the case with your
> patch.
>
Sorry, I wasn't clear.  I meant that they generate the same function from the 
user's point of view, not that their
internals are the same.  I didn't mean to suggest that the order of their 
template parameters would be the same.  It
was meant to demonstrate that using 'auto' and specifying an explicit unique 
typename are equivalent from a client
point-of-view.  You are correct that in #3's case the generated lambda is 
equivalent to:

   []  (__AutoT1 x, Y& y, 
__AutoT2 const& z) { return x + y + z; }

Which is, from the user's perspective, equivalent to the lambda functions 
defined by #1 and #2, just that the order of
the template arguments are different.

I accept that this does give a different function signature in terms of 
template parameter indexes.  I've assumed that
explicit specialization of the call operator is not useful and therefore the 
user would never see the final template
parameter list and would not need to understand its ordering.

> If you save up all the auto parms until the end and then assign indices
> and adjust the parm vector then, that will avoid reallocating the vector
> each time.
>
Yes that would be better.

> But don't worry about tidying tree_node_counts; it just tracks how many
> of a particular tree code we create, not how many we currently have.
> Normal GC throws away lots of stuff without adjusting the counts.
>
Ah okay.  Would it be worth enhancing the tree-vec interface to include block 
reallocation with size doubling and end
marking to allow for more efficient reallocation?  Such a structural change 
maybe hidden by the macro front-end.  I
wonder how many uses of make_tree_vec don't ggcfree their previous tree-vec 
when using it in a 'realloc' way.

A quick grep for 'make_tree_vec\>' in gcc reveals about 80 uses, many of which 
with an arithmetic expression as its
argument.  I wonder how many of these might benefit from such an allocation 
scheme and how many would be impaired by
it?

Maybe its an insignificant issue.

>> +  /* XXX: Maybe to loop rather than recurse here? */
>
> At -O2, the GCC optimizers should convert tail recursion into looping.
>
Great, no worries there then.

>> +  if (type_dependent_expression_p (expr))
>> +/* TODO: Should defer this until instantiation rather than using
>> + decltype.  */
>> +return_type = type_decays_to (non_reference (finish_decltype_type
>> +  (expr, /*id_expression_or_member_access_p=*/false)));
>
> Definitely need to defer it; type_decays_to and non_reference don't work
> on DECLTYPE_TYPE.
>
I thought as much -- I assume it's just trying to strip non-existent qualifiers 
from `decltype(expr)' which amounts to
a no-op.

Thanks again for the feedback.  I'll try to get deferred return type deduction 
working when I get some time.  Working
through that will probably end up sorting some of the issues with dependent 
return type deduction inside templates.

Cheers,
Adam




Re: Address as HImode when Pmode is QImode?

2009-08-13 Thread Adam Nemet
Markus L  writes:
> I run into an assert in convert_memory_address not beeing able to
> convert the address rtx (beeing HImode) into to Pmode (i.e. QImode). A
> few frames up the I can dump the tree node and it looks like the
> address calculations are done in HImode. Why is the address beeing
> calculated as unsigned long int and not as unsigned int which would be
> Pmode for my target? Is this expected (and my problem originates from
> elsewhere) or am I missing something obvious here?

What's your sizetype?  This could be related to:

http://gcc.gnu.org/ml/gcc/2008-05/msg00313.html

Adam


Re: IRA undoing scheduling decisions

2009-08-24 Thread Adam Nemet
"Charles J. Tabony"  writes:
> I see the same difference between GCC 4.3.2 and 4.4.0 when compiling
> for PowerPC, MIPS, ARM, and FR-V.

I can confirm this with mainline on MIPS/Octeon.  Can you please file a
bug.

Adam


Re: IRA undoing scheduling decisions

2009-08-25 Thread Adam Nemet
Charles J. Tabony writes:
> Filed as PR 41171.

Thanks.

> Is this actually a performance regression on MIPS?  I suspect either
> sequence would be the same performance on most machines.

Yes it is: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41171#c1

Adam


Using MEM_EXPR inside a call expression

2009-08-28 Thread Adam Nemet
On MIPS, PIC calls are indirect calls that need to be dispatched via an ABI
mandated register.  At expansion time we load the symbol into a pseudo then
expand the call.  There is a linker optimization that can turn these indirect
calls into direct calls under some circumstances.  This can improve branch
prediction (the real ABI requirement is that the PIC register is live on entry
to the callee).  To assist the linker we need to annotate the indirect call
with the function symbol.

Since the call is expanded early, during the various optimizations the meaning
of the call can change to no longer refer to the original function.  E.g.:

f (int i, int j)
{
  while (i--)
if (j)
  g ();
else
  h ();
}

You would hope that GCC would move the condition out of the loop and preset
the PIC register accordingly with only the indirect call in the loop body.
(No, this does not happen ATM.)

I've however seen this happening with cross-jumping.

AFAICT we have two options.  We either create a simple (local) dataflow in
md_reorg and annotate the calls with the result or set MEM_EXPR of the mem
inside the call to the function decl during expansion.

I have a patch for the latter (I used to do the former in our GCC).  It seems
to me that this perfectly fits with the definition of MEM_EXPR but I don't
think MEM_EXPR is ever used for mems inside calls.  In fact, I don't think any
of the MEM_ATTRS are meaningful in a call expression.

What's promising that cross-jumping treats them correctly.  As it merges
MEM_ATTRS it clears mismatching MEM_EXPRs no matter where the mem expression
is found in the insn.  And my patch bootstraps successfully.

So, is using MEM_EXPR for this a bad idea?

Adam


letter to GCC Steering Committee, Streamnovation Ltd.

2009-08-30 Thread Adam Rak
Dear GCC Steering Committee,

We are a forming company (StreamNovation Ltd.) from Hungary, and we
would like to ask your attitude about our plans. We are intending to
implement a plugin for GCC 4.5 which makes it possible to utilize the
GPU (graphics processing unit) semi-automatically (later
fully-automatically). We are planning to implement it as a free
software according to GPL, but we would also like to sell it in a box,
and get paid for personal support, guarantee and education.

The basic idea is to make the GPU kind of OpenMP (Open
Multi-Processing is an application programming interface (API) that
supports multi-platform shared memory multiprocessing programming in
C, C++). The GCC user marks the loop, and GCC creates a heterogeneous
binary code which runs the kernel on GPU, generated from the original
source code. We expect huge attention for this field in the next few
years. We think our plugin is in harmony with the GCC development
mission statement.

We are ready to follow any of your instructions according to GPL,
source code publicity, and legal compatibility with GPU manufacturers'
software.



Please let us know the conventions we have to follow and do not
hesitate to ask questions.

Best regards,

Ádám Rák

StreamNovation Ltd.

Práter u. 50/a. Budapest, H-1083 Hungary

Phone: +36 20 9677 199

Email: adam@streamnovation.com


Re: Using MEM_EXPR inside a call expression

2009-09-01 Thread Adam Nemet
Richard Henderson writes:
> On 08/28/2009 12:38 AM, Adam Nemet wrote:
> > ... To assist the linker we need to annotate the indirect call
> > with the function symbol.
> >
> > Since the call is expanded early...
> 
> Having experimented with this on Alpha a few years back,
> the only thing I can suggest is to not expand them early.
> 
> I use a combination of peep2's and normal splitters to
> determine if the post-call GP reload is needed, and to
> expand the call itself.

I see.  So I guess you're saying that there is little chance to optimize the
loop I had in my previous email ;(.

Now suppose we split late, shouldn't we still assume that data-flow can change
later.  IOW, wouldn't we be required to use the literal/lituse counting that
alpha does?

If yes then I guess it's still better to use MEM_EXPR.  MEM_EXPR also has the
benefit that it does not deem indirect calls as different when cross-jumping
compares the insns.  I don't know how important this is though.

Adam


Re: Using MEM_EXPR inside a call expression

2009-09-01 Thread Adam Nemet
Richard Henderson writes:
> On 09/01/2009 12:48 PM, Adam Nemet wrote:
> > I see.  So I guess you're saying that there is little chance to optimize the
> > loop I had in my previous email ;(.
> 
> Not at the rtl level.  Gimple-level loop splitting should do it though.
> 
> > Now suppose we split late, shouldn't we still assume that data-flow can 
> > change
> > later.  IOW, wouldn't we be required to use the literal/lituse counting that
> > alpha does?
> 
> If you split post-reload, data flow isn't going to change
> in any significant way.
> 
> > If yes then I guess it's still better to use MEM_EXPR.  MEM_EXPR also has 
> > the
> > benefit that it does not deem indirect calls as different when cross-jumping
> > compares the insns.  I don't know how important this is though.
> 
> It depends on how much benefit you get from the direct
> branch.  On alpha it's quite a bit, so we work hard to
> make sure that we can get one, if at all possible.

Thanks, RTH.

RichardS,

Can you comment on what RTH is suggesting?  Besides cross-jumping I haven't
seen indirect PIC calls get optimized much, and it seems that splitting late
will avoid the data-flow complications.

I can experiment with this but it would be nice to get some early buy-in.

BTW, I have the R_MIPS_JALR patch ready for submission but if we don't need to
worry about data-flow changes then using MEM_EXPR is not necessary.

Adam


Re: [LTO] Request for testing: Last merge from trunk before final merge

2009-09-30 Thread Adam Nemet
Diego Novillo  writes:
> If anyone has free cycles I would appreciate results from other
> ELF-capable targets.

The branch on mipsisa64-elf looks good (no regressions with languages
c,c++,objc):

  http://gcc.gnu.org/ml/gcc-testresults/2009-09/msg02717.html

baseline:

  http://gcc.gnu.org/ml/gcc-testresults/2009-09/msg02777.html

In addition to what has already been reported we have these failures in
the new lto tests:

FAIL: g++.dg/lto/20081109-1 cp_lto_20081109-1_0.o-cp_lto_20081109-1_0.o link
FAIL: g++.dg/lto/20081118 cp_lto_20081118_0.o-cp_lto_20081118_1.o link
FAIL: g++.dg/lto/20081119-1 cp_lto_20081119-1_0.o-cp_lto_20081119-1_1.o link
FAIL: g++.dg/lto/20081120-1 cp_lto_20081120-1_0.o-cp_lto_20081120-1_1.o link
FAIL: g++.dg/lto/20081120-2 cp_lto_20081120-2_0.o-cp_lto_20081120-2_1.o link
FAIL: g++.dg/lto/20081123 cp_lto_20081123_0.o-cp_lto_20081123_1.o link
FAIL: g++.dg/lto/20081204-1 cp_lto_20081204-1_0.o-cp_lto_20081204-1_1.o link
FAIL: g++.dg/lto/20081219 cp_lto_20081219_0.o-cp_lto_20081219_1.o link
FAIL: g++.dg/lto/20090302 cp_lto_20090302_0.o-cp_lto_20090302_1.o link
FAIL: g++.dg/lto/20090313 cp_lto_20090313_0.o-cp_lto_20090313_1.o link
FAIL: gcc.dg/lto/20081120-1 c_lto_20081120-1_0.o-c_lto_20081120-1_1.o link
FAIL: gcc.dg/lto/20081120-2 c_lto_20081120-2_0.o-c_lto_20081120-2_1.o link
FAIL: gcc.dg/lto/20081204-1 c_lto_20081204-1_0.o-c_lto_20081204-1_1.o link
FAIL: gcc.dg/lto/20081212-1 c_lto_20081212-1_0.o-c_lto_20081212-1_0.o link

  /home/anemet/src/gcc-lto/mipsisa64-elf/gcc/../ld/ld-new:
  /home/anemet/src/gcc-lto/mipsisa64-elf/mipsisa64-elf/./libgloss/mips/crt0.o:
  relocation R_MIPS_HI16 against `hardware_hazard_hook' can not be used
  when making a shared object; recompile with -fPIC

FAIL: g++.dg/lto/20081109 cp_lto_20081109_0.o-cp_lto_20081109_1.o execute -O0 
-fwhopr

  terminate called after throwing an instance of 'int'

FAIL: g++.dg/lto/20081109 cp_lto_20081109_0.o-cp_lto_20081109_1.o link,  
(internal compiler error)
FAIL: g++.dg/lto/20090311-1 cp_lto_20090311-1_0.o-cp_lto_20090311-1_1.o link,  
(internal compiler error)
FAIL: gcc.dg/lto/20081112 c_lto_20081112_0.o-c_lto_20081112_1.o link,  
(internal compiler error)
FAIL: gcc.dg/lto/20081118 c_lto_20081118_0.o-c_lto_20081118_2.o link,  
(internal compiler error)

  lto1: internal compiler error: Segmentation fault
  Please submit a full bug report,
  with preprocessed source if appropriate.
  See  for instructions.

FAIL: g++.dg/lto/20090311 cp_lto_20090311_0.o-cp_lto_20090311_1.o link
FAIL: g++.dg/lto/20090311 cp_lto_20090311_0.o-cp_lto_20090311_1.o link,  
(internal compiler error)
FAIL: g++.dg/lto/20090106 cp_lto_20090106_0.o-cp_lto_20090106_0.o link
FAIL: g++.dg/lto/20090106 cp_lto_20090106_0.o-cp_lto_20090106_0.o link

  (It's strange that these have the same name.)

  cp_lto_20090106_0.wpa.ltrans.o:(.rodata+0xbc): undefined reference to 
`_ZThn8_N1
  
00_GLOBAL__N__home_anemet_src_gcc_lto_combined_gcc_testsuite_g__.dg_lto_20090106
  _0.C__36F381533LMND1Ev.1183.1258'
  cp_lto_20090106_0.wpa.ltrans.o:(.rodata+0xc4): undefined reference to 
`_ZThn8_N1
  
00_GLOBAL__N__home_anemet_src_gcc_lto_combined_gcc_testsuite_g__.dg_lto_20090106
  _0.C__36F381533LMND0Ev.1236.1255'
  collect2: ld returned 1 exit status
  compiler exited with status 1
  output is:
  cp_lto_20090106_0.wpa.ltrans.o:(.rodata+0xbc): undefined reference to 
`_ZThn8_N1
  
00_GLOBAL__N__home_anemet_src_gcc_lto_combined_gcc_testsuite_g__.dg_lto_20090106
  _0.C__36F381533LMND1Ev.1183.1258'
  cp_lto_20090106_0.wpa.ltrans.o:(.rodata+0xc4): undefined reference to 
`_ZThn8_N1
  
00_GLOBAL__N__home_anemet_src_gcc_lto_combined_gcc_testsuite_g__.dg_lto_20090106
  _0.C__36F381533LMND0Ev.1236.1255'
  collect2: ld returned 1 exit status


Re: LTO and the inlining of functions only called once.

2009-10-13 Thread Adam Nemet
Paul Brook  writes:

>> Nothing you've said changes  the fact that there are a class of users
>> for whom that information is vital and we ought to spend some time
>> thinking about how to provide the information in a form they can
>> digest.  GCC dumps as they exist today are largely useless for a non-GCC
>> developer to use to understand why a particular transformation did or
>> did not occur in their code.  This has come up time and time again and
>> will continue to do so unless we find a way to provide visibility into
>> the optimizer's decision making.
>
> My guess is anyone inspecting the code and optimizer decisions at this level 
> is also going to want to strongarm the result they want when the compiler 
> makes the "wrong" decision. i.e. detailed unroller diagnostics are only of 
> limited use without (say) #pragma unroll.

We would also increase the chances of getting more precise bug reports
rather than "my code is slower with GCC 4.5 than it was with GCC 4.4".
IOW, we could push some of the initial investigation work over to the
user :).

Also with VTA we will hopefully be in better shape referencing
source-level constructs.

Adam


Re: Slightly offtpic: why svn stubbornly refuses to listen to ctrl-c?!?

2010-01-15 Thread Adam Butcher
On Fri, January 15, 2010 1:43 pm, Paolo Carlini wrote:
>
> I mean, why a well designed application should refuse to listen to
> ctrl-c when something goes wrong? Why every time for some reason it gets
> stuck, I have to kill it from another shell? That's definitely annoying.
>
If you're on a posix-compatible have you tried using SIGQUIT (CTRL-\ or CTRL-4) 
instead of SIGINT?

Adam




Re: Slightly offtpic: why svn stubbornly refuses to listen to ctrl-c?!?

2010-01-15 Thread Adam Butcher
On Fri, January 15, 2010 3:57 pm, Dave Korn wrote:
> Adam Butcher wrote:
>> On Fri, January 15, 2010 1:43 pm, Paolo Carlini wrote:
>>> I mean, why a well designed application should refuse to listen to ctrl-c
>>> when something goes wrong? Why every time for some reason it gets stuck,
>>> I have to kill it from another shell? That's definitely annoying.
>>>
>> If you're on a posix-compatible have you tried using SIGQUIT (CTRL-\ or
>> CTRL-4) instead of SIGINT?
>
>   Or kill -9 of course, but beware; Vincent LeFevre reported sandboxes
> corrupted beyond anything 'svn cleanup' could repair in one of the links I
> posted in another reply.
>
Sure, this is a last resort.  I only mentioned SIGQUIT because its
'typeable' so you wouldn't have the annoyance of going to a new shell
(or suspending svn), finding the pid and killing it (or use killall).
That's providing svn doesn't implement the same best-effort cancellation
checkpoints for SIGQUIT also of course.

Cheers,
Adam




Re: GCC-How does the coding style affect the insv pattern recognization?

2010-01-17 Thread Adam Nemet
fanqifei  writes:
> Paolo Bonzini said that insv instruction might be synthesized
> later by combine. But combine only works on at most 3 instructions and
> insv is not generated in such case.
> So exactly when will the insv pattern be recognized and how does
> the coding style affect it?

Sorry for jumping in late.  See make_file_assigment in combine.c.

The problem usually is that:

 (set A (ior (and B C1) OTHER))

can only be turned into a bit-insertion if A and B happen to be the same
pseudos.

Adam


Re: GCC-How does the coding style affect the insv pattern recognization?

2010-02-01 Thread Adam Nemet
fanqifei writes:
> 2010/1/18 Adam Nemet :
> > Sorry for jumping in late.  See make_file_assigment in combine.c.
> >
> > The problem usually is that:
> >
> >  (set A (ior (and B C1) OTHER))
> >
> > can only be turned into a bit-insertion if A and B happen to be the same
> > pseudos.
> >
> > Adam
> >
> I did found such kind of pattern for some simple C statements in rtl dump.
> Unfortunately, A and B are not same.
> Is it possible and easy to move B to A firstly and replace B with A in the 
> insn?
> Anyway, this is not required if it's impracticable.

It might work.  You could experiment with a define_split to do that.  Note
however that in terms of combine's notion of profitability, you will need to
have at the end eliminated some instructions.  (Also the overall cost of the
the new insns should be lower than the old ones.)

Alternatively you can recognize the above pattern, use a contraint to force
the pseudos to be allocated to the same hard register.  If successful you can
generate bit-inseration otherwise you split the pattern into the original
insns after reload.

Adam


Re: gen_lowpart called where 'truncate' needed?

2010-02-03 Thread Adam Nemet
Ian Lance Taylor  writes:
> Mat Hostetter  writes:
>
>> Since the high bits are already zero, that would be less efficient on
>> most platforms, so guarding it with something like this would probably
>> be smarter:
>>
>>   if (targetm.mode_rep_extended (mode, GET_MODE(x)) == SIGN_EXTEND)
>> return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x));
>>
>> I'm happy to believe I'm doing something wrong in my back end, but I'm
>> not sure what that would be.  I could also believe these are obscure
>> edge cases no one cared about before.  Any tips would be appreciated.
>
> Interesting.  I think you are in obscure edge case territory.  Your
> suggestion makes sense to me, and in fact it should probably be put
> into gen_lowpart_common.

FWIW, I disagree.  Firstly, mode_rep_extended is a special case of
!TRULY_NOOP_TRUNCATION so the above check should use that.  Secondly, in
MIPS we call gen_lowpart to convert DI to SI when we know it's safe.  In
this case we always want a subreg not a truncate for better code.  So I
don't think gen_lowpart_common is the right place to fix this.

I think the right fix is to call convert_to_mode or convert_move in the
expansion code which ensure the proper truncation.

Adam


Re: gen_lowpart called where 'truncate' needed?

2010-02-04 Thread Adam Nemet
Mat Hostetter writes:
> Adam Nemet  writes:
> 
> > Ian Lance Taylor  writes:
> > > Mat Hostetter  writes:
> > >
> > >> Since the high bits are already zero, that would be less efficient on
> > >> most platforms, so guarding it with something like this would probably
> > >> be smarter:
> > >>
> > >>   if (targetm.mode_rep_extended (mode, GET_MODE(x)) == SIGN_EXTEND)
> > >> return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x));
> > >>
> > >> I'm happy to believe I'm doing something wrong in my back end, but I'm
> > >> not sure what that would be.  I could also believe these are obscure
> > >> edge cases no one cared about before.  Any tips would be appreciated.
> > >
> > > Interesting.  I think you are in obscure edge case territory.  Your
> > > suggestion makes sense to me, and in fact it should probably be put
> > > into gen_lowpart_common.
> > 
> > FWIW, I disagree.  Firstly, mode_rep_extended is a special case of
> > !TRULY_NOOP_TRUNCATION so the above check should use that.  Secondly, in
> > MIPS we call gen_lowpart to convert DI to SI when we know it's safe.  In
> > this case we always want a subreg not a truncate for better code.  So I
> > don't think gen_lowpart_common is the right place to fix this.
> >
> > I think the right fix is to call convert_to_mode or convert_move in the
> > expansion code which ensure the proper truncation.
> 
> That would yield correct code, but wouldn't it throw away the fact
> that the high bits are already known to be zero, and yield redundant
> zero-extension on some platforms?  I'm guessing that's why the code was
> originally written to call convert_lowpart rather than convert_to_mode.

convert_to_mode uses gen_lowpart for truncation if TRULY_NOOP_TRUNCATION.

> And just to add a minor wrinkle: for the widen_bswap case, which
> produces (bswap64 (x) >> 32), the optimal thing is actually to use
> ashiftrt instead of lshiftrt when targetm.mode_rep_extended says
> SIGN_EXTEND, and then call convert_lowpart as now.

Agreed but we sort of do this due to this pattern in mips.md:

(define_insn "*lshr32_trunc"
  [(set (match_operand:SUBDI 0 "register_operand" "=d")
(truncate:SUBDI
  (lshiftrt:DI (match_operand:DI 1 "register_operand" "d")
   (const_int 32))))]
  "TARGET_64BIT && !TARGET_MIPS16"
  "dsra\t%0,%1,32"
  [(set_attr "type" "shift")
   (set_attr "mode" "")])


That said with mode_rep_extended this optimization could now be performed by
the middle-end in simplify-rtx.c:

(truncate:SI (lshiftrt:DI .. 32))
->
(subreg:SI (ashiftrt:DI .. 32) )

if targetm.mode_rep_extended (SImod, DImode) == SIGN_EXTEND.

Adam


Re: Zero extractions and zero extends

2010-02-11 Thread Adam Nemet
Jean Christophe Beyler  writes:
> typedef struct sTestUnsignedChar {
> uint64_t a:1;
> }STestUnsignedChar;
>
> uint64_t getU (STestUnsignedChar a)
> {
> return a.a;
> }
>
>
> I get this in the DCE pass :
> (insn 6 3 7 2 bitfield2.c:8 (set (subreg:DI (reg:QI 75) 0)
> (zero_extract:DI (reg/v:DI 73 [ a ])
> (const_int 1 [0x1])
> (const_int 0 [0x0]))) 63 {extzvdi} (expr_list:REG_DEAD
> (reg/v:DI 73 [ a ])
> (nil)))
>
> (insn 7 6 12 2 bitfield2.c:8 (set (reg:DI 74)
> (zero_extend:DI (reg:QI 75))) 51 {zero_extendqidi2}
> (expr_list:REG_DEAD (reg:QI 75)
> (nil)))
>
>
> (on the x86 port, I get a and instead of the zero_extract)
>
> However, on the combine pass both stay, whereas in the x86 port, the
> zero_extend is removed. Where is this decided exactly ?
> I've checked the costs of the instructions, I have the same thing as
> the x86 port.

Maybe it is turned into an (and:DI .. (const_int 1)) and you don't
recognize that?  Check your combine dump file, that should tell you what
is the pattern that combine came up with while dealing with these two
insns.

Adam


FRAME_GROWS_DOWNWARD and expand_stack_vars

2008-12-07 Thread Adam Nemet
I am working on FRAME_GROWS_DOWNWARD for MIPS and I am seeing two performance
issues with the code generated.  The first one has to do with the order how
expand_stack_vars places locals on the stack.

Consider this function (simplified from CSiBE's
replaypc-0.4.0.preproc:find-GOPs):

  f ()
  {
char buf[32768];
int i;
g (&buf, &i);
  }

expand_stack_vars allocates variables in increasing order of their size.  So
with !FRAME_GROWS_DOWNWARD smaller variables are closer to the stack pointer
making it more likely that they can be accesses with base + offset addressing.

With FRAME_GROWS_DOWNWARD the opposite happens so computing the address of
variable i now takes multiple instructions.

To fix this we can change the allocation with FRAME_GROWS_DOWNWARD (see the
patch below).  However, I can see that on some targets for example if the hard
frame pointer is used and it points somewhere above or at the local variables
(unlike on MIPS) then you'd still want the original order.  So maybe this
should really be controlled by another backend knob.

Comments?

Adam

* cfgexpand.c (expand_stack_vars): Expand variables in decreasing
order of their size if FRAME_GROWS_DOWNWARD.

Index: cfgexpand.c
===
--- cfgexpand.c (revision 142249)
+++ cfgexpand.c (working copy)
@@ -901,7 +901,13 @@ expand_stack_vars (bool (*pred) (tree))
 {
   HOST_WIDE_INT offset;
 
-  i = stack_vars_sorted[si];
+  /* If the frame grows downward allocate larger object first, away from
+the stack pointer.  We have a better chance this way to access more
+locals with base + offset addressing.  */
+  if (FRAME_GROWS_DOWNWARD)
+   i = stack_vars_sorted[n - 1 - si];
+  else
+   i = stack_vars_sorted[si];
 
   /* Skip variables that aren't partition representatives, for now.  */
   if (stack_vars[i].representative != i)



Re: gcc -r

2008-12-09 Thread Adam Nemet
Ian Lance Taylor <[EMAIL PROTECTED]> writes:
> Ralf Corsepius <[EMAIL PROTECTED]> writes:
>
>> So what would you recommend: To use "gcc -r" or "gcc -Wl,-r" ?
>
> Ah, when you put the question like that, I would recommend "ld -r".
> This is the one case where you get no advantage from using the gcc
> driver to invoke the linker, and it can actually mess you up by adding
> unexpected files to the command line.

If a different linker emulation needs to be selected you're still better off
using gcc.  E.g., on linux64 n64 this:

  gcc -nostdlib -Wl,-r -mabi=64 ...

is probably easier to remember than this:

  ld -melf64btsmip -r ...

Adam


Re: gcc -r

2008-12-09 Thread Adam Nemet
Ralf Corsepius writes:
> So, my questions actually were aiming at
> 
> * whether
>  "gcc ... -nostdlib -r" 
> and 
> "gcc ... -nostdlib -Wl,-r"
> are equivalent 
> 
> * if the fact that "gcc -r" appears to work, can be exploited or whether
> this is a "random accident" and/or intentionally undocumented feature,
> which may go away at any time.

Right and Ian has already answered this.  I just wanted to add that AFAIC the
most portable way to specify a partial link is still through the gcc driver.

(Whether you use -Wl,-r or -r depends on your belief if this switch would ever
be used for something else in GCC.)

Adam


Missed bitfield packing?

2009-01-15 Thread Adam Nemet
struct s
{
  char a:4;
  char b:8;
  char c:4;
} __attribute__ ((packed))

is 3 bytes long because b gets pushed to the next byte boundary.

You would think that similarly:

struct t
{
  char  a:4;
  short b:16;
  char  c:4;
} __attribute__ ((packed))

would come out to be 4 bytes long.  However, that's not the case because here,
b gets bit-packed.

The reason for this behavior is that finish_struct does not propagate packed
to fields whose type has alignment <= BITS_PER_UNIT:

  for (x = fieldlist; x; x = TREE_CHAIN (x))
{
  if (TREE_TYPE (x) == error_mark_node)
continue;

  DECL_CONTEXT (x) = t;

  if (TYPE_PACKED (t) && TYPE_ALIGN (TREE_TYPE (x)) > BITS_PER_UNIT)
DECL_PACKED (x) = 1;

Is this intentional or an oversight?  Shouldn't the alignment check be
disregarded for DECL_BIT_FIELDs?

Also note that extend.texi seems to be pretty clear about packed bitfields:

  @item packed
  @cindex @code{packed} attribute
  The @code{packed} attribute specifies that a variable or structure field
  should have the smallest possible alignment---one byte for a variable,
  and one bit for a field, unless you specify a larger value with the
  @code{aligned} attribute.

Adam


Re: New GCC Runtime Library Exception

2009-01-27 Thread Adam Nemet
Gerald Pfeifer  writes:
> +January 27, 2008

2009 ;)

Adam


MIPS64 -msym32 and DWARF2_ADDR_SIZE

2009-01-30 Thread Adam Nemet
-msym32 changes DWARF's address_size from 64 bits to 32 bits.  This means that
while symbols are 64-bit (due to ELF64), target addresses in the debug info
are 32-bit.

There is support for this in DWARF of course in fact you can specify different
address_size for each compilation unit which nicely maps with -msym32 being
link-compatible with regular N64 objects.

However, this asymmetry exposed several bugs in binutils.  Also, as I just
discovered today, dwarfdump (libdwarf) has no support for changing the
address_size between compilation units and in fact derives address_size from
the ELF class (ELF64/ELF32).  (Obviously, that's a bug in libdwarf.)

So my question is whether the saving in the size of the debug info with
-msym32 is really worth the trouble here or should we just start generating
64-bit addresses with -msym32?

Adam


Re: MIPS64 -msym32 and DWARF2_ADDR_SIZE

2009-02-01 Thread Adam Nemet
Richard Sandiford writes:
> How about the patch below?  I'll apply it in the next couple of days
> if there are no objections.

Thanks for patch.  I also like the new comments you added.

Adam


Re: About -fstack-protector in mips

2009-02-05 Thread Adam Nemet
Yoriko Komatsuzaki  writes:
> Could you tell me why it doesn't work well around FRAME_GROWS_DOWNWARD
> on mips ?

I have a WIP patch for this but was holding back mostly because of stage3/4
and that I was trying to make FRAME_GROWS_DOWNWARD the default and I was
running into performance issues.

The patch below enables FRAME_GROWS_DOWNWARD with -mframe-grows-downward
(mostly for testing) and with -fstack-protector.

Adam

Index: config/mips/mips.opt
===
--- config/mips/mips.opt(revision 142249)
+++ config/mips/mips.opt(working copy)
@@ -283,3 +283,6 @@ Perform VR4130-specific alignment optimi
 mxgot
 Target Report Var(TARGET_XGOT)
 Lift restrictions on GOT size
+
+mframe-grows-downward
+Target Var(flag_frame_grows_downward) Init(1)
Index: config/mips/mips.c
===
--- config/mips/mips.c  (revision 142249)
+++ config/mips/mips.c  (working copy)
@@ -274,10 +274,10 @@ struct mips_frame_info GTY(()) {
   HOST_WIDE_INT gp_sp_offset;
   HOST_WIDE_INT fp_sp_offset;
 
-  /* The offset of arg_pointer_rtx from frame_pointer_rtx.  */
+  /* The offset of arg_pointer_rtx from the bottom of the frame.  */
   HOST_WIDE_INT arg_pointer_offset;
 
-  /* The offset of hard_frame_pointer_rtx from frame_pointer_rtx.  */
+  /* The offset of hard_frame_pointer_rtx from the bottom of the frame.  */
   HOST_WIDE_INT hard_frame_pointer_offset;
 };
 
@@ -8593,10 +8593,9 @@ mips_compute_frame_info (void)
 
   cfun->machine->global_pointer = mips_global_pointer ();
 
-  /* The first STARTING_FRAME_OFFSET bytes contain the outgoing argument
- area and the $gp save slot.  This area isn't needed in leaf functions,
- but if the target-independent frame size is nonzero, we're committed
- to allocating it anyway.  */
+  /* The first bytes contain the outgoing argument area and the $gp save slot.
+ This area isn't needed in leaf functions, but if the target-independent
+ frame size is nonzero, we're committed to allocating it anyway.  */
   if (size == 0 && current_function_is_leaf)
 {
   /* The MIPS 3.0 linker does not like functions that dynamically
@@ -8612,7 +8611,7 @@ mips_compute_frame_info (void)
   else
 {
   frame->args_size = crtl->outgoing_args_size;
-  frame->cprestore_size = STARTING_FRAME_OFFSET - frame->args_size;
+  frame->cprestore_size = MIPS_GP_SAVE_AREA_SIZE;
 }
   offset = frame->args_size + frame->cprestore_size;
 
@@ -8746,12 +8745,16 @@ mips_initial_elimination_offset (int fro
 
   mips_compute_frame_info ();
 
-  /* Set OFFSET to the offset from the soft frame pointer, which is also
- the offset from the end-of-prologue stack pointer.  */
+  /* Set OFFSET to the offset from the end-of-prologue stack pointer.  */
   switch (from)
 {
 case FRAME_POINTER_REGNUM:
-  offset = 0;
+  if (FRAME_GROWS_DOWNWARD)
+   offset = (cfun->machine->frame.args_size
+ + cfun->machine->frame.cprestore_size
+ + cfun->machine->frame.var_size);
+  else
+   offset = 0;
   break;
 
 case ARG_POINTER_REGNUM:
Index: config/mips/mips.h
===
--- config/mips/mips.h  (revision 142249)
+++ config/mips/mips.h  (working copy)
@@ -2035,14 +2035,20 @@ enum reg_class
 
 /* Stack layout; function entry, exit and calling.  */
 
+#define FRAME_GROWS_DOWNWARD (flag_frame_grows_downward || flag_stack_protect)
+
 #define STACK_GROWS_DOWNWARD
 
-/* The offset of the first local variable from the beginning of the frame.
-   See mips_compute_frame_info for details about the frame layout.  */
+#define MIPS_GP_SAVE_AREA_SIZE \
+  (TARGET_CALL_CLOBBERED_GP ? MIPS_STACK_ALIGN (UNITS_PER_WORD) : 0)
+
+/* The offset of the first local variable from the frame pointer.  See
+   mips_compute_frame_info for details about the frame layout.  */
 
-#define STARTING_FRAME_OFFSET  \
-  (crtl->outgoing_args_size\
-   + (TARGET_CALL_CLOBBERED_GP ? MIPS_STACK_ALIGN (UNITS_PER_WORD) : 0))
+#define STARTING_FRAME_OFFSET  \
+  (FRAME_GROWS_DOWNWARD\
+   ? 0 \
+   : crtl->outgoing_args_size + MIPS_GP_SAVE_AREA_SIZE)
 
 #define RETURN_ADDR_RTX mips_return_addr
 



Re: Constant folding and Constant propagation

2009-02-06 Thread Adam Nemet
Bernd Schmidt  writes:
> Take a look at reload_cse_move2add.

I don't think that powerful enough; it requires the same destination
registers:

  /* Try to transform (set (REGX) (CONST_INT A))
  ...
  (set (REGX) (CONST_INT B))
 to
  (set (REGX) (CONST_INT A))
  ...
  (set (REGX) (plus (REGX) (CONST_INT B-A)))

I think you really need the Joern's optmize_related_values patch.  Also see
PR33699.

Adam


Re: Constant folding and Constant propagation

2009-02-27 Thread Adam Nemet
"Rahul Kharche"  writes:
> GCSE won't help with your trimmed down example
>
> int main(void)
> {
> long a = 0xcafecafe;
>
> printf("Final: %lx %lx %lx\n", a, a+5, a+15);
> return EXIT_SUCCESS;
> }
>
> I believe Paolo's canon_reg solution together with tweaking
> rtx_cost of constants with outer code PLUS might help. Any
> luck with that pass?

As I understand Paolo's change is to move some of the functionality from cse.c
to a new LCM pass.  However cse.c does not handle the above yet.

I am actually looking at something similar for PR33699 for MIPS.  My plan is
to experiment extending cse.c with putting "anchor" constants to the available
expressions along with the original constant and then querying those later for
constant expressions.

Adam


Re: Constant folding and Constant propagation

2009-03-03 Thread Adam Nemet
Adam Nemet  writes:
> I am actually looking at something similar for PR33699 for MIPS.  My plan is
> to experiment extending cse.c with putting "anchor" constants to the available
> expressions along with the original constant and then querying those later for
> constant expressions.

See http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00161.html.

Adam


Re: mips64*-*-linux multi arch handling

2009-03-04 Thread Adam Nemet
Thanks for measuring these.

Laurent GUERBY  writes:
> For bzip2 trunk regress badly in performance against 4.3.2
> but n32 is indeed relatively faster than 32 (still slower than 4.3.2).
> For gzip trunk matches 4.3.2 but n32 is slower than 32 and 64 even
> slower.

Last time I checked (on Octeon) gzip was completely dominated by the hash
table search of the deflation code.  The new ABIs shine when you have function
calls or 64-bit arithmetic, which I don't think is the case in gzip.

I might look at the 4.3.2->trunk regressions (if I can reproduce on Octeon).

Adam


Re: ARM compiler rewriting code to be longer and slower

2009-03-15 Thread Adam Nemet
Steven Bosscher  writes:
> On Sun, Mar 15, 2009 at 11:19 PM, Ramana Radhakrishnan
>  wrote:
>> I'm not sure about the best way to fix this but I've filed this for
>> the moment as
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39468
>
> This problem is reported every once in a while, all targets with small
> load-immediate instructions suffer from this, especially since GCC 4.0
> (i.e. since tree-ssa).  But it seems there is just not enough interest
> in having it fixed somehow, or someone would have taken care of it by
> now.
>
> I've summed up before how the problem _could_ be fixed, but I can't
> find where.  So here we go again.
>
> This could be solved in CSE by extending the notion of "related
> expressions" to constants that can be generated from other constants
> by a shift. Alternatively, you could create a simple, separate pass
> that applies CSE's "related expressions" thing in dominator tree walk.

See http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00158.html for handling
something similar when related expressions differ by a small additive
constant.  I am planning to finish this and submit it for 4.5.

Adam


RE: ARM compiler rewriting code to be longer and slower

2009-03-16 Thread Adam Nemet
Ramana Radhakrishnan writes:
> [Resent because of account funnies. Apologies to those who get this twice]
> 
> Hi,
> 
> > > This problem is reported every once in a while, all targets with
> > small
> > > load-immediate instructions suffer from this, especially since GCC
> > 4.0
> > > (i.e. since tree-ssa).  But it seems there is just not enough
> > interest
> > > in having it fixed somehow, or someone would have taken care of it by
> > > now.
> > >
> > > I've summed up before how the problem _could_ be fixed, but I can't
> > > find where.  So here we go again.
> > >
> > > This could be solved in CSE by extending the notion of "related
> > > expressions" to constants that can be generated from other constants
> > > by a shift. Alternatively, you could create a simple, separate pass
> > > that applies CSE's "related expressions" thing in dominator tree
> > walk.
> > 
> > See http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00158.html for
> > handling
> > something similar when related expressions differ by a small additive
> > constant.  I am planning to finish this and submit it for 4.5.
> 
> Wouldn't doing this in CSE only solve the problem within an extended basic
> block and not necessarily across the program ? Surely you'd want to do it
> globally or am I missing something very basic here ?

No, you're not.  There are plans moving some of what's in CSE to a new LCM
(global) pass.  Also note that for a global a pass you clearly need some more
sophisticated cost model for deciding when CSEing is beneficial.  On a
multi-scalar architecture, instructions synthesizing consts sometimes appear
to be "free" whereas holding a value a in a register for an extended period of
time is not.

Adam


Re: Constant folding and Constant propagation

2009-03-16 Thread Adam Nemet
Jean Christophe Beyler writes:
> I set up your patch and I get an internal error on this test program:

You're right.  I haven't handled the case properly when the constant itself
was an anchor constant (e.g. 0).  Try this version.

Adam


* cse.c (get_const_anchors): New function.
(insert_const_anchors): New function.
(cse_insn): Set src_related using anchor constants.  Insert
constant anchors into the table of available expressions.

* config/mips/mips.c (mips_rtx_costs): Make immediate-add even cheaper
than loading a simple constant into a register.

Index: gcc/cse.c
===
--- gcc.orig/cse.c  2009-03-08 12:16:56.0 -0700
+++ gcc/cse.c   2009-03-16 23:07:40.0 -0700
@@ -3961,6 +3961,55 @@ record_jump_cond (enum rtx_code code, en
 
   merge_equiv_classes (op0_elt, op1_elt);
 }
+
+#define TARGET_CONST_ANCHOR 0x8000
+
+/* Compute the upper and lower anchors for CST as base, offset pairs.  Return
+   NULL_RTX if CST is equal to an anchor.  */
+
+static rtx
+get_const_anchors (rtx cst, rtx *upper_base, HOST_WIDE_INT *upper_offs,
+  HOST_WIDE_INT *lower_offs)
+{
+  HOST_WIDE_INT n, upper, lower;
+
+  n = INTVAL (cst);
+  lower = n & ~(TARGET_CONST_ANCHOR - 1);
+  if (n == lower)
+return NULL_RTX;
+  upper = (n + (TARGET_CONST_ANCHOR - 1)) & ~(TARGET_CONST_ANCHOR - 1);
+
+  *upper_base = GEN_INT (upper);
+  *upper_offs = n - upper;
+  *lower_offs = n - lower;
+  return GEN_INT (lower);
+}
+
+/* Create equivalences between the two anchors of a constant value and the
+   corresponding register-offset expressions.  Use the register REG, which is
+   equivalent to the constant value CLASSP->exp.  */
+
+static void
+insert_const_anchors (rtx reg, struct table_elt *classp,
+ enum machine_mode mode)
+{
+  rtx lower_base, upper_base;
+  HOST_WIDE_INT lower_offs, upper_offs;
+  rtx lower_exp, upper_exp;
+  struct table_elt *celt;
+  rtx cst = classp->exp;
+
+  lower_base = get_const_anchors (cst, &upper_base, &upper_offs, &lower_offs);
+  if (!lower_base)
+return;
+  lower_exp = plus_constant (reg, -lower_offs);
+  upper_exp = plus_constant (reg, -upper_offs);
+
+  celt = insert (lower_base, NULL, HASH (lower_base, mode), mode);
+  insert (lower_exp, celt, HASH (lower_exp, mode), mode);
+  celt = insert (upper_base, NULL, HASH (upper_base, mode), mode);
+  insert (upper_exp, celt, HASH (upper_exp, mode), mode);
+}
 
 /* CSE processing for one instruction.
First simplify sources and addresses of all assignments
@@ -4595,6 +4644,67 @@ cse_insn (rtx insn)
}
 #endif /* LOAD_EXTEND_OP */
 
+  /* Try to express the constant using a register-offset expression using
+anchor constants.  */
+
+  if (!src_related && src_const && GET_CODE (src_const) == CONST_INT)
+   {
+ rtx lower_base, upper_base;
+ struct table_elt *lower_elt, *upper_elt, *elt;
+ HOST_WIDE_INT lower_offs, upper_offs, offs;
+
+ lower_base = get_const_anchors (src_const, &upper_base, &upper_offs,
+ &lower_offs);
+ if (lower_base)
+   {
+ lower_elt = lookup (lower_base, HASH (lower_base, mode), mode);
+ upper_elt = lookup (upper_base, HASH (upper_base, mode), mode);
+
+ /* Loop over LOWER_ELTs and UPPER_ELTs to find a reg-offset pair
+that we can use to express SRC_CONST.  */
+ elt = NULL;
+ if (lower_elt)
+   {
+ elt = lower_elt->first_same_value;
+ offs = lower_offs;
+   }
+ else if (upper_elt)
+   {
+ elt = upper_elt->first_same_value;
+ upper_elt = NULL;
+ offs = upper_offs;
+   }
+ while (elt)
+   {
+ if (REG_P (elt->exp)
+ || (GET_CODE (elt->exp) == PLUS
+ && REG_P (XEXP (elt->exp, 0))
+ && GET_CODE (XEXP (elt->exp, 1)) == CONST_INT))
+   {
+ rtx x = plus_constant (elt->exp, offs);
+ if (REG_P (x)
+ || (GET_CODE (x) == PLUS
+ && IN_RANGE (INTVAL (XEXP (x, 1)),
+  -TARGET_CONST_ANCHOR,
+  TARGET_CONST_ANCHOR - 1)))
+   {
+ src_related = x;
+ break;
+   }
+   }
+
+ if (!elt->next_same_value && upper_elt)
+   {
+ elt = upper_elt->first_same_value;
+ upper_elt = NULL;
+ 

fwprop and CSE const anchor opt

2009-04-02 Thread Adam Nemet
In order for my CSE const anchor patch to work I needed to drastically lower
the cost of immediate addition in the MIPS backend.  This was acceptable as a
proof of concept but not in general of course.

The problem is with "single-insn"/simple constants.  We would also like these
to use const anchors in the hope that the resulting expression would get
propagated into MEM expressions.  I was hoping that fwprop would do this
propagation for me.  However, since a single-insn constant load is cheaper
than an immediate addition (make sense), fwprop is free to propagate either
(1) into (2) or (2) into (3) here:

  (1) a <- C
  |
  +--> (2) b <- a + D
  ||
  |+--> (3) mem(b)
  |
  +--> (4) use(a)

Which one it does depends on which one it tries first.  Right now we go in
insn order so we'd do (1) to (2) preventing to do (2) to (3) later.

It seems to me that it would be preferable if fwprop tried propagating into
single uses first before trying to propagate into others.  This should improve
the chances of making more defs redundant.  With the example above, (2) would
be redundant after propagation.

Am I missing something?  If this sounds reasonable I can try to work out a
patch and see what happens.

Adam


Re: fwprop and CSE const anchor opt

2009-04-04 Thread Adam Nemet
Richard Sandiford writes:
> Adam Nemet  writes:
> > In order for my CSE const anchor patch to work I needed to drastically lower
> > the cost of immediate addition in the MIPS backend.  This was acceptable as 
> > a
> > proof of concept but not in general of course.
> >
> > The problem is with "single-insn"/simple constants.  We would also like 
> > these
> > to use const anchors in the hope that the resulting expression would get
> > propagated into MEM expressions.  I was hoping that fwprop would do this
> > propagation for me.  However, since a single-insn constant load is cheaper
> > than an immediate addition (make sense), fwprop is free to propagate either
> > (1) into (2) or (2) into (3) here:
> >
> >   (1) a <- C
> >   |
> >   +--> (2) b <- a + D
> >   ||
> >   |+--> (3) mem(b)
> >   |
> >   +--> (4) use(a)
> >
> > Which one it does depends on which one it tries first.  Right now we go in
> > insn order so we'd do (1) to (2) preventing to do (2) to (3) later.
> 
> Probably a dumb question, sorry, but is this only a problem for MIPS when
> C + D is in the range [0x8000, 0x]?  Or is the (1)->(2) substitution
> somehow succeeding in other cases?

Right and no.  I wasn't very precise saying single-insn constants; I should
have said single-insn constants produced with ORI using the zero register.
(The testcase in PR/33699 is really good :).)

Just to recap the other cases:

* Synthesizing multi-insns constants from const anchors make sense regardless
whether the constant is used as an address so I am adjusting the cost system
to make those stick independent of the context.  I still need to benchmark
this.

* For single-insn constants produced with ADDIU using the zero register, we
don't need const anchors because they can already be used in addresses
offsetting the zero register.

Adam


Re: fwprop and CSE const anchor opt

2009-04-05 Thread Adam Nemet
Richard Sandiford writes:
> Adam Nemet  writes:
> > * Synthesizing multi-insns constants from const anchors make sense 
> > regardless
> > whether the constant is used as an address so I am adjusting the cost system
> > to make those stick independent of the context.  I still need to benchmark
> > this.
> 
> Out of interest, what sort of changes did you need to make?  Do you care
> about the cases where rtx_costs is applied to a pattern that won't later
> be checked by recog (such as the calls from the expander)?  Or do you
> just care about the cases where rtx_costs is used to check whether a
> validate_change would be profitable?

I am glad you asked because I myself was going to ask you some questions in
this area ;).

I think the best way to formulate the profitability question for multi-insn
constants in CSE is this.  We found an expression using a const-anchor that is
equivalent to the REG_EQUAL note on the insn.  We go with the const-anchor
expression if it is cheaper than the REG_EQUAL expression.

Comparing the cost of the actual source in the insn against the const-anchor
expression would ignore the fact that the preceding instructions buiding up
the constants become redundant with the const-anchor expression.

I also have an alternative solution that feels more hackish and gives less
control to the backend but works with the current cost model.  For a
multi-insn constant if the current cost and the cost of the new expression are
equal we prefer the new expression.  This works for MIPS because AFAICT we
always replace a binary expression with another one of the same cost.

The problem with the first approach and I feel you were anticipating this with
your question is that multi-insn constants have a cost of zero in the MIPS
backend currently.  Or with the code from mips_rtx_costs:

case CONST_INT:
[...]
  /* When not optimizing for size, we care more about the cost
 of hot code, and hot code is often in a loop.  If a constant
 operand needs to be forced into a register, we will often be
 able to hoist the constant load out of the loop, so the load
 should not contribute to the cost.  */
  if (!optimize_size
  || mips_immediate_operand_p (outer_code, INTVAL (x)))
{
  *total = 0;
  return true;
}

That is really hard to beat for an immediate add, which is COSTS_N_INSNS (1).

Since you already offered :), can you please explain where the above code is
needed and whether we can refine the picture a little bit.  It definitely
feels like the above cost value is only true in certain contexts.  This might
be a naive comment but maybe somehow limiting the effect to that particular
context would be a good improvement.

Adam


Re: fwprop and CSE const anchor opt

2009-04-08 Thread Adam Nemet
Thank you very much.  This was very informative.

Richard Sandiford writes:
> If we have an instruction:
> 
> A: (set (reg Z) (plus (reg X) (const_int 0xdeadbeef)))
> 
> we will need to use something like:
> 
>(set (reg Y) (const_int 0xdead))
>(set (reg Y) (ior (reg Y) (const_int 0xbeef)))
> B: (set (reg Z) (plus (reg X) (reg Y)))
> 
> But if A is in a loop, the Y loads can be hoisted, and the cost
> of A is effectively the same as the cost of B.  In other words,
> the (il)legitimacy of the constant operand doesn't really matter.

My guess is that A not being a recognizable insn, this is relevant at RTL
expansion.  Is this correct?

> In summary, the current costs generally work because:
> 
>   (a) We _usually_ only apply costs to arbitrary instructions
>   (rather than candidate instruction patterns) before
>   loop optimisation.

I don't think I understand this point.  I see the part that the cost is
typically queried before loop optimization but I don't understand the
distinction between "arbitrary instructions" and "candidate instruction
patterns".  Can you please explain the difference?

>   (b) It doesn't matter what we return for invalid candidate
>   instruction patterns, because recog will reject them anyway.
> 
> So I suppose my next question is: are you seeing this problem with cse1
> or cse2?  The reasoning behind the zero cost might still be valid for
> REG_EQUAL notes in cse1.  However, it's probably not right for cse2,
> which runs after loop hoisting.

I am seeing it with both, so at least at cse2 we could do it with this.

> Perhaps we could add some kind of context parameter to rtx_costs
> to choose between the hoisting and non-hoisting cost.  As well as
> helping with your case, it could let us use the non-hoisting cost
> before loop optimisation in cases where the insn isn't going to
> go in a loop.  The drawback is that we then have to replicate
> even more of the .md file in rtx_costs.
> 
> Alternatively, perhaps we could just assume that rtx_costs always
> returns the hoisted cost when optimising for speed, in which case
> I think your alternative solution would be theoretically correct
> (i.e. not a hack ;)).

OK, I think I am going to propose this in the patch then.  It might still be
interesting to experiment with providing more context to rtx_costs.

> E.g. suppose we're deciding how to implement an in-loop multiplication.
> We calculate the cost of a multiplication instruction vs. the cost of a
> shift/add sequence, but we don't consider whether any of the backend-specific
> shift/add set-up instructions could be hoisted.  This would lead to us
> using multiplication insns in cases where we don't want to.
> 
> (This was one of the most common situations in which the zero cost helped.)

I am not sure I understand this.  Why would we decide to hoist suboperations
of a multiplication?  If it is loop-variant then even the suboperations are
loop-variant whereas if it is loop-invariant then we can hoist the whole
operation.  What am I missing?

Adam


cleanup tests failing on MIPS64

2009-04-10 Thread Adam Nemet
For two testresults now the cleanup tests are failing in both gcc and g++:

  http://gcc.gnu.org/ml/gcc-testresults/2009-04/msg01031.html
  http://gcc.gnu.org/ml/gcc-testresults/2009-04/msg00592.html

I waited for another testresults because there were some bug fixes in this
area after the eh changes.

Does somebody know what's going on?  I'll look at it otherwise.

Adam


Re: cleanup tests failing on MIPS64

2009-04-11 Thread Adam Nemet
Adam Nemet  writes:
> For two testresults now the cleanup tests are failing in both gcc and g++:
>
>   http://gcc.gnu.org/ml/gcc-testresults/2009-04/msg01031.html
>   http://gcc.gnu.org/ml/gcc-testresults/2009-04/msg00592.html
>
> I waited for another testresults because there were some bug fixes in this
> area after the eh changes.
>
> Does somebody know what's going on?  I'll look at it otherwise.

I am not exactly sure what has exposed this but the bug seems to be old.
can_throw_external in except.c does not look at the branch delay slot (second
entry in a SEQUENCE) to determine whether the insn may throw or not.

In gcc.dg/cleanup-8.c for example after inlining fn3, the trapping store is
moved to the delay slot of abort, which is a nothrow function so we decide
that fn2 can't throw and then ultimately remove the eh region around fn1 in
fn0.

Adam


Re: cleanup tests failing on MIPS64

2009-04-11 Thread Adam Nemet
Adam Nemet  writes:
> I am not exactly sure what has exposed this but the bug seems to be old.
> can_throw_external in except.c does not look at the branch delay slot (second
> entry in a SEQUENCE) to determine whether the insn may throw or not.
>
> In gcc.dg/cleanup-8.c for example after inlining fn3, the trapping store is
> moved to the delay slot of abort, which is a nothrow function so we decide
> that fn2 can't throw and then ultimately remove the eh region around fn1 in
> fn0.

I forgot to mention that I started testing a patch.

Adam


Re: cleanup tests failing on MIPS64

2009-04-12 Thread Adam Nemet
John David Anglin writes:
> The same tests now fail on hppa.  This is PR 39651.  I'm fairly certain
> this was introduced by the following change:

I put this PR in the checkin that was just approved on gcc-patc...@.  Please
close the bug if it fixes the failures on hppa too.

Adam


Re: Forgetting return values

2009-05-28 Thread Adam Nemet
Jamie Prescott  writes:
> static inline int set_prop(char const *path, char const *name,
>   void const *data, int size)
> {
> int error;
>
> asm volatile ("int\t11\n\t"
>   : "=a0" (error): "a0" (path), "a1" (name), "a2" (data),
> "a3" (size));
>
> return error;
> }
>
> extern int calc(int);
>
> int proc(int i)
> {
> int j = calc(i);
>
> return set_prop(0, 0, &j, sizeof(int));
> }
...
>
> Why is the memory clobber required, and why GCC does not understand to
> sync the value to memory when passing the address to a function?

Because you never inform GCC that you will use the value at
address *NAME.  Try to use "m"(*name) rather than "a1"(name) in the asm.

Adam


Re: Forgetting return values

2009-05-28 Thread Adam Nemet
Jamie Prescott writes:
> > From: Adam Nemet 
> > Jamie Prescott writes:
> > > static inline int set_prop(char const *path, char const *name,
> > >   void const *data, int size)
> > > {
> > > int error;
> > >
> > > asm volatile ("int\t11\n\t"
> > >   : "=a0" (error): "a0" (path), "a1" (name), "a2" 
> > > (data),
> > > "a3" (size));
> > >
> > > return error;
> > > }
> > >
> > > extern int calc(int);
> > >
> > > int proc(int i)
> > > {
> > > int j = calc(i);
> > >
> > > return set_prop(0, 0, &j, sizeof(int));
> > > }
> > ...
> > >
> > > Why is the memory clobber required, and why GCC does not understand to
> > > sync the value to memory when passing the address to a function?
> > 
> > Because you never inform GCC that you will use the value at
> > address *NAME.  Try to use "m"(*name) rather than "a1"(name) in the asm.
> 
> That's 'data', not 'name'. But OK, got it. unfortunately, I cannot use "m" 
> since
> that value need to go into a specific register.
> Any other solution?

You can also have it *in addition* as an argument to the asm that's never
used:

asm volatile ("int\t11\n\t"
  : "=a0" (error): "a0" (path), "a1" (name), "a2" (data),
"a3" (size), "m"(*data));

after changing data's type into something that can be dereferenced.

Adam


Re: From regno to pseudo?

2009-06-03 Thread Adam Nemet
Steven Bosscher  writes:
> Is there a way to get the REG for a given regno?  I am building a
> register renumbering map that is just a pair of unsigned int
> , but I can't figure out how to get the REG for
> new_regno without remembering a pointer to it myself. Is there an
> easier/better way?

regno_reg_rtx in emit-rtl.c?

Adam


STRIP_NOPS and lower-precision/bit-field types

2009-06-04 Thread Adam Nemet
In this testcase:

  struct s
  {
unsigned long a:2;
unsigned long b:40;
unsigned long c:22;
  };
  
  f (struct s s)
  {
g (((unsigned long) (s.b-8)) + 8);
  }

fold breaks the code.  Specifically, after a few transformations we're asked
to fold this.  (I changed the big constant to be in symbolic form for better
understanding):

 D.40719;
   (uint64_t) (D.40719 + ((1<<40) - 8)) + 8

The cast is removed by STRIP_NOPS so with explicit precision and signedness on
the operations we have (the constants are of types of the corresponding
operation):

   (D.40719 +:u40 ((1<<40) - 8)) +:u64 8

split_trees/associate_trees code combines the two constant resulting in:

  (uint64_t) D.40719 +:64 1<<40

Ouch.

The problem is obviously that we remove a zero-extension that would ensure the
modulo-2^40 nature of the addition (remove the overflow).

I see two ways to fix this:

1. Change STRIP_NOPS not to remove the cast if it's changing the precision
(not just when it changes machine modes).  And then audit code quality where
we need to and are allowed to look through these casts.

The above case does not qualify because the first operation may not overflow
into the >40 bits whereas the second may.  For example with 2 as s.b the
substraction should yield 0xff,,fffa and the addition 0x100,,0002.
(Note that there in *no* default integer promotion on bitfield types wider
than int [1].)

2. Just fix this particular case of folding (split_trees/associate_trees) and
hope everything else works ;).

I am for 1, FWIW, but I know this issue has some history [2], so I'd like to
hear others' opinion before I started working on a patch.

Adam

[1] http://gcc.gnu.org/ml/gcc/2008-01/msg00215.html
[2] http://gcc.gnu.org/ml/gcc-patches/2004-07/msg01478.html


Re: Bootstrap failures on solaris

2009-06-09 Thread Adam Nemet
Revital1 Eres  writes:
> Hello,
>
>> i386-pc-solaris:
>>
>> cc1: warnings being treated as errors
>> /export/home/arth/gnu/gcc.git/gcc/tree-ssa-loop-prefetch.c: In function
>> 'loop_prefetch_arrays':
>> /export/home/arth/gnu/gcc.git/gcc/tree-ssa-loop-prefetch.c:1589:7: error:
>
>> format '%ld' expects type 'long int', but argument 5 has type 'long long
> int'
>> make[3]: *** [tree-ssa-loop-prefetch.o] Error 1
>> make[3]: *** Waiting for unfinished jobs
>
> I get this error on ppc also.

See http://gcc.gnu.org/ml/gcc-patches/2009-06/msg00712.html

Adam


Re: increasing the number of GCC reviewers

2009-06-09 Thread Adam Nemet
Andrew Haley  writes:
> We need something more like "I think Fred Bloggs knows gcc well enough
> to approve patches to reload" or "I am Fred Bloggs and I know gcc well
> enough to approve patches to reload."

And whom should such email be sent to?  The SC is best reached on gcc@
but I don't think that recommending someone publicly is necessarly a
good idea.  E.g. what if the SC does not appoint the person; does that
mean that the SC decided that he or she was not qualified enough?

IMO the best way would be to nominate someone to the SC directly and
then if the SC decides to support the nomination they can check with the
person if he or she would accept the appointment.

Adam


Rationale for an old TRUNCATE patch

2009-06-16 Thread Adam Nemet
I am trying to understand the checkin by Jeff Law from about 11 years ago:

   r19204 | law | 1998-04-14 01:04:21 -0700 (Tue, 14 Apr 1998) | 4 lines
   
   
   * combine.c (simplify_rtx, case TRUNCATE): Respect value of
   TRULY_NOOP_TRUNCATION.
   
   
   Index: combine.c
   ===
   --- combine.c(revision 19018)
   +++ combine.c(revision 19204)
   @@ -3736,7 +3736,9 @@ simplify_rtx (x, op0_mode, last, in_dest
  if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
break;

   -  if (GET_MODE_BITSIZE (mode) <= HOST_BITS_PER_WIDE_INT)
   +  if (GET_MODE_BITSIZE (mode) <= HOST_BITS_PER_WIDE_INT
   +  && TRULY_NOOP_TRUNCATION (GET_MODE_BITSIZE (mode),
   +GET_MODE_BITSIZE (GET_MODE (XEXP (x, 0)
SUBST (XEXP (x, 0),
   force_to_mode (XEXP (x, 0), GET_MODE (XEXP (x, 0)),
  GET_MODE_MASK (mode), NULL_RTX, 0));

This optimization simplifies the input to a truncate by only computing bits
that won't be eliminated by the truncation.  Normally these are the bits in
the output mode mask.  Note that the optimization does not change the truncate
into a low-part subreg, which would pretty automatically warrant the
TRULY_NOOP_TRUNCATION check.

Specifically I am curious if this was prompted by MIPS (or SH) or maybe some
other target that have a notion of truncate different from MIPS.

force_to_mode was broken WRT truncate before my patch in 2006:

  http://gcc.gnu.org/ml/gcc-patches/2006-01/msg00553.html

so maybe the Jeff's patch was just trying to avoid the call to force_to_mode.

Anyhow, I don't think that the above transformation is illegal for MIPS64.

When truncating to SI mode the bits outside the resulting SI mode will all be
copies of the SI mode sign bit so their input values are obviously irrelevant.

For QI and HI modes the representation requires the upper 32 bits in the
64-bit word to be copies of the 31st bit.  The transformation might change the
31 bits so it can affect the the resulting bits in the 64-bit word.  This
however should have no visible effect.  Operations on QI- and HI-mode values
are performed in SI mode with proper extensions before the operation if bits
outside the mode would affect the result.

I also tried a few things after removing the above check.  Boostrapping and
regtesting on mips64octeon-linux was successful so was regtesting on
mipsisa64r2-elf.  The assembly output of gcc.c-torture/execute/*.c improved
as:

  13 files changed, 73 insertions(+), 99 deletions(-)

The changes typically remove zero and sign-extensions before a subsequent
truncation.

Any further information on this subject would be very much appreciated.

Adam


Re: Rationale for an old TRUNCATE patch

2009-06-16 Thread Adam Nemet
Ian Lance Taylor writes:
> I agree that this patch looks wrong in todays compiler.  There should be
> no need to call TRULY_NOOP_TRUNCATION if you are in a TRUNCATE anyhow.

Thanks.

Do you think we can assume this for TRUNCATEs in general or only for MIPS-like
TRUNCATEs?

I can't think of why it would be useful to represent a mode so that bits
outside the mode mask don't either depent on bits inside the mask or are
don't-care bits.  IOW, can we assume that for any TRUNCATE from wider mode W
to narrower mode N the following holds:

  (truncate:N expr:W) == (truncate:N (and:W expr:W GET_MODE_MASK(Nmode)))

?  Where == is not necessarily identical bit representation of the object
holding the value (e.g. QI HI values in MIPS) but that they are
indistinguishable in the operations that are defined on them.

Adam


Re: Rationale for an old TRUNCATE patch

2009-06-16 Thread Adam Nemet
Jeff Law writes:
> Ian Lance Taylor wrote:
> > Adam Nemet  writes:
> >
> >   
> >> I am trying to understand the checkin by Jeff Law from about 11 years ago:
> >>
> >>r19204 | law | 1998-04-14 01:04:21 -0700 (Tue, 14 Apr 1998) | 4 lines
> >>
> >>
> >>* combine.c (simplify_rtx, case TRUNCATE): Respect value of
> >>TRULY_NOOP_TRUNCATION.
> >>
> >>
> >>Index: combine.c
> >>===
> >>--- combine.c   (revision 19018)
> >>+++ combine.c   (revision 19204)
> >>@@ -3736,7 +3736,9 @@ simplify_rtx (x, op0_mode, last, in_dest
> >>   if (GET_MODE_CLASS (mode) == MODE_PARTIAL_INT)
> >>break;
> >> 
> >>-  if (GET_MODE_BITSIZE (mode) <= HOST_BITS_PER_WIDE_INT)
> >>+  if (GET_MODE_BITSIZE (mode) <= HOST_BITS_PER_WIDE_INT
> >>+ && TRULY_NOOP_TRUNCATION (GET_MODE_BITSIZE (mode),
> >>+   GET_MODE_BITSIZE (GET_MODE (XEXP 
> >> (x, 0)
> >>SUBST (XEXP (x, 0),
> >>   force_to_mode (XEXP (x, 0), GET_MODE (XEXP (x, 0)),
> >>  GET_MODE_MASK (mode), NULL_RTX, 0));
> >>
> >> This optimization simplifies the input to a truncate by only computing bits
> >> that won't be eliminated by the truncation.  Normally these are the bits in
> >> the output mode mask.  Note that the optimization does not change the 
> >> truncate
> >> into a low-part subreg, which would pretty automatically warrant the
> >> TRULY_NOOP_TRUNCATION check.
> >> 
> >
> > I agree that this patch looks wrong in todays compiler.  There should be
> > no need to call TRULY_NOOP_TRUNCATION if you are in a TRUNCATE anyhow.
> >   
> Based on reviewing my old notes, we do have to ensure that combine 
> doesn't replace a TRUNCATE with a SUBREG as that can result in having a 
> 32bit value that isn't sign-extended, which clearly causes MIPS64 ports 
> grief.

Thanks for the long information in your other reply.

As I said in the original email, we are not turning a TRUNCATE into a SUBREG
in this transformation.  We're just simplifying the input expression to
truncate with the knowledge that only the truncated mode bits are relevant
from the input.  At the end we are still left with a truncate expression but
possibly with a simpler operand.

Adam


Re: Rationale for an old TRUNCATE patch

2009-06-16 Thread Adam Nemet
Ian Lance Taylor writes:
> truncate has a machine independent meaning.

Yes, I guess with your definition below it does.  It's interesting though that
Jim had said the opposite in the excerpts posted by Jeff:

  And a later message from Jim:
  
  Truncate converts a value from a larger to a smaller mode in a machine
  dependent way.
  
  A subreg just chops off the high bits.  A truncate does not.  The name might
  be a little confusing, but the whole point of truncate is to have something
  that operates differently than a subreg.
  
  Combine is clearly making an invalid transformation.
 
> Yes.  The bits in Nmode's mask are determined by the truncate.  The
> other bits are don't-care.  If the result of the truncate happens to
> wind up in a register, then in some cases PROMOTE_MODE will apply.

And IIUC this don't-care nature of the other bits that allows backends to
define the upper bits.  For example to have sign-bit copies there in registers
to enforce the MIPS64 SI mode representation.  And treating the don't care
bits outside SI mode in this way is true for any other SI-mode operations
performed on registers not just truncate, right?  Hmm, nice.

Adam


Re: Rationale for an old TRUNCATE patch

2009-06-17 Thread Adam Nemet
Ian Lance Taylor writes:
> I'm not entirely sure, but I don't think Jim said the opposite.  He said
> that the way truncate works is machine dependent.  I said that the
> output of truncate is machine independent.  Since truncate is only
> defined for fixed-point modes, I think both statements are true.

OK but in that way every operation is machine dependent not just truncate.
BTW, why is being fixed-point relevant here?

>From that little excerpt I just gathered that maybe my misunderstanding of
treating truncate as a blackbox was not completely without precedence.  But
anyway I think I understand now.  JTBS, can you agree with other statement in
my email?:

> And IIUC this don't-care nature of the other bits that allows backends to
> define the upper bits.  For example to have sign-bit copies there in registers
> to enforce the MIPS64 SI mode representation.  And treating the don't care
> bits outside SI mode in this way is true for any other SI-mode operations
> performed on registers not just truncate, right?  Hmm, nice.

Thanks for all the explanations.

Adam


Re: Phase 1 of gcc-in-cxx now complete

2009-06-26 Thread Adam Nemet
Ian Lance Taylor  writes:
> I would like to encourage people to try using --enable-build-with-cxx in
> other configuration--other bootstraps, cross-compilers--to see how well
> it works.  Please let me know if you run into problems that you don't
> know how, or don't have time, to fix.

With GMP 4.2.1 build fails like this:

g++ -mabi=n32 -c  -g -DIN_GCC   -W -Wall -Wwrite-strings -Wcast-qual 
-Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I../../src/gcc 
-I../../src/gcc/. -I../../src/gcc/../include -I../../src/gcc/../libcpp/include  
-I../../src/gcc/../libdecnumber -I../../src/gcc/../libdecnumber/dpd 
-I../libdecnumber
../../src/gcc/c-lang.c -o c-lang.o
In file included from ../../src/gcc/double-int.h:24,
 from ../../src/gcc/tree.h:30,
 from ../../src/gcc/c-lang.c:27:
//usr/include/gmp.h:515: error: 'std::FILE' has not been declared
make[3]: *** [c-lang.o] Error 1
make[3]: Leaving directory `/mnt/src/gcc-tmp/mips64octeon-linux/gcc'
make[2]: *** [all-stage1-gcc] Error 2
make[2]: Leaving directory `/mnt/src/gcc-tmp/mips64octeon-linux'
make[1]: *** [stage1-bubble] Error 2
make[1]: Leaving directory `/mnt/src/gcc-tmp/mips64octeon-linux'
make: *** [all] Error 2

We need at least GMP 4.2.3 with --enable-build-with-cxx:

>From <http://gmplib.org/gmp4.2.html>:

Changes in GMP 4.2.3
Bugs:

...
* For C++, gmp.h now includes cstdio, improving compiler
compatibility. 
...

Adam


Re: Phase 1 of gcc-in-cxx now complete

2009-06-27 Thread Adam Nemet
Ian Lance Taylor  writes:
> I would like to encourage people to try using --enable-build-with-cxx in
> other configuration--other bootstraps, cross-compilers--to see how well
> it works.  Please let me know if you run into problems that you don't
> know how, or don't have time, to fix.

MIPS bootstraps fine with --enable-build-with-cxx:

  http://gcc.gnu.org/ml/gcc-testresults/2009-06/msg02323.html

I don't know if the new failures are related to C++; I will do a C build
later and compare.

Ian, thanks for your C++ work!

Adam


How is the definition of stack canary on MIPS arch?

2010-08-30 Thread Adam Jiang
Hello,

How is the definition of stack canery on MIPS arch? Where can I find a
specific on this topic?

When I read the source in Linux kerne, it was said that stack canary for
implementing stack protector is defined as an offset to %gs on x86
architecture. How about stack canary defined on MIPS?

Best regards,
/Adam



Re: How is the definition of stack canary on MIPS arch?

2010-08-30 Thread Adam Jiang
On Mon, Aug 30, 2010 at 10:43:44AM -0700, David Daney wrote:
> On 08/30/2010 09:46 AM, Richard Henderson wrote:
> >On 08/30/2010 03:45 AM, Adam Jiang wrote:
> >>When I read the source in Linux kerne, it was said that stack canary for
> >>implementing stack protector is defined as an offset to %gs on x86
> >>architecture. How about stack canary defined on MIPS?
> >
> >It's not implemented for MIPS.
> >
> >
> 
> For the Linux kernel, the MIPS stack canary would be a constant
> offset (that depends on PAGE_SIZE) from register $28.
> 
> David Daney

Thanks, David and Richard.

Is there code, doc or anything on this topic I can refer to? Is it
defined in gcc internally or in kernel source itself? Would you please
redirect me to the right place?

I have done some research by googling. Here are what I've gotten.

http://www.trl.ibm.com/projects/security/ssp/main.html
http://www.trl.ibm.com/projects/security/ssp/
http://lxr.linux.no/linux+v2.6.35/arch/x86/include/asm/stackprotector.h

However, it seems there is no documents about how this is done on MIPS.
Do I miss something?

/Adam


Re: How is the definition of stack canary on MIPS arch?

2010-09-06 Thread Adam Jiang
On Tue, Aug 31, 2010 at 09:12:57AM -0700, David Daney wrote:
> On 08/30/2010 08:36 PM, Adam Jiang wrote:
> >On Mon, Aug 30, 2010 at 10:43:44AM -0700, David Daney wrote:
> >>On 08/30/2010 09:46 AM, Richard Henderson wrote:
> >>>On 08/30/2010 03:45 AM, Adam Jiang wrote:
> >>>>When I read the source in Linux kerne, it was said that stack canary for
> >>>>implementing stack protector is defined as an offset to %gs on x86
> >>>>architecture. How about stack canary defined on MIPS?
> >>>
> >>>It's not implemented for MIPS.
> >>>
> >>>
> >>
> >>For the Linux kernel, the MIPS stack canary would be a constant
> >>offset (that depends on PAGE_SIZE) from register $28.
> >>
> >>David Daney
> >
> >Thanks, David and Richard.
> >
> >Is there code, doc or anything on this topic I can refer to? Is it
> >defined in gcc internally or in kernel source itself? Would you please
> >redirect me to the right place?
> >
> 
> I am unaware of any documents.  The MIPS Linux kernel ABI is not
> really documented anywhere, one learns it by studying and hacking on
> the source code.
> 
> 32-bit kernels use a variant of the o32 ABI, 64-bit kernels use a
> variant of n64.  Both dedicate register $28 as a pointer to the
> thread area of which the stack is a part.
> 
> The form any stack canary for the MIPS Linux kernel will be
> determined by whomever implements it.
> 
> >I have done some research by googling. Here are what I've gotten.
> >
> >http://www.trl.ibm.com/projects/security/ssp/main.html
> >http://www.trl.ibm.com/projects/security/ssp/
> >http://lxr.linux.no/linux+v2.6.35/arch/x86/include/asm/stackprotector.h
> >
> >However, it seems there is no documents about how this is done on MIPS.
> >Do I miss something?
> >
> 
> At RTH said, "It's not implemented for MIPS.", so there was really
> nothing to miss.

I found this.

http://gcc.gnu.org/ml/gcc-patches/2009-04/msg02164.html

However, my toolchain which I downloaded from CodeSourcery complained
-fstack-protector was not supported when I tried this out. So, my
question is what's the meaning by "It's not implemented for MIPS".

Linaro added stack protector in Linux kernel about two month ago. This
patch has been pulled into a new release. Does this mean there is only
one step left on Linux Kernel side to get support SSP on MIPS?

/Adam

> 
> David Daney


[Patch] [C++0x] Support decltype as base-specifier (Bug 42603)

2011-01-20 Thread Adam Butcher
Hi there,

Attached is a tentative patch to support using decltype as a
base-type-specifier (Re:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42603).

It should handle also the case where decltype is used as the start of a
nested-name-specifier when this is supported in the future.

Any feedback welcome -- I have done only basic testing -- it appears to
work okay for what I want (so far at least).

Regards,
Adam

PS. I already have sorted copyright assignment a while ago so you can go
ahead and apply this at your convenience if it is any good (subject to
review of course).


0001-C-0x-Support-decltype-as-base-specifier.patch
Description: Binary data


Re: [Patch] [C++0x] Support decltype as base-specifier (Bug 42603)

2011-01-20 Thread Adam Butcher
On Thu, January 20, 2011 4:43 pm, Jonathan Wakely wrote:
> On 20 January 2011 14:26, Adam Butcher wrote:
> > Hi there,
> >
> > Attached is a tentative patch ...
> > [snip]
>
> thanks for working on this.  Patches should be sent to the gcc-patches
> list for review, you could then add the URL of the archived email to
> the bugzilla report.
>
Thanks Jonathan, Paolo had already mailed me off list about this.  I
have posted the original mail and a further patch to handle decltype in
nested-name-specifiers to gcc-patches.

I will update the related bugzilla issues 42603 and 6709 also.

Cheers
Adam




Re: Reconsidering gcjx

2006-01-28 Thread Adam Megacz

Tom Tromey <[EMAIL PROTECTED]> writes:
> I think our technical approach should be to have ecj emit class files,
> which would then be compiled by jc1.  In particular I think we could
> change ecj to emit a single .jar file. 

I (and David Crawshaw) have actually done this.

  http://tool.ibex.org/

The Eclipse compiler is great.  I really like it.

Plus, lots of tools are designed to use Java bytecode as an
intermediate format (however ill-suited it may be for that task).
Making bytecode the primary input format for gcj would make coupling
such tools to gcj a less out-there-in-left-field sort of thing.

  - a



C question: typecast changes behavior with optimizations enabled.

2006-09-26 Thread Adam Dickmeiss

Consider the attached which sweeps through an array of chars..

1) If a typecast is used (CAST defined), the *src is not updated and 
main will see (sz == 0).


2) If no typecast is used, *src is updated, and size == 1.

I expected that *src would be updated in both cases (hence the assert).. 
But I am probably wrong.. and would like to know why:-)


[EMAIL PROTECTED]:~$ gcc -O3 o2tst.c  && ./a.out
a.out: o2tst.c:22: main: Assertion `sz == 1' failed.
Aborted
[EMAIL PROTECTED]:~$ gcc -O2 o2tst.c  && ./a.out
a.out: o2tst.c:22: main: Assertion `sz == 1' failed.
Aborted
[EMAIL PROTECTED]:~$ gcc -O o2tst.c  && ./a.out
[EMAIL PROTECTED]:~$ gcc -v
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v 
--enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr 
--enable-shared --with-system-zlib --libexecdir=/usr/lib 
--without-included-gettext --enable-threads=posix --enable-nls 
--program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-mpfr --with-tune=i686 
--enable-checking=release i486-linux-gnu

Thread model: posix
gcc version 4.1.2 20060901 (prerelease) (Debian 4.1.1-13)

Same behavior on
[EMAIL PROTECTED]:~/proj/idzebra-2.0.2/isamb$ gcc -v
Reading specs from /usr/local/lib/gcc/sparc-sun-solaris2.9/3.4.6/specs
Configured with: ../configure --with-as=/usr/ccs/bin/as 
--with-ld=/usr/ccs/bin/ld --enable-shared --enable-languages=c,c++,f77

Thread model: posix
gcc version 3.4.6

/ Adam
#include 
#include 

#define CAST (const unsigned char **)
// #define CAST 

static void decode_ptr(const char **src)
{
unsigned char c;
while ((c = *(* CAST src)++))
	;
}

int main(int argc, char **argv)
{
const char *src0 = "";
const char *src = src0;
size_t sz;

decode_ptr(&src);
sz = src - src0;
assert(sz == 1);
}



Re: [lambda] Latest experimental polymorphic lambda patches

2013-04-27 Thread Adam Butcher

Hi Jason,

On 23.04.2013 14:42, Jason Merrill wrote:

On 22.04.2013 17:42, Jason Merrill wrote:
> On 08/10/2009 08:33 PM, Adam Butcher wrote:
> > Attached are my latest experimental polymorphic lambda patches
> > against the latest lambda branch.
>
> Polymorphic lambdas were voted in for C++14 at the meeting this 
past

> week; are you interested in resuming this work?
>

Yes very interested.  I have been meaning to get around to remaking
the patch against 4.8 (and previously 4.7) for ages now.  Though
getting the time to do so has been, and will likely continue to be, a
problem what with work, fatherhood and other commitments.

Since the gcc/cp code base is significantly different now from the old
lambda branch where the changes were originally made in 2009, my
intent was to remake the changes from scratch implementing the
entirety of N3559 (now N3649) and, at least, the explicit template
parameter specifier syntax from N3560 (though I'm very interested
getting the single expression body syntax working also).

I have no problem with someone else getting on with this as I can't
guarantee consistent availability for it but I will try to get some
time to at least look at my previous aborted efforts to get the
changes applied to the more recent mainlines (I have some git repos
around here and at work with some incomplete changes).  I will see how
far I got with these and maybe post a few patches.  It is highly
likely though that these were just abortive attempts to merge the
previous stuff or incomplete restarts.


> The proposal will be at
>
>  http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3649.html
>
It's now been posted at http://isocpp.org/files/papers/N3649.html



Cheers,
Adam




Re: [lambda] First stage remake of old lambda patches

2013-06-12 Thread Adam Butcher
Hi Jason,

I did find some code in the end but it wasn't much use due to the
changes in gcc/cp since the old lambda branch from which the original
patch was based.

I've since been finding 10 minutes here and there to have a hack about
and have finally got the first stage of generic lambda support
functional against GCC's trunk.

It currently only supports the explicit template-style syntax of the
second (non approved) paper (and my original patch) but I believe this
is good place to start even if it is considered a GNU extension for
now.  I will try to reapply the 'auto' syntax patch at some point but
that is syntactic rather than a behavioral.

I want to fix up the stateless lambda ptr-to-fn conversion-op stuff
next which, I think, is currently the biggest wheel off mechanically.

I've no idea how well this will deal with parameter packs or more
exotic use cases than the simple test program below but I'd welcome
any feedback at this stage.

I do feel like I've gone backward with this but I think it's worth
getting the mechanical side working right with the latest GCC before
integrating the 'auto' parameter patch.

Cheers,
Adam

## begin test program ##

// to check generated asm
//
volatile int E = 1;
volatile int F = 2;
volatile int G = 3;
volatile int H = 4;

// instantiation generates a -Wwrite-strings warning showing the type
// bindings for T and U (currently not displayed in generic lambda
// diagnostics) as part of the synthesized constructor.
//
template 
struct diagnostic { char* x = ""; };

int main()
{
  int i = 1;

  // e: monomorphic stateless lambda
  // f: monomorphic closure
  // g: polymorphic stateless lambda
  // (Note: explicit return type specified to avoid current
  //conversion-op bug)
  // h: polymorphic closure
  //
  auto e = []  (char a, char b) { return E + a + b; };
  auto f = [i] (char a, char b) { return F + i + a + b; };
  auto g = []   (T a, U b) -> double { 
diagnostic(); return G + a + b; };
  auto h = [i]  (T a, U b) { diagnostic(); return 
H + i + a + b; };

  // SEGV currently: conversion-op not implemented correctly yet:
  //   int (*p) (char, float) = g;
  //   int (*q) (int, double) = g;
  
  return g (1.0, 2) + h (2.f, '\003');
}

## end test program ##




[PATCH] Remake initial changes from old lambda branch to support explicit lambda template support and fixup against mainline gcc/cp changes.

2013-06-12 Thread Adam Butcher
This only supports the explicit template parameter syntax and does not
correctly support conversion to static ptr-to-function for generic
stateless lambdas.
---
 gcc/cp/mangle.c|  2 ++
 gcc/cp/parser.c| 43 +--
 gcc/cp/semantics.c | 24 
 3 files changed, 63 insertions(+), 6 deletions(-)

diff --git a/gcc/cp/mangle.c b/gcc/cp/mangle.c
index 8da62b5..4d4c0fd 100644
--- a/gcc/cp/mangle.c
+++ b/gcc/cp/mangle.c
@@ -1438,6 +1438,8 @@ write_closure_type_name (const tree type)
   MANGLE_TRACE_TREE ("closure-type-name", type);
 
   write_string ("Ul");
+  if (TREE_CODE (fn) == TEMPLATE_DECL) // XXX: should we bother mangling 
generic lambdas?
+fn = DECL_TEMPLATE_RESULT (fn);
   write_method_parms (parms, /*method_p=*/1, fn);
   write_char ('E');
   write_compact_number (LAMBDA_EXPR_DISCRIMINATOR (lambda));
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index 319da21..407dca3 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -8668,6 +8668,7 @@ cp_parser_lambda_introducer (cp_parser* parser, tree 
lambda_expr)
 /* Parse the (optional) middle of a lambda expression.
 
lambda-declarator:
+ < template-parameter-list [opt] >
  ( parameter-declaration-clause [opt] )
attribute-specifier [opt]
mutable [opt]
@@ -8687,10 +8688,26 @@ cp_parser_lambda_declarator_opt (cp_parser* parser, 
tree lambda_expr)
   tree param_list = void_list_node;
   tree attributes = NULL_TREE;
   tree exception_spec = NULL_TREE;
+  tree template_param_list = NULL_TREE;
   tree t;
 
-  /* The lambda-declarator is optional, but must begin with an opening
- parenthesis if present.  */
+  /* The template-parameter-list is optional, but must begin with
+ an opening angle if present.  */
+  if (cp_lexer_next_token_is (parser->lexer, CPP_LESS))
+{
+  cp_lexer_consume_token (parser->lexer);
+
+  template_param_list = cp_parser_template_parameter_list (parser);
+
+  cp_parser_skip_to_end_of_template_parameter_list (parser);
+
+  /* We just processed one more parameter list.  */
+  ++parser->num_template_parameter_lists;
+}
+
+  /* The parameter-declaration-clause is optional (unless
+ template-parameter-list was given), but must begin with an
+ opening parenthesis if present.  */
   if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN))
 {
   cp_lexer_consume_token (parser->lexer);
@@ -8736,6 +8753,8 @@ cp_parser_lambda_declarator_opt (cp_parser* parser, tree 
lambda_expr)
 
   leave_scope ();
 }
+  else if (template_param_list != NULL_TREE) // generate diagnostic
+cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN);
 
   /* Create the function call operator.
 
@@ -8779,6 +8798,23 @@ cp_parser_lambda_declarator_opt (cp_parser* parser, tree 
lambda_expr)
DECL_ARTIFICIAL (fco) = 1;
/* Give the object parameter a different name.  */
DECL_NAME (DECL_ARGUMENTS (fco)) = get_identifier ("__closure");
+   if (template_param_list != NULL_TREE)
+ {
+   tree saved_current_function_decl = current_function_decl;
+
+   /* Clear current function decl to allow check_member_template
+  to pass.  Currently it rejects templates inside functions.
+  Couldn't figure out a clean way to test for lambda inside
+  check_member_template.  */
+   current_function_decl = NULL_TREE;
+   fco = finish_member_template_decl (fco);
+   current_function_decl = saved_current_function_decl;
+
+   --parser->num_template_parameter_lists;
+
+   finish_template_decl (template_param_list);
+
+ }
   }
 
 finish_member_declaration (fco);
@@ -8822,6 +8858,9 @@ cp_parser_lambda_body (cp_parser* parser, tree 
lambda_expr)
 tree compound_stmt;
 tree cap;
 
+if (TREE_CODE (fco) == TEMPLATE_DECL)
+  fco = DECL_TEMPLATE_RESULT (fco);
+
 /* Let the front end know that we are going to be defining this
function.  */
 start_preparsed_function (fco,
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index b5c3b0a..db5ba7b 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -9135,7 +9135,7 @@ lambda_function (tree lambda)
  /*protect=*/0, /*want_type=*/false,
  tf_warning_or_error);
   if (lambda)
-lambda = BASELINK_FUNCTIONS (lambda);
+lambda = OVL_CURRENT (BASELINK_FUNCTIONS (lambda));
   return lambda;
 }
 
@@ -9381,6 +9381,8 @@ build_capture_proxy (tree member)
 
   closure = DECL_CONTEXT (member);
   fn = lambda_function (closure);
+  if (TREE_CODE (fn) == TEMPLATE_DECL)
+fn = DECL_TEMPLATE_RESULT (fn);
   lam = CLASSTYPE_LAMBDA_EXPR (closure);
 
   /* The proxy variable forwards to the capture field.  */
@@ -9795,7 +9797,8 @@ maybe_add_lambda_conv_op (tree type)
   if (processing_template_decl)
 return;
 
-  if (DECL_INITIAL (callop) == NULL_TREE)
+  if (TREE_CODE (callop) != TEMPLATE

Re: Preventing link time optimization from inlining

2012-12-21 Thread Adam Lewis
On Fri, Dec 21, 2012 at 4:41 AM, Richard Biener
 wrote:
>
> On Thu, Dec 20, 2012 at 5:33 PM, Adam  wrote:
> > Hi,
> >
> > When using -flto is there a way to tell gcc to not inline a particular 
> > function? attribute noinline appears to have no effect. I am using gcc 
> > 4.7.2. The use case is for certain functions that cause optimization 
> > problems when they are inlined. An example is when the function is throwing 
> > a c++ exception.
>
> attribute noinline should have the desired effect (you may need
> noclone as well).
>


Richard,

Doesn't seem to work for me. Here is what I am trying:
func.hpp ( I have tried this with various combinations and orderings
of noinline, noclone ):

void  my__func() __attribute__((noinline)) __attribute__ ((noclone));

func.cpp:
#include 
void my__func() { printf("prnting"); }

main.cpp
#include "func.hpp"
int main() { my__func(); return 0; }


then:
g++ -O3 -g -flto main.cpp func.cpp -o lto_inline_test

I load up gdb, break main, run. disassemble and see that it is
directly calling printf instead of my__func.

gcc -v:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro
4.7.2-11precise2'
--with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs
--enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-4.7 --enable-shared --enable-linker-build-id
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--enable-gnu-unique-object --enable-plugin --enable-objc-gc
--disable-werror --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64 --with-tune=generic
--enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.7.2 (Ubuntu/Linaro 4.7.2-11precise2)

Any thoughts?




> Richard.
>
> > Thanks,
> > Adam


Re: Preventing link time optimization from inlining

2012-12-21 Thread Adam Lewis
On Fri, Dec 21, 2012 at 11:08 AM, Richard Günther
 wrote:
> Adam Lewis  wrote:
>
>>On Fri, Dec 21, 2012 at 4:41 AM, Richard Biener
>>wrote:
>>
>>> On Thu, Dec 20, 2012 at 5:33 PM, Adam  wrote:
>>> > Hi,
>>> >
>>> > When using -flto is there a way to tell gcc to not inline a
>>particular
>>> function? attribute noinline appears to have no effect. I am using
>>gcc
>>> 4.7.2. The use case is for certain functions that cause optimization
>>> problems when they are inlined. An example is when the function is
>>throwing
>>> a c++ exception.
>>>
>>> attribute noinline should have the desired effect (you may need
>>> noclone as well).
>>>
>>> Richard.
>>>
>>
>>
>>Richard,
>>
>>Doesn't seem to work for me. Here is what I am trying:
>>func.hpp ( I have tried this with various combinations and orderings of
>>noinline, noclone ):
>>
>>void  my__func() __attribute__((noinline)) __attribute__ ((noclone));
>>
>>func.cpp:
>>#include 
>>void my__func() { printf("prnting"); }
>
> The attribute is missing here. Or the include.


Of course I missed the include, it works now. Sorry for wasting your time!

>
> Richard.
>>
>>main.cpp
>>#include "func.hpp"
>>int main() { my__func(); return 0; }
>>
>>
>>then:
>>g++ -O3 -g -flto main.cpp func.cpp -o lto_inline_test
>>
>>I load up gdb, break main, run. disassemble and see that it is directly
>>calling printf instead of my__func.
>>
>>gcc -v:
>>Using built-in specs.
>>COLLECT_GCC=gcc
>>COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper
>>Target: x86_64-linux-gnu
>>Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro
>>4.7.2-11precise2'
>>--with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs
>>--enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr
>>--program-suffix=-4.7 --enable-shared --enable-linker-build-id
>>--with-system-zlib --libexecdir=/usr/lib --without-included-gettext
>>--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7
>>--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
>>--enable-libstdcxx-debug --enable-libstdcxx-time=yes
>>--enable-gnu-unique-object --enable-plugin --enable-objc-gc
>>--disable-werror --with-arch-32=i686 --with-abi=m64
>>--with-multilib-list=m32,m64 --with-tune=generic
>>--enable-checking=release
>>--build=x86_64-linux-gnu --host=x86_64-linux-gnu
>>--target=x86_64-linux-gnu
>>Thread model: posix
>>gcc version 4.7.2 (Ubuntu/Linaro 4.7.2-11precise2)
>>
>>Any thoughts?
>>
>>
>>
>>
>>
>>>
>>> > Thanks,
>>> > Adam
>>>
>
>
> --
> Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.


Re: About the gcc Warning: setting incorrect section attributes for .note

2008-05-22 Thread Adam Nemet
"XU SHENG" <[EMAIL PROTECTED]> writes:
>   int priv_dat  __attribute__ ((section(".note"))) = MAGIC;
>
...
>
>   It's clear to me that only section marked attribute with the
> startup of ".note" can be compiled to section with type SHT_NOTE in ELF
> file. Then only problem is assembler in gcc assumed ".note" section as
> "aw" (allocate/write) as default property, but actually, the .note
> section can be only marked with "a" (note section is readonly). If I
> modify the temporary .s export by gcc assembler by delete "w" property,
> the warning is disappeared.

Did you try making priv_dat const?

Adam


Re: newlib & libgcov

2008-06-12 Thread Adam Nemet
"Bingfeng Mei" <[EMAIL PROTECTED]> writes:
> Both -Dinhibit_libc and -DL_gcov are used as shown in our building log
> file.  I guess -Dinhibit_libc is added because we used newlibc instead
> of glibc. I tried to grep these functions in newlibc and didn't find
> them.  My question is how to enable gcov with newlibc.  Do I need to
> write my own versions of them?  Any suggestion is greatly appreciated.  

You need to configure with --with-headers so that the headers are used during
the build and that inhibit_libc is not defined.  I think this is what the
crosstool-newlib script does as well.

Adam


Re: RFC: Adding non-PIC executable support to MIPS

2008-07-02 Thread Adam Nemet
Richard Sandiford writes:
> However, IMO, your argument about MTI being the central authority
> is a killer one.  The purpose of the GNU tools should be to follow
> appropriate standards where applicable (and extend them where it
> seems wise).  So from that point of view, I agree that the GNU tools
> should follow the ABI that Nigel and MTI set down.  Consider my
> patch withdrawn.

While I'm not entirely clear how this decision came about I'd like to point
out that it is unfortunate that MTI had not sought wider consensus for this
ABI extension among MIPS implementors and the community.

We would not be in this situation with duplicated efforts and much frustration
if this proposal had been circulated properly ahead of time.

> I've been thinking about that a lot recently, since I heard about
> your implementation.  I kind-of guessed it had been agreed with MTI
> beforehand (although I hadn't realised MTI themselves had written
> the specification).  Having thought it over, I think it would be best
> if I stand down as a MIPS maintainer and if someone with the appropriate
> commercial connections is appointed instead.  I'd recommend any
> combination of yourself, Adam Nemet and David Daney (subject to
> said people being willing, of course).

Richard, while I understand your frustration I really hope that you will
reconsider your decision and remain the MIPS maintainer.  I think there is a
chance that if the community expresses that MTI should seek broader consensus
for such proposals they will do so in the future.

Your expertise as the GCC maintainer has improved the backend tremendously and
and you should be given all the information necessary to continue your great
work.

Adam


New assert in haifa-sched.c

2008-09-04 Thread Adam Nemet
haifa-sched.c:
  2302/* Let the target filter the search space.  */
  2303for (i = 1; i < ready->n_ready; i++)
  2304  if (!ready_try[i])
  2305{
  2306  insn = ready_element (ready, i);
  2307  
  2308  gcc_assert (INSN_CODE (insn) >= 0
  2309  || recog_memoized (insn) < 0);

I am hitting this assert with the Octeon pipeline patch.  Isn't the check on
recog_memoized reversed?  Or are we really asserting here that the insn has
either been recognized earlier or won't be recognized now either?!

Adam


Re: New assert in haifa-sched.c

2008-09-05 Thread Adam Nemet
Maxim Kuvyrkov writes:
> Yes, the assert is really checking exactly that.  Several pieces of 
> haifa-sched.c assume that the instruction has been recognized during 
> scheduler initialization to speed up checking if instruction is normal 
> or some kind of use/clobber/asm.

Thanks for the info but I can't seem to find the code where this is supposed
to be happening.  Can you point me to the code?

Adam


Re: New assert in haifa-sched.c

2008-09-05 Thread Adam Nemet
Maxim Kuvyrkov writes:
> I'm not 100% sure about current state of things, considering recent 
> merge of sel-sched, but before that it was:
> 
> set_priorities() -> priority() -> dep_cost() -> recog_memoized().

I don't think that was the case for all insns even before the patch.  The only
new thing is the assert which now catches this.

If the consumer is an asm just like in gcc.dg/tree-ssa/stdarg-2.c:f10() then
we would not call recog on the producer inside dep_cost*.

The patch below fixes the issue for me.  I am going to test this if it looks
good to people.

Adam

Index: haifa-sched.c
===
--- haifa-sched.c   (revision 139918)
+++ haifa-sched.c   (working copy)
@@ -646,7 +646,8 @@ insn_cost (rtx insn)
 
 /* Compute cost of dependence LINK.
This is the number of cycles between instruction issue and
-   instruction results.  */
+   instruction results.  We also use this function to call
+   recog_memoized on all insns.  */
 int
 dep_cost_1 (dep_t link, dw_t dw)
 {
@@ -657,7 +658,10 @@ dep_cost_1 (dep_t link, dw_t dw)
  This allows the computation of a function's result and parameter
  values to overlap the return and call.  */
   if (recog_memoized (used) < 0)
-cost = 0;
+{
+  cost = 0;
+  recog_memoized (DEP_PRO (link));
+}
   else
 {
   rtx insn = DEP_PRO (link);
@@ -2305,6 +2309,8 @@ choose_ready (struct ready_list *ready, 
  {
insn = ready_element (ready, i);
 
+   /* If this insn is recognizable we should have already
+  recognized it in dep_cost_1.  */
gcc_assert (INSN_CODE (insn) >= 0
|| recog_memoized (insn) < 0);
 



Re: Bootstrap failure in sparc-sun-solaris2.10

2008-09-12 Thread Adam Nemet
Eric Botcazou <[EMAIL PROTECTED]> writes:
>> Applying the patch Adam created and posted in the message below resolved
>> the issue and the compiler successfully bootstrapped:
>>
>> http://gcc.gnu.org/ml/gcc/2008-09/msg00139.html
>
> Thanks for reporting this.  I now can close PR 37424.
>
>> There was one reply to this message; I don't know if the patch is being
>> reworked or been formally submitted yet, but it did fix my build.
>
> OK, I'll take a look.

Yes it was formally submitted here, no review so far:

  http://gcc.gnu.org/ml/gcc-patches/2008-09/msg00574.html

Adam


Re: New assert in haifa-sched.c

2008-09-16 Thread Adam Nemet
Eric Botcazou writes:
> > Yes, the assert is really checking exactly that.  Several pieces of
> > haifa-sched.c assume that the instruction has been recognized during
> > scheduler initialization to speed up checking if instruction is normal
> > or some kind of use/clobber/asm.
> 
> What happens if an instruction hasn't been recognized upon reaching these 
> parts of haifa-sched.c?  Will it be only mis-scheduled, i.e. will this only 
> result in inferior, but still correct code?  If so, the assertion shouldn't 
> be enabled in release mode but only if ENABLE_CHECKING is defined.

Well, I can't bootstrap as it is so it would be nice if this was fixed
regardless of the checking level.  (And this is what my patch does in
http://gcc.gnu.org/ml/gcc-patches/2008-09/msg00574.html.)  Another option
would be to completely remove the assert.

Adam


FAIL: gcc.target/mips/octeon-exts-2.c scan-assembler-times

2008-09-22 Thread Adam Nemet
Hi David,

David Daney writes:
> gcc.target/mips/octeon-exts-2.c is failing when configured --with-arch=sb1
> 
> Do you know if it is failing universally or only on non-octeon targets?

It's failing on little-endian.  There is also another test that's failing on
64-bit little-endian.  I will submit a patch later.  I was just waiting for
the other Octeon patches to settle.  Thanks for the note.

Adam


Incorrect bitfield aliasing with Tree SSA

2007-06-15 Thread Adam Nemet
get_alias_set and internally record_component_aliases makes
assumptions about the IR that are only valid in RTL.  As a consequence
the constant 1 is propagated into the function call in the example
below (I found this issue with 4.1 but it is also present on trunk):

  struct s
  {
long long a:12;
long long b:12;
long long c:40;
  };
  
  struct s s, *p = &s;
  
  f ()
  {
p->a = 1;
s.a = 2;
s.b = 3;
use (p->a, s.b);
  }

or with VOPS:

  # VUSE 
  p.0_1 = p;
  # SMT.7_9 = VDEF<- missing SFT.2 VDEF
  p.0_1->a = 1;
  # SFT.2_11 = VDEF 
  s.a = 2;
  # SFT.1_13 = VDEF 
  s.b = 3;
  # p_14 = VDEF 
  # SFT.1_15 = VDEF 
  # SFT.2_16 = VDEF 
  # SMT.7_17 = VDEF 
  use (1, 3) [tail call];
 

In RTL the expmed.c functions extract_bit_field and store_bit_field
change the alias set of a bit-extraction or bit-insertion expression
of a memory operand to 0 because the actual RTL generated will access
adjacent fields.  Therefore there is no need to represent alias-subset
relationship between non-addressable fields and their containing
structure and record_component_aliases will skip such fields.

First I was trying to remove this assumption even for RTL but that
increased memory consumption on the cc1 preprocessed files by 1%.

The other idea I had was to make this check dependent on the IR type.
The problem is that the function tree_register_cfg_hooks setting the
IR type is called only after gimplification whereas get_alias_set is
called during.  Would it be acceptable to call tree_register_cfg_hooks
earlier?  Hopefully, this will be a minimal change and would make it
easy to backport it to 4.1 and 4.2.

Another option would be to have a Tree-SSA-specific version of
get_alias_set and change all the invocations in the tree-* modules.

Do people have other recommendations or a preference?

Adam


Re: Incorrect bitfield aliasing with Tree SSA

2007-06-15 Thread Adam Nemet
Daniel Berlin writes:
> On 6/15/07, Adam Nemet <[EMAIL PROTECTED]> wrote:
> > get_alias_set and internally record_component_aliases makes
> > assumptions about the IR that are only valid in RTL.
> 
> What is this assumption, exactly?

That non-addressable fields are always accessed through alias set 0.
For example:

  X: (set (mem A) ...)
  Y: (set (zero_extract (mem B) 2 2) ...)

Let's say that A and B both point to the same type and have the same
alias set 1.  The bitfield store in Y would normally have the alias
set of the field, let's say 2.  Normally 2 would have to be recorded
as an alias subset of 1.  This is not done in RTL and the comment
before record_component_aliases mentions this:

/* Record that component types of TYPE, if any, are part of that type for
   aliasing purposes.  For record types, we only record component types
   for fields that are marked addressable.

If I am not mistaken this is because the functions expanding RTL
ensure that the alias set of the mem operand in Y has alias set 0.  Or
to quote expmed.c:

  /* We may be accessing data outside the field, which means
 we can alias adjacent data.  */
  if (MEM_P (op0))
{
  op0 = shallow_copy_rtx (op0);
  set_mem_alias_set (op0, 0);
      set_mem_expr (op0, 0);
}

Adam


Re: Incorrect bitfield aliasing with Tree SSA

2007-06-17 Thread Adam Nemet
Eric Botcazou writes:
> For the Ada testcase:
> 
> ;; s->i = 0
> (insn 7 6 0 p.adb:5 (set (mem/s/j:SI (reg/v/f:DI 59 [ s ]) [4 .i+0 
> S4 A32])
> (const_int 0 [0x0])) -1 (nil))
> 
> ;; *p = 1
> (insn 8 7 0 p.adb:6 (set (mem:SI (reg/v/f:DI 60 [ p ]) [2 S4 A32])
> (const_int 1 [0x1])) -1 (nil))
> 
> ;; return s->i
> (insn 9 8 10 p.adb:6 (set (reg:SI 62)
> (mem/s/j:SI (reg/v/f:DI 59 [ s ]) [4 .i+0 S4 A32])) -1 
> (nil))
> 
> i.e. s->i is accessed with the alias set of 'S'.

Thanks, that helped.  I think you're right.  Obviously we don't have
this issue with bitfields in C.

I am trying now to prototype a new approach along the lines of
returning true in component_uses_parent_alias_set for SFT's with
DECL_NONADDRESSABLE_P.

Adam


Re: Incorrect bitfield aliasing with Tree SSA

2007-06-18 Thread Adam Nemet
Richard Kenner writes:
> But there's also an implementation issue: where do you propose to *store*
> this fake alias set?  There is no such field as DECL_ALIAS_SET.

I am glad to see we are converging toward implementation issues now!

I am storing it in a new field under the alias_set_entry:

  get_alias_set_entry (TYPE_ALIAS_SET (t))->nonaddr_alias_set.

Adam


Re: Incorrect bitfield aliasing with Tree SSA

2007-06-18 Thread Adam Nemet
Richard Kenner writes:
> > I am glad to see we are converging toward implementation issues now!
> > 
> > I am storing it in a new field under the alias_set_entry:
> > 
> >   get_alias_set_entry (TYPE_ALIAS_SET (t))->nonaddr_alias_set.
> 
> Where T is which type? 

Type of the expression passed to get_alias_set.  And without the
component_uses_parent_alias_set loop.

Adam


  1   2   >