atomicity of x86 bt/bts/btr/btc?

2010-10-19 Thread Jay K

gcc-4.5/gcc/config/i386/i386.md:

;; %%% bts, btr, btc, bt.
;; In general these instructions are *slow* when applied to memory,
;; since they enforce atomic operation. When applied to registers,


I haven't found documented confirmation that these instructions are atomic 
without a lock prefix,
having checked Intel and AMD documentation and random web searching.
They are mentioned as instructions that can be used with lock prefix.


- Jay 


Re: atomicity of x86 bt/bts/btr/btc?

2010-10-19 Thread Rick C. Hodgin
> ;; %%% bts, btr, btc, bt.
> ;; In general these instructions are *slow* when applied to memory,
> ;; since they enforce atomic operation. When applied to registers,
> 
> I haven't found documented confirmation that these instructions are atomic 
> without a lock prefix,
> having checked Intel and AMD documentation and random web searching.
> They are mentioned as instructions that can be used with lock prefix.

They do not automatically lock the bus.  They will lock the bus with the
explicit LOCK prefix, and BTS is typically used for an atomic read/write
operation.

- Rick



RE: atomicity of x86 bt/bts/btr/btc?

2010-10-19 Thread Jay K


> Subject: Re: atomicity of x86 bt/bts/btr/btc?
> From: foxmuldrsterm
> To: jay.krell
> CC: gcc@gcc.gnu.org
> Date: Tue, 19 Oct 2010 02:52:34 -0500
>
> > ;; %%% bts, btr, btc, bt.
> > ;; In general these instructions are *slow* when applied to memory,
> > ;; since they enforce atomic operation. When applied to registers,
> >
> > I haven't found documented confirmation that these instructions are atomic 
> > without a lock prefix,
> > having checked Intel and AMD documentation and random web searching.
> > They are mentioned as instructions that can be used with lock prefix.
>
> They do not automatically lock the bus. They will lock the bus with the
> explicit LOCK prefix, and BTS is typically used for an atomic read/write
> operation.
>
> - Rick

 
Thanks Rick.
I'll go back to using them.
I'm optimizing mainly for size.
The comment should perhaps be amended.
The "since they enforce atomic operation" part seems wrong.
 
 - Jay


RE: atomicity of x86 bt/bts/btr/btc?

2010-10-19 Thread Rick C. Hodgin
> > They do not automatically lock the bus. They will lock the bus with the
> > explicit LOCK prefix, and BTS is typically used for an atomic read/write
> > operation.

> Thanks Rick.
> I'll go back to using them.
> I'm optimizing mainly for size.
> The comment should perhaps be amended.
> The "since they enforce atomic operation" part seems wrong.

Np.  For citation, see here (page 166).

   http://www.intel.com/Assets/PDF/manual/253666.pdf

- Rick




RE: atomicity of x86 bt/bts/btr/btc?

2010-10-19 Thread Jay K


> Subject: RE: atomicity of x86 bt/bts/btr/btc?
> From: foxmuldrster
> To: jay
> CC: gcc
> Date: Tue, 19 Oct 2010 03:05:26 -0500
>
> > > They do not automatically lock the bus. They will lock the bus with the
> > > explicit LOCK prefix, and BTS is typically used for an atomic read/write
> > > operation.
>
> > Thanks Rick.
> > I'll go back to using them.
> > I'm optimizing mainly for size.
> > The comment should perhaps be amended.
> > The "since they enforce atomic operation" part seems wrong.
>
> Np. For citation, see here (page 166).
>
> http://www.intel.com/Assets/PDF/manual/253666.pdf
>
> - Rick

 
Yep that was one of my references.
 
 
It might be nice if optimizing for size would use them with code like e.g.:
 
 
void set_bit(size_t* a, size_t b)
{
  const unsigned c = sizeof(size_t) * CHAR_BIT;
  a[b / c] |= (((size_t)1) << (b % c));
}
 
void clear_bit(size_t* a, size_t b)
{
  const unsigned c = sizeof(size_t) * CHAR_BIT;
  a[b / c] &=  ~(((size_t)1) << (b % c));
}
 
int get_bit(size_t* a, size_t b)
{
  const unsigned c = sizeof(size_t) * CHAR_BIT;
  return !!(a[b / c] & (((size_t)1) << (b % c)));
}
 
 
 - Jay


Bug in expand_builtin_setjmp_receiver ?

2010-10-19 Thread Frederic Riss
Hi,
in builtins.c:expand_builtin_setjmp_receiver I see the following code:

827  /* Now put in the code to restore the frame pointer, and argument
828     pointer, if needed.  */
829 #ifdef HAVE_nonlocal_goto
830  if (! HAVE_nonlocal_goto)
831 #endif
832    {
833      emit_move_insn (virtual_stack_vars_rtx, hard_frame_pointer_rtx);
834      /* This might change the hard frame pointer in ways that aren't
835         apparent to early optimization passes, so force a clobber.  */
836      emit_clobber (hard_frame_pointer_rtx);
837    }

Shouldn't that code somehow honor STARTING_FRAME_OFFSET ?
Or maybe virtual_stack_vars_rtx shouldn't include that offset? The
thing is I'm playing with a port these days, and I'm not able to make
all the testsuite pass when STARTING_FRAME_OFFSET is not 0. I tracked
the failures down to that move instruction that makes following stack
vars accesses fail when there's an offset between the frame pointer
and the first stack variable.

Thanks!
Fred


Re: secondary reload via 2 intermediary registers

2010-10-19 Thread Alex Turjan
Hi Jeff,
Thanks for answer. I managed to make use of an architecture trick which allows 
me to get the secondary reload via only one intermediary register, but still 
have some comments to what you wrote me.

> > 1.Is it possible to do the secondary reload via 2
> intermediary registers?
> > As far as I can see the insn that implements the
> secondary reload has to have 3 operands.
> Make the scratch/intermediary register double-sized so that
> you get a pair of registers instead of a single register for
> the scratch/intermediary.

In my case the intermediary registers were from two different register files 
holding data of different modes. To do what you suggest I would have had to 
interleave the two register files and introduce a new mode that occupies a pair 
of regs - one from each of the register files. 

> > 2. Is it possible that an instruction emitted during
> the secondary reload to get reloaded as well? and if yes
> how?
> Typically you need to ensure that your reload_xxx expanders
> generate RTL which does not need further reloading. 
> This makes handling secondary reloads rather complex in some
> cases.
I was asking for this because I was not able to do the secondary reload via 2 
intermediary registers. Thus I was trying to provide the two registers - the 
first one via a first secondary reload which was generating two instructions 
among which one was needing again a secondary reload where I was planing to 
provide the second register. However the mechanism was not functioning.



  


Re: atomicity of x86 bt/bts/btr/btc?

2010-10-19 Thread Ian Lance Taylor
Jay K  writes:

> It might be nice if optimizing for size would use them with code like e.g.:

I encourage you to file a missed-optimization bug at
http://gcc.gnu.org/bugzilla , so that this is not forgotten.

Ian


Re: Bug in expand_builtin_setjmp_receiver ?

2010-10-19 Thread Ian Lance Taylor
Frederic Riss  writes:

> in builtins.c:expand_builtin_setjmp_receiver I see the following code:
>
> 827  /* Now put in the code to restore the frame pointer, and argument
> 828     pointer, if needed.  */
> 829 #ifdef HAVE_nonlocal_goto
> 830  if (! HAVE_nonlocal_goto)
> 831 #endif
> 832    {
> 833      emit_move_insn (virtual_stack_vars_rtx, hard_frame_pointer_rtx);
> 834      /* This might change the hard frame pointer in ways that aren't
> 835         apparent to early optimization passes, so force a clobber.  */
> 836      emit_clobber (hard_frame_pointer_rtx);
> 837    }
>
> Shouldn't that code somehow honor STARTING_FRAME_OFFSET ?
> Or maybe virtual_stack_vars_rtx shouldn't include that offset? The
> thing is I'm playing with a port these days, and I'm not able to make
> all the testsuite pass when STARTING_FRAME_OFFSET is not 0. I tracked
> the failures down to that move instruction that makes following stack
> vars accesses fail when there's an offset between the frame pointer
> and the first stack variable.

It should not be necessary to use STARTING_FRAME_OFFSET when using
virtual_stack_vars_rtx, as it should be added in by the vregs pass.  See
instantiate_new_reg, and note that var_offset is set to
STARTING_FRAME_OFFSET.

However, I agree that it does seem that it should be added to or
subtracted from hard_frame_pointer_rtx before setting
virtual_stack_vars_rtx, or something.  I only see one existing target
which sets STARTING_FRAME_OFFSET to a non-zero value and does not have a
nonlocal_goto expander: lm32.  It would be interesting to know whether
that target works here.

Ian


Re: Questions about selective scheduler and PowerPC

2010-10-19 Thread Jie Zhang

On 10/18/2010 03:41 PM, Andrey Belevantsev wrote:

On 18.10.2010 11:31, Jie Zhang wrote:

Hi Andrey,

On 10/18/2010 03:13 PM, Andrey Belevantsev wrote:

Hi Jie,

On 18.10.2010 10:49, Jie Zhang wrote:


When this error happens, FENCE_ISSUED_INSNS (fence) is 2 and
issue_rate is
1. PowerPC 8540 is capable to issue 2 instructions in one cycle, but
rs6000_issue_rate lies to scheduler that it can only issue 1
instruction
before register relocation is done. See the following code:


See PR 45352. I've tried to fix this in the selective scheduler by
modeling the lying behavior in line with the haifa scheduler. Let me
know if the last patch from the PR audit trail doesn't work for you.

In addition, after the above patch goes in, I can make the selective
scheduler not try to jump through the hoops with putting correct sched
cycles on insns for targets which don't need it in their target_finish
hook. I guess powerpc needs this though, but x86-64 (for which PR 45342
was opened) almost surely does not.


Thanks for your reply. I just tried. That patch does not help for this
issue.

I see, I didn't touch the failing assert with the patch. Can you just
remove the assert and see if that helps for you? I cannot think of how
it can be relaxed and still be useful.

Removing the failing assert fixes the test case. But I wonder why not 
just get max_issue correct. I'm testing the attached patch. IMHO, 
max_issue looks confusing.


 * The concept of ISSUE POINT has never been used since the code landed 
in repository.


 * In the comment just before the function, it's mentioned that 
MAX_POINTS is the sum of points of all instructions in READY. But it 
does not match the code. The code only summarizes the points of the 
first MORE_ISSUE instructions. If later ISSUE_POINTS become not uniform, 
that piece of code should be redesigned.


So I think it's good to remove it now. And "top - choice_stack" is a 
good replacement for top->n. So we can remove field n from struct 
choice_entry, too.


Now I'm looking at MIPS target to find out why this change in the would 
cause PR37360.


   /* ??? We used to assert here that we never issue more insns than 
issue_rate.
  However, some targets (e.g. MIPS/SB1) claim lower issue rate than 
can be
  achieved to get better performance.  Until these targets are 
fixed to use
  scheduler hooks to manipulate insns priority instead, the assert 
should

- be disabled.
-
- gcc_assert (more_issue >= 0);  */
+ be disabled.  */


--
Jie Zhang
CodeSourcery

	* haifa-sched.c (ISSUE_POINTS): Remove.
	(struct choice_entry): Remove field n.
	(max_issue): Don't issue more than issue_rate instructions.

Index: haifa-sched.c
===
--- haifa-sched.c	(revision 165642)
+++ haifa-sched.c	(working copy)
@@ -199,10 +199,6 @@ struct common_sched_info_def *common_sch
 /* The minimal value of the INSN_TICK of an instruction.  */
 #define MIN_TICK (-max_insn_queue_index)
 
-/* Issue points are used to distinguish between instructions in max_issue ().
-   For now, all instructions are equally good.  */
-#define ISSUE_POINTS(INSN) 1
-
 /* List of important notes we must keep around.  This is a pointer to the
last element in the list.  */
 rtx note_list;
@@ -2401,8 +2397,6 @@ struct choice_entry
   int index;
   /* The number of the rest insns whose issues we should try.  */
   int rest;
-  /* The number of issued essential insns.  */
-  int n;
   /* State after issuing the insn.  */
   state_t state;
 };
@@ -2444,8 +2438,7 @@ static int cached_issue_rate = 0;
insns is insns with the best rank (the first insn in READY).  To
make this function tries different samples of ready insns.  READY
is current queue `ready'.  Global array READY_TRY reflects what
-   insns are already issued in this try.  MAX_POINTS is the sum of points
-   of all instructions in READY.  The function stops immediately,
+   insns are already issued in this try.  The function stops immediately,
if it reached the such a solution, that all instruction can be issued.
INDEX will contain index of the best insn in READY.  The following
function is used only for first cycle multipass scheduling.
@@ -2458,7 +2451,7 @@ int
 max_issue (struct ready_list *ready, int privileged_n, state_t state,
 	   int *index)
 {
-  int n, i, all, n_ready, best, delay, tries_num, max_points;
+  int i, all, n_ready, best, delay, tries_num;
   int more_issue;
   struct choice_entry *top;
   rtx insn;
@@ -2477,25 +2470,15 @@ max_issue (struct ready_list *ready, int
 }
 
   /* Init max_points.  */
-  max_points = 0;
   more_issue = issue_rate - cycle_issued_insns;
 
   /* ??? We used to assert here that we never issue more insns than issue_rate.
  However, some targets (e.g. MIPS/SB1) claim lower issue rate than can be
  achieved to get better performance.  Until these targets are fixed to use
  scheduler hooks to manipulate insns priority instead, the a

Re: Questions about selective scheduler and PowerPC

2010-10-19 Thread Andrey Belevantsev

On 19.10.2010 17:57, Jie Zhang wrote:

Removing the failing assert fixes the test case. But I wonder why not just
get max_issue correct. I'm testing the attached patch. IMHO, max_issue
looks confusing.

* The concept of ISSUE POINT has never been used since the code landed in
repository.

* In the comment just before the function, it's mentioned that MAX_POINTS
is the sum of points of all instructions in READY. But it does not match
the code. The code only summarizes the points of the first MORE_ISSUE
instructions. If later ISSUE_POINTS become not uniform, that piece of code
should be redesigned.

So I think it's good to remove it now. And "top - choice_stack" is a good
replacement for top->n. So we can remove field n from struct choice_entry,
too.

Now I'm looking at MIPS target to find out why this change in the would
cause PR37360.
I agree that ISSUE_POINTS can be removed, as it was not used (maybe Maxim 
can comment more on this).  However, the assert is not about the points but 
exactly about the situation when a target is lying to the compiler about 
its issue rate.


The ideal situation is that we agree on that this should never happen, but 
then you need to fix all targets that use this trick, and it seems that 
there is at least mips, ppc, and x86-64 (which is why I pointed you to 
45352).  The fix would be to find out why claiming the true issue rate 
degrades performance and to implement the proper scheduling hooks for 
changing priority of some insns, or to enable -fsched-pressure for the 
offending targets.


This is a lot of work, which is why this assert was installed in max_issue 
for relatively short amount of time.  Maybe it's time to try again, but 
let's have a consensus first that this assert should never trigger by 
design and we have enough flexibility in the scheduler to provide legal 
means to achieve the same performance effect.


Andrey





/* ??? We used to assert here that we never issue more insns than issue_rate.
However, some targets (e.g. MIPS/SB1) claim lower issue rate than can be
achieved to get better performance. Until these targets are fixed to use
scheduler hooks to manipulate insns priority instead, the assert should
- be disabled.
-
- gcc_assert (more_issue >= 0); */
+ be disabled. */






Re: Questions about selective scheduler and PowerPC

2010-10-19 Thread Maxim Kuvyrkov

On 10/19/10 6:16 PM, Andrey Belevantsev wrote:
...

I agree that ISSUE_POINTS can be removed, as it was not used (maybe
Maxim can comment more on this).


I too agree with removing ISSUE_POINTS, it never found any use.

Regards,

--
Maxim Kuvyrkov
CodeSourcery
ma...@codesourcery.com
(650) 331-3385 x724


Re: Questions about selective scheduler and PowerPC

2010-10-19 Thread Jie Zhang

On 10/19/2010 10:16 PM, Andrey Belevantsev wrote:

On 19.10.2010 17:57, Jie Zhang wrote:

Removing the failing assert fixes the test case. But I wonder why not
just
get max_issue correct. I'm testing the attached patch. IMHO, max_issue
looks confusing.

* The concept of ISSUE POINT has never been used since the code landed in
repository.

* In the comment just before the function, it's mentioned that MAX_POINTS
is the sum of points of all instructions in READY. But it does not match
the code. The code only summarizes the points of the first MORE_ISSUE
instructions. If later ISSUE_POINTS become not uniform, that piece of
code
should be redesigned.

So I think it's good to remove it now. And "top - choice_stack" is a good
replacement for top->n. So we can remove field n from struct
choice_entry,
too.

Now I'm looking at MIPS target to find out why this change in the would
cause PR37360.

I agree that ISSUE_POINTS can be removed, as it was not used (maybe
Maxim can comment more on this). However, the assert is not about the
points but exactly about the situation when a target is lying to the
compiler about its issue rate.

The ideal situation is that we agree on that this should never happen,
but then you need to fix all targets that use this trick, and it seems
that there is at least mips, ppc, and x86-64 (which is why I pointed you
to 45352). The fix would be to find out why claiming the true issue rate
degrades performance and to implement the proper scheduling hooks for
changing priority of some insns, or to enable -fsched-pressure for the
offending targets.

I agree. But I still have a question about TARGET_SCHED_ISSUE_RATE. 
According to my understanding of gccint:


[quote]
Target Hook: int TARGET_SCHED_ISSUE_RATE (void)
[snip]
Although the insn scheduler can define itself the possibility of issue 
an insn on the same cycle, the value can serve as an additional 
constraint to issue insns on the same simulated processor cycle

[snip]
[/quote]

it should be allowed to be defined smaller than the issue rate defined 
by the scheduler DFA. So even if the backend defines a DFA which is 
capable to issue 4 instructions in one cycle but it also defines 
TARGET_SCHED_ISSUE_RATE to 3, the scheduler should restrict the number 
of instructions issued in one cycle to 3 instead of 4.


So I think this assert should hold even the backend lies to scheduler 
about the issue rate. Fixing the lies is another problem.


With the attached draft patch, we can enable the assert in max_issue 
without regression on PR37360.



This is a lot of work, which is why this assert was installed in
max_issue for relatively short amount of time. Maybe it's time to try
again, but let's have a consensus first that this assert should never
trigger by design and we have enough flexibility in the scheduler to
provide legal means to achieve the same performance effect.


Agree.


Regards,
--
Jie Zhang
CodeSourcery
diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index b13d648..7653941 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -589,6 +589,10 @@ static const char *mips_hi_relocs[NUM_SYMBOL_TYPES];
 /* Target state for MIPS16.  */
 struct target_globals *mips16_globals;
 
+/* Cached value of can_issue_more. This is cached in mips_variable_issue hook
+   and returned from mips_sched_reorder2.  */
+static int cached_can_issue_more;
+
 /* Index R is the smallest register class that contains register R.  */
 const enum reg_class mips_regno_to_class[FIRST_PSEUDO_REGISTER] = {
   LEA_REGS,	LEA_REGS,	M16_REGS,	V1_REG,
@@ -12439,8 +12443,8 @@ mips_sched_init (FILE *file ATTRIBUTE_UNUSED, int verbose ATTRIBUTE_UNUSED,
 /* Implement TARGET_SCHED_REORDER and TARGET_SCHED_REORDER2.  */
 
 static int
-mips_sched_reorder (FILE *file ATTRIBUTE_UNUSED, int verbose ATTRIBUTE_UNUSED,
-		rtx *ready, int *nreadyp, int cycle ATTRIBUTE_UNUSED)
+mips_sched_reorder_1 (FILE *file ATTRIBUTE_UNUSED, int verbose ATTRIBUTE_UNUSED,
+		  rtx *ready, int *nreadyp, int cycle ATTRIBUTE_UNUSED)
 {
   if (!reload_completed
   && TUNE_MACC_CHAINS
@@ -12455,10 +12459,25 @@ mips_sched_reorder (FILE *file ATTRIBUTE_UNUSED, int verbose ATTRIBUTE_UNUSED,
 
   if (TUNE_74K)
 mips_74k_agen_reorder (ready, *nreadyp);
+}
 
+
+static int
+mips_sched_reorder (FILE *file ATTRIBUTE_UNUSED, int verbose ATTRIBUTE_UNUSED,
+		rtx *ready, int *nreadyp, int cycle ATTRIBUTE_UNUSED)
+{
+  mips_sched_reorder_1 (file, verbose, ready, nreadyp, cycle);
   return mips_issue_rate ();
 }
 
+static int
+mips_sched_reorder2 (FILE *file ATTRIBUTE_UNUSED, int verbose ATTRIBUTE_UNUSED,
+		 rtx *ready, int *nreadyp, int cycle ATTRIBUTE_UNUSED)
+{
+  mips_sched_reorder_1 (file, verbose, ready, nreadyp, cycle);
+  return cached_can_issue_more;
+}
+
 /* Update round-robin counters for ALU1/2 and FALU1/2.  */
 
 static void
@@ -12516,6 +12535,7 @@ mips_variable_issue (FILE *file ATTRIBUTE_UNUSED, int verbose ATTRIBUTE_UNUSED,
 	  || recog_memoized (ins

GCC RTX generation question

2010-10-19 Thread Radu Hobincu
Hello,

I wrote here before a few months ago, I'm trying to port GCC to a simple
RISC machine and I have two problems I don't seem to be able to fix. I'm
using gcc 4.4.3 for both compiling and as source code.

1. I have the following code:

---
extern void doSmth();

void bugTest(){
doSmth();
}
---

It compiles fine with -O0, but when I try to use -O3, I get the following
compiler error:

-
test0.c:13: error: unrecognizable insn:
(call_insn 7 6 8 3 test0.c:12 (call (mem:SI (mem:SI (reg/f:SI 41) [0 S4
A32]) [0 S4 A32])
(const_int 0 [0x0])) -1 (nil)
(nil))
test0.c:13: internal compiler error: in extract_insn, at recog.c:2048
-

I don't understand why the compiler generates (call (mem (mem (reg) )))...
and also, I was under the impression that any address should checked by
the GO_IF_LEGITIMATE_ADDRESS macro, but I checked and the macro doesn't
receive a (mem (reg)) rtx to verify. This is most likely a failure of my
part to describe something correctly, but the error message isn't very
explicit.


2. I have another piece of code that fails to compile with -O3.

-
struct desc{
int int1;
int int2;
int int3;
};

int bugTest(struct desc *tDesc){
return *((int*)(tDesc->int1 + 16));
}
--

This time the compiler crashes with a segmentation fault. From what I
could dig up with gdb, the compilers tries to make a LIBCALL for a
memcopy, but I'm not really sure why. At the end is the back-trace of the
crash.

If someone could give me a hint or two, it would be greatly appreciated.

Thanks,
Radu



assign_temp (type_or_decl=0x0, keep=0, memory_required=1, dont_promote=1)
at ../../gcc-4.4.3/gcc/function.c:889
889   if (DECL_P (type_or_decl))
(gdb) bt
#0  assign_temp (type_or_decl=0x0, keep=0, memory_required=1,
dont_promote=1) at ../../gcc-4.4.3/gcc/function.c:889
#1  0x081312cd in emit_push_insn (x=0xb7d0a5c0, mode=SImode, type=0x0,
size=0xb7c912d8, align=8, partial=0, reg=0x0, extra=0,
args_addr=0xb7c92290, args_so_far=0xb7c912b8, reg_parm_stack_space=0,
alignment_pad=0xb7c912b8) at ../../gcc-4.4.3/gcc/expr.c:3756
#2  0x080cf0cb in emit_library_call_value_1 (retval=,
orgfun=, value=,
fn_type=LCT_NORMAL, outmode=VOIDmode, nargs=3,
p=0xbfffef60 "\300\245з\006") at ../../gcc-4.4.3/gcc/calls.c:3701
#3  0x080cf8ed in emit_library_call (orgfun=0xb7cce7a0,
fn_type=LCT_NORMAL, outmode=VOIDmode, nargs=3) at
../../gcc-4.4.3/gcc/calls.c:3952
#4  0x08124d31 in expand_assignment (to=0xb7c940f0, from=0xb7c9a5a0,
nontemporal=0 '\000') at ../../gcc-4.4.3/gcc/expr.c:4381
#5  0x08126803 in expand_expr_real_1 (exp=0xb7c95750, target=, tmode=, modifier=EXPAND_NORMAL,
alt_rtl=0x0) at ../../gcc-4.4.3/gcc/expr.c:9257
#6  0x0812b4ed in expand_expr_real (exp=0xb7c95750, target=0xb7c912b8,
tmode=VOIDmode, modifier=EXPAND_NORMAL, alt_rtl=0x0) at
../../gcc-4.4.3/gcc/expr.c:7129
#7  0x0823bd9b in expand_expr (exp=0xb7c95750) at
../../gcc-4.4.3/gcc/expr.h:539
#8  expand_expr_stmt (exp=0xb7c95750) at ../../gcc-4.4.3/gcc/stmt.c:1352



Re: 4-15% speed-up in std::sort special case - is it worth the effort?

2010-10-19 Thread Gabriel Dos Reis
On Wed, Aug 25, 2010 at 2:44 AM, Jaroslav Hajek  wrote:
> Hi all,
>
> I've been experimenting with sorting recently and I have noticed a
> possibility to slightly optimize the sorting of std::pair values using
> the default < operator. This is, I believe, a common usage case to
> retrieve sorting indices (better locality of reference than sorting
> via pointers). At least I usually do it that way :)
>
> The idea is simple: we know how the < operator for std::pairs looks
> like; basically it's
> x.first < y.first || (! (y.first < x.first) && x.second < y.second)
>

std::pair can be specialized (along with user defined comparators.)
How are you handling that?


Problem with equivalent memory handling

2010-10-19 Thread Jeff Law
 Looking for advice here -- while I haven't seen this bug trigger in 
the mainline, it triggers  with the range splitting code I've been 
working on.


Reload has the ability to replace a pseudo with its equivalent memory 
location.  This is fine and good.


Imagine:

  1. We have a pseudo (call is pseudo A) with a read-only memory 
equivalent.  Pseudo A does not get a hard reg


  2. Pseudo A crosses a call (because the memory is readonly, we will 
not invalidate the equivalency)


  3. The equivalent memory address references another pseudo (call it 
pseudo B)


  4. Pseudo B does not cross calls and is assigned a call-clobbered 
hard reg.


  5. reload replaces pseudo A with its equivalent memory form and in 
doing so lengthens the lifetime of pseudo B and causes pseudo B to be 
live across a call.


Obviously this is bad.  The question is where/how do we want to catch it.

The easy solution is to go ahead and invalidate the equivalency once we 
notice that pseudo A crosses a call, even though the memory equivalent 
is readonly.  Seems a little harsh, but it's a one line change.




Other approaches?

Jeff






Plug-in Licensing

2010-10-19 Thread Justin Seyster
I'm getting ready to release plug-in code, and I want to have a very
clear idea about licensing before I release.  I'm leaning towards
releasing everything as GPLv3, but I do want to know exactly what is
and isn't allowed.

I know this issue was debated quite intensely before plug-in support
got added, but my understanding is that there was a final consensus.
I can't find one document though that explains exactly what this
consensus was.

I vaguely remember a proposal that there would be no restriction on
plug-in licensing but that non-free plug-ins could only be used to
compile Free software, but that's not documented anywhere I can find.

GCC itself now requires that plug-ins export a
plugin_is_gpl_compatible symbol, which implies that the plug-in's
license need only be compatible with the GPL.  Is it ok to release
LGPL- or BSD-licensed plug-ins?

My understanding is that, in general, only GPLv3 code can link against
GPLv3 code, which would imply that my plug-in code must be GPLv3.

The reason I ask is that other users of my code might want to make
derived works from it, and I want to be able to give them clear
answers about what licensing options they have.  Thanks!
--Justin


Re: Plug-in Licensing

2010-10-19 Thread Basile Starynkevitch
On Tue, 19 Oct 2010 16:05:51 -0400
Justin Seyster  wrote:

> I'm getting ready to release plug-in code, and I want to have a very
> clear idea about licensing before I release.  I'm leaning towards
> releasing everything as GPLv3, but I do want to know exactly what is
> and isn't allowed.

A definitive and legal answer should be asked to expensive lawyers
after showing them http://www.gnu.org/licenses/gcc-exception.html

I am not a lawyer (and not a US citizen), I don't even understand all
the details of GPLv3 and my naive answer is that plugin which are
GPLv3 are ok (in particular when they are used to compile GPL
software). Perhaps even any plugin with a licence compatible with GPLv3
is ok, but I cannot explain what that is meaning :-)

And that also applies to anything dlopen-ed by a GCC plugin or a GCC
branch such as a MELT module, see http://gcc-melt.org/ or
http://gcc.gnu.org/wiki/MELT or come hear my MELT tutorial at the GCC
Summit next week :)

So my naive advice to you is to release your code as GPLv3. It could
happen that I am wrong. So let just say that it is my wish.

The other advantage of GPLv3 plugins is that they are playing the rules
wanted by most GCC contributors, and that GCC maintainers could have a
glance inside them. I might be happy being able to do that.

Cheers.

-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***


Re: GCC RTX generation question

2010-10-19 Thread Ian Lance Taylor
"Radu Hobincu"  writes:

> 1. I have the following code:
>
> ---
> extern void doSmth();
>
> void bugTest(){
>   doSmth();
> }
> ---
>
> It compiles fine with -O0, but when I try to use -O3, I get the following
> compiler error:
>
> -
> test0.c:13: error: unrecognizable insn:
> (call_insn 7 6 8 3 test0.c:12 (call (mem:SI (mem:SI (reg/f:SI 41) [0 S4
> A32]) [0 S4 A32])
> (const_int 0 [0x0])) -1 (nil)
> (nil))
> test0.c:13: internal compiler error: in extract_insn, at recog.c:2048
> -
>
> I don't understand why the compiler generates (call (mem (mem (reg) )))...
> and also, I was under the impression that any address should checked by
> the GO_IF_LEGITIMATE_ADDRESS macro, but I checked and the macro doesn't
> receive a (mem (reg)) rtx to verify. This is most likely a failure of my
> part to describe something correctly, but the error message isn't very
> explicit.

This looks like gcc is loading the function address from memory.  Is
that required for your target?  Assuming it is, then the problem seems
to be that the operand predicate for your call instruction accepts
(mem:SI (mem:SI (reg:SI 41))).  That seems odd.


> 2. I have another piece of code that fails to compile with -O3.
>
> -
> struct desc{
>   int int1;
>   int int2;
>   int int3;
> };
>
> int bugTest(struct desc *tDesc){
>   return *((int*)(tDesc->int1 + 16));
> }
> --

That code looks awfully strange.  Is that an integer or a pointer?

> This time the compiler crashes with a segmentation fault. From what I
> could dig up with gdb, the compilers tries to make a LIBCALL for a
> memcopy, but I'm not really sure why. At the end is the back-trace of the
> crash.

gcc is invoking memmove.  This is happening in the return statement.
For some reason gcc thinks that the function returns a struct.  Your
example does not return a struct..  I can not explain this.

Ian


Re: Problem with equivalent memory handling

2010-10-19 Thread Ian Lance Taylor
Jeff Law  writes:

> Reload has the ability to replace a pseudo with its equivalent memory
> location.  This is fine and good.
>
> Imagine:
>
>   1. We have a pseudo (call is pseudo A) with a read-only memory
> equivalent.  Pseudo A does not get a hard reg
>
>   2. Pseudo A crosses a call (because the memory is readonly, we will
> not invalidate the equivalency)
>
>   3. The equivalent memory address references another pseudo (call it
> pseudo B)
>
>   4. Pseudo B does not cross calls and is assigned a call-clobbered
> hard reg.
>
>   5. reload replaces pseudo A with its equivalent memory form and in
> doing so lengthens the lifetime of pseudo B and causes pseudo B to be
> live across a call.
>
> Obviously this is bad.  The question is where/how do we want to catch it.
>
> The easy solution is to go ahead and invalidate the equivalency once
> we notice that pseudo A crosses a call, even though the memory
> equivalent is readonly.  Seems a little harsh, but it's a one line
> change.

If you can spot this before reload, then you can load the memory address
into a new pseudo C, and let the register allocator decide whether to
save C on the stack or rematerialize it as needed.


Otherwise, if you don't invalidate the equivalency, you are effectively
extending the lifetime of pseudo B so that it lives across a call.  If
there is any register pressure, B is going to go onto the stack or is
going to force something else onto the stack.  At that point, you would
have been better off just putting the memory address on the stack
instead of pseudo B.  So I think that invalidating the equivalency
across a function call is probably the right thing to do.

Ian


Re: Plug-in Licensing

2010-10-19 Thread Ian Lance Taylor
Justin Seyster  writes:

> I'm getting ready to release plug-in code, and I want to have a very
> clear idea about licensing before I release.  I'm leaning towards
> releasing everything as GPLv3, but I do want to know exactly what is
> and isn't allowed.

GPLv3 is fine.

> I know this issue was debated quite intensely before plug-in support
> got added, but my understanding is that there was a final consensus.
> I can't find one document though that explains exactly what this
> consensus was.

The document is here:

http://www.gnu.org/licenses/gcc-exception.html

See also the rationale and FAQ that it links to.

Basically, if you use a plugin with gcc, and the plugin is not
GPL-compatible, then the resulting compiled code is covered by the GPL.

> I vaguely remember a proposal that there would be no restriction on
> plug-in licensing but that non-free plug-ins could only be used to
> compile Free software, but that's not documented anywhere I can find.

That's pretty much it.

> GCC itself now requires that plug-ins export a
> plugin_is_gpl_compatible symbol, which implies that the plug-in's
> license need only be compatible with the GPL.  Is it ok to release
> LGPL- or BSD-licensed plug-ins?

Sure, both of those licenses are GPL-compatible.

> My understanding is that, in general, only GPLv3 code can link against
> GPLv3 code, which would imply that my plug-in code must be GPLv3.

That is incorrect.  You can link code under any GPL-compatible license
with GPLv3 code, and the resulting executable will be covered the union
of both licenses.  Since GPLv3 tends to be stricter than any
GPL-compatible license, this generally means that the result is under
GPLv3.  There is a (non-exhaustive) list of GPL-compatible licenses
here:

http://www.gnu.org/licenses/license-list.html#GPLCompatibleLicenses

Ian


Re: how to initialize a pointer using data page mode

2010-10-19 Thread Phung Nguyen
> It seems that you want to generate two .int statements.  My question is
> whether you can load those in a single load instruction, or whether you
> also need to generate multiple load instructions.
 I need to generate multiple load instructions


Re: *_ALIGN_MAX_SKIP macros

2010-10-19 Thread DJ Delorie

> This is OK if you add LABEL_ALIGN_MAX_SKIP, LOOP_ALIGN_MAX_SKIP,
> LABEL_ALIGN_AFTER_BARRIER_MAX_SKIP, and JUMP_ALIGN_MAX_SKIP to the
> 
> /* Old target macros that have moved to the target hooks structure.  */
> 
> #pragma GCC poison list in system.h.

Thanks, committed with that change.


Re: Plug-in Licensing

2010-10-19 Thread Justin Seyster
Thanks for this advice.  The link to the GCC Exception was especially helpful.

The trick here is that I'm actually releasing a library designed to be
linked into plug-ins.  I want the library itself to be copyleft but
for plug-in authors to retain any licensing flexibility that they
would have when releasing a stand-alone GCC plug-in.

It sounds like the GPLv3 will do that for me, so that's my plan unless
somebody corrects me.
--Justin

On Tue, Oct 19, 2010 at 4:49 PM, Ian Lance Taylor  wrote:
> Justin Seyster  writes:
>
>> I'm getting ready to release plug-in code, and I want to have a very
>> clear idea about licensing before I release.  I'm leaning towards
>> releasing everything as GPLv3, but I do want to know exactly what is
>> and isn't allowed.
>
> GPLv3 is fine.
>
>> I know this issue was debated quite intensely before plug-in support
>> got added, but my understanding is that there was a final consensus.
>> I can't find one document though that explains exactly what this
>> consensus was.
>
> The document is here:
>
> http://www.gnu.org/licenses/gcc-exception.html
>
> See also the rationale and FAQ that it links to.
>
> Basically, if you use a plugin with gcc, and the plugin is not
> GPL-compatible, then the resulting compiled code is covered by the GPL.
>
>> I vaguely remember a proposal that there would be no restriction on
>> plug-in licensing but that non-free plug-ins could only be used to
>> compile Free software, but that's not documented anywhere I can find.
>
> That's pretty much it.
>
>> GCC itself now requires that plug-ins export a
>> plugin_is_gpl_compatible symbol, which implies that the plug-in's
>> license need only be compatible with the GPL.  Is it ok to release
>> LGPL- or BSD-licensed plug-ins?
>
> Sure, both of those licenses are GPL-compatible.
>
>> My understanding is that, in general, only GPLv3 code can link against
>> GPLv3 code, which would imply that my plug-in code must be GPLv3.
>
> That is incorrect.  You can link code under any GPL-compatible license
> with GPLv3 code, and the resulting executable will be covered the union
> of both licenses.  Since GPLv3 tends to be stricter than any
> GPL-compatible license, this generally means that the result is under
> GPLv3.  There is a (non-exhaustive) list of GPL-compatible licenses
> here:
>
> http://www.gnu.org/licenses/license-list.html#GPLCompatibleLicenses
>
> Ian
>


Re: Questions about selective scheduler and PowerPC

2010-10-19 Thread Paul Brook
> [quote]
> Target Hook: int TARGET_SCHED_ISSUE_RATE (void)
> [snip]
> Although the insn scheduler can define itself the possibility of issue
> an insn on the same cycle, the value can serve as an additional
> constraint to issue insns on the same simulated processor cycle
> [snip]
> [/quote]
> 
> it should be allowed to be defined smaller than the issue rate defined
> by the scheduler DFA. So even if the backend defines a DFA which is
> capable to issue 4 instructions in one cycle but it also defines
> TARGET_SCHED_ISSUE_RATE to 3, the scheduler should restrict the number
> of instructions issued in one cycle to 3 instead of 4.

FWIW I have a strong suspicion that there are scheduler descriptions in the 
ARM backend that rely on TARGET_SCHED_ISSUE_RATE instead of explicitly 
modelling the issue unit.

Paul


Hooks, macros and target configuration

2010-10-19 Thread Joseph S. Myers
My ongoing work to implement the multilib selection changes described
at  will in due
course require option-related hooks to be shared between the driver
and the compilers proper (cc1 etc.).  As we do not currently have a
hooks system in the driver, it seems appropriate to consider what
design we want for hooks extended into this part of the compiler -
and, more generally, what we would like the system for target
configuration to look like.

(In this message I only consider designs for C.  It is certainly
possible that in future a native C++ approach with less heavy use of
macros may be used, but I think the same general issues arise.)

The basic design for target hooks was implemented by Neil Booth in
June 2001 following a discussion
 I started
(this was not the first time the principle of moving away from macros
had come up), with langhooks following later that year, the
targhooks.c system for incremental transition of individual macros to
hooks being added in 2003, and automatic generation of much of the
boilerplate code from a single .def file being added much more
recently by Joern Rennecke; we now have about 300 target hooks.  The
point that such functions should be linkable into the driver came up
in the second message of that thread.

The motivations for moving from macros to hooks remain as discussed
then: cleaner design, better-specified interfaces with prototypes (so
eliminating one cause of warnings building target-independent files
only when configured for some targets, or errors from code conditioned
with #if) and potentially the ability to swap out target structures
for multi-target compiler binaries.

In general, the existing target hooks are defined through #define
within the target .c file, before the targetm variable is defined with
its initializer.  This works well for hooks that largely depend on the
target architecture alone, not on the target OS.  There are some
exceptions where OS-dependent hooks are defined in the .h files listed
in $tm_file, and some cases where targets modify hooks at runtime
(this should generally be for hooks that are integer or string
constants rather than functions, though I have not checked whether any
function hooks are also being modified at runtime).  Such cases would
make a multi-target compiler (with each source file compiled only
once, but multiple target OSes for a single architecture) a bit
harder; targetm would need to move to its own .c file, with only that
.c file being compiled separately for each target.

Some target macros - which should become hooks in some form - are much
more dependent on the target OS.  This applies, in particular, to all
the various specs used by the driver.  Given this, my inclination is
that the driver's targetm structure should be defined in its own .c
file, with the macros providing its initializer generally coming from
.h files.  This isn't very far distant from the present system for
specs (where the macros initialize variables and otherwise the
variables are generally what's used, potentially being modified at
runtime when a specs file is read).

The advantage as I see it would come if the target .h files could be
split up, with the driver-only defines coming from a separate set of
headers from those used in the core compiler.  The ones used in the
core compiler would likely be simpler and have fewer OS dependencies.
It might make it easier to move towards using these headers
consistently in the order given in config.gcc as the preferred order -
with the aim being to avoid architecture-specific cases in config.gcc
mentioning architecture-independent headers at all.  (A case statement
over target architectures should deal with architecture-specific
configuration; one over OSes should deal with OS-specific
configuration; one over pairs should deal with the combination; and
target-independent code should put together the lists from each of
those case statements in the standard order.)

This separation of configuration for different purposes is closely
related to the issues with hooks for option handling.  Target .h files
configuration used in various ways, with some macros used in more than
one way:

* Definitions for use in the target's own .c files.

* Definitions for use in code in the target's .md files.

* Definitions for use in code in the core compiler (middle-end and
  front ends).  tm_p.h has prototypes for use in such code, and an
  intermediate goal in converting macros to hooks might be to convert
  every macro that uses a function on any target, so that tm_p.h is
  only included in the target's .c files and in the generated files
  containing code from the .md files.

* Definitions for use in the driver.

* Definitions for use in collect2.  These overlap with those used in
  the driver (e.g. MD_EXEC_PREFIX), and probably with those used
  elsewhere (e.g. OBJECT_FORMAT_*).

  My inclination is to say that

Re: how to initialize a pointer using data page mode

2010-10-19 Thread Ian Lance Taylor
Phung Nguyen  writes:

>> It seems that you want to generate two .int statements.  My question is
>> whether you can load those in a single load instruction, or whether you
>> also need to generate multiple load instructions.
>  I need to generate multiple load instructions

In that case, you need a lot more than just changing the .int pseudo-op.
You probably need to change your move expander to recognize this case
and split it up into the two constants, then you need to generate the
insns to load the constants and merge them together.  That isn't going
to play very well with the RTL optimizers, and it isn't going to be easy
to write, but it should work.

As far as I know, you are in an area of gcc which is not well trodden.
You're going to have to a lot of digging and debugging to get this to
work.

Ian


Re: Plug-in Licensing

2010-10-19 Thread Ian Lance Taylor
Justin Seyster  writes:

> Thanks for this advice.  The link to the GCC Exception was especially helpful.
>
> The trick here is that I'm actually releasing a library designed to be
> linked into plug-ins.  I want the library itself to be copyleft but
> for plug-in authors to retain any licensing flexibility that they
> would have when releasing a stand-alone GCC plug-in.
>
> It sounds like the GPLv3 will do that for me, so that's my plan unless
> somebody corrects me.

I would recommend using the GPLv3 with an explicit reference to the GCC
Runtime Library Exception, as in:

XXX is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.

XXX is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

Under Section 7 of GPL version 3, you are granted additional
permissions described in the GCC Runtime Library Exception, version
3.1, as published by the Free Software Foundation.

You should have received a copy of the GNU General Public License and
a copy of the GCC Runtime Library Exception along with this program;
see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
.

Ian


gcc-4.4-20101019 is now available

2010-10-19 Thread gccadmin
Snapshot gcc-4.4-20101019 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.4-20101019/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.4 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_4-branch 
revision 165707

You'll find:

 gcc-4.4-20101019.tar.bz2 Complete GCC (includes all of below)

  MD5=21f18a592410c439ea659f076a839f0d
  SHA1=703d8bdb8c3a45e4438ae2215c98608fe1565dec

 gcc-core-4.4-20101019.tar.bz2C front end and core compiler

  MD5=6fa026b2d18859f1fa090b3ee0629f69
  SHA1=001028d42233d3fa2b94c46e49cae3ac1b88561e

 gcc-ada-4.4-20101019.tar.bz2 Ada front end and runtime

  MD5=f5b47aa2b2d8f10260252aa5301b6053
  SHA1=57b9e6feb36eaa1f9c5d430b1e1ac3df0c602c98

 gcc-fortran-4.4-20101019.tar.bz2 Fortran front end and runtime

  MD5=ccded0001858c7b64b02fa1c503b195d
  SHA1=bdffc8528d980b0b90f3cccb9d69376621c74992

 gcc-g++-4.4-20101019.tar.bz2 C++ front end and runtime

  MD5=e0f9f87fc2146f61b0006c8435edce21
  SHA1=7e3a44eba0ef0b09d0b0df54cb408aeb48995d44

 gcc-java-4.4-20101019.tar.bz2Java front end and runtime

  MD5=d673f10eec878e5712b431d0ca194e8a
  SHA1=016b01df711ae77fcca23cc84f6301ae9e83ff1d

 gcc-objc-4.4-20101019.tar.bz2Objective-C front end and runtime

  MD5=51b9b8bc40349e95c228f3b27308728b
  SHA1=e98a456fbd762940da34ab0b951bde9bacac10e0

 gcc-testsuite-4.4-20101019.tar.bz2   The GCC testsuite

  MD5=480512f2014c1f079fa4af015be0
  SHA1=bab56db685b7e6e1ee2bf472c1af754c000f6682

Diffs from 4.4-20101012 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.4
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.