[Fortran-dev][Patch] Fix cshift1
This patch fixes the stride setting for cshift1; hence, it fixes gfortran.dg/optional_dim_3.f90. Build and regtested on x86-64-linux - 13 failing tests remain. OK? Tobias 2012-07-15 Tobias Burnus * m4/cshift1.m4 (cshift1): Correctly set stride multiplier. * generated/cshift1_16.c: Regenerate. * generated/cshift1_4.c: Regenerate. * generated/cshift1_8.c: Regenerate. Index: libgfortran/m4/cshift1.m4 === --- libgfortran/m4/cshift1.m4 (Revision 189480) +++ libgfortran/m4/cshift1.m4 (Arbeitskopie) @@ -80,22 +80,18 @@ cshift1 (gfc_array_char * const restrict ret, if (ret->base_addr == NULL) { int i; + index_type sm, ext; ret->base_addr = xmalloc (size * arraysize); ret->offset = 0; ret->dtype = array->dtype; + sm = sizeof ('atype_name`); + ext = 1; for (i = 0; i < GFC_DESCRIPTOR_RANK (array); i++) { - index_type ext, sm; - + sm *= ext; ext = GFC_DESCRIPTOR_EXTENT (array, i); - if (i == 0) -sm = 1; - else - sm = GFC_DESCRIPTOR_EXTENT (ret, i-1) - * GFC_DESCRIPTOR_SM (ret, i-1); - GFC_DIMENSION_SET (ret->dim[i], 0, ext, sm); } } Index: libgfortran/generated/cshift1_16.c === --- libgfortran/generated/cshift1_16.c (Revision 189480) +++ libgfortran/generated/cshift1_16.c (Arbeitskopie) @@ -79,22 +79,18 @@ cshift1 (gfc_array_char * const restrict ret, if (ret->base_addr == NULL) { int i; + index_type sm, ext; ret->base_addr = xmalloc (size * arraysize); ret->offset = 0; ret->dtype = array->dtype; + sm = sizeof (GFC_INTEGER_16); + ext = 1; for (i = 0; i < GFC_DESCRIPTOR_RANK (array); i++) { - index_type ext, sm; - + sm *= ext; ext = GFC_DESCRIPTOR_EXTENT (array, i); - if (i == 0) -sm = 1; - else - sm = GFC_DESCRIPTOR_EXTENT (ret, i-1) - * GFC_DESCRIPTOR_SM (ret, i-1); - GFC_DIMENSION_SET (ret->dim[i], 0, ext, sm); } } Index: libgfortran/generated/cshift1_4.c === --- libgfortran/generated/cshift1_4.c (Revision 189480) +++ libgfortran/generated/cshift1_4.c (Arbeitskopie) @@ -79,22 +79,18 @@ cshift1 (gfc_array_char * const restrict ret, if (ret->base_addr == NULL) { int i; + index_type sm, ext; ret->base_addr = xmalloc (size * arraysize); ret->offset = 0; ret->dtype = array->dtype; + sm = sizeof (GFC_INTEGER_4); + ext = 1; for (i = 0; i < GFC_DESCRIPTOR_RANK (array); i++) { - index_type ext, sm; - + sm *= ext; ext = GFC_DESCRIPTOR_EXTENT (array, i); - if (i == 0) -sm = 1; - else - sm = GFC_DESCRIPTOR_EXTENT (ret, i-1) - * GFC_DESCRIPTOR_SM (ret, i-1); - GFC_DIMENSION_SET (ret->dim[i], 0, ext, sm); } } Index: libgfortran/generated/cshift1_8.c === --- libgfortran/generated/cshift1_8.c (Revision 189480) +++ libgfortran/generated/cshift1_8.c (Arbeitskopie) @@ -79,22 +79,18 @@ cshift1 (gfc_array_char * const restrict ret, if (ret->base_addr == NULL) { int i; + index_type sm, ext; ret->base_addr = xmalloc (size * arraysize); ret->offset = 0; ret->dtype = array->dtype; + sm = sizeof (GFC_INTEGER_8); + ext = 1; for (i = 0; i < GFC_DESCRIPTOR_RANK (array); i++) { - index_type ext, sm; - + sm *= ext; ext = GFC_DESCRIPTOR_EXTENT (array, i); - if (i == 0) -sm = 1; - else - sm = GFC_DESCRIPTOR_EXTENT (ret, i-1) - * GFC_DESCRIPTOR_SM (ret, i-1); - GFC_DIMENSION_SET (ret->dim[i], 0, ext, sm); } }
Re: [PATCH] Improve andq $0xffffffff, %reg handling (PR target/53110)
On Sun, Jul 15, 2012 at 1:56 AM, H.J. Lu wrote: > On Wed, Apr 25, 2012 at 12:14 PM, Jakub Jelinek wrote: > >> We have a splitter for reg1 = reg2 & 0x, but only if regnums >> are different. But movl %edi, %edi is a cheaper variant of >> andq $0x, %rdi even with the same register and doesn't clobber >> flags, so this patch attempts to expand it as a zero extension early. >> >> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? >> >> 2012-04-25 Jakub Jelinek >> >> PR target/53110 >> * config/i386/i386.md (and3): For andq $0x, reg >> instead expand it as zero extension. >> >> --- gcc/config/i386/i386.md.jj 2012-04-25 12:14:54.0 +0200 >> +++ gcc/config/i386/i386.md 2012-04-25 14:50:48.708925963 +0200 >> @@ -7694,7 +7694,17 @@ (define_expand "and3" >> (and:SWIM (match_operand:SWIM 1 "nonimmediate_operand") >> (match_operand:SWIM 2 "")))] >>"" >> - "ix86_expand_binary_operator (AND, mode, operands); DONE;") >> +{ >> + if (mode == DImode >> + && GET_CODE (operands[2]) == CONST_INT >> + && INTVAL (operands[2]) == (HOST_WIDE_INT) 0x >> + && REG_P (operands[1])) >> +emit_insn (gen_zero_extendsidi2 (operands[0], >> +gen_lowpart (SImode, operands[1]))); >> + else >> +ix86_expand_binary_operator (AND, mode, operands); >> + DONE; >> +}) >> >> (define_insn "*anddi_1" >>[(set (match_operand:DI 0 "nonimmediate_operand" "=r,rm,r,r") >> >> Jakub > > Can it be backported to 4.7 branch? It also fixed: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53961 > > on hjl/x32/gcc-4_7-branch. I have backported the patch to 4.7 branch. OTOH, those LEA patterns are really in the need of some cleanup in the future. Uros.
[Fortran-dev][committed] Fix associate
The following patch compares the stride multiplier rather than the stride; that's not only faster as it avoids an useless division, but it also fixes gfortran.dg/associated_2.f90 by not dividing by zero. Build and regtested on x86-64-gnu-linux, and committed as Rev. 189492. Remaining 12 (branch-only) regressions (without pending patches): gfortran.dg/auto_char_dummy_array_1.f90 gfortran.dg/auto_char_len_3.f90 gfortran.dg/class_array_1.f03 gfortran.dg/class_array_2.f03 gfortran.dg/class_array_3.f03 gfortran.dg/class_to_type_1.f03 gfortran.dg/proc_decl_23.f90 gfortran.dg/select_type_26.f03 gfortran.dg/select_type_27.f03 gfortran.dg/read_eof_all.f90 gfortran.dg/transfer_intrinsic_3.f90 gfortran.dg/subref_array_pointer_2.f90 I think when we are down to zero (branch-only) regressions, we should try a bunch of real-world programs – there are probably more issues. Tobias 2012-07-15 Tobias Burnus * trans-intrinsic.c (gfc_conv_associated): Compare sm instead of stride. 2012-07-15 Tobias Burnus * intrinsics/associated.c (associated): Compare sm instead of stride. Index: gcc/fortran/trans-intrinsic.c === --- gcc/fortran/trans-intrinsic.c (Revision 189481) +++ gcc/fortran/trans-intrinsic.c (Arbeitskopie) @@ -5849,19 +5849,19 @@ gfc_conv_associated (gfc_se *se, gfc_exp se->expr = fold_build2_loc (input_location, TRUTH_AND_EXPR, boolean_type_node, tmp, tmp2); } else { /* An array pointer of zero length is not associated if target is present. */ arg1se.descriptor_only = 1; gfc_conv_expr_lhs (&arg1se, arg1->expr); - tmp = gfc_conv_descriptor_stride_get (arg1se.expr, + tmp = gfc_conv_descriptor_sm_get (arg1se.expr, gfc_rank_cst[arg1->expr->rank - 1]); nonzero_arraylen = fold_build2_loc (input_location, NE_EXPR, boolean_type_node, tmp, build_int_cst (TREE_TYPE (tmp), 0)); /* A pointer to an array, call library function _gfor_associated. */ gcc_assert (ss2 != gfc_ss_terminator); arg1se.want_pointer = 1; gfc_conv_expr_descriptor (&arg1se, arg1->expr, ss1); Index: libgfortran/intrinsics/associated.c === --- libgfortran/intrinsics/associated.c (Revision 189480) +++ libgfortran/intrinsics/associated.c (Arbeitskopie) @@ -42,17 +42,17 @@ associated (const gfc_array_void *pointe rank = GFC_DESCRIPTOR_RANK (pointer); for (n = 0; n < rank; n++) { long extent; extent = GFC_DESCRIPTOR_EXTENT(pointer,n); if (extent != GFC_DESCRIPTOR_EXTENT(target,n)) return 0; - if (GFC_DESCRIPTOR_STRIDE(pointer,n) != GFC_DESCRIPTOR_STRIDE(target,n) && extent != 1) + if (GFC_DESCRIPTOR_SM (pointer,n) != GFC_DESCRIPTOR_SM (target,n) && extent != 1) return 0; if (extent <= 0) return 0; } return 1; }
Re: [Fortran-dev][Patch] Fix cshift1
Hi Tobias, This patch fixes the stride setting for cshift1; hence, it fixes gfortran.dg/optional_dim_3.f90. Build and regtested on x86-64-linux - 13 failing tests remain. OK? OK. Thanks for the patch! Thomas
Re: [testsuite] Allow for / comments in g++.dg/debug/dwarf2/pubnames-2.C
Installed. Andreas. * g++.dg/debug/dwarf2/pubnames-2.C: Support all known comment characters. diff --git a/gcc/testsuite/g++.dg/debug/dwarf2/pubnames-2.C b/gcc/testsuite/g++.dg/debug/dwarf2/pubnames-2.C index 375b856..3b7f95e 100644 --- a/gcc/testsuite/g++.dg/debug/dwarf2/pubnames-2.C +++ b/gcc/testsuite/g++.dg/debug/dwarf2/pubnames-2.C @@ -1,63 +1,63 @@ // { dg-do compile } // { dg-options "-gpubnames -gdwarf-4 -std=c++0x -dA" } // { dg-final { scan-assembler ".section\t.debug_pubnames" } } -// { dg-final { scan-assembler "\"\\(anonymous namespace\\)0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"one0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"one::G_A0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"one::G_B0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"one::G_C0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"one::\\(anonymous namespace\\)0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"two0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"F_A0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"F_B0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"F_C0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"inline_func_10\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"one::c1::c10\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"one::c1::~c10\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"one::c1::val0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"check_enum0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"main0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"two::c2::c20\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"two::c2::c20\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"two::c2::c20\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"check0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"check \\>0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"check \\>0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"check \\>0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"two::c2::val0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"two::c2::val0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"two::c2::val0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"__static_initialization_and_destruction_00\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"two::c2::~c20\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"two::c2::~c20\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"two::c2::~c20\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"_GLOBAL__sub_I__ZN3one3c1vE0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"anonymous_union_var0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"two::ci0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"two::c2v10\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"two::c2v20\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"two::c2v30\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"one::c1v0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"one::\\(anonymous namespace\\)::one_anonymous_var0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"\\(anonymous namespace\\)::c1_count0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"\\(anonymous namespace\\)::c2_count0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"\\(anonymous namespace\\)::three0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } -// { dg-final { scan-assembler "\"\\(anonymous namespace\\)::three::anonymous_three_var0\"+\[ \t\]+\[#;/]+\[ \t\]+external name" } } +// { dg-final { scan-assembler "\"\\(anonymous namespace\\)0\"+\[ \t\]+\[#;/|@!]+\[ \t\]+external name" } } +// { dg-final { scan-assembler "\"one0\"+\[ \t\]+\[#;/|@!]+\[ \t\]+external name" } } +// { dg-final { scan-assembler "
[Fortran-dev][Patch] Some (ubound-lbound+1) -> extent cleanup
This patch cleans up the source code and generated code ("dump") by changing (ubound-lbound+1) calculations to directly taking the "extent". Except for a faster -O0 performance and saving some cycles during code generation, the code should be the same. The only real code change I did was to gfc_grow_array. I didn't understand the previous code, I think the current code makes more sense. I believe the old code did: desc->extent = desc.extent + extra; realloc (&desc->data, (desc.extent + 1)*element_size); while I think the latter should be "+ extra" and not "+ 1". From the callee, I couldn't deduce whether extra is always unity in practice, in any case it doesn't look like. For fcncall_realloc_result, I didn't check whether the new code effectively matches the old one, I just followed the comment which states: "Check that the shapes are the same between lhs and expression.". There are some more cases, but there the code wasn't as obvious. For instance "extent = ubound - lbound"; there I was unsure whether that's a bug (missing "+1") or valid. Build and regtested with no new failures. OK for the branch? Tobias 2012-07-15 Tobias Burnus * trans-intrinsic.c (gfc_conv_intrinsic_size, gfc_conv_intrinsic_sizeof): Replace (ubound-lbound+1) calculation by "extent". * trans-expr.c (fcncall_realloc_result): Ditto. * trans-io.c (gfc_convert_array_to_string): Ditto. * trans-openmp.c (gfc_omp_clause_default_ctor, gfc_omp_clause_copy_ctor, gfc_omp_clause_assign_op, gfc_trans_omp_array_reduction): Ditto. * trans-array.c (array_parameter_size): Ditto. (gfc_grow_array): Ditto - and fix size calculation for realloc. 2012-07-15 Tobias Burnus * gfortran.dg/array_section_2.f90: Update scan-tree-dump pattern. Index: gcc/fortran/trans-intrinsic.c === --- gcc/fortran/trans-intrinsic.c (Revision 189492) +++ gcc/fortran/trans-intrinsic.c (Arbeitskopie) @@ -5145,17 +5145,8 @@ gfc_conv_intrinsic_size (gfc_se * se, gfc_expr * e if (se->expr == NULL_TREE) { - tree ubound, lbound; - - arg1 = build_fold_indirect_ref_loc (input_location, - arg1); - ubound = gfc_conv_descriptor_ubound_get (arg1, argse.expr); - lbound = gfc_conv_descriptor_lbound_get (arg1, argse.expr); - se->expr = fold_build2_loc (input_location, MINUS_EXPR, - gfc_array_index_type, ubound, lbound); - se->expr = fold_build2_loc (input_location, PLUS_EXPR, - gfc_array_index_type, - se->expr, gfc_index_one_node); + arg1 = build_fold_indirect_ref_loc (input_location, arg1); + se->expr = gfc_conv_descriptor_extent_get (arg1, argse.expr); se->expr = fold_build2_loc (input_location, MAX_EXPR, gfc_array_index_type, se->expr, gfc_index_zero_node); @@ -5194,8 +5185,6 @@ gfc_conv_intrinsic_sizeof (gfc_se *se, gfc_expr *e tree source_bytes; tree type; tree tmp; - tree lower; - tree upper; int n; arg = expr->value.function.actual->expr; @@ -5240,12 +5229,7 @@ gfc_conv_intrinsic_sizeof (gfc_se *se, gfc_expr *e { tree idx; idx = gfc_rank_cst[n]; - lower = gfc_conv_descriptor_lbound_get (argse.expr, idx); - upper = gfc_conv_descriptor_ubound_get (argse.expr, idx); - tmp = fold_build2_loc (input_location, MINUS_EXPR, - gfc_array_index_type, upper, lower); - tmp = fold_build2_loc (input_location, PLUS_EXPR, - gfc_array_index_type, tmp, gfc_index_one_node); + tmp = gfc_conv_descriptor_extent_get (argse.expr, idx); tmp = fold_build2_loc (input_location, MULT_EXPR, gfc_array_index_type, tmp, source_bytes); gfc_add_modify (&argse.pre, source_bytes, tmp); Index: gcc/fortran/trans-expr.c === --- gcc/fortran/trans-expr.c (Revision 189481) +++ gcc/fortran/trans-expr.c (Arbeitskopie) @@ -6476,19 +6481,10 @@ fcncall_realloc_result (gfc_se *se, int rank) for (n = 0 ; n < rank; n++) { tree tmp1; - tmp = gfc_conv_descriptor_lbound_get (desc, gfc_rank_cst[n]); - tmp1 = gfc_conv_descriptor_lbound_get (res_desc, gfc_rank_cst[n]); - tmp = fold_build2_loc (input_location, MINUS_EXPR, - gfc_array_index_type, tmp, tmp1); - tmp1 = gfc_conv_descriptor_ubound_get (desc, gfc_rank_cst[n]); - tmp = fold_build2_loc (input_location, MINUS_EXPR, - gfc_array_index_type, tmp, tmp1); - tmp1 = gfc_conv_descriptor_ubound_get (res_desc, gfc_rank_cst[n]); - tmp = fold_build2_loc (input_location, PLUS_EXPR, - gfc_array_index_type, tmp, tmp1); + tmp = gfc_conv_descriptor_extent_get (desc, gfc_rank_cst[n]); + tmp1 = gfc_conv_descriptor_extent_get (res_desc, gfc_rank_cst[n]); tmp = fold_build2_loc (input_location, NE_EXPR, - boolean_type_node, tmp, - gfc_index_zero_node); + boolean_type_node, tmp, tmp1); tmp = gfc_evaluate_now (tmp, &se->post); zero_cond = fold_build2_loc (input_lo
[patch, Fortran] Fix PR 53824
Hello world, this fixes an ICE with allocation of coarrays. Regression-tested. OK for trunk? What about 4.7? Thomas 2012-07-15 Thomas König PR fortran/53824 * resolve.c (resolve_allocate_deallocate): If both start indices are NULL, skip the test for equality. 2012-07-15 Thomas König PR fortran/53824 * gfortran.dg/coarray_allocate_1.f90: New test. Index: resolve.c === --- resolve.c (Revision 189478) +++ resolve.c (Arbeitskopie) @@ -7326,8 +7326,8 @@ resolve_allocate_deallocate (gfc_code *code, const } } - /* Check that an allocate-object appears only once in the statement. - FIXME: Checking derived types is disabled. */ + /* Check that an allocate-object appears only once in the statement. */ + for (p = code->ext.alloc.list; p; p = p->next) { pe = p->expr; @@ -7377,9 +7377,10 @@ resolve_allocate_deallocate (gfc_code *code, const { gfc_array_ref *par = &(pr->u.ar); gfc_array_ref *qar = &(qr->u.ar); - if (gfc_dep_compare_expr (par->start[0], - qar->start[0]) != 0) - break; + if ((par->start[0] != NULL || qar->start[0] != NULL) + && gfc_dep_compare_expr (par->start[0], + qar->start[0]) != 0) + break; } } else ! { dg-do compile } ! { dg-options "-fcoarray=single" } ! PR 53824 - this used to ICE. ! Original test case by VladimÃr Fuka program Jac implicit none integer,parameter:: KND=KIND(1.0) type Domain real(KND),dimension(:,:,:),allocatable:: A,B integer :: n=64,niter=2,blockit=1000 integer :: starti,endi integer :: startj,endj integer :: startk,endk integer,dimension(:),allocatable :: startsi,startsj,startsk integer,dimension(:),allocatable :: endsi,endsj,endsk end type type(Domain),allocatable :: D[:,:,:] ! real(KND),codimension[*] :: sumA,sumB,diffAB integer i,j,k,ncom integer nims,nxims,nyims,nzims integer im,iim,jim,kim character(20):: ch nims = num_images() nxims = nint(nims**(1./3.)) nyims = nint(nims**(1./3.)) nzims = nims / (nxims*nyims) im = this_image() if (im==1) write(*,*) "n: [",nxims,nyims,nzims,"]" kim = (im-1) / (nxims*nyims) + 1 jim = ((im-1) - (kim-1)*(nxims*nyims)) / nxims + 1 iim = (im-1) - (kim-1)*(nxims*nyims) - (jim-1)*(nxims) + 1 write (*,*) im,"[",iim,jim,kim,"]" allocate(D[nxims,nyims,*]) ncom=command_argument_count() if (command_argument_count() >=2) then call get_command_argument(1,value=ch) read (ch,*) D%n call get_command_argument(2,value=ch) read (ch,*) D%niter call get_command_argument(3,value=ch) read (ch,*) D%blockit end if allocate(D%startsi(nxims)) allocate(D%startsj(nyims)) allocate(D%startsk(nzims)) allocate(D%endsi(nxims)) allocate(D%endsj(nyims)) allocate(D%endsk(nzims)) D%startsi(1) = 1 do i=2,nxims D%startsi(i) = D%startsi(i-1) + D%n/nxims end do D%endsi(nxims) = D%n D%endsi(1:nxims-1) = D%startsi(2:nxims) - 1 D%startsj(1) = 1 do j=2,nyims D%startsj(j) = D%startsj(j-1) + D%n/nyims end do D%endsj(nyims) = D%n D%endsj(1:nyims-1) = D%startsj(2:nyims) - 1 D%startsk(1) = 1 do k=2,nzims D%startsk(k) = D%startsk(k-1) + D%n/nzims end do D%endsk(nzims) = D%n D%endsk(1:nzims-1) = D%startsk(2:nzims) - 1 D%starti = D%startsi(iim) D%endi = D%endsi(iim) D%startj = D%startsj(jim) D%endj = D%endsj(jim) D%startk = D%startsk(kim) D%endk = D%endsk(kim) write(*,*) D%startsi,D%endsi write(*,*) D%startsj,D%endsj write(*,*) D%startsk,D%endsk !$hmpp JacKernel allocate, args[A,B].size={0:D%n+1,0:D%n+1,0:D%n+1} allocate(D%A(D%starti-1:D%endi+1,D%startj-1:D%endj+1,D%startk-1:D%endk+1),& D%B(D%starti-1:D%endi+1,D%startj-1:D%endj+1,D%startk-1:D%endk+1)) end program Jac
Re: G++ namespace association extension
On Tue, 10 Jul 2012, Jonathan Wakely wrote: >> Yes, but people should use inline namespaces instead; we should deprecate >> this form and then remove it in 4.9. > > * doc/extend.texi (Namespace Association): Alter cautionary text. I think this also should go into the GCC 4.8 release notes (gcc-4.8/changes.html)? Gerald
Re: [Fortran-dev][Patch] Some (ubound-lbound+1) -> extent cleanup
On 15/07/2012 13:24, Tobias Burnus wrote: > This patch cleans up the source code and generated code ("dump") by > changing (ubound-lbound+1) calculations to directly taking the "extent". > Except for a faster -O0 performance and saving some cycles during code > generation, the code should be the same. > A small, yet welcome simplification :-). > > The only real code change I did was to gfc_grow_array. I didn't > understand the previous code, I think the current code makes more sense. > I believe the old code did: > > desc->extent = desc.extent + extra; > realloc (&desc->data, (desc.extent + 1)*element_size); If you are talking about the old code, I think it does: desc.ubound[0] += extra; realloc (desc.data, (desc.ubound[0]+1)*element_size); > > while I think the latter should be "+ extra" and not "+ 1". From the > callee, I couldn't deduce whether extra is always unity in practice, in > any case it doesn't look like. > I think the old code is correct, under the assumption that desc.rank is 1, and desc.lbound[0] is 0, which is the case in the contexts where the functions is called (array constructors). > > For fcncall_realloc_result, I didn't check whether the new code > effectively matches the old one, I just followed the comment which > states: "Check that the shapes are the same between lhs and expression.". > I think the old code is: res_desc.ubound[n] - res_desc.lbound[n] - (desc.ubound[n] - desc.lbound[n]) != 0 while the new one is: res_desc.extent[n] != res.extent[n] > > There are some more cases, but there the code wasn't as obvious. For > instance "extent = ubound - lbound"; there I was unsure whether that's a > bug (missing "+1") or valid. > > > Build and regtested with no new failures. > OK for the branch? > Yes, thanks. Mikael
[wwwdocs] Add note about C++11 ABI to gcc-4.7/changes.html
Added a caveat to the gcc-4.7/changes.html page about C++11 ABI incompatibilities. Committed to wwwdocs. Index: htdocs/gcc-4.7/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v retrieving revision 1.120 diff -u -r1.120 changes.html --- htdocs/gcc-4.7/changes.html 20 Jun 2012 23:34:49 - 1.120 +++ htdocs/gcc-4.7/changes.html 15 Jul 2012 15:04:29 - @@ -97,6 +97,17 @@ It is no longer possible to use the "l" constraint in MIPS16 asm statements. +GCC versions 4.7.0 and 4.7.1 had changes to the C++ standard library +which affected the ABI in C++11 mode: a data member was added to +std::list changing its size and altering the definitions of +some member functions, and std::pair's move constructor was +non-trivial which altered the calling convention for functions with +std::pair arguments or return types. The ABI incompatibilities +have been fixed for GCC version 4.7.2 but as a result C++11 code compiled +with GCC 4.7.0 or 4.7.1 may be incompatible with C++11 code compiled with +different GCC versions and with C++98/C++03 code compiled with any version. + + More information on porting to GCC 4.7 from previous versions of GCC can be found in the http://gcc.gnu.org/gcc-4.7/porting_to.html";>porting
Re: G++ namespace association extension
On 15 July 2012 12:26, Gerald Pfeifer wrote: > On Tue, 10 Jul 2012, Jonathan Wakely wrote: >>> Yes, but people should use inline namespaces instead; we should deprecate >>> this form and then remove it in 4.9. >> >> * doc/extend.texi (Namespace Association): Alter cautionary text. > > I think this also should go into the GCC 4.8 release notes > (gcc-4.8/changes.html)? I can do that too. There's no gcc-4.8 dir yet, do I need to copy over the various other files from the gcc-4.7 dir or can I just create changes.html and leave the RM to do the rest at the appropriate time?
Re: [PATCH][MIPS] NetLogic XLP scheduling
Chung-Lin Tang writes: > This patch adds scheduling support for the NetLogic XLP, including a new > pipeline description, and associated changes. > > Asides from the new xlp.md description file, there are also some sync > primitive attribute modifications, for better scheduling of sync loops > (Maxim should be able to better explain this). Rather than add a "type" attribute to each sync loop, please just add: (not (eq_attr "sync_mem" "none")) (symbol_ref "syncloop") to the default value of the "type" attribute. You'll probably need to swap the order of the sync* attributes with the "type" attribute in order for this to compile. The patch is effectively changing the type of the sync loops from "unknown" to "syncloop". That's certainly OK, but you'll need to add "syncloop" to the "unknown" reservations of all other schedulers (except for generic.md, where what you've done instead is fine). It might be easier if you split out the addition of syncloop as a separate patch. > Other generic changes include a new "hilo" insn attribute, to mark which > of HI/LO does a m[ft]hilo insn access. The way other schedulers handle this is with things like: (define_insn_reservation "ir_sb1_mfhi" 1 (and (eq_attr "cpu" "sb1,sb1a") (and (eq_attr "type" "mfhilo") (not (match_operand 1 "lo_operand" "sb1_ex1") which seems simpler. mfhilo and mthilo are required to read operand 1 and write to operand 0 (respectively) in order to support this kind of construct. That said, even the above is a hold-over from when we tried to allow high registers to store independent values. These days we can be a bit more precise, as with the patch below. (As the comment says: ;; If a doubleword move uses these expensive instructions, ;; it is usually better to schedule them in the same way ;; as the singleword form, rather than as "multi". I'm continuing to assume that mflo and mtlo are the best type choices for unsplit double-register moves. That path should be very rarely outside of MIPS16 anyway -- just by sched1 if hi and lo are exposed directly -- and no current scheduler tries to model a doubleword hi/lo move separately from single-register ones. The information is available via the dword_mode attribute if required.) Tested on mips64-elf, and by making sure that there were no changes in -O2 output for a recent set of cc1 .ii files. Applied. I'm probably punishing you for being honest here, but the only other thing is that you've listed NetLogic Microsystems Inc. as one of the authors. I think that means they'll need to sign a copyright assignment. Have they already done that? Thanks, Richard gcc/ * config/mips/mips.md (move_type): Replace mfhilo and mthilo with mflo and mtlo. (type): Split mfhilo into mfhi and mflo. Split mthilo into mthi and mtlo. Adjust move_type->type mapping. (may_clobber_hilo): Split mthilo into mthi and mtlo. (*movdi_32bit, *movdi_32bit_mips16, *movdi_64bit, *movdi_64bit_mips16) (*mov_internal, *mov_mips16, *movhi_internal) (*movhi_mips16, *movqi_internal, *movqi_mips16): Use mtlo and mflo instead of mthilo and mfhilo. (mfhi_): Use mfhi instead of mfhilo. (mthi_): Use mthi instead of mthilo. * config/mips/mips-dsp.md (mips_extr_w, mips_extr_r_w, mips_extr_rs_w) (mips_extr_s_h, mips_extp, mips_extpdp, mips_shilo, mips_mthlip): Use mflo instead of mfhilo. * config/mips/1.md (r10k_arith): Split mthilo. (r10k_mfhi, r10k_mflo): Use mfhi and mflo directly. * config/mips/sb1.md (ir_sb1_mfhi, ir_sb1_mflo): Likewise. (ir_sb1_mthilo): Split mthilo into mthi and mtlo. * config/mips/20kc.md (r20kc_imthilo, r20kc_imfhilo): Split mthilo and mfhilo. * config/mips/24k.md (r24k_int_mfhilo, r24k_int_mthilo): Likewise. * config/mips/4130.md (vr4130_class, vr4130_mfhilo, vr4130_mthilo): Likewise. * config/mips/4k.md (r4k_int_mthilo, r4k_int_mfhilo): Likewise. * config/mips/5400.md (ir_vr54_hilo): Likewise. * config/mips/5500.md (ir_vr55_mthilo, ir_vr55_mfhilo): Likewise. * config/mips/5k.md (r5k_int_mthilo, r5k_int_mfhilo): Likewise. * config/mips/7000.md (rm7_mthilo, rm7_mfhilo): Likewise. * config/mips/74k.md (r74k_int_mfhilo, r74k_int_mthilo): Likewise. * config/mips/9000.md (rm9k_mfhilo, rm9k_mthilo): Likewise. * config/mips/generic.md (generic_hilo): Likewise. * config/mips/loongson2ef.md (ls2_alu): Likewise. * config/mips/loongson3a.md (ls3a_mfhilo): Likewise. * config/mips/octeon.md (octeon_imul_o1, octeon_imul_o2) (octeon_mfhilo_o1, octeon_mfhilo_o2): Likewise. * config/mips/sr71k.md (ir_sr70_hilo): Likewise. * config/mips/xlr.md (xlr_hilo): Likewise. Index: gcc/config/mips/mips.md =
[SH] Remove old mov peepholes
Hello, The attached patch removes old peephole patterns that seem to be unused. Tested with 'make all'. CSiBE result-size (-m4-single -ml -O2 -mpretend-cmove) does not show any changes. OK? Cheers, Oleg ChangeLog: * config/sh/sh.md: Delete mov related define_peephole patterns. Index: gcc/config/sh/sh.md === --- gcc/config/sh/sh.md (revision 186311) +++ gcc/config/sh/sh.md (working copy) @@ -11726,73 +11726,9 @@ (mem:HI (plus:SI (match_dup 1) (match_dup 2] "") -;; These convert sequences such as `mov #k,r0; add r15,r0; mov.l @r0,rn' -;; to `mov #k,r0; mov.l @(r0,r15),rn'. These sequences are generated by -;; reload when the constant is too large for a reg+offset address. - -;; ??? We would get much better code if this was done in reload. This would -;; require modifying find_reloads_address to recognize that if the constant -;; is out-of-range for an immediate add, then we get better code by reloading -;; the constant into a register than by reloading the sum into a register, -;; since the former is one instruction shorter if the address does not need -;; to be offsettable. Unfortunately this does not work, because there is -;; only one register, r0, that can be used as an index register. This register -;; is also the function return value register. So, if we try to force reload -;; to use double-reg addresses, then we end up with some instructions that -;; need to use r0 twice. The only way to fix this is to change the calling -;; convention so that r0 is not used to return values. - (define_peephole [(set (match_operand:SI 0 "register_operand" "=r") (plus:SI (match_dup 0) (match_operand:SI 1 "register_operand" "r"))) - (set (mem:SI (match_dup 0)) - (match_operand:SI 2 "general_movsrc_operand" ""))] - "TARGET_SH1 && REGNO (operands[0]) == 0 && reg_unused_after (operands[0], insn)" - "mov.l %2,@(%0,%1)") - -(define_peephole - [(set (match_operand:SI 0 "register_operand" "=r") - (plus:SI (match_dup 0) (match_operand:SI 1 "register_operand" "r"))) - (set (match_operand:SI 2 "general_movdst_operand" "") - (mem:SI (match_dup 0)))] - "TARGET_SH1 && REGNO (operands[0]) == 0 && reg_unused_after (operands[0], insn)" - "mov.l @(%0,%1),%2") - -(define_peephole - [(set (match_operand:SI 0 "register_operand" "=r") - (plus:SI (match_dup 0) (match_operand:SI 1 "register_operand" "r"))) - (set (mem:HI (match_dup 0)) - (match_operand:HI 2 "general_movsrc_operand" ""))] - "TARGET_SH1 && REGNO (operands[0]) == 0 && reg_unused_after (operands[0], insn)" - "mov.w %2,@(%0,%1)") - -(define_peephole - [(set (match_operand:SI 0 "register_operand" "=r") - (plus:SI (match_dup 0) (match_operand:SI 1 "register_operand" "r"))) - (set (match_operand:HI 2 "general_movdst_operand" "") - (mem:HI (match_dup 0)))] - "TARGET_SH1 && REGNO (operands[0]) == 0 && reg_unused_after (operands[0], insn)" - "mov.w @(%0,%1),%2") - -(define_peephole - [(set (match_operand:SI 0 "register_operand" "=r") - (plus:SI (match_dup 0) (match_operand:SI 1 "register_operand" "r"))) - (set (mem:QI (match_dup 0)) - (match_operand:QI 2 "general_movsrc_operand" ""))] - "TARGET_SH1 && REGNO (operands[0]) == 0 && reg_unused_after (operands[0], insn)" - "mov.b %2,@(%0,%1)") - -(define_peephole - [(set (match_operand:SI 0 "register_operand" "=r") - (plus:SI (match_dup 0) (match_operand:SI 1 "register_operand" "r"))) - (set (match_operand:QI 2 "general_movdst_operand" "") - (mem:QI (match_dup 0)))] - "TARGET_SH1 && REGNO (operands[0]) == 0 && reg_unused_after (operands[0], insn)" - "mov.b @(%0,%1),%2") - -(define_peephole - [(set (match_operand:SI 0 "register_operand" "=r") - (plus:SI (match_dup 0) (match_operand:SI 1 "register_operand" "r"))) (set (mem:SF (match_dup 0)) (match_operand:SF 2 "general_movsrc_operand" ""))] "TARGET_SH1 && REGNO (operands[0]) == 0
[SH] Reorg some CONST_OK_ macros
Hello, This patch replaces usages of CONST_OK_FOR_I06 with satisfies_constraint_I06 and moves the CONST_OK_FOR_I10 macro to sh.c. Tested with 'make all-gcc'. OK? Cheers, Oleg ChangeLog: * config/sh/sh.h (CONST_OK_FOR_I06): Delete. (CONST_OK_FOR_I10): Move macro to ... * config/sh/sh.c: ... here. (sh_legitimate_index_p): Use satisfies_constraint_I06 instead of CONST_OK_FOR_I06. Index: gcc/config/sh/sh.c === --- gcc/config/sh/sh.c (revision 189427) +++ gcc/config/sh/sh.c (working copy) @@ -63,6 +63,9 @@ #define LSW (TARGET_LITTLE_ENDIAN ? 0 : 1) /* These are some macros to abstract register modes. */ +#define CONST_OK_FOR_I10(VALUE) (((HOST_WIDE_INT)(VALUE)) >= -512 \ + && ((HOST_WIDE_INT)(VALUE)) <= 511) + #define CONST_OK_FOR_ADD(size) \ (TARGET_SHMEDIA ? CONST_OK_FOR_I10 (size) : CONST_OK_FOR_I08 (size)) #define GEN_MOV (*(TARGET_SHMEDIA64 ? gen_movdi : gen_movsi)) @@ -9776,7 +9779,7 @@ /* Check if this is the address of an unaligned load / store. */ if (mode == VOIDmode) - return CONST_OK_FOR_I06 (INTVAL (op)); + return satisfies_constraint_I06 (op); size = GET_MODE_SIZE (mode); return (!(INTVAL (op) & (size - 1)) Index: gcc/config/sh/sh.h === --- gcc/config/sh/sh.h (revision 189427) +++ gcc/config/sh/sh.h (working copy) @@ -1213,12 +1213,8 @@ /* Defines for sh.md and constraints.md. */ -#define CONST_OK_FOR_I06(VALUE) (((HOST_WIDE_INT)(VALUE)) >= -32 \ - && ((HOST_WIDE_INT)(VALUE)) <= 31) #define CONST_OK_FOR_I08(VALUE) (((HOST_WIDE_INT)(VALUE))>= -128 \ && ((HOST_WIDE_INT)(VALUE)) <= 127) -#define CONST_OK_FOR_I10(VALUE) (((HOST_WIDE_INT)(VALUE)) >= -512 \ - && ((HOST_WIDE_INT)(VALUE)) <= 511) #define CONST_OK_FOR_I16(VALUE) (((HOST_WIDE_INT)(VALUE)) >= -32768 \ && ((HOST_WIDE_INT)(VALUE)) <= 32767)
Re: [patch, Fortran] Fix PR 53824
Thomas Koenig wrote: this fixes an ICE with allocation of coarrays. Regression-tested. OK for trunk? What about 4.7? OK. Thanks for the patch. Regarding 4.7, I don't have a strong opinion. Given that it is a simple patch and given that (single-image) coarrays work rather well in 4.7, maybe one should. Tobias 2012-07-15 Thomas König PR fortran/53824 * resolve.c (resolve_allocate_deallocate): If both start indices are NULL, skip the test for equality. 2012-07-15 Thomas König PR fortran/53824 * gfortran.dg/coarray_allocate_1.f90: New test.
Re: PATCH: PR target/53383: Allow -mpreferred-stack-boundary=3 on x86-64
On Fri, 22 Jun 2012, H.J. Lu wrote: > I am not sure if news.html is the best place for this. news.html definitely is not a good place for this, cf. the comment in that file. ;-) > How about putting it in gcc-4.8/changes.html? Yes, that fits. > Does it look OK? Index: ./gcc-4.8/changes.html === +Allow -mpreferred-stack-boundary=3 for the x86-64 +architecture with SSE extensions disabled. Since x86-64 ABI require the...ABI requires +used in controlled environment where stack space is important limitation. is an important limitation +long double and __int128), leading to wrong results. You must build all ... And the header for the supersection was missing. All fixed with the patch below which I committed. Index: gcc-4.8/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.4 diff -u -3 -p -r1.4 changes.html --- gcc-4.8/changes.html2 Jul 2012 11:35:28 - 1.4 +++ gcc-4.8/changes.html15 Jul 2012 21:23:19 - @@ -63,22 +63,22 @@ more information about requirements to b Java (GCJ) --> - IA-32/x86-64 Allow -mpreferred-stack-boundary=3 for the x86-64 -architecture with SSE extensions disabled. Since x86-64 ABI require -16 byte stack alignment, this is ABI incompatible and intended to be -used in controlled environment where stack space is important limitation. +architecture with SSE extensions disabled. Since the x86-64 ABI +requires 16 byte stack alignment, this is ABI incompatible and +intended to be used in controlled environments where stack space +is an important limitation. This option will lead to wrong code when functions compiled with 16 byte stack alignment (such as functions from a standard library) are called with misaligned stack. In this case, SSE instructions may lead to misaligned memory access traps. In addition, variable arguments will be handled incorrectly for 16 byte aligned objects (including x87 -long double and __int128), leading to wrong results. You must build all +long double and __int128), leading to +wrong results. You must build all modules with -mpreferred-stack-boundary=3, including any libraries. This includes the system libraries and startup modules.
[PATCH] Enable vectorizer cost model by default at -O3
The auto-vectorizer is overly aggressive when not constrained by the vectorizer cost model. Although the cost model is by no means perfect, it does a reasonable job of avoiding many poor vectorization decisions. Since the auto-vectorizer is enabled by default at -O3 and above, we should also enable the vectorizer cost model by default at -O3 and above. Bootstrapped and tested on powerpc64-unknown-linux-gnu with no new regressions. Ok for trunk? Thanks, Bill 2012-07-15 Bill Schmidt * opts.c (default_option): Add -fvect-cost-model to default options at -O3 and above. Index: gcc/opts.c === --- gcc/opts.c (revision 189481) +++ gcc/opts.c (working copy) @@ -501,6 +501,7 @@ static const struct default_options default_option { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_fgcse_after_reload, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_ftree_vectorize, NULL, 1 }, +{ OPT_LEVELS_3_PLUS, OPT_fvect_cost_model, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_fipa_cp_clone, NULL, 1 }, { OPT_LEVELS_3_PLUS, OPT_ftree_partial_pre, NULL, 1 },
[wwwdocs] SH - add 'b' target characteristic
Hello, If I'm not mistaken, the SH target does not use the '"* ..."' notation for output template code. The patch below updates the table in backends.html to reflect the current status. Cheers, Oleg Index: htdocs/backends.html === RCS file: /cvs/gcc/wwwdocs/htdocs/backends.html,v retrieving revision 1.45 diff -u -r1.45 backends.html --- htdocs/backends.html23 Feb 2012 13:24:36 - 1.45 +++ htdocs/backends.html15 Jul 2012 22:52:00 - @@ -96,7 +96,7 @@ pdp11|L ICqrcp e rs6000 | Q Cqr da s390 | ? Qqr p g bda e -sh | Q CB qr da +sh | Q CB qr bda sparc| Q CB qr pda spu | ? Q *C p g bd stormy16 | ???L FIC D l p m a
Re: G++ namespace association extension
On Sun, 15 Jul 2012, Jonathan Wakely wrote: >> I think this also should go into the GCC 4.8 release notes >> (gcc-4.8/changes.html)? > I can do that too. There's no gcc-4.8 dir yet, do I need to copy over > the various other files from the gcc-4.7 dir or can I just create > changes.html and leave the RM to do the rest at the appropriate time? If you run `cvs up -PAd` it should magically appear. :-) Gerald
Re: [wwwdocs] Buildstat update for 4.7
On Sun, 1 Jul 2012, Tom G. Christensen wrote: > Latest results for 4.7.x Thanks, Tom! Gerald
Re: [wwwdocs] Document ARM/embedded-x_y-branch family
Hi Terry, On Mon, 9 Jul 2012, Terry Guo wrote: > As it becomes our long term goal to deliver arm-none-eabi tool chain > based on GCC 4.6/4.7/4.8 and future branches, I am going to use the > following pattern to document this branch family. Is it ok to commit? yep, this looks good (and sorry for somehow missing this at first). The only question I'd have, that may make sense addressing in the description, is "What kind of changes to you expect to take there, but not the corresponding release branches? And why?" Gerald
Re: [patch] PR web/53919 - Add note to install.texi
Hi Jonathan, On Sat, 14 Jul 2012, Jonathan Wakely wrote: > Attached this time, here's the original mail again: > > PR c++/53919 > * doc/install.texi (Installing GCC): Refer to instructions for > released versions. Fix hyphenation. > > Whether or not we want the release-specific installation instructions > online, I don't think it hurts to point out that the docs at > http://gcc.gnu.org/install/ refer to the SVN trunk. This is good, with one caveat. Can we please not refer to SVN? ;-) This is one of those lessons learned, where just "current development sources" or similar says the same, de facto, and makes it easier to change things. (You'd be surprised how long we kept finding references to CVS.) The latest version of this document is always available at @uref{http://gcc.gnu.org/install/,,http://gcc.gnu.org/install/}. +The latest version refers to the SVN development sources, instructions for +specific released versions are included with the sources. And here, instead of repeating "The latest version", how about just "It refers..."? Thanks! Gerald
Re: [SH] Reorg some CONST_OK_ macros
Oleg Endo wrote: > This patch replaces usages of CONST_OK_FOR_I06 with > satisfies_constraint_I06 and moves the CONST_OK_FOR_I10 macro to sh.c. > Tested with 'make all-gcc'. > > OK? OK. Regards, kaz
Re: [SH] Remove old mov peepholes
Oleg Endo wrote: > The attached patch removes old peephole patterns that seem to be unused. > Tested with 'make all'. CSiBE result-size (-m4-single -ml -O2 > -mpretend-cmove) does not show any changes. > > OK? OK. Regards, kaz
Re: [wwwdocs] SH - add 'b' target characteristic
Oleg Endo wrote: > If I'm not mistaken, the SH target does not use the '"* ..."' notation > for output template code. The patch below updates the table in > backends.html to reflect the current status. Looks obvious and OK to me. Regards, kaz
Re: [RFA/ARM 1/3] Add VFP support for VFMA and friends
On 5 July 2012 21:13, Matthew Gretton-Dann wrote: > On 26/06/12 14:44, Richard Earnshaw wrote: >> >> On 25/06/12 15:59, Matthew Gretton-Dann wrote: >>> >>> All, >>> >>> This patch adds support to the ARM backend for generating floating-point >>> fused multiply-accumulate. >>> >>> OK? >>> >>> gcc/ChangeLog: >>> >>> 2012-06-25 Matthew Gretton-Dann >>> >>> * config/arm/iterators.md (SDF): New mode iterator. >>> (V_if_elem): Add support for SF and DF modes. >>> (V_reg): Likewise. >>> (F_w_constraint): New mode iterator attribute. >>> (F_r_constraint): Likewise. >>> (F_fma_type): Likewise. >>> (F_target): Likewise. >>> config/arm/vfp.md (fma4): New pattern. >>> (*fmsub4): Likewise. >>> (*fmnsub4): Likewise. >>> (*fmnadd4): Likewise. >>> >> >> F_target as an attribute name doesn't tell me anything useful. I >> suggest F_maybe_not_df. >> >>> + "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FMA " >> >> >> This should be written as >> >> "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FMA && " >> >> Then the attribute should expand >> >>(define_mode_attr F_maybe_not_df [(SF "1") (DF "TARGET_VFP_DOUBLE")]) >> >> As I style nit, I would also suggest using the iterator name when it >> appears in the pattern name, even though it is redundant. This avoids >> potential ambiguities when there are multiple iterators operating on >> different expansions. That is, instead of: >> >> (define_insn "fma4" >> >> use: >> >> (define_insn "fma4" >> >> OK with those changes. >> >> R. >> > > Now checked in with some changes (see attached patch for what was committed) > - changes approved off list. Hi Matt. Your new patterns require TARGET_HARD_FLOAT but the testsuite doesn't giving failures when building for soft float[1] or softfp[2]. Which should it be? -- Michael [1] http://builds.linaro.org/toolchain/gcc-4.8~svn189401/logs/armv7l-natty-cbuild344-tcpanda06-armv5r2/gcc-testsuite.txt [2] http://builds.linaro.org/toolchain/gcc-4.8~svn189401/logs/armv7l-natty-cbuild344-tcpanda02-cortexa9r1/gcc-testsuite.txt
CRIS atomics revisited 0/4: summary
These were spotted while debugging usage of atomics within glibc. The kind of changes are microoptimizations, nanooptimizations, a buglet and a major issue. Micro: the load-store-conditional sequence for compare-and-swap I originally committed was an earlier version improved later. Nanooptimizations: choosing better-fitting operands for the atomic operator insn. Buglet: a post-increment could have sneaked into the (non-atomic) arithmetic operator operand; better make it nonmemory_operand altogether. I also threw in use of the now generic need_atomic_barrier_p, let's call that a microoptimization. The major issue, giving up on alignment of atomic data by default, is last. Tested together and some separately, no regressions for cris-elf nor crisv32-elf. Committed separately. brgds, H-P
CRIS atomics revisited 1/4: use need_atomic_barrier_p.
Use the new need_atomic_barrier_p. gcc: * config/cris/sync.md ("atomic_fetch_") ("atomic_compare_and_swap"): Gate expand_mem_thread_fence calls on result of call to need_atomic_barrier_p. Index: config/cris/sync.md === --- config/cris/sync.md (revision 189499) +++ config/cris/sync.md (working copy) @@ -93,11 +93,15 @@ (define_expand "atomic_fetch_mode != QImode && TARGET_TRAP_UNALIGNED_ATOMIC) cris_emit_trap_for_misalignment (operands[1]); - expand_mem_thread_fence (mmodel); + if (need_atomic_barrier_p (mmodel, true)) +expand_mem_thread_fence (mmodel); + emit_insn (gen_cris_atomic_fetch__1 (operands[0], operands[1], operands[2])); - expand_mem_thread_fence (mmodel); + if (need_atomic_barrier_p (mmodel, false)) +expand_mem_thread_fence (mmodel); + DONE; }) @@ -196,13 +200,17 @@ (define_expand "atomic_compare_and_swap< if (mode != QImode && TARGET_TRAP_UNALIGNED_ATOMIC) cris_emit_trap_for_misalignment (operands[2]); - expand_mem_thread_fence (mmodel); + if (need_atomic_barrier_p (mmodel, true)) +expand_mem_thread_fence (mmodel); + emit_insn (gen_cris_atomic_compare_and_swap_1 (operands[0], operands[1], operands[2], operands[3], operands[4])); - expand_mem_thread_fence (mmodel); + if (need_atomic_barrier_p (mmodel, false)) +expand_mem_thread_fence (mmodel); + DONE; }) brgds, H-P
CRIS atomics revisited 2/4: don't allow a memory operand (with possible side-effects)
Buglet in "atomic_compare_and_swap", allowing (in theory) a volatile or post-increment memory operand. Simplest and safest fixed by excluding all memory operands. gcc: * config/cris/sync.md ("atomic_compare_and_swap"): Change predicate to nonmemory_operand for operand 3. Add FIXME. ("cris_atomic_compare_and_swap_1"): Change predicates and constraints for operand 3 to exclude memory. Index: config/cris/sync.md === --- config/cris/sync.md (revision 189500) +++ config/cris/sync.md (working copy) @@ -184,11 +184,12 @@ (define_insn "cris_atomic_fetch_ over this, but having both would be ;; redundant. +;; FIXME: handle memory without side-effects for operand[3]. (define_expand "atomic_compare_and_swap" [(match_operand:SI 0 "register_operand") (match_operand:BWD 1 "register_operand") (match_operand:BWD 2 "memory_operand") - (match_operand:BWD 3 "general_operand") + (match_operand:BWD 3 "nonmemory_operand") (match_operand:BWD 4 "register_operand") (match_operand 5) (match_operand 6) @@ -218,7 +219,7 @@ (define_insn "cris_atomic_compare_and_sw [(set (match_operand:SI 0 "register_operand" "=&r") (unspec_volatile:SI [(match_operand:BWD 2 "memory_operand" "+Q") - (match_operand:BWD 3 "general_operand" "g")] + (match_operand:BWD 3 "nonmemory_operand" "ri")] CRIS_UNSPEC_ATOMIC_SWAP_BOOL)) (set (match_operand:BWD 1 "register_operand" "=&r") (match_dup 2)) (set (match_dup 2) brgds, H-P
CRIS atomics revisited 3/4: pattern improvements
Microoptimizations for the atomic patterns themselves. Constant operands are so common that it seems wasteful not to handle the most common cases and avoid wasting a register. gcc/testsuite: * gcc.target/cris/20011127-1.c: Adjust to %P being a valid register operand output modifier. gcc: * config/cris/cris.c (cris_print_operand) : New cases. * config/cris/sync.md (atomic_op_op_cnstr): New code_attr. (atomic_op_op_pred): Ditto. (atomic_op_mnem_pre_op2): Renamed from atomic_op_mnem_pre; to reflect the change to include %2 in expansion. All callers changed. (qm3): New mode_attr. ("atomic_fetch_"): Use as predicate for operand 2. ("cris_atomic_fetch__1"): Update FIXME. Use "" "" for predicate and constraint for operand 2. ("atomic_compare_and_swap"): Add FIXME. Change predicate to nonmemory_operand for operand 3. ("cris_atomic_compare_and_swap_1"): Change operand 3 to exclude memory. Improve emitted sync code for v10 and v32. Use instead of for size designator for cmp. Index: config/cris/cris.c === --- config/cris/cris.c (revision 189499) +++ config/cris/cris.c (working copy) @@ -981,6 +981,53 @@ cris_print_operand (FILE *file, rtx x, i fprintf (file, INTVAL (operand) < 0 ? "adds.w" : "addq"); return; +case 'P': + /* For const_int operands, print the additive mnemonic and the +modified operand (byte-sized operands don't save anything): + N=MIN_INT..-65536: add.d N + -65535..-64: subu.w -N + -63..-1: subq -N + 0..63: addq N + 64..65535: addu.w N + 65536..MAX_INT: add.d N. +(Emitted mnemonics are capitalized to simplify testing.) +For anything else (N.B: only register is valid), print "add.d". */ + if (REG_P (operand)) + { + fprintf (file, "Add.d "); + + /* Deal with printing the operand by dropping through to the +normal path. */ + break; + } + else + { + int val; + gcc_assert (CONST_INT_P (operand)); + + val = INTVAL (operand); + if (!IN_RANGE (val, -65535, 65535)) + fprintf (file, "Add.d %d", val); + else if (val <= -64) + fprintf (file, "Subu.w %d", -val); + else if (val <= -1) + fprintf (file, "Subq %d", -val); + else if (val <= 63) + fprintf (file, "Addq %d", val); + else if (val <= 65535) + fprintf (file, "Addu.w %d", val); + return; + } + break; + +case 'q': + /* If the operand is an integer -31..31, print "q" else ".d". */ + if (CONST_INT_P (operand) && IN_RANGE (INTVAL (operand), -31, 31)) + fprintf (file, "q"); + else + fprintf (file, ".d"); + return; + case 'd': /* If this is a GOT symbol, force it to be emitted as :GOT and :GOTPLT regardless of -fpic (i.e. not as :GOT16, :GOTPLT16). Index: config/cris/sync.md === --- config/cris/sync.md (revision 189501) +++ config/cris/sync.md (working copy) @@ -73,17 +73,32 @@ (define_code_iterator atomic_op [plus mi (define_code_attr atomic_op_name [(plus "add") (minus "sub") (and "and") (ior "or") (xor "xor") (mult "nand")]) +;; The operator nonatomic-operand can be memory, constant or register +;; for all but xor. We can't use memory or addressing modes with +;; side-effects though, so just use registers and literal constants. +(define_code_attr atomic_op_op_cnstr + [(plus "ri") (minus "ri") (and "ri") (ior "ri") (xor "r") (mult "ri")]) + +(define_code_attr atomic_op_op_pred + [(plus "nonmemory_operand") (minus "nonmemory_operand") + (and "nonmemory_operand") (ior "nonmemory_operand") + (xor "register_operand") (mult "nonmemory_operand")]) + ;; Pairs of these are used to insert the "not" after the "and" for nand. -(define_code_attr atomic_op_mnem_pre ;; Upper-case only to sinplify testing. - [(plus "Add.d") (minus "Sub.d") (and "And.d") (ior "Or.d") (xor "Xor") - (mult "aNd.d")]) +(define_code_attr atomic_op_mnem_pre_op2 ;; Upper-case only to simplify testing. + [(plus "%P2") (minus "Sub.d %2") (and "And%q2 %2") (ior "Or%q2 %2") (xor "Xor %2") + (mult "aNd%q2 %2")]) + (define_code_attr atomic_op_mnem_post_op3 [(plus "") (minus "") (and "") (ior "") (xor "") (mult "not %3\;")]) +;; For SImode, emit "q" for operands -31..31. +(define_mode_attr qm3 [(SI "%q3") (HI ".w") (QI ".b")]) + (define_expand "atomic_fetch_" [(match_operand:BWD 0 "register_operand") (match_operand:BWD 1 "memory_operand") - (match_operand:BWD 2 "register_operand") + (match_operand:BWD 2 "") (match_operand 3) (atomic_op:BWD (match_dup 0) (match_dup 1))] "" @@ -109,8 +124,9 @@ (define_insn "cris_atomic_fetch_" "")))
CRIS atomics revisited 4/4: give up on alignment of atomic data, RFC for is_lock_free hook
Well, give up by default that is, and fix it up in a helper function in glibc to hold a global byte-sized atomic lock for the duration. (Sorry!) Yes, this means that fold_builtin_atomic_always_lock_free is wrong. It knows about alignment in general but doesn't handle the case where the default alignment of the underlying type is too small for atomic accesses, and should probably be augmented by a target hook, alternatively, change the allow_libcall argument in the call to can_compare_and_swap_p to false. I guess I should open a PR for this and add a test-case. Later. Too many library API writers don't cater for the possibility that atomic (lockless) data may need to have certain properties that may not be matched by the basic underlying data type, specifically alignment, and fixing the failing instances by hand is...challenging. About half the cases have the atomic data defined in the proximity of the atomic operations, and are easily locally fixable. The other half are of increasing complexity; may have the data defined elsewhere, where the need for atomicity is surprising and fixing it would be a kludge. (But with a proper API could be easily handled, e.g. if a data-type defined specific for the purpose was used; one different than the underlying type or other common derivated type used in the library.) So, we'll change things for cris*-linux. By default, call a helper function. Users can change the default at the caller site where atomic alignment is known good or where there is interest in fixing it up when failure is seen; executing a trap insn was the old default. Regarding changing fold_builtin_atomic_always_lock_free or adding a hook, I posit that a better default is to assume atomic data has to be naturally aligned on *all* existing GCC targets to accomplish lockless operations, at least for those that don't just punt to a system call, so maybe fold_builtin_atomic_always_lock_free should be changed to check that, by default. Now, it just assumes that the default type-alignment is ok and that only a smaller alignment is not always atomic. People with counter-examples are asked to please explain how the counter-example handles data straddling a page boundary. Yes, it can be done, but how does the kernel-equivalent accomplish atomicity; are the pages locked while the instruction (assumed to cause an exception), is emulated, or is kernel re-entrance impossible or what? The default remains the same for non-*-linux-* cris-* and crisv32-* subtargets, since the code compiled for those targets is expected to have a different focus, one where fixing non-aligned data definitions is feasible and desirable. I deliberately make it optional and use weasel-wording whether the library functions are actually called or an atomic insn sequence emitted when suitable; when GCC knows the alignment of the data (for example, for local static data or through deliberate attributes) it should be allowed to emit the atomic instruction sequence even without alignment checks. Right now (or maybe it was just the 4.7 branch), GCC's handling of alignment is so poor that the emitted alignment checks (those that conditionally execute a trap insn) aren't eliminated for atomic data with explicit large-enough attribute declarations. >From what I (with limited tree-foo) could see, IIRC basically everything about alignment for the specific data is discarded and the underlying default type alignment is reported. (Right, a PR is in order; I know I've entered a PR for the related __alignof__.) gcc: * config/cris/cris.c (cris_init_libfuncs): Handle initialization of library functions for basic atomic compare-and-swap. * config/cris/cris.h (TARGET_ATOMICS_MAY_CALL_LIBFUNCS): New macro. * config/cris/cris.opt (munaligned-atomic-may-use-library): New option. * config/cris/sync.md ("atomic_fetch_") ("cris_atomic_fetch__1") ("atomic_compare_and_swap") ("cris_atomic_compare_and_swap_1"): Make conditional on TARGET_ATOMICS_MAY_CALL_LIBFUNCS for sizes larger than byte. gcc/testsuite: * gcc.target/cris/sync-2i.c, gcc.target/cris/sync-2s.c, gcc.target/cris/sync-3i.c, gcc.target/cris/sync-3s.c, gcc.target/cris/sync-4i.c, gcc.target/cris/sync-4s.c, gcc.target/cris/sync-1-v10.c, gcc.target/cris/sync-1-v32.c: For cris*-*-linux*, also pass -mno-unaligned-atomic-may-use-library. * gcc.target/cris/sync-xchg-1.c: New test. diff --git gcc/config/cris/cris.c gcc/config/cris/cris.c index 22b254f..e4c11fd 100644 --- gcc/config/cris/cris.c +++ gcc/config/cris/cris.c @@ -3130,6 +3176,16 @@ cris_init_libfuncs (void) set_optab_libfunc (udiv_optab, SImode, "__Udiv"); set_optab_libfunc (smod_optab, SImode, "__Mod"); set_optab_libfunc (umod_optab, SImode, "__Umod"); + + /* Atomic data being unaligned is unfortunately a reality. + Deal with it. */ + if (TARGET_ATOMICS_MAY_CALL_LIBFUNCS) +
Fixing gcc.c-torture/compile/pr44707.c for CRIS v32 1/2.
Buglet in cris_preferred_reload_class, incidental, apparently without effect at least regarding failing test-cases. A class disjunct from the input was returned as "preferred". It could arguably be gcc_asserted as a sanity-check by the caller that the returned class is a subset of the original class. ...and I guess I'll add such a gcc_assert *inside* cris_preferred_reload_class. Later. No regressions, cris-elf and crisv32-elf. Committed. gcc: * config/cris/cris.c (cris_preferred_reload_class): Don't return GENERAL_REGS as preferred to MOF_SRP_REGS. Index: gcc/config/cris/cris.c === --- gcc/config/cris/cris.c (revision 189470) +++ gcc/config/cris/cris.c (working copy) @@ -1503,6 +1550,7 @@ cris_preferred_reload_class (rtx x ATTRI { if (rclass != ACR_REGS && rclass != MOF_REGS + && rclass != MOF_SRP_REGS && rclass != SRP_REGS && rclass != CC0_REGS && rclass != SPECIAL_REGS) brgds, H-P
Fixing gcc.c-torture/compile/pr44707.c for CRIS v32 2/2: RFC: CONSTANT_ADDRESS_P and its default are evil!
I think CONSTANT_ADDRESS_P can and should be eliminated, replaced by something like CONSTANT_P (x) && targetm.legitimate_address_p (QImode, x, false) (or QImode replaced by the known used mode) in the code currently calling it. It should, because the default definition is redundant and evil; easy to miss for targets where (mem (const x)) is not valid for any arbitrary generic x (symbol_ref, label_ref or const_int, including offsetted ones; (plus x (const_int N)). This is the case for CRIS v32, for which only (mem reg) and (mem (post_inc reg)) are valid. Like ia64 it has no offsettable addressing mode. For example, the constraint in gcc.c-torture/compile/pr44707.c of "nro" can only match for the "r" part. If your target fails gcc.c-torture/compile/pr44707.c, this might be the reason. No regressions for cris-elf nor crisv32-elf; fixes gcc.c-torture/compile/pr44707.c for the latter. Committed. * config/cris/cris-protos.h (cris_legitimate_address_p): Declare. * config/cris/cris.h (CONSTANT_ADDRESS_P): Define in terms of CONSTANT_P and cris_legitimate_address_p. * config/cris/cris.c (cris_legitimate_address_p): Make non-static. Index: config/cris/cris.c === --- config/cris/cris.c (revision 189506) +++ config/cris/cris.c (working copy) @@ -127,8 +127,6 @@ static void cris_init_libfuncs (void); static reg_class_t cris_preferred_reload_class (rtx, reg_class_t); -static bool cris_legitimate_address_p (enum machine_mode, rtx, bool); - static int cris_register_move_cost (enum machine_mode, reg_class_t, reg_class_t); static int cris_memory_move_cost (enum machine_mode, reg_class_t, bool); static bool cris_rtx_costs (rtx, int, int, int, int *, bool); @@ -1414,7 +1412,7 @@ cris_biap_index_p (const_rtx x, bool str here (but is thankfully a general_operand in itself). A local PIC symbol is valid for the plain "symbol + offset" case. */ -static bool +bool cris_legitimate_address_p (enum machine_mode mode, rtx x, bool strict) { const_rtx x1, x2; Index: config/cris/cris.h === --- config/cris/cris.h (revision 189504) +++ config/cris/cris.h (working copy) @@ -778,6 +778,9 @@ struct cum_args {int regs;}; #define HAVE_POST_INCREMENT 1 +#define CONSTANT_ADDRESS_P(X) \ + (CONSTANT_P (X) && cris_legitimate_address_p (QImode, X, false)) + /* Must be a compile-time constant, so we go with the highest value among all CRIS variants. */ #define MAX_REGS_PER_ADDRESS 2 Index: config/cris/cris-protos.h === --- config/cris/cris-protos.h (revision 189499) +++ config/cris/cris-protos.h (working copy) @@ -40,6 +40,7 @@ extern bool cris_base_p (const_rtx, bool extern bool cris_base_or_autoincr_p (const_rtx, bool); extern bool cris_bdap_index_p (const_rtx, bool); extern bool cris_biap_index_p (const_rtx, bool); +extern bool cris_legitimate_address_p (enum machine_mode, rtx, bool); extern bool cris_store_multiple_op_p (rtx); extern bool cris_movem_load_rest_p (rtx, int); extern void cris_asm_output_symbol_ref (FILE *, rtx); brgds, H-P
[Ping, ARM]PR53189: optimizations of 64bit logic operation with constant
Hi The following patches implemented the optimizations suggested by PR53189, optimizations of 64bit logic operation with constant. Could any maintainer help to review it? http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00087.html http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00169.html http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00226.html thanks Carrot
Re: CRIS atomics revisited 4/4: give up on alignment of atomic data
> From: Hans-Peter Nilsson > Date: Mon, 16 Jul 2012 05:49:00 +0200 > gcc: > * config/cris/sync.md ("atomic_fetch_") > ("cris_atomic_fetch__1") > ("atomic_compare_and_swap") > ("cris_atomic_compare_and_swap_1"): Make > conditional on TARGET_ATOMICS_MAY_CALL_LIBFUNCS for > sizes larger than byte. A sync goof (the VC kind): the committed and sent patch, but not the changelog, was missing the first hunk, now committed: Index: config/cris/sync.md === --- config/cris/sync.md (revision 189504) +++ config/cris/sync.md (working copy) @@ -101,7 +101,7 @@ (define_expand "atomic_fetch_") (match_operand 3) (atomic_op:BWD (match_dup 0) (match_dup 1))] - "" + "mode == QImode || !TARGET_ATOMICS_MAY_CALL_LIBFUNCS" { enum memmodel mmodel = (enum memmodel) INTVAL (operands[3]); brgds, H-P
Re: [PATCH][MIPS] NetLogic XLP scheduling
On 2012/7/16 12:28 AM, Richard Sandiford wrote: > Chung-Lin Tang writes: >> This patch adds scheduling support for the NetLogic XLP, including a new >> pipeline description, and associated changes. >> >> Asides from the new xlp.md description file, there are also some sync >> primitive attribute modifications, for better scheduling of sync loops >> (Maxim should be able to better explain this). > > Rather than add a "type" attribute to each sync loop, please just add: > > (not (eq_attr "sync_mem" "none")) > (symbol_ref "syncloop") > > to the default value of the "type" attribute. You'll probably need > to swap the order of the sync* attributes with the "type" attribute > in order for this to compile. > > The patch is effectively changing the type of the sync loops from > "unknown" to "syncloop". That's certainly OK, but you'll need to > add "syncloop" to the "unknown" reservations of all other schedulers > (except for generic.md, where what you've done instead is fine). > It might be easier if you split out the addition of syncloop > as a separate patch. I'll leave it to Maxim to respond to the sync parts. >> Other generic changes include a new "hilo" insn attribute, to mark which >> of HI/LO does a m[ft]hilo insn access. > > The way other schedulers handle this is with things like: > > (define_insn_reservation "ir_sb1_mfhi" 1 > (and (eq_attr "cpu" "sb1,sb1a") >(and (eq_attr "type" "mfhilo") > (not (match_operand 1 "lo_operand" > "sb1_ex1") > > which seems simpler. mfhilo and mthilo are required to read operand 1 > and write to operand 0 (respectively) in order to support this kind of > construct. > > That said, even the above is a hold-over from when we tried to allow > high registers to store independent values. These days we can be a bit > more precise, as with the patch below. (As the comment says: > >;; If a doubleword move uses these expensive instructions, >;; it is usually better to schedule them in the same way >;; as the singleword form, rather than as "multi". > > I'm continuing to assume that mflo and mtlo are the best type choices > for unsplit double-register moves. That path should be very rarely > outside of MIPS16 anyway -- just by sched1 if hi and lo are exposed > directly -- and no current scheduler tries to model a doubleword hi/lo > move separately from single-register ones. The information is available > via the dword_mode attribute if required.) I suppose this means that actual generation of moves as mfhi/mthi should almost never happen out of normal conditions? > Tested on mips64-elf, and by making sure that there were no changes in > -O2 output for a recent set of cc1 .ii files. Applied. > > I'm probably punishing you for being honest here, but the only other > thing is that you've listed NetLogic Microsystems Inc. as one of the > authors. I think that means they'll need to sign a copyright assignment. > Have they already done that? They have assigned the copyright to Mentor Graphics, so it should mean the code can be contributed by us. Thanks, Chung-Lin
Re: [PATCH][MIPS] NetLogic XLP scheduling
On 16/07/2012, at 6:37 PM, Chung-Lin Tang wrote: > On 2012/7/16 12:28 AM, Richard Sandiford wrote: >> Chung-Lin Tang writes: >>> This patch adds scheduling support for the NetLogic XLP, including a new >>> pipeline description, and associated changes. >>> >>> Asides from the new xlp.md description file, there are also some sync >>> primitive attribute modifications, for better scheduling of sync loops >>> (Maxim should be able to better explain this). >> >> Rather than add a "type" attribute to each sync loop, please just add: >> >>(not (eq_attr "sync_mem" "none")) >>(symbol_ref "syncloop") >> >> to the default value of the "type" attribute. You'll probably need >> to swap the order of the sync* attributes with the "type" attribute >> in order for this to compile. >> >> The patch is effectively changing the type of the sync loops from >> "unknown" to "syncloop". That's certainly OK, but you'll need to >> add "syncloop" to the "unknown" reservations of all other schedulers >> (except for generic.md, where what you've done instead is fine). >> It might be easier if you split out the addition of syncloop >> as a separate patch. > > I'll leave it to Maxim to respond to the sync parts. Richard, that's indeed simpler, thanks. Chung-Lin, I'll try to make a patch for the patch in the next couple of days and will send it to you. Let me know if you'd rather fixed this yourself. ... >> Tested on mips64-elf, and by making sure that there were no changes in >> -O2 output for a recent set of cc1 .ii files. Applied. >> >> I'm probably punishing you for being honest here, but the only other >> thing is that you've listed NetLogic Microsystems Inc. as one of the >> authors. I think that means they'll need to sign a copyright assignment. >> Have they already done that? > > They have assigned the copyright to Mentor Graphics, so it should mean > the code can be contributed by us. That is correct. NetLogic developed the original xlp.md description, which Chung-Lin essentially rewrote. In any case, Mentor has copyright assignment for the original xlp.md specifically so that we can contribute this upstream. Thank you, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics