How to make a array aligned with 16 byte
Can I make a array aligned with 16 byte at RTL pass? Thanks! -- Jianzhang Peng
Re: identifying indirect references in a loop
On Fri, Dec 11, 2009 at 5:16 AM, Aravinda wrote: > Hi, > > Im trying to identify all indirect references in a loop so that, after > this analysis, I have a list of tree_nodes of pointer_type that are > dereferenced in a loop along with their step size, if any. > > E.g. > while(i++ < n) > { > *(p+i); > } > > I want to get the pointer_type_node for 'p' and identify the step size > as '1', since 'i' has a step size of 1. > > I am able to identify 'INDIRECT_REF' nodes in the loop. But since > these are generally the expression_temporaries, I will not get the > tree_node for 'p'. But I believe INDIRECT_REF is an expression who's > arg0 is an SSA_NAME node from which I will be able to use the > SSA_NAME_DEF_STMT to ultimately reach the tree_node for 'p'. > > But I dont know how to get the SSA_NAME node from the given > INDIRECT_REF. Could someone please point out how to do this. > > Also, I find it very difficult to know how the tree_nodes and types > are contained one within the other. Is there a general technique by > which I can know when a tree node will be nested within another and > how to retrieve them ? Look into the tree.def file. Operands can be retrieved with the TREE_OPERAND macro (see tree.h). So if you have an INDIRECT_REF expression tree node you can get the variable or SSA_NAME that is dereferenced using TREE_OPERAND (e, 0) if e is the INDIRECT_REF expression tree. The pointer type is then simply TREE_TYPE of that operand. Btw, I think you want to use the existing data dependence analysis which provides you with a list of data references in a loop. See tree-data-ref.[ch]. Richard. > Thanks, > Aravinda >
[RFC] LTO and debug information
The following draft patch disables the debuginfo disabling when using -flto or -fwhopr and fixes up things so that for C debugging (mostly) works. The main question I have is how to proceed further here (with the goal that simple debugging should be possible in 4.5). If we apply this patch then we expose ICEs when -flto is used in conjunction with -g because the patch doesn't fix all clashes between free-lang-data and dwarf2out. Now I was thinking of instead of ICEing to sorry () if we ICE, have debuginfo enabled and had run free-lang-data. Or to keep -g non-operational for LTO and add a -glto or -fi-really-want-to-debug option. Or of course hope I can reasonably fix the ICEs I run into and deal with the remaining cases as bugs? The patch has proven useful debugging miscompiles in its current state already. Thanks, Richard. 2009-12-11 Richard Guenther * tree.c (free_lang_data_in_binfo): Do not free BINFO_OFFSET and BINFO_VPTR_FIELD. (free_lang_data_in_decl): Do not free DECL_SIZE_UNIT, DECL_SIZE, DECL_FIELD_OFFSET and DECL_FCONTEXT. (free_lang_data): Do not disable debuginfo. * lto-streamer-out.c (write_symbol_vec): Deal with non-constant DECL_SIZE. * dwarf2out.c (add_pure_or_virtual_attribute): Check for DECL_CONTEXT. (gen_type_die_for_member): Test for TYPE_STUB_DECL. * opts.c (decode_options): Do not disable var-tracking for lto. lto/ * lto.c (lto_fixup_field_decl): Fixup DECL_FIELD_OFFSET. (lto_post_options): Do not disable debuginfo. Index: gcc/tree.c === *** gcc/tree.c (revision 155164) --- gcc/tree.c (working copy) *** free_lang_data_in_binfo (tree binfo) *** 4152,4164 gcc_assert (TREE_CODE (binfo) == TREE_BINFO); - BINFO_OFFSET (binfo) = NULL_TREE; BINFO_VTABLE (binfo) = NULL_TREE; - BINFO_VPTR_FIELD (binfo) = NULL_TREE; BINFO_BASE_ACCESSES (binfo) = NULL; BINFO_INHERITANCE_CHAIN (binfo) = NULL_TREE; BINFO_SUBVTT_INDEX (binfo) = NULL_TREE; - BINFO_VPTR_FIELD (binfo) = NULL_TREE; for (i = 0; VEC_iterate (tree, BINFO_BASE_BINFOS (binfo), i, t); i++) free_lang_data_in_binfo (t); --- 4152,4161 *** free_lang_data_in_decl (tree decl) *** 4376,4404 } } ! if (TREE_CODE (decl) == PARM_DECL ! || TREE_CODE (decl) == FIELD_DECL ! || TREE_CODE (decl) == RESULT_DECL) ! { ! tree unit_size = DECL_SIZE_UNIT (decl); ! tree size = DECL_SIZE (decl); ! if ((unit_size && TREE_CODE (unit_size) != INTEGER_CST) ! || (size && TREE_CODE (size) != INTEGER_CST)) ! { ! DECL_SIZE_UNIT (decl) = NULL_TREE; ! DECL_SIZE (decl) = NULL_TREE; ! } ! ! if (TREE_CODE (decl) == FIELD_DECL ! && DECL_FIELD_OFFSET (decl) ! && TREE_CODE (DECL_FIELD_OFFSET (decl)) != INTEGER_CST) ! DECL_FIELD_OFFSET (decl) = NULL_TREE; ! ! /* DECL_FCONTEXT is only used for debug info generation. */ ! if (TREE_CODE (decl) == FIELD_DECL) ! DECL_FCONTEXT (decl) = NULL_TREE; ! } ! else if (TREE_CODE (decl) == FUNCTION_DECL) { if (gimple_has_body_p (decl)) { --- 4373,4379 } } ! if (TREE_CODE (decl) == FUNCTION_DECL) { if (gimple_has_body_p (decl)) { *** free_lang_data (void) *** 4973,4985 diagnostic_finalizer (global_dc) = default_diagnostic_finalizer; diagnostic_format_decoder (global_dc) = default_tree_printer; - /* FIXME. We remove sufficient language data that the debug - info writer gets completely confused. Disable debug information - for now. */ - debug_info_level = DINFO_LEVEL_NONE; - write_symbols = NO_DEBUG; - debug_hooks = &do_nothing_debug_hooks; - return 0; } --- 4948,4953 Index: gcc/lto-streamer-out.c === *** gcc/lto-streamer-out.c (revision 155164) --- gcc/lto-streamer-out.c (working copy) *** write_symbol_vec (struct lto_streamer_ca *** 2350,2356 break; } ! if (kind == GCCPK_COMMON && DECL_SIZE (t)) size = (((uint64_t) TREE_INT_CST_HIGH (DECL_SIZE (t))) << 32) | TREE_INT_CST_LOW (DECL_SIZE (t)); else --- 2349,2357 break; } ! if (kind == GCCPK_COMMON ! && DECL_SIZE (t) ! && TREE_CODE (DECL_SIZE (t)) == INTEGER_CST) size = (((uint64_t) TREE_INT_CST_HIGH (DECL_SIZE (t))) << 32) | TREE_INT_CST_LOW (DECL_SIZE (t)); else Index: gcc/dwarf2out.c === *** gcc/dwarf2out.c (revision 155164) --- gcc/dwarf2out.c (working copy) *** add_pure_or_virtual_attribute (dw_die_re *** 16476,16482 0
Re: generate RTL sequence
Joern Rennecke writes: > If you need more rigid scheduling, you can use CC0. No, please don't. I accept that CC0 is necessary today for a few processors, but I really don't think we should encourage any new uses of it. Ian
Bitfields problem
As I continue my work on the machine description file, I currently worked on the bitfields again to try to get a good code generation working. Right now, I've followed what was done in the ia64 for signed extractions : (define_insn "extv" [(set (match_operand:DI 0 "gr_register_operand" "=r") (sign_extract:DI (match_operand:DI 1 "gr_register_operand" "r") (match_operand:DI 2 "extr_len_operand" "n") (match_operand:DI 3 "shift_count_operand" "M")))] "" "extr %0 = %1, %3, %2" [(set_attr "itanium_class" "ishf")]) now this works for me except that I get for this code: typedef struct sTest { int64_t a:1; int64_t b:5; int64_t c:7; int64_t d:15; }STest; int64_t bar2 (STest a) { int64_t res = a.d; return res; } Here is what I get at the final cleanup: ;; Function bar2 (bar2) bar2 (a) { short unsigned int SR.44; short unsigned int SR.43; short unsigned int SR.41; short unsigned int SR.40; short unsigned int SR.22; short unsigned int SR.3; : SR.22 = (short unsigned int) () ((short unsigned int) a.d & 32767); SR.43 = SR.22 & 32767; SR.44 = SR.43 ^ 16384; SR.3 = (short unsigned int) () ((short unsigned int) () (SR.44 + 49152) & 32767); SR.40 = SR.3 & 32767; SR.41 = SR.40 ^ 16384; return (int64_t) () (SR.41 + 49152); } I don't understand why I get all these instructions. I know that because it's signed, it is more complicated but I would prefer to get an unsigned extract and the a shift left/shift right. Thus 3 instructions. Right now, I get so many more instructions that represent what I showed from the final cleanup. Any reason for all these instructions or ideas on how to get to my 3 instructions ? Thank you for your help and time, Jean Christophe Beyler
Dwarf announcements mailing list
The public comment draft of the DWARF Version 4 Standard should be available some time next month. It will be on the DWARF website: http://dwarfstd.org. If you want to receive a notification when this is available, please sign up on the DWARF announcements mailing list: http://lists.dwarfstd.org/listinfo.cgi/dwarf-announce-dwarfstd.org -- Michael Eagerea...@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
Vectorizing 16bit signed integers
Hi I hope someone can help me. I've been trying to write some tight integer loops in way that could be auto-vectorized, saving me to write assembler or using specific vectorization extensions. Unfortunately I've not yet managed to make gcc vectorize any of them. I've simplified the case to just perform the very first operation in the loop; converting from two's complement to sign-and-magnitude. I've then used -ftree-vectorizer-verbose to examine if and if not, why not the loops were not vectorized, but I am afraid I don't understand the output. The simplest version of the loop is here (it appears the branch is not a problem, but I have another version without). inline uint16_t transsign(int16_t v) { if (v<0) { return 0x8000U | (1-v); } else { return v; } } It very simply converts in a fashion that maintains the full effective bit- width. The error from the vectorizer is: vectorizesign.cpp:42: note: not vectorized: relevant stmt not supported: v.1_16 = (uint16_t) D.2157_11; It appears the unsupported operation in vectorization is the typecast from int16_t to uint16_t, can this really be the case, or is the output misleading? If it is the case, then is there good reason for it, or can I fix it myself by adding additional vectorizable operations? I've attached both test case and full output of ftree-vectorized-verbose=9 Best regards `Allan #include inline uint16_t transsign1(int16_t v) { // written with no control-flow to facilitate auto-vectorization uint16_t sv = v >> 15; // signed left-shift gives a classic sign selector -1 or 0 sv = sv & 0x7FFFU; // never invert the sign-bit return v ^ sv; // conditional invertion by xor } inline uint16_t transsign2(int16_t v) { if (v<0) { return 0x8000U | ~v; } else { return v; } } inline uint16_t transsign3(int16_t v) { if (v<0) { return 0x8000U | (1-v); } else { return v; } } // candidate for vectorizaton void convertts1(uint16_t* out, int16_t* in, uint32_t len) { for(unsigned int i=0;igcc: 2: No such file or directory vectorizesign.cpp:28: note: = analyze_loop_nest = vectorizesign.cpp:28: note: === vect_analyze_loop_form === vectorizesign.cpp:28: note: split exit edge. vectorizesign.cpp:28: note: === get_loop_niters === vectorizesign.cpp:28: note: ==> get_loop_niters:len_3(D) vectorizesign.cpp:28: note: Symbolic number of iterations is len_3(D) vectorizesign.cpp:28: note: === vect_analyze_data_refs === vectorizesign.cpp:28: note: get vectype with 8 units of type short int vectorizesign.cpp:28: note: vectype: vector short int vectorizesign.cpp:28: note: get vectype with 8 units of type short unsigned int vectorizesign.cpp:28: note: vectype: vector short unsigned int vectorizesign.cpp:28: note: === vect_analyze_scalar_cycles === vectorizesign.cpp:28: note: Analyze phi: i_16 = PHI vectorizesign.cpp:28: note: Access function of PHI: {0, +, 1}_1 vectorizesign.cpp:28: note: step: 1, init: 0 vectorizesign.cpp:28: note: Detected induction. vectorizesign.cpp:28: note: Analyze phi: SMT.12_27 = PHI vectorizesign.cpp:28: note: === vect_pattern_recog === vectorizesign.cpp:28: note: vect_is_simple_use: operand i_16 vectorizesign.cpp:28: note: def_stmt: i_16 = PHI vectorizesign.cpp:28: note: type of def: 4. vectorizesign.cpp:28: note: === vect_mark_stmts_to_be_vectorized === vectorizesign.cpp:28: note: init: phi relevant? i_16 = PHI vectorizesign.cpp:28: note: init: phi relevant? SMT.12_27 = PHI vectorizesign.cpp:28: note: init: stmt relevant? D.2120_5 = i_16 * 2; vectorizesign.cpp:28: note: init: stmt relevant? D.2121_7 = out_6(D) + D.2120_5; vectorizesign.cpp:28: note: init: stmt relevant? D.2122_10 = in_9(D) + D.2120_5; vectorizesign.cpp:28: note: init: stmt relevant? D.2123_11 = *D.2122_10; vectorizesign.cpp:28: note: init: stmt relevant? D.2124_12 = (int) D.2123_11; vectorizesign.cpp:28: note: init: stmt relevant? D.2170_17 = D.2124_12 >> 15; vectorizesign.cpp:28: note: init: stmt relevant? sv_18 = (uint16_t) D.2170_17; vectorizesign.cpp:28: note: init: stmt relevant? sv_19 = sv_18 & 32767; vectorizesign.cpp:28: note: init: stmt relevant? sv.0_20 = (short int) sv_19; vectorizesign.cpp:28: note: init: stmt relevant? D.2167_21 = sv.0_20 ^ D.2123_11; vectorizesign.cpp:28: note: init: stmt relevant? D.2166_22 = (uint16_t) D.2167_21; vectorizesign.cpp:28: note: init: stmt relevant? *D.2121_7 = D.2166_22; vectorizesign.cpp:28: note: vec_stmt_relevant_p: stmt has vdefs. vectorizesign.cpp:28: note: mark relevant 4, live 0. vectorizesign.cpp:28: note: init: stmt relevant? i_14 = i_16 + 1; vectorizesign.cpp:28: note: init: stmt relevant? if (len_3(D) > i_14) vectorizesign.cpp:28: note: worklist: examine stmt: *D.2121_7 = D.2166_22; vectorizesign.cpp:28: note: vect_is_simple_use: operand D.2166_22 vectorizesign.cpp:28: note: def_stmt: D.2166_22 = (uint16_t) D.2167_21; vectorizesign.cpp:28: note: type of def: 3. vectorizes
Re: Bitfields problem
Interestingly enough, if I do this instead: typedef struct sTest { int a:12; int b:20; int c:7; int d:15; }STest; int64_t bar2 (STest *a) { int64_t res = a->b; return res; } I get at the expand pass : (insn 6 5 7 3 struct3.c:27 (set (reg:SI 75) (mem/s:SI (reg/v/f:DI 73 [ a ]) [0 S4 A32])) -1 (nil)) -> Actually get the data (insn 7 6 8 3 struct3.c:27 (set (reg:DI 77) (zero_extract:DI (subreg:DI (reg:SI 75) 0) (const_int 20 [0x14]) (const_int 12 [0xc]))) -1 (nil)) -> Extract the bits we want but this is zero_extracted (insn 8 7 9 3 struct3.c:27 (set (reg:DI 78) (ashift:DI (reg:DI 77) (const_int 43 [0x2b]))) -1 (nil)) (insn 9 8 10 3 struct3.c:27 (set (subreg:DI (reg:SI 76) 0) (ashiftrt:DI (reg:DI 78) (const_int 43 [0x2b]))) -1 (nil)) -> These two instructions actually sign extend it (insn 10 9 11 3 struct3.c:27 (set (reg:DI 79) (ashift:DI (reg:SI 76) (const_int 32 [0x20]))) -1 (nil)) (insn 11 10 12 3 struct3.c:27 (set (reg:DI 74) (ashiftrt:DI (reg:DI 79) (const_int 32 [0x20]))) -1 (expr_list:REG_EQUAL (sign_extend:DI (reg:SI 76)) (nil))) -> Because it's seen as a SI, these last two sign extend it again... And I get later on in the passes (the instructions are removed by the combine pass): (insn 6 3 7 2 struct3.c:27 (set (reg:SI 75) (mem/s:SI (reg:DI 8 r8 [ a ]) [0 S4 A32])) 74 {movsi_internal2} (expr_list:REG_DEAD (reg:DI 8 r8 [ a ]) (nil))) (note 7 6 8 2 NOTE_INSN_DELETED) (note 8 7 9 2 NOTE_INSN_DELETED) (note 9 8 10 2 NOTE_INSN_DELETED) (note 10 9 11 2 NOTE_INSN_DELETED) (note 11 10 16 2 NOTE_INSN_DELETED) (insn 16 11 22 2 struct3.c:30 (set (reg/i:DI 6 r6) (zero_extract:DI (subreg:DI (reg:SI 75) 0) (const_int 20 [0x14]) (const_int 12 [0xc]))) 63 {extzvdi} (expr_list:REG_DEAD (reg:SI 75) (nil))) So now I have two issues that I can't seem to figure out : - Why can combine remove these 4 instructions ? - Why do I have such a difference between a local variable that is not a pointer, a pointer and a global variable ? I remember having a different behavior if the variable was a global variable or if it was a parameter. It seems that this is the case also for here. However, this is worse, since it transforms my signed extract into a simple zero_extract. Thanks for your help, Jc PS: here is the combine pass debug information: ;; Function bar2 (bar2) starting the processing of deferred insns ending the processing of deferred insns df_analyze called insn_cost 2: 4 insn_cost 6: 4 insn_cost 7: 36 insn_cost 8: 4 insn_cost 9: 4 insn_cost 10: 4 insn_cost 11: 4 insn_cost 16: 4 insn_cost 22: 0 deferring deletion of insn with uid = 2. modifying insn i3 6 r75:SI=[r8:DI] REG_DEAD: r8:DI deferring rescan insn with uid = 6. deferring deletion of insn with uid = 8. modifying insn i3 9 r76:SI#0=r77:DI REG_DEAD: r77:DI deferring rescan insn with uid = 9. deferring deletion of insn with uid = 7. modifying insn i3 9 r76:SI#0=zero_extract(r75:SI#0,0x14,0xc) REG_DEAD: r75:SI deferring rescan insn with uid = 9. deferring deletion of insn with uid = 10. modifying insn i311 r74:DI=r76:SI#0&0xf REG_DEAD: r76:SI deferring rescan insn with uid = 11. deferring deletion of insn with uid = 9. modifying insn i311 r74:DI=zero_extract(r75:SI#0,0x14,0xc) REG_DEAD: r75:SI deferring rescan insn with uid = 11. deferring deletion of insn with uid = 11. modifying insn i316 r6:DI=zero_extract(r75:SI#0,0x14,0xc) REG_DEAD: r75:SI deferring rescan insn with uid = 16. (note 1 0 4 NOTE_INSN_DELETED) (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (note 2 4 3 2 NOTE_INSN_DELETED) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (insn 6 3 7 2 struct3.c:27 (set (reg:SI 75) (mem/s:SI (reg:DI 8 r8 [ a ]) [0 S4 A32])) 74 {movsi_internal2} (expr_list:REG_DEAD (reg:DI 8 r8 [ a ]) (nil))) (note 7 6 8 2 NOTE_INSN_DELETED) (note 8 7 9 2 NOTE_INSN_DELETED) (note 9 8 10 2 NOTE_INSN_DELETED) (note 10 9 11 2 NOTE_INSN_DELETED) (note 11 10 16 2 NOTE_INSN_DELETED) (insn 16 11 22 2 struct3.c:30 (set (reg/i:DI 6 r6) (zero_extract:DI (subreg:DI (reg:SI 75) 0) (const_int 20 [0x14]) (const_int 12 [0xc]))) 63 {extzvdi} (expr_list:REG_DEAD (reg:SI 75) (nil))) (insn 22 16 0 2 struct3.c:30 (use (reg/i:DI 6 r6)) -1 (nil)) starting the processing of deferred insns deleting insn with uid = 2. deleting insn with uid = 7. deleting insn with uid = 8. deleting insn with uid = 9. deleting insn with uid = 10. deleting insn with uid = 11. rescanning insn with uid = 6. deleting insn with uid = 6. rescanning insn with uid = 16. deleting insn with uid = 16. ending the processing of deferred insns ;; Combiner totals: 16 attempts, 16 substitutions (2 requiring new space), ;; 6 successes. On Fri, Dec 11, 2009 at 11:57 AM, Jean Christ
Re: Bad mailing list index?
On 10/12/2009 7:43 a.m., H.J. Lu wrote: Hi, When I visit: http://gcc.gnu.org/ml/gcc-bugs/ http://gcc.gnu.org/ml/gcc-cvs/ at Wed Dec 9 10:41:43 PST 2009, I didn't see "December, 2009". It was there yesterday. Has anyone else seen it? You may need to clear browser cache first. The page sends a Last-Modified time but no Expires header, so some aggressive proxies, browsers and other caches could end up caching it for a long time. Adding an Expires header set for the 1st of the next month would be a good idea. Cheers, Nicholas Sherlock