On Wed, Jun 20, 2012 at 6:22 PM, William J. Schmidt
<[email protected]> wrote:
> On Wed, 2012-06-20 at 13:11 +0200, Richard Guenther wrote:
>> On Thu, Jun 14, 2012 at 3:21 PM, William J. Schmidt
>> <[email protected]> wrote:
>> > Pro forma ping. :)
>>
>> ;)
>>
>> I notice (with all of these functions)
>>
>> +unsigned
>> +negate_cost (enum machine_mode mode, bool speed)
>> +{
>> + static unsigned costs[NUM_MACHINE_MODES];
>> + rtx seq;
>> + unsigned cost;
>> +
>> + if (costs[mode])
>> + return costs[mode];
>> +
>> + start_sequence ();
>> + force_operand (gen_rtx_fmt_e (NEG, mode,
>> + gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1)),
>> + NULL_RTX);
>> + seq = get_insns ();
>> + end_sequence ();
>> +
>> + cost = seq_cost (seq, speed);
>> + if (!cost)
>> + cost = 1;
>>
>> that the cost[] array is independent on the speed argument. Thus whatever
>> comes first determines the cost. Odd, and probably not good. A fix
>> would be appreciated (even for the current code ...) - simply make the
>> array costs[NUM_MACHINE_MODES][2].
>>
>> As for the renaming - can you name the functions consistently? Thus
>> the above would be negate_reg_cost? And maybe rename the other
>> FIXME function, too?
>
> I agree with all this. I'll prepare all the cost model changes as a
> separate preliminaries patch.
>
>>
>> Index: gcc/tree-ssa-strength-reduction.c
>> ===================================================================
>> --- gcc/tree-ssa-strength-reduction.c (revision 0)
>> +++ gcc/tree-ssa-strength-reduction.c (revision 0)
>> @@ -0,0 +1,1611 @@
>> +/* Straight-line strength reduction.
>> + Copyright (C) 2012 Free Software Foundation, Inc.
>>
>> I know we have these 'tree-ssa-' names, but really this is gimple-ssa now ;)
>> So, please name it gimple-ssa-strength-reduction.c.
>
> Will do. Vive la revolution? ;)
>
>>
>> + /* Access to the statement for subsequent modification. Cached to
>> + save compile time. */
>> + gimple_stmt_iterator cand_gsi;
>>
>> this is a iterator for cand_stmt? Then caching it is no longer necessary
>> as the iterator is the stmt itself after recent infrastructure changes.
>
> Oh yeah, I remember seeing that go by. Nice. Will change.
>
>>
>> +/* Hash table embodying a mapping from statements to candidates. */
>> +static htab_t stmt_cand_map;
>> ...
>> +static hashval_t
>> +stmt_cand_hash (const void *p)
>> +{
>> + return htab_hash_pointer (((const_slsr_cand_t) p)->cand_stmt);
>> +}
>>
>> use a pointer-map instead.
>>
>> +/* Callback to produce a hash value for a candidate chain header. */
>> +
>> +static hashval_t
>> +base_cand_hash (const void *p)
>> +{
>> + tree ssa_name = ((const_cand_chain_t) p)->base_name;
>> +
>> + if (TREE_CODE (ssa_name) != SSA_NAME)
>> + return (hashval_t) 0;
>> +
>> + return (hashval_t) SSA_NAME_VERSION (ssa_name);
>> +}
>>
>> does it ever happen that ssa_name is not an SSA_NAME?
>
> Not in this patch, but when I introduce CAND_REF in a later patch it
> could happen since the base field of a CAND_REF is a MEM_REF. It's a
> safety valve in case of misuse. I'll think about this some more.
Hm. But then you produce a gigantic hash-collision for them ...
>> I'm not sure
>> the memory savings over simply using a fixed-size (num_ssa_names)
>> array indexed by SSA_NAME_VERSION pointing to the chain is worth
>> using a hashtable for this?
>
> That's reasonable. I'll do that.
>
>>
>> + node = (cand_chain_t) pool_alloc (chain_pool);
>> + node->base_name = c->base_name;
>>
>> If you never free pool entries it's more efficient to use an obstack.
>> alloc-pool
>> only pays off if you get freed item re-use.
>
> OK. I'll change both cand_pool and chain_pool to obstacks.
>
>>
>> + switch (gimple_assign_rhs_code (gs))
>> + {
>> + case MULT_EXPR:
>> + rhs2 = gimple_assign_rhs2 (gs);
>> +
>> + if (TREE_CODE (rhs2) == INTEGER_CST)
>> + return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed);
>> +
>> + if (TREE_CODE (rhs1) == INTEGER_CST)
>> + return multiply_by_cost (TREE_INT_CST_LOW (rhs1), lhs_mode, speed);
>>
>> In theory all commutative statements should have constant operands only
>> at rhs2 ...
>
> I'm glad I'm not the only one who thought that was the theory. ;) I
> wasn't sure, and I've seen violations of this come up in practice.
> Should I assert when that happens instead, and track down the offending
> optimizations?
Yes please. Good enough if you file bugreports for the cases you hit
and in the end simply drop the assert (but do not optimize) in the patch.
>>
>> Also you do not verify that the constant fits in a host-wide-int - but maybe
>> you do not care? Thus, I'd do
>>
>> if (host_integerp (rhs2, 0))
>> return multiply_by_cost (TREE_INT_CST_LOW (rhs2), lhs_mode, speed);
>>
>> or make multiply_by[_const?]_cost take a double-int instead. Likewise below
>> for add.
>
> Ok. Name change looks good also, I'll include that in the cost model
> changes.
>
>>
>> + case MODIFY_EXPR:
>> + /* Be suspicious of assigning costs to copies that may well go away.
>> */
>> + return 0;
>>
>> MODIFY_EXPR is never a gimple_assign_rhs_code. Simple copies have
>> a code of SSA_NAME for example. But as you assert if you get to an
>> unhandled code I wonder why you needed the above ...
>
> I'll remove this, and document that we are deliberately not touching
> copies (which was my original intent).
>
>>
>> +static slsr_cand_t
>> +base_cand_from_table (tree base_in)
>> +{
>> + slsr_cand mapping_key;
>> +
>> + gimple def = SSA_NAME_DEF_STMT (base_in);
>> + if (!def)
>> + return (slsr_cand_t) NULL;
>> +
>> + mapping_key.cand_stmt = def;
>> + return (slsr_cand_t) htab_find (stmt_cand_map, &mapping_key);
>>
>> isn't that reachable via the base-name -> chain mapping for base_in?
>
> Maybe so. I need to think about it some more (it got evicted from my
> mental cache). It could be I created this before adding the chains
> stuff and never cleaned up.
Ok.
>>
>> + if (TREE_CODE (rhs2) == SSA_NAME && operand_equal_p (rhs1, rhs2, 0))
>> + return;
>>
>> SSA_NAMEs can be compared by pointer equality, thus the above is
>> equivalent to
>>
>> if (TREE_CODE (rhs2) == SSA_NAME && rhs1 == rhs2)
>>
>> or even just
>>
>> if (rhs1 == rhs2)
>>
>> applies elsewhere as well.
>>
>
> Ok. I promise I'll remember this eventually...
;)
>> +/* Return TRUE iff PRODUCT is an integral multiple of FACTOR, and return
>> + the multiple in *MULTIPLE. Otherwise return FALSE and leave *MULTIPLE
>> + unchanged. */
>> +/* ??? - Should this be moved to double-int.c? */
>>
>> I think so.
>
> Ok, I'll include this in the preliminaries patch.
>
>>
>> +static bool
>> +double_int_multiple_of (double_int product, double_int factor,
>> + bool unsigned_p, double_int *multiple)
>> +{
>> + double_int remainder;
>> + double_int quotient = double_int_divmod (product, factor, unsigned_p,
>> + TRUNC_DIV_EXPR, &remainder);
>> + if (double_int_zero_p (remainder))
>> + {
>> + *multiple = quotient;
>> + return true;
>> + }
>> +
>> + return false;
>> +}
>>
>>
>> In general I find a lot of code of similar structure, factoring bits may make
>> it easier to read. Obvious candidates include:
>>
>> + /* Add the interpretation to the statement-candidate mapping. */
>> + slot = htab_find_slot (stmt_cand_map, c, INSERT);
>> + gcc_assert (!*slot);
>> + *slot = c;
>>
>> into a function add_cand_for_stmt ()
>>
>> + if (TREE_CODE (rhs1) != SSA_NAME || TREE_CODE (rhs2) != INTEGER_CST)
>> + return;
>>
>> doing some simple checks of this kind in the function walking all statements
>> and pass in operands and operation code.
>
> Sounds good. You should have seen it before I already did one round of
> factoring... :)
>
>>
>> +/* Return TRUE if GS is a statement that defines an SSA name from
>> + a NOP_EXPR and is legal for us to combine an add and multiply
>> + across. This is legitimate for casts from a signed type to
>> + a signed or unsigned type of the same or larger size. It is not
>> + legitimate to convert any unsigned type to a signed type, or
>> + to an unsigned type of a different size.
>> +
>> + The reasoning here is that signed integer overflow is undefined,
>> + so any program that was expecting overflow that no longer occurs
>> + is not correct. Unsigned integers, however, have wrap semantics,
>> + and it is reasonable for programs to assume an overflowing add
>> + will wrap.
>> +
>> + With -fwrapv, signed integers also have wrap semantics, so widening
>> + casts are not allowed then either. */
>> +
>> +static bool
>> +legal_cast_p (gimple gs)
>> +{
>> + tree lhs, rhs, lhs_type, rhs_type;
>> + unsigned lhs_size, rhs_size, lhs_uns, rhs_uns;
>> +
>> + if (!is_gimple_assign (gs)
>> + || TREE_CODE (gimple_assign_lhs (gs)) != SSA_NAME
>>
>> That's always true if the code is NOP_EXPR
>>
>> + || gimple_assign_rhs_code (gs) != NOP_EXPR)
>>
>> CONVERT_EXPR_CODE_P ()
>>
>> + return false;
>> +
>> + lhs = gimple_assign_lhs (gs);
>> + rhs = gimple_assign_rhs1 (gs);
>> +
>> + if (TREE_CODE (rhs) != SSA_NAME)
>> + return false;
>>
>> Likewise.
>>
>> + lhs_type = TREE_TYPE (SSA_NAME_VAR (lhs));
>> + rhs_type = TREE_TYPE (SSA_NAME_VAR (rhs));
>>
>> Get the type from the SSA_NAMEs themselves, no need to lookup
>> SSA_NAME_VAR. If it happens you ICE that way you are looking
>> at released SSA names ...
>>
>> + lhs_size = TYPE_PRECISION (lhs_type);
>> + rhs_size = TYPE_PRECISION (rhs_type);
>> + lhs_uns = TYPE_UNSIGNED (lhs_type);
>> + rhs_uns = TYPE_UNSIGNED (rhs_type);
>> +
>> + if (lhs_size < rhs_size
>> + || (rhs_uns && !lhs_uns)
>> + || (rhs_uns && lhs_uns && rhs_size != lhs_size)
>> + || (!rhs_uns && flag_wrapv && rhs_size != lhs_size))
>> + return false;
>>
>> So we basically check whether the lhs can represent all values of the rhs?
>> So ...
>>
>> if (lhs_size > rhs_size)
>> return true;
>>
>> is missing? Because you'd reject a widening of an unsigned int to an
>> unsigned long.
>
> That rejection is intentional. Assuming 4-byte int and 8-byte long, the
> unsigned int would wrap at 32 bits, so allowing the widening could
> change semantics. I've actually encountered problems like this all over
> libiberty and other GCC support libraries.
>
>>
>> As for your handling of flag_wrapv and the unsigned flag, please try
>> to use the TYPE_OVERFLOW_UNDEFINED/TYPE_OVERFLOW_WRAPS
>> type properties instead for the parts that care about overflow behavior
>> and not value-representation.
>>
>> + return true;
>> +}
>>
>
> OK, I'll look into those and let you know if I have questions.
>
>> I have a deja-vu for seeing similar code elsewhere (widening shift/multiply
>> support in the vector pattern recognizer maybe). While I may understand
>> what the textual description says including an example would explain
>> things better. For example
>>
>> "Return TRUE if GS is a statement that defines an SSA name from
>> a conversion and is legal for us to associate with an add or multiply.
>> Thus transform name = (T) name'; name * X; to name' * X"
>>
>> But I suppose I got the semantics wrong (and thus the example). Can you
>> clarify?
>
> OK, I'll try. This was definitely the hardest part to get right.
Of course ;) It's probably also important due to the all-present casts to
sizetype for address offset computations.
>>
>> Otherwise the pass looks quite good. It looks though that it will optimize
>> cross-basic-block strength-reduction opportunities, eventually causing
>> cross-BB register livetime increases? Or is this not an issue?
>
> This is true, and it's made somewhat worse by later optimizations. I
> attempted to reduce the scope where possible by choosing the most
> closely dominating candidate as a basis, thinking this would help limit
> lifetimes. However, later optimizations fold up the various adds so
> that all candidates in a chain end up directly depending on the
> "ultimate basis," resulting in one longer lifetime.
>
> It's something to keep an eye on. If it appears to be an issue, we may
> need to look into some sort of distance heuristic to limit which
> candidates can be used as a basis. It's not a big deal to keep a
> register live across a single conditional branch, but reusing a value
> from a control-equivalent block when there's an intervening loop is not
> so good. (On the other hand, it's just another opportunity for
> intelligent register spill design. :)
>
> As I indicated, though, I don't think this is a problem limited to this
> pass, but is a general issue many places in the middle end.
Of course. For this pass as you already have a 'cost' associated you
could simply pessimize candidates from a different basic-block slightly.
But I guess without a real testcase it would be just some magic ...
Richard.
>>
>> Thanks for working on this,
>> Richard.
>>
>
> And thanks very much for the review!
>
> Bill
>