On 6/3/06, Steven Bosscher <[EMAIL PROTECTED]> wrote:
> For floating point data, the latter is the only interesting case because
float loads only
> access the L2. Thus using "lfetch" for floating point arrays will
unnecessarily wipe out
> the contents of L1. (gcc 3.2.3 only seems to generate "lfetch", which is why
I ask...)
You could experiment with this for ia64 by hacking issue_prefetch_ref
in tree-ssa-loop-prefetch.c to issue a prefetch to L2 for floating
point types.
E.g. something like this, which is (needless to say) untested but
something you could play with.
Gr.
Steven
Index: tree-ssa-loop-prefetch.c
===================================================================
--- tree-ssa-loop-prefetch.c (revision 114315)
+++ tree-ssa-loop-prefetch.c (working copy)
@@ -816,7 +816,7 @@ static void
issue_prefetch_ref (struct mem_ref *ref, unsigned unroll_factor, unsigned ahead)
{
HOST_WIDE_INT delta;
- tree addr, addr_base, prefetch, params, write_p;
+ tree addr, addr_base, prefetch, params, write_p, locality;
block_stmt_iterator bsi;
unsigned n_prefetches, ap;
@@ -838,11 +838,21 @@ issue_prefetch_ref (struct mem_ref *ref,
addr_base, build_int_cst (ptr_type_node, delta));
addr = force_gimple_operand_bsi (&bsi, unshare_expr (addr), true, NULL);
- /* Create the prefetch instruction. */
+ /* Create the prefetch instruction. Do this by building a call to
+ `void __builtin_prefetch (const void *ADDR, int RW, int LOCALITY)'.
+
+ ??? The `locality' parameter is a shameless, untested hack to
+ force lfetch.nt1 -- hopefully. */
write_p = ref->write_p ? integer_one_node : integer_zero_node;
- params = tree_cons (NULL_TREE, addr,
- tree_cons (NULL_TREE, write_p, NULL_TREE));
-
+ locality = FLOAT_TYPE_P (mem_ref->base)
+ ? integer_one_node : integer_zero_node;
+ params = tree_cons (NULL_TREE,
+ addr,
+ tree_cons (NULL_TREE,
+ write_p,
+ tree_cons (NULL_TREE,
+ locality,
+ NULL_TREE)));
prefetch = build_function_call_expr (built_in_decls[BUILT_IN_PREFETCH],
params);
bsi_insert_before (&bsi, prefetch, BSI_SAME_STMT);