------- Comment #9 from rsandifo at gcc dot gnu dot org  2006-08-03 21:06 
-------
Created an attachment (id=12010)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12010&action=view)
A hackish fix

I agree with Kaz that a blockage would be a correct fix here.
I'm just worried about the performance impact.

A simple hack for 4.1 is to model tls_tp_<mode> as a division
instruction.  The middle-end will then treat it as potentially
trapping and won't execute it speculatively.  The advantages
are that:

  (1) it won't hinder speculation or scheduling of unrelated
      instructions, and

  (2) it still allows the optimisers to remove redundant rdhwrs,
      just as they would for redundant divisions of the same values.

Although this is indeed hackish, we're kind-of relying on the same
infrasturcture to prevent speculative execution of FPU code on targets
where kernel emulation is required.  A cleaner fix would be to get
may_trap_p (and perhaps other functions) to call a target hook
for UNSPECs.

As it happens, we're not really getting (2) anyway.  The use of register
$3 is exposed right at the outset, which stops most optimisers from
touching it.  E.g. something as simple as:

    extern __thread int x;
    void foo (void) { x++; }

will execute rdhwr twice.  It's simple to fix this by using a pseudo
instead of (reg $3).  This shouldn't cause problems, as we're already
relying on the "=v" constraint to force the use of the right register.

If we do use a pseudo, the question is what to do about uses in loops.
E.g. if we have:

    extern __thread int x;
    void foo (int n, int *ptr)
    {
      while (n-- > 0)
        if (*ptr++ == 1)
          x++;
    }

should we allow the rdhwr to be hoisted?  (It will be if we keep
a non-trapping representation of tls_get_tp_<mode>, but won't be
if we treat it as trapping.)  I think the answer is that, in the
absence of profiling information, we simply don't know.  There are
going to be some cases where hoisting is exactly the right thing to
do and others where it's exactly the wrong thing.

This makes me wonder if we should compromise, and get the base pointer
lazily.  When we first find that a function needs the base pointer,
we can allocate a function-wide pseudo for it, and make sure that
the pseudo is zeroed at the beginning of the function.  We can then
emit:

        (set (pc) (if_then_else (ne (reg bp) 0) (label_ref foo) (pc))
        (set (reg bp) ...UNSPEC_TLS_GET_TP...)
    foo:

every time.  This in itself is easy to do, but we'd need some way
of telling the optimisers that the result of ...UNSPEC_TLS_GET_TP...
is nonzero, and that subsequent (ne (reg bp) 0) branches will
always be taken.

Random musing, sorry.

Richard


-- 

rsandifo at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |rsandifo at gcc dot gnu dot
                   |dot org                     |org
             Status|NEW                         |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126

Reply via email to