------- Comment #9 from rsandifo at gcc dot gnu dot org 2006-08-03 21:06 ------- Created an attachment (id=12010) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=12010&action=view) A hackish fix
I agree with Kaz that a blockage would be a correct fix here. I'm just worried about the performance impact. A simple hack for 4.1 is to model tls_tp_<mode> as a division instruction. The middle-end will then treat it as potentially trapping and won't execute it speculatively. The advantages are that: (1) it won't hinder speculation or scheduling of unrelated instructions, and (2) it still allows the optimisers to remove redundant rdhwrs, just as they would for redundant divisions of the same values. Although this is indeed hackish, we're kind-of relying on the same infrasturcture to prevent speculative execution of FPU code on targets where kernel emulation is required. A cleaner fix would be to get may_trap_p (and perhaps other functions) to call a target hook for UNSPECs. As it happens, we're not really getting (2) anyway. The use of register $3 is exposed right at the outset, which stops most optimisers from touching it. E.g. something as simple as: extern __thread int x; void foo (void) { x++; } will execute rdhwr twice. It's simple to fix this by using a pseudo instead of (reg $3). This shouldn't cause problems, as we're already relying on the "=v" constraint to force the use of the right register. If we do use a pseudo, the question is what to do about uses in loops. E.g. if we have: extern __thread int x; void foo (int n, int *ptr) { while (n-- > 0) if (*ptr++ == 1) x++; } should we allow the rdhwr to be hoisted? (It will be if we keep a non-trapping representation of tls_get_tp_<mode>, but won't be if we treat it as trapping.) I think the answer is that, in the absence of profiling information, we simply don't know. There are going to be some cases where hoisting is exactly the right thing to do and others where it's exactly the wrong thing. This makes me wonder if we should compromise, and get the base pointer lazily. When we first find that a function needs the base pointer, we can allocate a function-wide pseudo for it, and make sure that the pseudo is zeroed at the beginning of the function. We can then emit: (set (pc) (if_then_else (ne (reg bp) 0) (label_ref foo) (pc)) (set (reg bp) ...UNSPEC_TLS_GET_TP...) foo: every time. This in itself is easy to do, but we'd need some way of telling the optimisers that the result of ...UNSPEC_TLS_GET_TP... is nonzero, and that subsequent (ne (reg bp) 0) branches will always be taken. Random musing, sorry. Richard -- rsandifo at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|unassigned at gcc dot gnu |rsandifo at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28126