Re: RFC: Petition Intel/AMD to add POPF_IF insn

Christian König Wed, 17 Aug 2016 13:08:51 -0700

Well first of all you actually CCed AMDs graphics developers. So Alexand I can't say much about CPU optimizations.


Additional to that a comment below.


Am 17.08.2016 um 19:20 schrieb Denys Vlasenko:

Last year, a proposal was floated to avoid costly POPF.
In particular, each spin_unlock_irqrestore() does it,
a rather common operation.

https://lkml.org/lkml/2015/4/21/290
    [RFC PATCH] x86/asm/irq: Don't use POPF but STI

* Andy Lutomirski <[email protected]> wrote:

> Another different approach would be to formally state that
> pv_irq_ops.save_fl() needs to return all the flags, which would
> make local_irq_save() safe to use in this circumstance, but that
> makes a hotpath longer for the sake of a single boot time check.

...which reminds me:

Why does native_restore_fl restore anything other than IF?  A branch
and sti should be considerably faster than popf.



Ingo agreed:
====
Yes, this has come up in the past, something like the patch below?

Totally untested and not signed off yet: because we'd first have to
make sure (via irq flags debugging) that it's not used in reverse, to
re-disable interrupts:
    local_irq_save(flags);
    local_irq_enable();
    ...
    local_irq_restore(flags); /* effective local_irq_disable() */
I don't think we have many (any?) such patterns left, but it has to be
checked first. If we have such cases then we'll have to use different
primitives there.
====

Linus replied:
=====
"popf" is fast for the "no changes to IF" case, and is a smaller
instruction anyway.
====


This basically shot down the proposal.

But in my measurements POPF is not fast even in the case where restored
flags are not changes at all:

                mov     $200*1000*1000, %eax
                pushf
                pop     %rbx
                .balign 64
loop:           push    %rbx
                popf
                dec     %eax
                jnz     loop

# perf stat -r20 ./popf_1g

4,929,012,093 cycles # 3.412GHz ( +- 0.02% )835,721,371 instructions # 0.17 insn percycle ( +- 0.02% )1.446185359 seconds timeelapsed ( +- 0.46% )


If I replace POPF with a pop into an unused register, I get this:


You are comparing apples and bananas here.

The microcode path in the CPUs will probably through away the read ifyou don't actually use the register value.


Regards,
Christian.

loop:           push    %rbx
                pop     %rcx
                dec     %eax
                jnz     loop
209,247,645 cycles # 3.209GHz ( +- 0.11% )801,898,016 instructions # 3.83 insn percycle ( +- 0.00% )0.066210725 seconds timeelapsed ( +- 0.59% )
IOW, POPF takes at least 6 cycles.
Linus does have a point that a "test+branch+STI" may end up not aclear win because of the branch.
But the need to restore IF flag exists, it is necessary not only forLinux, but for any OS
running on x86: they all have some sort of spinlock.
The addition of a POPF instruction variant which looks only at IF bitand changes
only that bit in EFLAGS may be a good idea, for all OSes.

I propose that we ask Intel / AMD to do that.
Maybe by the "ignored prefix" trick which was used when LZCNT insn wasintroduced
as REPZ-prefixed BSR?
Currently, REPZ POPF (f3 9d) insn does execute. Redefine this opcode as
POPF_IF. Then the same kernel will work on old and new CPUs.

CC'ing some @intel and @amd emails...

Re: RFC: Petition Intel/AMD to add POPF_IF insn

Reply via email to