On Mon, 27 Mar 2017 07:15:18 -0700
Matthew Wilcox <[email protected]> wrote:
> On Mon, Mar 27, 2017 at 02:39:47PM +0200, Jesper Dangaard Brouer wrote:
> >
> > +static __always_inline int in_irq_or_nmi(void)
> > +{
> > + return in_irq() || in_nmi();
> > +// XXX: hoping compiler will optimize this (todo verify) into:
> > +// #define in_irq_or_nmi() (preempt_count() & (HARDIRQ_MASK | NMI_MASK))
> > +
> > + /* compiler was smart enough to only read __preempt_count once
> > + * but added two branches
> > +asm code:
> > + │ mov __preempt_count,%eax
> > + │ test $0xf0000,%eax // HARDIRQ_MASK: 0x000f0000
> > + │ ┌──jne 2a
> > + │ │ test $0x100000,%eax // NMI_MASK: 0x00100000
> > + │ │↓ je 3f
> > + │ 2a:└─→mov %rbx,%rdi
> > +
> > + */
> > +}
>
> To be fair, you told the compiler to do that with your use of fancy-pants ||
> instead of optimisable |. Try this instead:
Thanks you! -- good point! :-)
> static __always_inline int in_irq_or_nmi(void)
> {
> return in_irq() | in_nmi();
> }
>
> 0000000000001770 <test_fn>:
> 1770: 65 8b 05 00 00 00 00 mov %gs:0x0(%rip),%eax #
> 1777 <test_fn+0x7>
> 1773: R_X86_64_PC32 __preempt_count-0x4
> #define in_nmi() (preempt_count() & NMI_MASK)
> #define in_task() (!(preempt_count() & \
> (NMI_MASK | HARDIRQ_MASK |
> SOFTIRQ_OFFSET)))
> static __always_inline int in_irq_or_nmi(void)
> {
> return in_irq() | in_nmi();
> 1777: 25 00 00 1f 00 and $0x1f0000,%eax
> }
> 177c: c3 retq
> 177d: 0f 1f 00 nopl (%rax)
And I also verified it worked:
0.63 │ mov __preempt_count,%eax
│ free_hot_cold_page():
1.25 │ test $0x1f0000,%eax
│ ↓ jne 1e4
And this simplification also made the compiler change this into a
unlikely branch, which is a micro-optimization (that I will leave up to
the compiler).
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer