Modern compilers at -O2 make good inlining decisions for small static inline functions; forced inlining via __rte_always_inline should be reserved for cases where it is required for correctness or for documented measured performance reasons.
Document the policy in the coding style guide and add a checkpatches.sh entry that flags when new uses of the attribute are introduced. Checkpatches is not an absolute blocker to acceptance, only an indicator that more review is needed. Add additional comments about use of __rte_always_inline, __rte_noinline, __rte_hot, and __rte_cold to the rte_common.h to aid developers. Signed-off-by: Stephen Hemminger <[email protected]> --- devtools/checkpatches.sh | 8 +++++ doc/guides/contributing/coding_style.rst | 26 ++++++++++++++++- lib/eal/include/rte_common.h | 37 ++++++++++++++++++++++-- 3 files changed, 68 insertions(+), 3 deletions(-) diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh index f5dd77443f..2a3d364178 100755 --- a/devtools/checkpatches.sh +++ b/devtools/checkpatches.sh @@ -137,6 +137,14 @@ check_forbidden_additions() { # <patch> -f $(dirname $(readlink -f $0))/check-forbidden-tokens.awk \ "$1" || res=1 + # forbid new use of __rte_always_inline + awk -v FOLDERS="lib drivers app examples" \ + -v EXPRESSIONS='\\<__rte_always_inline\\>' \ + -v RET_ON_FAIL=1 \ + -v MESSAGE='Adding __rte_always_inline; prefer plain inline' \ + -f $(dirname $(readlink -f $0))/check-forbidden-tokens.awk \ + "$1" || res=1 + # refrain from using compiler __rte_atomic_thread_fence() # It should be avoided on x86 for SMP case. awk -v FOLDERS="lib drivers app examples" \ diff --git a/doc/guides/contributing/coding_style.rst b/doc/guides/contributing/coding_style.rst index 243a3c2959..8f9e77d1c1 100644 --- a/doc/guides/contributing/coding_style.rst +++ b/doc/guides/contributing/coding_style.rst @@ -747,11 +747,35 @@ Static Variables and Functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * All functions and variables that are local to a file must be declared as ``static`` because it can often help the compiler to do some optimizations (such as, inlining the code). -* Functions that should be inlined should to be declared as ``static inline`` and can be defined in a .c or a .h file. +* Functions that should be inlined should be declared as ``static inline`` and can be defined in a .c or a .h file. .. note:: Static functions defined in a header file must be declared as ``static inline`` in order to prevent compiler warnings about the function being unused. +Use of ``__rte_always_inline`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The ``__rte_always_inline`` attribute forces the compiler to inline a function regardless of its size or call-graph heuristics. +Prefer plain ``inline`` (or no annotation at all for static functions) and let the compiler decide. +Modern compilers at ``-O2`` make good inlining decisions for small ``static inline`` functions in headers, +and forced inlining can hurt performance by inflating function bodies, increasing register pressure, and overriding profile-guided optimization. + +``__rte_always_inline`` should only be used when one of the following applies: + +* The function contains ``__rte_constant()`` checks that gate a constant-folded fast path, + and the optimization is lost if the function is not inlined into the caller. + Examples include byte-order helpers and length-dispatched copy/compare routines. + +* The function wraps inline assembly or a compiler intrinsic whose correctness depends on being inlined into the caller's register context (for example, intrinsics requiring a compile-time constant argument). + +* Measurement on a representative workload shows that the annotation is required to retain performance, and the reason is documented in the commit message that introduces it. + +Each use must be justified at the point it is introduced. Adding ``__rte_always_inline`` because nearby code uses it is not a justification; +if the constant or intrinsic that requires inlining is several call levels up the call chain, +restructure the code rather than annotating the entire chain. + +The complementary attribute ``__rte_noinline`` is useful for explicitly marking cold paths (error handling, initialization, slow-path fallbacks) where outlining the function can reduce instruction-cache pressure on the hot path. + Const Attribute ~~~~~~~~~~~~~~~ diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 71415346cf..e358be7fcf 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -482,7 +482,22 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void) #endif /** - * Force a function to be inlined + * Force a function to be inlined, regardless of the compiler's size and + * call-graph heuristics. + * + * Prefer plain @c inline (or no annotation on a static function) and let the compiler decide. + * Modern compilers at -O2 inline small static functions well, + * and forcing it can hurt by inflating call sites, raising register pressure, + * and overriding profile-guided optimization. + * + * Reserve this attribute for cases where inlining is required for + * correctness, or for a documented and measured performance reason, e.g. + * - a constant-folded fast path gated by @c __rte_constant() that is lost + * unless the function is inlined into the caller; + * - a wrapper around inline asm or an intrinsic that needs a + * compile-time-constant argument from the caller's context. + * + * See the DPDK coding style guide for the full policy. */ #ifdef RTE_TOOLCHAIN_MSVC #define __rte_always_inline __forceinline @@ -491,7 +506,11 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void) #endif /** - * Force a function to be noinlined + * Force a function not to be inlined. + * + * Useful for explicitly outlining cold paths such as error handling, + * initialization, slow-path fallbacks, so they do not bloat the hot path + * or add to its instruction-cache footprint. */ #ifdef RTE_TOOLCHAIN_MSVC #define __rte_noinline __declspec(noinline) @@ -501,6 +520,12 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void) /** * Hint function in the hot path + * + * The compiler may optimize the function more aggressively, treat calls + * to it as likely for branch prediction, and group it with other hot + * functions to improve instruction-cache locality. This affects code + * placement and prediction, not inlining; combine with an inline + * annotation if both are wanted. */ #ifdef RTE_TOOLCHAIN_MSVC #define __rte_hot @@ -510,6 +535,14 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void) /** * Hint function in the cold path + * + * The compiler optimizes the function for size rather than speed, + * marks branches that reach it as unlikely, and may move it to a separate + * section to keep it off the hot path and reduce instruction-cache + * pressure there. + * + * Suitable for error handling, logging, and setup/teardown code. + * Functions marked @c __rte_noreturn are already treated as cold. */ #ifdef RTE_TOOLCHAIN_MSVC #define __rte_cold -- 2.53.0

