https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103755

            Bug ID: 103755
           Summary: {has,use}_facet() and iostream constructor performance
           Product: gcc
           Version: 11.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: dprokoptsev at gmail dot com
  Target Milestone: ---

Created attachment 52021
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52021&action=edit
Proposed implementation

Current implementation of basic_ios::init(), through invocation of
_M_cache_locale(), calls has_facet() and use_facet() on three facet types
(ctype, num_get, num_put). As each one of them does a dynamic_cast under the
hood, a mere construction of an iostream results in six (6) dynamic_casts,
resulting in poor performance in cases of short-lived iostreams.

However, one can note that these {has,use}_facet() only reference facets which
present in classic() locale and therefore are guaranteed to be there in any
locale (with the matching type) and therefore don't require any bounds checks
or dynamic_casts in the locale impl object. So we can do some TMP to establish
a subset of facets which are always present in any locale, and instruct
has_facet() and use_facet() to bypass any checks for those.

(It may also be worth it to add __try_use_facet() function, providing more
efficient version of (has_facet<T>(loc) ? &use_facet<T>(loc) : 0)).

I've drafted a patch implementing changes described above. On my hardware
(i9-9900k @ 5 GHz) it reduces iostream construction time from 100 ns down to 36
ns).

Reply via email to