https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103755
Bug ID: 103755 Summary: {has,use}_facet() and iostream constructor performance Product: gcc Version: 11.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: dprokoptsev at gmail dot com Target Milestone: --- Created attachment 52021 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52021&action=edit Proposed implementation Current implementation of basic_ios::init(), through invocation of _M_cache_locale(), calls has_facet() and use_facet() on three facet types (ctype, num_get, num_put). As each one of them does a dynamic_cast under the hood, a mere construction of an iostream results in six (6) dynamic_casts, resulting in poor performance in cases of short-lived iostreams. However, one can note that these {has,use}_facet() only reference facets which present in classic() locale and therefore are guaranteed to be there in any locale (with the matching type) and therefore don't require any bounds checks or dynamic_casts in the locale impl object. So we can do some TMP to establish a subset of facets which are always present in any locale, and instruct has_facet() and use_facet() to bypass any checks for those. (It may also be worth it to add __try_use_facet() function, providing more efficient version of (has_facet<T>(loc) ? &use_facet<T>(loc) : 0)). I've drafted a patch implementing changes described above. On my hardware (i9-9900k @ 5 GHz) it reduces iostream construction time from 100 ns down to 36 ns).