Pádraig Brady wrote: > system wcwidth is not implicated here. > The slow down was attributed to locale_charset(). > At least this should be improved in the next coreutils release with: > https://git.sv.gnu.org/gitweb/?p=gnulib.git;a=commitdiff;h=214bf85
I was now able to get a profiling on macOS (with valgrind [note: the latest release does not work yet, the latest MacPorts build neither, but the current valgrind git repo works more or less]). The profiler's output is: =============================================================================== -------------------------------------------------------------------------------- Profile data file 'callgrind.out.55986' (creator: callgrind-3.14.0.GIT) -------------------------------------------------------------------------------- I1 cache: D1 cache: LL cache: Timerange: Basic block 0 - 367572452 Trigger: Program termination Profiled target: src/wc -m (PID 55986, part 1) Events recorded: Ir Events shown: Ir Event sort order: Ir Thresholds: 99 Include dirs: User annotated: Auto-annotation: off -------------------------------------------------------------------------------- Ir -------------------------------------------------------------------------------- 1,517,818,871 PROGRAM TOTALS -------------------------------------------------------------------------------- Ir file:function -------------------------------------------------------------------------------- 264,150,390 ???:_platform_strcmp [/usr/lib/system/libsystem_platform.dylib] 173,606,200 ???:_UTF8_mbrtowc [/usr/lib/system/libsystem_c.dylib] 132,210,556 ../src/wc.c:wc [src/wc] 124,000,000 ???:nl_langinfo_l [/usr/lib/system/libsystem_c.dylib] 106,000,000 ???:querylocale [/usr/lib/system/libsystem_c.dylib] 88,000,000 ???:__maskrune [/usr/lib/system/libsystem_c.dylib] 78,000,000 ../lib/localcharset.c:locale_charset [src/wc] 71,403,400 ???:mbrtowc [/usr/lib/system/libsystem_c.dylib] 66,000,000 ../lib/uniwidth/width.c:uc_width [src/wc] 50,003,752 ???:_platform_strchr$VARIANT$Base [/usr/lib/system/libsystem_platform.dylib] 46,200,000 ???:mbsinit [/usr/lib/system/libsystem_c.dylib] 38,000,038 ???:uselocale [/usr/lib/system/libsystem_c.dylib] 30,000,000 ???:nl_langinfo [/usr/lib/system/libsystem_c.dylib] 26,000,000 ../lib/wcwidth.c:rpl_wcwidth [src/wc] 24,400,862 ???:pthread_getspecific [/usr/lib/system/libsystem_pthread.dylib] 24,000,000 ???:___mb_cur_max_l [/usr/lib/system/libsystem_c.dylib] 22,000,000 ../lib/streq.h:rpl_wcwidth 21,000,000 ???:_UTF8_mbsinit [/usr/lib/system/libsystem_c.dylib] 20,000,000 /usr/include/_ctype.h:wc 18,212,211 ???:__vsnprintf_chk [/usr/lib/system/libsystem_c.dylib] 18,000,000 ../lib/nl_langinfo.c:rpl_nl_langinfo [src/wc] 16,200,593 ???:rpl_wcwidth [src/wc] 12,600,000 ../lib/mbchar.h:wc 12,001,176 ???:os_unfair_lock_unlock [/usr/lib/dyld] 10,000,055 ???:os_unfair_lock_lock [/usr/lib/dyld] 8,000,000 ../lib/streq.h:uc_width 4,596,013 ???:ImageLoader::trieWalk(unsigned char const*, unsigned char const*, char const*) [/usr/lib/dyld] =============================================================================== This does not make perfect sense (no iswprint nor iswspace calls visible, and ../lib/streq.h does not contain functions). But it still allows to dissect the time: mbrtowc: 173,606,200 ???:_UTF8_mbrtowc [/usr/lib/system/libsystem_c.dylib] 88,000,000 ???:__maskrune [/usr/lib/system/libsystem_c.dylib] 71,403,400 ???:mbrtowc [/usr/lib/system/libsystem_c.dylib] 46,200,000 ???:mbsinit [/usr/lib/system/libsystem_c.dylib] 21,000,000 ???:_UTF8_mbsinit [/usr/lib/system/libsystem_c.dylib] ----------- 400,209,600 = 26% rpl_wcwidth: 26,000,000 ../lib/wcwidth.c:rpl_wcwidth [src/wc] locale_charset: 124,000,000 ???:nl_langinfo_l [/usr/lib/system/libsystem_c.dylib] 106,000,000 ???:querylocale [/usr/lib/system/libsystem_c.dylib] 78,000,000 ../lib/localcharset.c:locale_charset [src/wc] 38,000,038 ???:uselocale [/usr/lib/system/libsystem_c.dylib] 30,000,000 ???:nl_langinfo [/usr/lib/system/libsystem_c.dylib] 24,400,862 ???:pthread_getspecific [/usr/lib/system/libsystem_pthread.dylib] 24,000,000 ???:___mb_cur_max_l [/usr/lib/system/libsystem_c.dylib] 18,000,000 ../lib/nl_langinfo.c:rpl_nl_langinfo [src/wc] ----------- 442,400,900 = 29% uc_width: 66,000,000 ../lib/uniwidth/width.c:uc_width [src/wc] ----------- 66,000,000 = 4% So it is spending 26% in mbrtowc calls (unlike > 50% with glibc). And it is spending at least 29% in locale_charset, mostly due to nl_langinfo_l and its associates. I'm saying "at least" because I don't know where to count the many _platform_strcmp calls. And this is _after_ all the recent locale_charset optimizations. Bruno