On 21/05/18 11:00, Bruno Haible wrote: > Hi Pádraig, > >> $ yes áááááááááááááááááááá | head -n100000 > mbc.txt >> $ yes 12345678901234567890 | head -n100000 > num.txt >> >> ===== Before ==== >> >> $ time src/wc -m < mbc.txt >> 2100000 >> real 0m0.186s >> >> $ time src/wc -m < num.txt >> 2100000 >> real 0m0.056s > > Here's my take on improving this. I'm attaching draft patches that have > this effect on the timings: > > * On glibc: > > num mbc > Before 0.056 0.152 > After 0.057 0.089 > ------- > Speedup 1.0 1.7 > factor > > * On macOS 10.13: > > num mbc > Before 0.153 0.229 > After 0.042 0.112 > ------- > Speedup 3.6 2.0 > factor > > Basically, the two problems that the profiling found were: > > * It is pointless to call locale_charset repeatedly, because the > locale won't change while 'wc' is running. > > * glibc has a slow mbrtowc() implementation for UTF-8 locales. > > Both problems can be addressed with the "abstract factory" design patterns. > Namely, instead of using the generic 'wcwidth'/'mbrtowc' function each > time, let the program produce an optimized 'wcwidth'/'mbrtowc' function > [pointer] once, and then call this optimized function pointer repeatedly > for each character. > > While at it, let me also do the same for the initialization of an mbstate_t, > because on macOS the mbstate_t is 128 bytes long but only the first 12 bytes > actually matter. > > This factory of function pointers side-steps the portability problems of > 'locale_t'. > > Notes: > - When you use these new gnulib modules, you are programming against an API > that is very similar to POSIX, but not exactly POSIX. > - The platform-specific #ifs have to be adjusted, by the help of configure > tests. > - mbrtowc-factory needs a unit test (for which I have a draft).
Wow thanks for doing all that. It's also worth noting that using wcwidth-factory etc. will always use the replacement routines on utf8, which would be a disadvantage on code size and divergence. Another disadvantage is the change in API which is a bit awkward, and would need tweaking elsewhere in coreutils to take advantage of the speedup. The first locale_charset() issue at least could be dealt with internally to gnulib with a simple gnulib level config. The attached is a proposed solution to the charset issue, that would just require depending on the wchar-single gnulib module to indicate locales don't change across calls. Now if utf8_mbrtowc() is about 4.5 and 2.3 times faster than mbrtowc() on glibc and macOS respectively, it's probably worthwhile to replace unconditionally. Though it would be cool to do that behind the wchar-single setting which could setup the appropriate calls/pointers internally to rpl_mbrtowc()? As for the reduced mbstate settings, it might not be worth complicating the interface for this perf gain? cheers, Pádraig.
From 3081860737968ae9d461c1dc5ab2e31f94918d35 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com> Date: Sun, 20 May 2018 22:11:12 -0700 Subject: [PATCH] wchar-single: a new module to enable optimizations in wchar replacements * lib/mbrtowc.c (mbrtowc): Only check locale_charset() once if GNULIB_WCHAR_SINGLE is enabled. * lib/wcwidth.c (wcwidth): Likewise. --- lib/mbrtowc.c | 14 ++++++++++++-- lib/wcwidth.c | 15 +++++++++++++-- modules/wchar-single | 21 +++++++++++++++++++++ 3 files changed, 46 insertions(+), 4 deletions(-) create mode 100644 modules/wchar-single diff --git a/lib/mbrtowc.c b/lib/mbrtowc.c index 22ac2c4..dc3597f 100644 --- a/lib/mbrtowc.c +++ b/lib/mbrtowc.c @@ -138,9 +138,19 @@ mbrtowc (wchar_t *pwc, const char *s, size_t n, mbstate_t *ps) goto invalid; /* Here MB_CUR_MAX > 1 and 0 < m < 4. */ { - const char *encoding = locale_charset (); + static int utf8_charset = -1; + static const char *encoding; - if (STREQ_OPT (encoding, "UTF-8", 'U', 'T', 'F', '-', '8', 0, 0, 0, 0)) +# if GNULIB_WCHAR_SINGLE + if (utf8_charset == -1) +# endif + { + encoding = locale_charset (); + utf8_charset = STREQ_OPT (encoding, + "UTF-8", 'U', 'T', 'F', '-', '8', 0, 0, 0 ,0); + } + + if (utf8_charset) { /* Cf. unistr/u8-mblen.c. */ unsigned char c = (unsigned char) p[0]; diff --git a/lib/wcwidth.c b/lib/wcwidth.c index 5fb090e..d03a50f 100644 --- a/lib/wcwidth.c +++ b/lib/wcwidth.c @@ -30,9 +30,20 @@ int wcwidth (wchar_t wc) #undef wcwidth { + static int utf8_charset = -1; + static const char *encoding; + +#if GNULIB_WCHAR_SINGLE + if (utf8_charset == -1) +#endif + { + encoding = locale_charset (); + utf8_charset = STREQ_OPT (encoding, + "UTF-8", 'U', 'T', 'F', '-', '8', 0, 0, 0 ,0); + } + /* In UTF-8 locales, use a Unicode aware width function. */ - const char *encoding = locale_charset (); - if (STREQ_OPT (encoding, "UTF-8", 'U', 'T', 'F', '-', '8', 0, 0, 0 ,0)) + if (utf8_charset) { /* We assume that in a UTF-8 locale, a wide character is the same as a Unicode character. */ diff --git a/modules/wchar-single b/modules/wchar-single new file mode 100644 index 0000000..e047de0 --- /dev/null +++ b/modules/wchar-single @@ -0,0 +1,21 @@ +Description: +Enable more efficient wchar replacements, where we know +the locale charset will not change between calls. + +Files: + +Depends-on: +wchar + +configure.ac: +gl_MODULE_INDICATOR([wchar-single]) + +Makefile.am: + +Include: + +License: +LGPLv2+ + +Maintainer: +all -- 2.9.3