Oh well. If something is supposedly simple... I got three OKs on this code, including one senior developer calling it simple and saying it shows how something like this should be done.
Fortunately, Patrick Keshishian privately mailed me that he suspected a regression. Even though the regression isn't in the loop he pointed at, exactly what he said is broken in the *other* loop. Here is how, with the first patch i sent: $ echo "1 t\n2\tt" | ./obj/uniq -f 1 1 t $ The second loop advances *str even for the first blank after the last field that is to be skipped, so if the first significant blank differs, output is wrong. (The first loop also skips the first non- blank in each field, but that's not a problem because the second loop is intended to do that anyway. Besides, the first loop called mbtowc(3) on the null string, but that wasn't a problem because it returns a length of 0.) So, here is a version that is both correcter and moar short. Coding is hard, let's go shopping. Ingo Index: uniq.1 =================================================================== RCS file: /cvs/src/usr.bin/uniq/uniq.1,v retrieving revision 1.17 diff -u -p -r1.17 uniq.1 --- uniq.1 3 Sep 2010 11:09:29 -0000 1.17 +++ uniq.1 11 Dec 2015 11:10:47 -0000 @@ -114,6 +114,14 @@ A file name of .Ql - denotes the standard input or the standard output .Pq depending on its position on the command line . +.Sh ENVIRONMENT +.Bl -tag -width LC_CTYPE +.It Ev LC_CTYPE +The character set +.Xr locale 1 . +Determines which groups of bytes are treated as characters +and which characters are considered blank. +.El .Sh EXIT STATUS .Ex -std uniq .Sh SEE ALSO Index: uniq.c =================================================================== RCS file: /cvs/src/usr.bin/uniq/uniq.c,v retrieving revision 1.23 diff -u -p -r1.23 uniq.c --- uniq.c 2 Nov 2015 20:25:42 -0000 1.23 +++ uniq.c 11 Dec 2015 11:10:47 -0000 @@ -37,10 +37,13 @@ #include <err.h> #include <errno.h> #include <limits.h> +#include <locale.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> +#include <wchar.h> +#include <wctype.h> #define MAXLINELEN (8 * 1024) @@ -61,6 +64,8 @@ main(int argc, char *argv[]) int ch; char *prevline, *thisline; + setlocale(LC_CTYPE, ""); + if (pledge("stdio rpath wpath cpath", NULL) == -1) err(1, "pledge"); @@ -176,16 +181,32 @@ show(FILE *ofp, char *str) char * skip(char *str) { + wchar_t wc; int nchars, nfields; + int len; + int field_started; for (nfields = numfields; nfields && *str; nfields--) { - while (isblank((unsigned char)*str)) - str++; - while (*str && !isblank((unsigned char)*str)) - str++; + /* Skip one field, including preceding blanks. */ + for (field_started = 0; *str != '\0'; str += len) { + if ((len = mbtowc(&wc, str, MB_CUR_MAX)) == -1) { + (void)mbtowc(NULL, NULL, MB_CUR_MAX); + wc = L'?'; + len = 1; + } + if (iswblank(wc)) { + if (field_started) + break; + } else + field_started = 1; + } } - for (nchars = numchars; nchars-- && *str && *str != '\n'; ++str) - ; + + /* Skip some additional characters. */ + for (nchars = numchars; nchars-- && *str != '\0'; str += len) + if ((len = mblen(str, MB_CUR_MAX)) == -1) + len = 1; + return (str); }