On Sun, Jul 17, 2011 at 07:28:11PM +0100, Jason McIntyre wrote: > On Sun, Jul 17, 2011 at 11:43:03AM -0400, Ted Unangst wrote: > > I recently learned that our grep does not support the \<\> syntax for > > word boundaries, only the somewhat more difficult to use [[:<:]] format. > > It's fairly easy to convert one to the other however. > > > > if you do this, we will need to think carefully about how to document > it. grep(1) itself does not discuss REs, and instead points to > re_format(7). but you are proposing an extension to grep only. > > can i ask why you want to support this? it is a gnu grep thing or > something? \
It is quite a common usage for word boundarys in regular expressions. As a data point nvi (/usr/bin/vi on openbsd) does support this shorthand (but only refers to re_format(7) in the manpage, a quick dig in the source shows it has a conversion function that converts between ``vi'' REs and posix in ex_subst.c (re_conv())). Because of this i find myself botching grep regexes regularly because it doesn't. (So that's support for the idea from me, but I won't bitch too loudly if it is shot down). -0- > > jmc > > > Index: grep.c > > =================================================================== > > RCS file: /home/tedu/cvs/src/usr.bin/grep/grep.c,v > > retrieving revision 1.44 > > diff -u -p -r1.44 grep.c > > --- grep.c 8 Jul 2011 01:20:24 -0000 1.44 > > +++ grep.c 17 Jul 2011 15:38:58 -0000 > > @@ -163,6 +163,54 @@ struct option long_options[] = > > {NULL, no_argument, NULL, 0} > > }; > > > > +#ifndef SMALL > > +char * > > +fix_word_boundaries(char *pat) > > +{ > > + size_t newlen; > > + int bs, repl; > > + char c, *newpat, *p, *r; > > + > > + repl = 0; > > + p = pat; > > + while ((p = strstr(p, "\\<"))) { > > + p += 2; > > + repl++; > > + } > > + p = pat; > > + while ((p = strstr(p, "\\>"))) { > > + p += 2; > > + repl++; > > + } > > + if (!repl) > > + return pat; > > + newlen = strlen(pat) + 1 + repl * 5; > > + newpat = grep_malloc(newlen); > > + p = pat; > > + r = newpat; > > + bs = 0; > > + while ((c = *p++)) { > > + if (bs && (c == '<' || c == '>')) { > > + /* overwrite previous backspace */ > > + snprintf(r-1, 8, "[[:%c:]]", c); > > + r += 6; > > + bs = 0; > > + continue; > > + } else if (!bs && c == '\\') { > > + bs = 1; > > + } else { > > + bs = 0; > > + } > > + *r++ = c; > > + } > > + *r = 0; > > + if (newlen <= strlen(newpat)) > > + abort(); > > + free(pat); > > + return newpat; > > + > > +} > > +#endif > > > > static void > > add_pattern(char *pat, size_t len) > > @@ -198,6 +246,12 @@ add_pattern(char *pat, size_t len) > > pattern[patterns] = grep_malloc(len + 1); > > memcpy(pattern[patterns], pat, len); > > pattern[patterns][len] = '\0'; > > +#ifndef SMALL > > + if (!Fflag) { > > + pattern[patterns] = > > fix_word_boundaries(pattern[patterns]); > > + } > > +#endif > > + > > } > > ++patterns; > > } > -- HELP! MY TYPEWRITER IS BROKEN! -- E. E. CUMMINGS