-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=311378 complains:
$ m4 > samp1 <<EOF > changequote(«,»)dnl > define(a,b)dnl > «a» > EOF «b» And indeed, on platforms where char is signed, we had some sign extension bugs, since we were comparing getc()'s unsigned chars vs a char*. With this patch, m4 should now be 8-bit clean; I went the path of always using unsigned char in the parser. Unfortunately, I don't know any good way to put an example of 8-bit characters in the documentation. Info will faithfully reproduce literal characters (but it may render horribly depending on your local), while TeX ignores 8-bit characters and needs a command for a glyph. So for now, I left the examples in an @ignore block, so at least the testsuite will ensure we don't regress. 2006-07-31 Eric Blake <[EMAIL PROTECTED]> * src/input.c (peek_input, next_char, match_input): Be eight-bit clean; fixes debian bug 311378. * doc/m4.texinfo (Syntax): Describe eight-bit handling. (Changequote, Changecom): Add examples to test this. * NEWS: Document this fix. * THANKS: Update. Reported by Steven Augart. - -- Life is short - so eat dessert first! Eric Blake [EMAIL PROTECTED] -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.1 (Cygwin) Comment: Public key at home.comcast.net/~ericblake/eblake.gpg Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEzsIb84KuGfSFAYARAjV6AKC4F7Y2rpNKr8LzY8Murz2fnAy01gCfY4pv adcorwShehrSo21KhbyPdvg= =dSGW -----END PGP SIGNATURE-----
Index: NEWS =================================================================== RCS file: /sources/m4/m4/NEWS,v retrieving revision 1.1.1.1.2.45 diff -u -p -r1.1.1.1.2.45 NEWS --- NEWS 30 Jul 2006 03:18:12 -0000 1.1.1.1.2.45 +++ NEWS 1 Aug 2006 02:44:33 -0000 @@ -20,6 +20,7 @@ Version 1.4.6 - ?? 2006, by ?? (CVS ver * The __file__ macro, and the -s/--synclines option, now show what directory a file was found in when the -I/--include option or M4PATH variable had an effect. +* The changequote and changecom macros now work with 8-bit characters. Version 1.4.5 - 15 July 2006, by Eric Blake (CVS version 1.4.4c) Index: src/input.c =================================================================== RCS file: /sources/m4/m4/src/Attic/input.c,v retrieving revision 1.1.1.1.2.13 diff -u -p -r1.1.1.1.2.13 input.c --- src/input.c 30 Jul 2006 23:46:51 -0000 1.1.1.1.2.13 +++ src/input.c 1 Aug 2006 02:44:33 -0000 @@ -397,7 +397,7 @@ init_macro_token (token_data *td) int peek_input (void) { - register int ch; + int ch; while (1) { @@ -407,7 +407,7 @@ peek_input (void) switch (isp->type) { case INPUT_STRING: - ch = isp->u.u_s.string[0]; + ch = to_uchar (isp->u.u_s.string[0]); if (ch != '\0') return ch; break; @@ -446,13 +446,13 @@ peek_input (void) #define next_char() \ (isp && isp->type == INPUT_STRING && isp->u.u_s.string[0] \ - ? *isp->u.u_s.string++ \ + ? to_uchar (*isp->u.u_s.string++) \ : next_char_1 ()) static int next_char_1 (void) { - register int ch; + int ch; if (start_of_input_line) { @@ -468,7 +468,7 @@ next_char_1 (void) switch (isp->type) { case INPUT_STRING: - ch = *isp->u.u_s.string++; + ch = to_uchar (*isp->u.u_s.string++); if (ch != '\0') return ch; break; @@ -531,14 +531,14 @@ match_input (const char *s) const char *t; ch = peek_input (); - if (ch != *s) + if (ch != to_uchar (*s)) return 0; /* fail */ (void) next_char (); if (s[1] == '\0') return 1; /* short match */ - for (n = 1, t = s++; (ch = peek_input ()) == *s++; n++) + for (n = 1, t = s++; (ch = peek_input ()) == to_uchar (*s++); n++) { (void) next_char (); if (*s == '\0') /* long match */ @@ -564,9 +564,9 @@ match_input (const char *s) `------------------------------------------------------------------------*/ #define MATCH(ch, s) \ - ((s)[0] == (ch) \ - && (ch) != '\0' \ - && ((s)[1] == '\0' \ + (to_uchar ((s)[0]) == (ch) \ + && (ch) != '\0' \ + && ((s)[1] == '\0' \ || (match_input ((s) + 1) ? (ch) = peek_input (), 1 : 0))) Index: doc/m4.texinfo =================================================================== RCS file: /sources/m4/m4/doc/m4.texinfo,v retrieving revision 1.1.1.1.2.57 diff -u -p -r1.1.1.1.2.57 m4.texinfo --- doc/m4.texinfo 31 Jul 2006 20:28:12 -0000 1.1.1.1.2.57 +++ doc/m4.texinfo 1 Aug 2006 02:44:34 -0000 @@ -698,8 +698,12 @@ primitive is spelled within @code{m4}. As @code{m4} reads its input, it separates it into @dfn{tokens}. A token is either a name, a quoted string, or any single character, that is not a part of either a name or a string. Input to @code{m4} can also -contain comments. @acronym{GNU} @code{m4} does not yet understand locales; all -operations are byte-oriented rather than character-oriented. +contain comments. @acronym{GNU} @code{m4} does not yet understand +locales; all operations are byte-oriented rather than +character-oriented. However, @code{m4} is eight-bit clean, so you can +use non-ASCII characters in quoted strings (@pxref{Changequote}), +comments (@pxref{Changecom}), and macro names (@pxref{Indir}), with the +exception of the NUL character (the zero byte). @menu * Names:: Macro names @@ -2344,6 +2348,23 @@ foo @result{}Macro foo. @end example +The quotation strings can safely contain eight-bit characters. [EMAIL PROTECTED] +Yuck. I know of no clean way to render an 8-bit character in both info +and dvi. This example uses the `open-guillemot' and `close-guillemot' +characters of the Latin-1 character set. + [EMAIL PROTECTED] +define(`a', `b') [EMAIL PROTECTED] +«a» [EMAIL PROTECTED] +changequote(`«', `»') [EMAIL PROTECTED] +«a» [EMAIL PROTECTED] [EMAIL PROTECTED] example [EMAIL PROTECTED] ignore If no single character is appropriate, @var{start} and @var{end} can be of any length. @@ -2380,10 +2401,10 @@ calls of @code{changequote} must be made and one for the new quotes. Macros are recognized in preference to the begin-quote string, so if a -prefix of @var{start} can be recognized as a macro name, the quoting -mechanism is effectively disabled. Unless you use @code{changeword} -(@pxref{Changeword}), this means that @var{start} should not begin with -a letter or @samp{_} (underscore). +prefix of @var{start} can be recognized as a potential macro name, the +quoting mechanism is effectively disabled. Unless you use [EMAIL PROTECTED] (@pxref{Changeword}), this means that @var{start} +should not begin with a letter or @samp{_} (underscore). @example define(`hi', `HI') @@ -2490,11 +2511,29 @@ changecom(`#') @result{}# comment again @end example +The comment strings can safely contain eight-bit characters. [EMAIL PROTECTED] +Yuck. I know of no clean way to render an 8-bit character in both info +and dvi. This example uses the `open-guillemot' and `close-guillemot' +characters of the Latin-1 character set. + [EMAIL PROTECTED] +define(`a', `b') [EMAIL PROTECTED] +«a» [EMAIL PROTECTED] +changecom(`«', `»') [EMAIL PROTECTED] +«a» [EMAIL PROTECTED] [EMAIL PROTECTED] example [EMAIL PROTECTED] ignore + Comments are recognized in preference to macros. However, this is not compatible with other implementations, where macros take precedence over comments, so it may change in a future release. For portability, this -means that @var{start} should not have a prefix that begins with a -letter or @samp{_} (underscore). +means that @var{start} should not begin with a letter or @samp{_} +(underscore). @example define(`hi', `HI') @@ -4646,6 +4685,7 @@ the first time. @bye @c Local Variables: [EMAIL PROTECTED] coding: ISO-8859-1 @c fill-column: 72 @c ispell-local-dictionary: "american" @c indent-tabs-mode: nil