On Sun, Jul 18, 2021 at 1:13 PM Ian Lance Taylor via Gcc <gcc@gcc.gnu.org>
wrote:

> On Sun, Jul 18, 2021 at 6:23 AM Mark Wielaard <m...@klomp.org> wrote:
> >
> > For the gcc rust frontend I was thinking of importing a couple of
> > gnulib modules to help with UTF-8 processing, conversion to/from
> > unicode codepoints and determining various properties of those
> > codepoints. But it seems gcc doesn't yet have any gnulib modules
> > imported, and maybe other frontends already have helpers to this that
> > the gcc rust frontend could reuse.
> >
> > Rust only accepts valid UTF-8 encoded source files, which may or may
> > not start with UTF-8 BOM character. Whitespace is any codepoint with
> > the Pattern_White_Space property. Identifiers can start with any
> > codepoint with the XID_start property plus zero or one codepoints with
> > XID_continue property. It isn't required, but highly desirable to
> > detect confusable identifiers according to tr39/Confusable_Detection.
> >
> > Other names might be constraint to Alphabetic and/or Number categories
> > (Nd, Nl, No), textual types can only contain Unicode Scalar Values
> > (any Unicode codepoint except high-surrogate and low-surrogates),
> > strings in source code can contain unicode escapes (24 bit, up to 6
> > digits codepoints) but are internally stored as UTF-8 (and must not
> > encode any surrogates).
> >
> > Do other gcc frontends handle any of the above already in a way that
> > might be reusable for other frontends?
>
> I don't know that this is particularly helpful, but the Go frontend
> has this kind of code in gcc/go/gofrontend/lex.cc.  E.g.,
> Lex::fetch_char, Lex::advance_one_utf8_char, unicode_space,
> unicode_digits, unicode_letters, Lex::is_unicode_space, etc.  But you
> probably won't be able to use the code directly, and the code in the
> gofrontend directory is also shared with GoLLVM so it can't trivially
> be moved.
>

I believe the UTF-8 handling for the C family front ends is all in libcpp;
I don't think it's factored in a way to be useful to other front ends.

Jason

Reply via email to