Later on, he summarizes some of the existing implementations, including comments about the Plan 9 implementation and his own RE2, both of which efficiently handle international text (which seems to be a major concern of Gabor's).

I believe Gabor is considering TRE for a good replacement regex library.
Yes. Oniguruma is slow, Google RE2 only supports Perl and fgrep syntax but not standard regex and Plan 9 implementation iirc only supports fgrep syntax and Unicode but not wchar_t in general.

The key comment in Mike's GNU grep notes is the one about not breaking into lines. That's simply double-scanning the input; instead, run the matcher over blocks of text and, when it finds a match, work backwards from the match to find the appropriate line beginning. This is efficient because most lines don't match.

I do like the idea.
So do I.

BTW, the fastgrep portion of bsdgrep is my fault/contribution to do a faster search bypassing the regex library. :) It certainly was not written with any encodings in mind; it was purely ASCII. As I have not kept up with it, I do not know if anyone improved it or not.

It has been made wchar-compliant.

Gabor
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Reply via email to