Hello,

Currently pcre2 is configured with "--enable-newline-is-any".  With
the option, the library treats 0x85 as a newline char.  But in UTF-8,
0x85 is used at least for some casual Kanji chars.  So the pcre2
cannot handle text which includes such the chars properly.

Since --enable-newline-is-any conflicts with using UTF-8, I think we
should change it to --enable-newline-is-anycrlf to avoid the
conflict.

https://github.com/PCRE2Project/pcre2/blob/pcre2-10.37/src/pcre2_internal.h#L663
    657 /* In ASCII/Unicode, linefeed is '\n' and we equate this to NL for
    658 compatibility. NEL is the Unicode newline character; make sure it is
    659 a positive value. */
    660 
    661 #define CHAR_LF                     '\n'
    662 #define CHAR_NL                     CHAR_LF
->  663 #define CHAR_NEL                    ((unsigned char)'\x85')
    664 #define CHAR_ESC                    '\033'
    665 #define CHAR_DEL                    '\177'
    666 #define CHAR_NBSP                   ((unsigned char)'\xa0')

\u8005 is "\xe0\x80\x85" in UTF-8, which includes "\x85".
https://glyphwiki.org/wiki/u8005

test code in php:

  <?php
    $test = "\u{8005} hogehoge";
    if (preg_match("/^(.+)$/m", $test, $match)) {
        print("result: " . str_ends_with($match[1], "hoge") .
          " (should be 1)\n");
    }
   ?>

ok?


Specify --enable-newline-is-anycrlf instead of --enable-newline-is-any
which doesn't work properly with UTF-8 text.  The former option treats
0x85, which is used for some kanji in UTF-8, as a newline char.w

Index: devel/pcre2/Makefile
===================================================================
RCS file: /cvs/ports/devel/pcre2/Makefile,v
retrieving revision 1.16
diff -u -p -r1.16 Makefile
--- devel/pcre2/Makefile        11 Mar 2022 18:52:29 -0000      1.16
+++ devel/pcre2/Makefile        2 Nov 2022 14:02:31 -0000
@@ -9,6 +9,8 @@ SHARED_LIBS +=  pcre2-posix             
 
 CATEGORIES =   devel
 
+REVISION =     0
+
 MASTER_SITES = https://ftp.pcre.org/pub/pcre/ \
                ${MASTER_SITE_SOURCEFORGE:=pcre/} \
                http://ftp.csx.cam.ac.uk/pub/software/programming/pcre/ \
@@ -27,7 +29,7 @@ LIB_DEPENDS =         archivers/bzip2
 CONFIGURE_STYLE =      gnu
 CONFIGURE_ARGS =       --enable-pcre2-16 \
                        --enable-pcre2-32 \
-                       --enable-newline-is-any \
+                       --enable-newline-is-anycrlf \
                        --enable-pcre2grep-libz \
                        --enable-pcre2grep-libbz2 \
                        --enable-pcre2test-libreadline

Reply via email to