Package: wml Version: 2.12.2~ds1-2 Severity: normal Tags: upstream Dear Maintainer
I found problem with processing wml file contain letter 'ą' followed by new line character. After some research, I've reached to point, which I cannot pass. I've created file with one letter 'ą' followed by new line character. In hex this file looks like this: mirek@dom:~/0$ hexdump -C test.wml 00000000 c4 85 0a |...| 00000003 Next, I've executed command: /usr/lib/wml/exec/wml_p8_htmlstrip /home/mirek/0/test.wml -o test.out File test.out in hex looks like this: mirek@dom:~/0$ hexdump -C test.out 00000000 c4 0a |..| 00000002 As you can see value 85 which should stay, was deleted. I tried to find where the error is and I came to the following code: # Level 2 if ( $self->opt_O >= 2 ) { # strip multiple whitespaces to single one $buf =~ s|(\S+)[ \t]{2,}|$1 |sg; # strip trailing whitespaces $buf =~ s|\s+\n|\n|sg; } from function _strip_plain_text in the file wml/TheWML/Backends/HtmlStrip/Main.pm. I've checked other letters (ĄćĆęĘóÓśŚłŁżŻźŹćĆńŃ), each followed by new line character and they are processed properly. Only letter 'ą' has made problem. I've already checked fresh install of Debian buster and result is the same. I've also create script test.pm: #!/usr/bin/perl my $text = "ą\cJ"; sub _strip_plain_text_test { my ( $self, $buf ) = @_; $buf = $text; if ( 2 >= 1 ) { # strip empty lines $buf =~ s|\n\s*\n|\n|sg; } # Level 2 if ( 2 >= 2 ) { # strip multiple whitespaces to single one $buf =~ s|(\S+)[ \t]{2,}|$1 |sg; # strip trailing whitespaces $buf =~ s|\s+\n|\n|sg; } return $buf; } print "extracted _strip_plain_text function\n"; print _strip_plain_text_test(); print "function _strip_plain_text from TheWML::Backends::HtmlStrip::Main\n"; use strict; use warnings; use lib '/usr/share/wml'; use lib '/usr/lib/wml'; use TheWML::Backends::HtmlStrip::Main (); my $object1 = TheWML::Backends::HtmlStrip::Main->new( argv => \@ARGV ); $object1->opt_O(2); print $object1->_strip_plain_text($text); Function _strip_plain_text_test works fine, but $object1->_strip_plain_text($text) fail: mirek@dom:~/0$ perl test.pm extracted _strip_plain_text function ą function _strip_plain_text from TheWML::Backends::HtmlStrip::Main � Could you check if you have the same problem with this character? -- System Information: Debian Release: buster/sid APT prefers testing APT policy: (500, 'testing') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 4.19.0-4-amd64 (SMP w/8 CPU cores) Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=pl_PL.utf8, LC_CTYPE=pl_PL.utf8 (charmap=UTF-8), LANGUAGE=pl_PL.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages wml depends on: ii eperl 2.2.14-23+b1 ii libbit-vector-perl 7.4-1+b5 ii libcarp-always-perl 0.16-1 ii libclass-xsaccessor-perl 1.19-3+b2 ii libfile-which-perl 1.23-1 ii libimage-size-perl 3.300-1 ii libio-all-perl 0.87-1 ii liblist-moreutils-perl 0.416-1+b4 ii libpath-tiny-perl 0.108-1 ii libterm-readkey-perl 2.38-1 ii m4 1.4.18-2 ii mp4h 1.3.1-17 ii perl 5.28.1-6 ii slice 1.3.8-14 Versions of packages wml recommends: ii iselect 1.4.0-3+b1 ii libgd-perl 2.71-2 ii libhtml-clean-perl 0.8-12 ii linklint 2.3.5-5.1 ii tidy 2:5.6.0-10 ii txt2html 2.5201-1 Versions of packages wml suggests: ii freetable 2.3-4.2 ii imagemagick 8:6.9.10.23+dfsg-2.1 ii imagemagick-6.q16 [imagemagick] 8:6.9.10.23+dfsg-2.1 ii libwww-perl 6.36-1 ii perl-tk 1:804.033-2+b3 pn shtool <none> pn tardy <none> pn w3-recs <none> ii weblint-perl 2.32+dfsg-1