Package: wml
Version: 2.12.2~ds1-2
Severity: normal
Tags: upstream

Dear Maintainer

I found problem with processing wml file contain letter 'ą' followed by new
line character.
After some research, I've reached to point, which I cannot pass.

I've created file with one letter 'ą' followed by new line character. In hex
this file looks like this:

mirek@dom:~/0$ hexdump -C test.wml
00000000  c4 85 0a                                          |...|
00000003

Next, I've executed command:
/usr/lib/wml/exec/wml_p8_htmlstrip /home/mirek/0/test.wml -o test.out

File test.out in hex looks like this:

mirek@dom:~/0$ hexdump -C test.out
00000000  c4 0a                                             |..|
00000002

As you can see value 85 which should stay, was deleted.

I tried to find where the error is and I came to the following code:
    #   Level 2
    if ( $self->opt_O >= 2 )
    {
        #   strip multiple whitespaces to single one
        $buf =~ s|(\S+)[ \t]{2,}|$1 |sg;
        #   strip trailing whitespaces
        $buf =~ s|\s+\n|\n|sg;
    }
from function _strip_plain_text in the file
wml/TheWML/Backends/HtmlStrip/Main.pm.

I've checked other letters (ĄćĆęĘóÓśŚłŁżŻźŹćĆńŃ), each followed by new line
character and they are processed properly. Only letter 'ą' has made problem.
I've already checked fresh install of Debian buster and result is the same.

I've also create script test.pm:
#!/usr/bin/perl
my $text = "ą\cJ";
sub _strip_plain_text_test
{
    my ( $self, $buf ) = @_;
    $buf = $text;
    if ( 2 >= 1 )
    {
        #   strip empty lines
        $buf =~ s|\n\s*\n|\n|sg;
    }
    #   Level 2
    if ( 2 >= 2 )
    {
        #   strip multiple whitespaces to single one
        $buf =~ s|(\S+)[ \t]{2,}|$1 |sg;
        #   strip trailing whitespaces
        $buf =~ s|\s+\n|\n|sg;
    }
    return $buf;
}

print "extracted _strip_plain_text function\n";
print _strip_plain_text_test();

print "function _strip_plain_text from TheWML::Backends::HtmlStrip::Main\n";
use strict;
use warnings;
use lib '/usr/share/wml';
use lib '/usr/lib/wml';
use TheWML::Backends::HtmlStrip::Main ();
my $object1 = TheWML::Backends::HtmlStrip::Main->new( argv => \@ARGV );
$object1->opt_O(2);
print $object1->_strip_plain_text($text);

Function _strip_plain_text_test works fine, but
$object1->_strip_plain_text($text) fail:

mirek@dom:~/0$ perl test.pm
extracted _strip_plain_text function
ą
function _strip_plain_text from TheWML::Backends::HtmlStrip::Main
�

Could you check if you have the same problem with this character?



-- System Information:
Debian Release: buster/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.19.0-4-amd64 (SMP w/8 CPU cores)
Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=pl_PL.utf8, LC_CTYPE=pl_PL.utf8 (charmap=UTF-8),
LANGUAGE=pl_PL.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages wml depends on:
ii  eperl                     2.2.14-23+b1
ii  libbit-vector-perl        7.4-1+b5
ii  libcarp-always-perl       0.16-1
ii  libclass-xsaccessor-perl  1.19-3+b2
ii  libfile-which-perl        1.23-1
ii  libimage-size-perl        3.300-1
ii  libio-all-perl            0.87-1
ii  liblist-moreutils-perl    0.416-1+b4
ii  libpath-tiny-perl         0.108-1
ii  libterm-readkey-perl      2.38-1
ii  m4                        1.4.18-2
ii  mp4h                      1.3.1-17
ii  perl                      5.28.1-6
ii  slice                     1.3.8-14

Versions of packages wml recommends:
ii  iselect             1.4.0-3+b1
ii  libgd-perl          2.71-2
ii  libhtml-clean-perl  0.8-12
ii  linklint            2.3.5-5.1
ii  tidy                2:5.6.0-10
ii  txt2html            2.5201-1

Versions of packages wml suggests:
ii  freetable                        2.3-4.2
ii  imagemagick                      8:6.9.10.23+dfsg-2.1
ii  imagemagick-6.q16 [imagemagick]  8:6.9.10.23+dfsg-2.1
ii  libwww-perl                      6.36-1
ii  perl-tk                          1:804.033-2+b3
pn  shtool                           <none>
pn  tardy                            <none>
pn  w3-recs                          <none>
ii  weblint-perl                     2.32+dfsg-1

Reply via email to