Package: perl
Version: 5.8.8-6
Severity: normal

Please excuse the bug title, after working on this for something like 5
hours, I cannot think clearly enough to write a short title describing
this very weird bug. Let the code speak for me. I have attached a testcase;
untar it, run the "repro" program.

[EMAIL PROTECTED]:~/tmp/repor/testcase>./repro
a
b
Wide character in subroutine entry at /usr/bin/markdown line 360.
zsh: exit 255   ./repro

Now, edit the repro file. There are 4 comments suggesting changes; if you make
any one of the changes, the wide character failure disappears.

Notice that several of the changes should not possibly affect anything,
but do. For example, uncommenting the s/// line should be a null change because
$mommy is otherwise utterly unused. But umcommenting that line "fixes" 
the problem. This smells deeply of a perl bug to me. I boiled this test case 
down
from several thousand lines of code, dealing with many changes like this that
inexplicably hid the problem.

I should probably do a similar reduction on markdown and possibly 
HTML::Scrubber,
but it's getting late. Their versions here are listed below.

Here's some analysis of what's going on inside markdown when it fails:

<paravoid> watch this:
<paravoid>         print 'text is utf: ', utf8::is_utf8($text) ? 'yes' : 'no', 
"\n";
<paravoid>         $text =~ s{
<paravoid>                                 (                                    
           # save in $1
<paravoid>                                         ^                            
           # start of line  (with /m)
<paravoid>                                         <($block_tags_a)        # 
start tag = $2
<paravoid>                                         \b                           
           # word break
<paravoid>                                         (.*\n)*?                     
   # any number of lines, minimally matching
<paravoid>                                         </\2>                        
   # the matching end tag
<paravoid>                                         [ \t]*                       
   # trailing spaces/tabs
<paravoid>                                 )
<paravoid>                         }{
<paravoid>                                 print '$1 is utf: ', 
utf8::is_utf8($1) ? 'yes' : 'no', "\n";
<paravoid>                                 my $key = md5_hex($1);
<paravoid>                                 $g_html_blocks{$key} = $1;
<paravoid>                                 "\n\n" . $key . "\n\n";
<paravoid>                         }egmx;
<paravoid> I added the two 'prints'
<paravoid> text is utf: no
<paravoid> $1 is utf: yes
<paravoid> that's freaking weird
<paravoid> the utf8 flag gets enabled after the regexp is run

Also note that paravoid had a version (much larger; a small modification to
ikiwiki) that reproduced the bug w/o HTML::Scrubber being loaded. As far as
I can guess, the HTML::Scrubber stuff doesn't really have any bearing on the bug
and is just one more mysterious thing that hides the bug if it's removed.

-- System Information:
Debian Release: testing/unstable
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: i386 (i686)
Shell:  /bin/sh linked to /bin/bash
Kernel: Linux 2.6.17-1-686
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)

Versions of packages perl depends on:
ii  libc6                         2.3.6-15   GNU C Library: Shared libraries
ii  libdb4.4                      4.4.20-6   Berkeley v4.4 Database Libraries [
ii  libgdbm3                      1.8.3-3    GNU dbm database routines (runtime
ii  perl-base                     5.8.8-6    The Pathologically Eclectic Rubbis
ii  perl-modules                  5.8.8-6    Core Perl modules

Versions of packages perl recommends:
ii  perl-doc                      5.8.8-6    Perl documentation

Other software:
ii  markdown       1.0.1-3        Text-to-HTML conversion tool
ii  libhtml-scrubb 0.08-2         Perl extension for scrubbing/sanitizing html


paravoid reproduced it using a similar test case on a system running sarge with:
<paravoid> ii  perl           5.8.4-8sarge4  Larry Wall's Practical Extraction 
and Report
<paravoid> ii  markdown       1.0.1-2        Text-to-HTML conversion tool
<paravoid> ii  libhtml-scrubb 0.08-1         Perl extension for 
scrubbing/sanitizing html

-- 
see shy jo

Attachment: signature.asc
Description: Digital signature

Reply via email to