Peter,
> we are running postfix + amavisd-new 2.7.0 on RHEL3 (perl v5.8.0).
>
> From time to time mails get stuck in the postfix queue
> because of the following error
>
> 451 4.5.0 Error in processing, id=02676-02, check-banned FAILED:
> Malformed UTF-8 character (unexpected continuation byte 0x81, with no
> preceding start byte) in substitution iterator at /usr/sbin/amavisd
> line 2874. (in reply to end of DATA command))
>
> It looks like the substitution command in "sub sanitize_str"
> does not handle all cases of input.
>
> As work-around we put an eval around the command but this
> seems very ugly because it heals the symptoms but does
> not remove the root cause. :-)
>
> eval {
> $str =~ s/([^\040-\133\135-\176])/ # and \240-\376 ?
> exists($quote_controls_map{$1}) ? $quote_controls_map{$1} :
> sprintf(ord($1)>255 ? '\\x{%04x}' : '\\%03o', ord($1))/egs;
> }
>
> Is there any other known solution beside upgrading the OS (+perl +...)
> to get rid of the problem?
>
> Best regards,
> Peter
>
> PS: We already set a non UTF8-locale (LANG = LC_ALL = C) in the
> init scripts of postfix & amavisd.
A manifestation of the Perl bug #32687:
Encode::is_utf8 on tainted UTF8 string returns false
https://rt.perl.org/rt3/Public/Bug/Display.html?id=32687
If you had perl 5.8.1 you could use utf8::is_utf8 instead,
but that is not available in 5.8.0 yet.
I forgot that we have already seen and avoided this bug
back in 2004:
amavisd-new-2.2.1 release notes:
- avoid the use of Encode::is_utf8 due to a Perl bug (still present in
5.8.8, Encode::is_utf8 on tainted utf8 character string produces false);
Perl bug tracking: #32687: Encode::is_utf8 on tainted UTF8 string
returns false;
Please try the attached patch, it avoids testing Encode::is_utf8
and just calls safe_encode() unconditionally.
Mark
--- amavisd~ 2011-07-01 18:21:07.000000000 +0200
+++ amavisd 2012-04-03 20:07:10.170938138 +0200
@@ -2481,3 +2481,4 @@
my($str) = @_; local($1);
- $str = safe_encode('UTF-8',$str) if Encode::is_utf8($str);
+ # avoid Encode::is_utf8 check, always false on tainted, Perl bug #32687
+ $str = safe_encode('UTF-8',$str); # if Encode::is_utf8($str);
$str =~ s/([^\041-\052\054-\074\076-\176])/sprintf('+%02X',ord($1))/egs;
@@ -2866,3 +2867,4 @@
my($str, $keep_eol) = @_;
- $str = safe_encode('UTF-8',$str) if Encode::is_utf8($str);
+ # avoid Encode::is_utf8 check, always false on tainted, Perl bug #32687
+ $str = safe_encode('UTF-8',$str); # if Encode::is_utf8($str);
local($1);
@@ -8689,3 +8691,3 @@
if (defined $val && $val ne '') {
- # $val = safe_encode('UTF-8',$val) if Encode::is_utf8($val);
+ # $val = safe_encode('UTF-8',$val);
$part->report_type($val);
@@ -19334,3 +19336,3 @@
do_log(-1,"WARN: Unicode string passed to datasend")
- if Encode::is_utf8($buff);
+ if Encode::is_utf8($buff); # always false on tainted, Perl bug #32687
# do_log(5,"smtp print %d bytes>", length($buff));