Bug#724115: hunspell: FTBFS: POD error

Agustin Martin Wed, 13 Nov 2013 03:45:59 -0800

On Tue, Nov 12, 2013 at 09:43:59PM +0100, Rene Engelhard wrote:
> On Tue, Nov 12, 2013 at 07:54:04PM +0100, Agustin Martin wrote:
> > I will have a look at this (I once wrote ispellaff2myspell). Now I think the
> > best is to change script to UTF8, but keep strings in code as escaped 
> > octal. 
> > Or rewrite that part.
> > 
> > Let me think about this. Hope to find time tomorrow.
> 
> Oops, too late. Just added the patch as I saw the patch and did it before
> starting to read mail. My bad.
> 
> Feel free to come up with a patch based on -5 and I'll happily add it, though.


Hi, Rene and Gregor

Attached in two forms. One simple, just to see the differences I added and
the good one with all trailing whitespace in ispellaff2myspell trimmed.
Minimally tested with the faroese dictionary.

I also looked at myspell-tools. If I find time I will also prepare a patch
for myspell-tools also including changes by Gregor. I see that
ispellaff2myspell is included through a dpatch patch. Do you think it would
be interesting to change handling to something closer to what is used for
hunspell-tools (plain file under debian/)?

Regards,

-- 
Agustin

diff --git a/debian/changelog b/debian/changelog
index 2ca1fbe..0572e6c 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,13 @@
+hunspell (1.3.2-6) unstable; urgency=low
+
+  * debian/ispellaff2myspell: New upstream version.
+    - Incorporate changes by Gregor Herrmann (UTF-8 and typo fixes).
+    - Use octal codes for unibyte strings to make them coexist
+      with new UTF-8 encoding.
+    - Other minor changes.
+
+ --
+
 hunspell (1.3.2-5) unstable; urgency=low
 
   * apply patch from Gregor Hermann, thanks
diff --git a/debian/ispellaff2myspell b/debian/ispellaff2myspell
index 692571c..940d82b 100644
--- a/debian/ispellaff2myspell
+++ b/debian/ispellaff2myspell
@@ -1,8 +1,7 @@
 #!/usr/bin/perl -w
-# -*- coding: iso-8859-1 -*-
-# 	$Id: ispellaff2myspell,v 1.29 2005/07/04 12:21:55 agmartin Exp $
+# -*- coding: utf-8 -*-
 # 
-#   (C) 2002-2005 Agustin Martin Domingo <agustin.mar...@hispalinux.es> 
+#   (C) 2002-2013 Agustin Martin Domingo <agustin.mar...@hispalinux.es> 
 # 
 #    This program is free software; you can redistribute it and/or modify
 #    it under the terms of the GNU General Public License as published by
@@ -21,7 +20,7 @@
 
 sub usage {
     print "ispellaff2myspell: A program to convert ispell affix tables to myspell format
-(C) 2002-2005 Agustin Martin Domingo <agustin.martin\@hispalinux.es>         License: GPL
+(C) 2002-2013 Agustin Martin Domingo <agustin.martin\@hispalinux.es>         License: GPL2+
 
 Usage:
 	ispellaff2myspell [options] <affixfile>
@@ -98,17 +97,17 @@ sub mylc{
 	}
     } else {
 	if ( $charset eq "latin0" ){
-	    $lowercase='a-zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ½¨¸';
-	    $uppercase='A-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ¼¦´';
+	    $lowercase='a-z\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\370\371\372\373\374\375\376\275\250\270';
+	    $uppercase='A-Z\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336\274\246\264';
 	} elsif ( $charset eq "latin1" ){
-	    $lowercase='a-zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ';
-	    $uppercase='A-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ';
+	    $lowercase='a-z\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\370\371\372\373\374\375\376';
+	    $uppercase='A-Z\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336';
 	} elsif ( $charset eq "latin2" ){
-	    $lowercase='a-z±³µ¶¹º»¼¾¿àáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ';
-	    $uppercase='A-Z¡£¥¦©ª«¬®¯ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ';
+	    $lowercase='a-z\261\263\265\266\271\272\273\274\276\277\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\370\371\372\373\374\375\376';
+	    $uppercase='A-Z\241\243\245\246\251\252\253\254\256\257\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336';
 	} elsif ( $charset eq "latin3" ){
-	    $lowercase='a-z±¶¹º»¼¿àáâäåæçèéêëìíîïñòóôõö÷øùúûüýþ';
-	    $uppercase='A-Z¡¦©ª«¬¯ÀÁÂÄÅÆÇÈÉÊËÌÍÎÏÑÒÓÔÕÖ×ØÙÚÛÜÝÞ';
+	    $lowercase='a-z\261\266\271\272\273\274\277\340\341\342\344\345\346\347\350\351\352\353\354\355\356\357\361\362\363\364\365\366\367\370\371\372\373\374\375\376';
+	    $uppercase='A-Z\241\246\251\252\253\254\257\300\301\302\304\305\306\307\310\311\312\313\314\315\316\317\321\322\323\324\325\326\327\330\331\332\333\334\335\336';
 #	} elsif ( $charset eq "other_charset" ){
 #	    die "latin2 still unimplemented";
 	} else {
@@ -440,13 +439,19 @@ requires B<--lowercase> having exactly that string but lowercase.
 
 =back
 
-If your encoding is currently unsupported you can send me a file with 
-the two strings of lower and uppercase chars. Note that they must match 
-exactly but case changed. It will look something like
+If your encoding is currently unsupported you can send me a separate file
+with the two strings of lower and uppercase chars. Note that they must
+match exactly but case changed. It will look something like
 
   $lowercase='a-zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ';
   $uppercase='A-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ';
 
+A safer alternative against accidental recoding is to use octal codes for
+non 7bit chars. Above strings would then look like
+
+  $lowercase='a-z\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\370\371\372\373\374\375\376';
+  $uppercase='A-Z\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336';
+
 =head1 SEE ALSO
 
 The OpenOffice.org Lingucomponent Project home page

diff --git a/debian/changelog b/debian/changelog
index 2ca1fbe..0572e6c 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,13 @@
+hunspell (1.3.2-6) unstable; urgency=low
+
+  * debian/ispellaff2myspell: New upstream version.
+    - Incorporate changes by Gregor Herrmann (UTF-8 and typo fixes).
+    - Use octal codes for unibyte strings to make them coexist
+      with new UTF-8 encoding.
+    - Other minor changes.
+
+ --
+
 hunspell (1.3.2-5) unstable; urgency=low
 
   * apply patch from Gregor Hermann, thanks
diff --git a/debian/ispellaff2myspell b/debian/ispellaff2myspell
index 692571c..216ec75 100644
--- a/debian/ispellaff2myspell
+++ b/debian/ispellaff2myspell
@@ -1,9 +1,8 @@
 #!/usr/bin/perl -w
-# -*- coding: iso-8859-1 -*-
-# 	$Id: ispellaff2myspell,v 1.29 2005/07/04 12:21:55 agmartin Exp $
-# 
-#   (C) 2002-2005 Agustin Martin Domingo <agustin.mar...@hispalinux.es> 
-# 
+# -*- coding: utf-8 -*-
+#
+#   (C) 2002-2013 Agustin Martin Domingo <agustin.mar...@hispalinux.es>
+#
 #    This program is free software; you can redistribute it and/or modify
 #    it under the terms of the GNU General Public License as published by
 #    the Free Software Foundation; either version 2 of the License, or
@@ -21,23 +20,23 @@
 
 sub usage {
     print "ispellaff2myspell: A program to convert ispell affix tables to myspell format
-(C) 2002-2005 Agustin Martin Domingo <agustin.martin\@hispalinux.es>         License: GPL
+(C) 2002-2013 Agustin Martin Domingo <agustin.martin\@hispalinux.es>         License: GPL2+
 
 Usage:
 	ispellaff2myspell [options] <affixfile>
 
       Options:
 	--affixfile=s      Affix file
-	--bylocale         Use current locale setup for upper/lowercase 
+	--bylocale         Use current locale setup for upper/lowercase
                            conversion
-	--charset=s        Use specified charset for upper/lowercase 
+	--charset=s        Use specified charset for upper/lowercase
                            conversion (defaults to latin1)
  	--debug            Print debugging info
  	--extraflags       Allow some non alphabetic flags
 	--lowercase=s      Lowercase string
         --myheader=s       Header file
-	--printcomments    Print commented lines in output 
-        --replacements=s   Replacements file 
+	--printcomments    Print commented lines in output
+        --replacements=s   Replacements file
         --split=i          Split flags with more that i entries
 	--uppercase=s      Uppercase string
 	--wordlist=s       Still unused
@@ -62,7 +61,7 @@ sub debugprint {
 
 sub shipoutflag{
     my $flag_entries=scalar @flag_array;
-	
+
     if ( $flag_entries != 0 ){
 	if ( $split ){
 	    while ( @flag_array ){
@@ -92,23 +91,23 @@ sub mylc{
     my $outputstring;
 
     if ( $bylocale ){
-	{ 
+	{
 	    use locale;
 	    $outputstring =  lc $inputstring;
 	}
     } else {
 	if ( $charset eq "latin0" ){
-	    $lowercase='a-zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ½¨¸';
-	    $uppercase='A-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ¼¦´';
+	    $lowercase='a-z\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\370\371\372\373\374\375\376\275\250\270';
+	    $uppercase='A-Z\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336\274\246\264';
 	} elsif ( $charset eq "latin1" ){
-	    $lowercase='a-zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ';
-	    $uppercase='A-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ';
+	    $lowercase='a-z\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\370\371\372\373\374\375\376';
+	    $uppercase='A-Z\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336';
 	} elsif ( $charset eq "latin2" ){
-	    $lowercase='a-z±³µ¶¹º»¼¾¿àáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ';
-	    $uppercase='A-Z¡£¥¦©ª«¬®¯ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ';
+	    $lowercase='a-z\261\263\265\266\271\272\273\274\276\277\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\370\371\372\373\374\375\376';
+	    $uppercase='A-Z\241\243\245\246\251\252\253\254\256\257\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336';
 	} elsif ( $charset eq "latin3" ){
-	    $lowercase='a-z±¶¹º»¼¿àáâäåæçèéêëìíîïñòóôõö÷øùúûüýþ';
-	    $uppercase='A-Z¡¦©ª«¬¯ÀÁÂÄÅÆÇÈÉÊËÌÍÎÏÑÒÓÔÕÖ×ØÙÚÛÜÝÞ';
+	    $lowercase='a-z\261\266\271\272\273\274\277\340\341\342\344\345\346\347\350\351\352\353\354\355\356\357\361\362\363\364\365\366\367\370\371\372\373\374\375\376';
+	    $uppercase='A-Z\241\246\251\252\253\254\257\300\301\302\304\305\306\307\310\311\312\313\314\315\316\317\321\322\323\324\325\326\327\330\331\332\333\334\335\336';
 #	} elsif ( $charset eq "other_charset" ){
 #	    die "latin2 still unimplemented";
 	} else {
@@ -116,7 +115,7 @@ sub mylc{
 		die "Unsupported charset [$charset]
 
 use explicitly --lowercase=string and --uppercase=string
-options. Remember that both string must match exactly, but 
+options. Remember that both string must match exactly, but
 case changed.
 ";
 	    }
@@ -136,17 +135,17 @@ sub validate_flag (){
 	    if ($flag =~ m/^$_/){
 		$flag =~ s/^$_//;
 		return $flag;
-	    } 
+	    }
 	}
-    } 
+    }
     return '';
 }
 
 sub process_replacements{
     my $file = shift;
     my @replaces = ();
-    
-    open (REPLACE,"< $file") || 
+
+    open (REPLACE,"< $file") ||
 	die "Error: Could not open replacements file: $file\n";
     while (<REPLACE>){
 	next unless m/^REP[\s\t]*\D.*/;
@@ -178,7 +177,7 @@ $debug         = '';
 $lowercase     = '';
 $myheader      = '';
 $printcomments = '';
-$replacements  = ''; 
+$replacements  = '';
 $split         = '';
 $uppercase     = '';
 $wordlist      = '';
@@ -218,7 +217,7 @@ if ( not $affixfile ){
 
 if ( $charset and ( $lowercase or $uppercase )){
     die "Error: charset and lowercase/uppercase options
-are incompatible. Use either charset or lowercase/uppercase options to 
+are incompatible. Use either charset or lowercase/uppercase options to
 specify the patterns
 "
 } elsif ( not $lowercase and not $uppercase and not $charset ){
@@ -231,7 +230,7 @@ if ( scalar(keys %theextraflags) == 0 && $hasextraflags ){
 
 debugprint "$affixfile $charset";
 
-open (AFFIXFILE,"< $affixfile") || 
+open (AFFIXFILE,"< $affixfile") ||
     die "Error: Could not open affix file: $affixfile";
 
 if ( $myheader ){
@@ -259,7 +258,7 @@ while (<AFFIXFILE>){
 	s/^[\s\t]*flag[\s\t]*//;
 	s/[\s\t]*:.*$//;
 	debugprint "Found flag $_ in line $.\n";
-	
+
 	if (/\*/){
 	    s/[\*\s]//g;
 	    $flagcombine="Y";
@@ -267,7 +266,7 @@ while (<AFFIXFILE>){
 	} else {
 	    $flagcombine="N";
 	}
-	
+
 	if ( $flagname = &validate_flag($_) ){
 	    $myaffix  = $affix;
 	} else {
@@ -278,11 +277,11 @@ while (<AFFIXFILE>){
     } elsif ( $affix and $inflags ) {
 	($rootname,@comments)   =  split('#',$_);
 	$comment                =  '# ' . join('#',@comments);
-	
+
 	$rootname               =~ s/\s*//g;
 	$rootname               =  mylc $rootname;
 	($rootname,$addtoroot)  =  split('>',$rootname);
-	
+
 	if ( $addtoroot =~ s/^\-//g ){
 	    ($rootremove,$addtoroot)  = split(',',$addtoroot);
 	    $addtoroot                = "0" unless $addtoroot;
@@ -295,15 +294,15 @@ while (<AFFIXFILE>){
 	if ( $rootname eq '.' && $rootremove ne "0" ){
 	    $rootname = $rootremove;
 	}
-	
+
 	debugprint "$rootname, $addtoroot, $rootremove\n";
 	if ( $printcomments ){
 	    $affix_line=sprintf("%s %s   %-5s %-11s %-24s %s",
-				$myaffix, $flagname, $rootremove, 
+				$myaffix, $flagname, $rootremove,
 				$addtoroot, $rootname, $comment);
 	} else {
 	    $affix_line=sprintf("%s %s   %-5s %-11s %s",
-				$myaffix, $flagname, $rootremove, 
+				$myaffix, $flagname, $rootremove,
 				$addtoroot, $rootname);
 	}
 	$rootremove = "0";
@@ -340,23 +339,23 @@ B<ispellaff2myspell> - A program to convert ispell affix tables to myspell forma
    Options:
 
     --affixfile=s      Affix file
-    --bylocale         Use current locale setup for upper/lowercase 
+    --bylocale         Use current locale setup for upper/lowercase
                        conversion
-    --charset=s        Use specified charset for upper/lowercase 
+    --charset=s        Use specified charset for upper/lowercase
                        conversion (defaults to latin1)
     --debug            Print debugging info
     --extraflags=s     Allow some non alphabetic flags
     --lowercase=s      Lowercase string
-    --myheader=s       Header file 
-    --printcomments    Print commented lines in output 
-    --replacements=s   Replacements file 
+    --myheader=s       Header file
+    --printcomments    Print commented lines in output
+    --replacements=s   Replacements file
     --split=i          Split flags with more that i entries
     --uppercase=s      Uppercase string
 
 =head1 DESCRIPTION
 
-B<ispellaff2myspell> is a script that will convert ispell affix tables 
-to myspell format in a more or less successful way. 
+B<ispellaff2myspell> is a script that will convert ispell affix tables
+to myspell format in a more or less successful way.
 
 This script does not create the dict file. Something like
 
@@ -368,85 +367,91 @@ should do the work, with mydict.words+ being the munched wordlist
 
 =over 8
 
-=item B<--affixfile=s>  
+=item B<--affixfile=s>
 
 Affix file. You can put it directly in the command line.
 
-=item B<--bylocale> 
+=item B<--bylocale>
 
-Use current locale setup for upper/lowercase conversion. Make sure 
-that the selected locale match the dictionary one, or you might get 
+Use current locale setup for upper/lowercase conversion. Make sure
+that the selected locale match the dictionary one, or you might get
 into trouble.
 
-=item B<--charset=s>        
+=item B<--charset=s>
 
-Use specified charset for upper/lowercase conversion (defaults to latin1). 
+Use specified charset for upper/lowercase conversion (defaults to latin1).
 Currently allowed values for charset are: latin0, latin1, latin2, latin3.
 
-=item B<--debug>            
+=item B<--debug>
 
 Print some debugging info.
 
-=item B<--extraflags:s>       
+=item B<--extraflags:s>
 
-Allows some non alphabetic flags. 
+Allows some non alphabetic flags.
 
-When invoked with no value the supported flags are currently those 
-corresponding to chars represented with the escape char B<\> as 
+When invoked with no value the supported flags are currently those
+corresponding to chars represented with the escape char B<\> as
 first char. B<\> will be stripped.
 
-When given with the flag prefix will allow that flag and strip the 
-given prefix. Be careful when giving the prefix to properly escape chars, 
-e.g. you will need B<-e "\\\\"> or B<-e '\\'> for flags like B<\[> to be stripped to 
-B<[>. Otherwise you might even get errors. Use B<-e "^"> to allow all 
+When given with the flag prefix will allow that flag and strip the
+given prefix. Be careful when giving the prefix to properly escape chars,
+e.g. you will need B<-e "\\\\"> or B<-e '\\'> for flags like B<\[> to be stripped to
+B<[>. Otherwise you might even get errors. Use B<-e "^"> to allow all
 flags and pass them unmodified.
 
-You will need a call to -e for each flag type, e.g., 
-B<-e "\\\\" -e "~\\\\"> (or B<-e '\\' -e '~\\'>). 
+You will need a call to -e for each flag type, e.g.,
+B<-e "\\\\" -e "~\\\\"> (or B<-e '\\' -e '~\\'>).
 
-When a prefix is explicitly set, the default value (anything starting by B<\>) 
+When a prefix is explicitly set, the default value (anything starting by B<\>)
 is disabled and you need to enable it explicitly as in previous example.
 
-=item B<--lowercase=s>      
+=item B<--lowercase=s>
 
-Lowercase string. Manually set the string of lowercase chars. This 
+Lowercase string. Manually set the string of lowercase chars. This
 requires B<--uppercase> having exactly that string but uppercase.
- 
-=item B<--myheader=s>       
 
-Header file. The myspell aff header. You need to write it 
+=item B<--myheader=s>
+
+Header file. The myspell aff header. You need to write it
 manually. This can contain everything you want to be before the affix table
 
-=item B<--printcomments>    
+=item B<--printcomments>
 
 Print commented lines in output.
 
-=item B<--replacements=file>      
+=item B<--replacements=file>
 
 Add a pre-defined replacements table taken from 'file' to the .aff file.
 Will skip lines not beginning with REP, and set the replacements number
 appropriately.
 
-=item B<--split=i>          
+=item B<--split=i>
 
-Split flags with more that i entries. This can be of interest for flags 
-having a lot of entries. Will split the flag in chunks containing B<i> 
+Split flags with more that i entries. This can be of interest for flags
+having a lot of entries. Will split the flag in chunks containing B<i>
 entries.
 
-=item B<--uppercase=s>      
+=item B<--uppercase=s>
 
-Uppercase string. Manually set the sring of uppercase chars. This 
+Uppercase string. Manually set the sring of uppercase chars. This
 requires B<--lowercase> having exactly that string but lowercase.
 
 =back
 
-If your encoding is currently unsupported you can send me a file with 
-the two strings of lower and uppercase chars. Note that they must match 
-exactly but case changed. It will look something like
+If your encoding is currently unsupported you can send me a separate file
+with the two strings of lower and uppercase chars. Note that they must
+match exactly but case changed. It will look something like
 
   $lowercase='a-zàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþ';
   $uppercase='A-ZÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞ';
 
+A safer alternative against accidental recoding is to use octal codes for
+non 7bit chars. Above strings would then look like
+
+  $lowercase='a-z\340\341\342\343\344\345\346\347\350\351\352\353\354\355\356\357\360\361\362\363\364\365\366\370\371\372\373\374\375\376';
+  $uppercase='A-Z\300\301\302\303\304\305\306\307\310\311\312\313\314\315\316\317\320\321\322\323\324\325\326\330\331\332\333\334\335\336';
+
 =head1 SEE ALSO
 
 The OpenOffice.org Lingucomponent Project home page
@@ -459,7 +464,7 @@ L<http://lingucomponent.openoffice.org/affix.readme>
 
 that provides information about the basics of the myspell affix file format.
 
-You can also take a look at 
+You can also take a look at
 
  /usr/share/doc/libmyspell-dev/affix.readme.gz
  /usr/share/doc/libmyspell-dev/README.compoundwords

Bug#724115: hunspell: FTBFS: POD error

Reply via email to