Control: tag -1 + patch

On Sun, Dec 07, 2014 at 09:33:26AM +0100, Jean-Francois Dockes wrote:
> Agustin Martin writes:
>  > 2014-12-06 16:46 GMT+01:00 Jean-Francois Dockes <j...@dockes.org>:
>  > > Package: aspell
>  > > Version: 0.60.7~20110707-1.3
>  > > Severity: important
>  > >
>  > > Example run:
>  > >
>  > > vj64$ /usr/bin/aspell --lang=en --encoding=utf-8 create master 
> /home/dockes/.recoll/aspdict.en.rws
>  > > Error: The language "en" is not known. This is probably because: the 
> file "/usr/share/aspell/en.dat" can not be opened for reading.
>  > >
>  > > Indeed /usr/share/aspell/en.dat does not exist, but 
> /usr/lib/aspell/en.dat
>  > > does.
>  > >
>  > > Specifying --data-dir does not work either:
>  > >
>  > > vj64$ /usr/bin/aspell --data-dir=/usr/lib/aspell --lang=en 
> --encoding=utf-8 create master /home/dockes/.recoll/aspdict.en.rws
>  > > Error: The file "/usr/lib/aspell/iso-8859-1.cset" can not be opened for 
> reading.
>  > 
>  > Does  --local-data-dir work? aspell-autobuildhash creates hashes with
>  > something like
>  > 
>  > zcat /usr/share/aspell/es.cwl.gz | precat | aspell
>  > --per-conf=/dev/null  --local-data-dir=/usr/lib/aspell --lang=es
>  > create master /var/lib/aspell/es.rws
>  > 
>  > and works smoothly.
> 
> Yes, this works, but it was not necessary on Wheezy (or any other system on
> which Recoll runs, including non-Linux ones).
> 
> If some reason lead to changing the location of certain aspell data files,
> it would seem reasonable that a corresponding change of the compiled-in
> defaults would let the command run without special arguments as it did in
> the past.

This change was related to semi-multiarch implementation. A new option was
added to aspell to allow a dictionaries location different from libdir. At
the same time global aspell data files were moved to /usr/share/aspell,
mostly for cosmetic reasons. 

This did not affect usual aspell dict packages use nor creation by
aspell-autobuildhash (I wonder why I added back in 2006 the --local-data-dir
option to dictionaries-common aspell-autobuildhash, may be only a single
location was looked for at that time). Seems that unfortunately this
particular case did not show up when doing the multiarch tests.

I have looked at aspell code and noticed that it looks for datafiles in two
locations,

a) local-data-dir or master-path. When checking, master-path is the location
   for multi or .rws files, thus in Debian data files are looked for in
   dict-dir (/usr/lib/aspell) unless explicit local-data-dir or local dict
   is used. This makes spellchecking work with the changed structure. 

   In your case, master-path is current dir, thus data files are not found
   there.

b) data-dir. The usual data-dir.

I see two approaches to this problem.

1) Make sure data files are also searched for under dict-dir. This requires
   a number of changes in aspell code to handle 3 search dirs instead of 2.
   I have been doing some changes for this, but this is probably more for
   upstream.

2) Partially revert the changes, keep independent multiarch lib-dir, but
   join data-dir and dict-dir into /usr/lib/aspell. This is a far simpler
   change, but will also need testing.

   This has another advantage. Noticed that aspell dicts building directly
   from aspell dict sources will install dat files under data-dir and dicts
   stuff under dict-dir, currently different locations. 

   I think is better to keep all (but autogenerated) stuff for a given dict
   together.

Brian, what do you think? 

I think the second option is better for Debian, but needs testing, a lot of
testing to make sure corner cases work. The problem is that we are in
freeze, the usual sid-> testing mechanism is disabled and only fixes for RC
bugs will reach jessie. 

https://release.debian.org/jessie/freeze_policy.html

And we should not upload to unstable, something RC might need care and would
interfere. Although nasty, I do not think this problem is RC, debian aspell
dicts build and work and there is the temporary --local-data-dir hack (or
using symlinks to xx.dat from current dict build dir). I think we should go
with 2) once jessie is released.

I am attaching a couple of patches with a first cut for both approaches,

1) 0001-quilt-patch_experimental-3-dirs-data-search.diff: diff for a quilt
   patch trying to implemnent 1)
2) 0001-debian-rules-Revert-dict-data-split-while-keeping-mu.patch. Changes
   needed for 2)

They are minimaly tested, but 1) is probably incomplete.

What do you think, Brian?

Regards,

-- 
Agustin
diff --git a/debian/patches/1000_extra-data-searchpath.diff b/debian/patches/1000_extra-data-searchpath.diff
new file mode 100644
index 0000000..1e87800
--- /dev/null
+++ b/debian/patches/1000_extra-data-searchpath.diff
@@ -0,0 +1,229 @@
+--- a/modules/speller/default/language.cpp
++++ b/modules/speller/default/language.cpp
+@@ -84,10 +84,10 @@
+     // get_lang_info
+     //
+ 
+-    String dir1,dir2,path;
++    String dir1,dir2,dir3,path;
+ 
+-    fill_data_dir(config, dir1, dir2);
+-    dir_ = find_file(path,dir1,dir2,lang,".dat");
++    fill_data_dir(config, dir1, dir2, dir3);
++    dir_ = find_file(path,dir1,dir2,dir3,lang,".dat");
+ 
+     lang_config_ = 
+       new Config("speller-lang",
+@@ -123,7 +123,7 @@
+   
+     FStream char_data;
+     String char_data_name;
+-    find_file(char_data_name,dir1,dir2,charset_,".cset");
++    find_file(char_data_name,dir1,dir2,dir3,charset_,".cset");
+     RET_ON_ERR(char_data.open(char_data_name, "r"));
+     
+     String temp;
+@@ -209,7 +209,8 @@
+ 
+     if (charmap_ != charset_) {
+       if (file_exists(dir1 + charset_ + ".cmap") || 
+-          file_exists(dir2 + charset_ + ".cmap"))
++          file_exists(dir2 + charset_ + ".cmap") ||
++          file_exists(dir3 + charset_ + ".cmap"))
+       {
+         charmap_ = charset_;
+       } else if (data_encoding_ == charset_) {
+@@ -288,7 +289,7 @@
+ 
+       String repl_file;
+       FStream REPL;
+-      find_file(repl_file, dir1, dir2, repl, "_repl", ".dat");
++      find_file(repl_file, dir1, dir2, dir3,repl, "_repl", ".dat");
+       RET_ON_ERR(REPL.open(repl_file, "r"));
+       
+       size_t num_repl = 0;
+@@ -596,9 +597,9 @@
+   {
+     String lang = c.retrieve("lang");
+ 
+-    String dir1,dir2,path;
+-    fill_data_dir(&c, dir1, dir2);
+-    String dir = find_file(path,dir1,dir2,lang,".dat");
++    String dir1,dir2,dir3,path;
++    fill_data_dir(&c, dir1, dir2,dir3);
++    String dir = find_file(path,dir1,dir2,dir3,lang,".dat");
+ 
+     String file;
+     file += dir;
+@@ -616,13 +617,13 @@
+     String l_data = c.retrieve("lang");
+     char * l = l_data.mstr();
+ 
+-    String dir1,dir2,path;
+-    fill_data_dir(&c, dir1, dir2);
++    String dir1,dir2,dir3,path;
++    fill_data_dir(&c, dir1, dir2,dir3);
+ 
+     char * s = l + strlen(l);
+ 
+     while (s > l) {
+-      find_file(path,dir1,dir2,l,".dat");
++      find_file(path,dir1,dir2,dir3,l,".dat");
+       if (file_exists(path)) {
+         c.replace_internal("actual-lang", l);
+         return true;
+--- a/modules/speller/default/typo_editdist.cpp
++++ b/modules/speller/default/typo_editdist.cpp
+@@ -96,9 +96,9 @@
+   TypoEditDistanceInfo::get_new(const char * kb, const Config * cfg, const Language * l)
+   {
+     FStream in;
+-    String file, dir1, dir2;
+-    fill_data_dir(cfg, dir1, dir2);
+-    find_file(file, dir1, dir2, kb, ".kbd");
++    String file, dir1, dir2,dir3;
++    fill_data_dir(cfg, dir1, dir2, dir3);
++    find_file(file, dir1, dir2, dir3, kb, ".kbd");
+     RET_ON_ERR(in.open(file.c_str(), "r"));
+ 
+     ConvEC iconv;
+--- a/common/file_data_util.cpp
++++ b/common/file_data_util.cpp
+@@ -11,7 +11,8 @@
+   // FIXME: The case when there is no "/" in the master-path should not
+   //   happen since it is an internal option.  Unofficially, it can still
+   //   be set by the user.  This needs to eventually be fixed.
+-  void fill_data_dir(const Config * config, String & dir1, String & dir2) {
++  void fill_data_dir(const Config * config,
++		     String & dir1, String & dir2, String & dir3) {
+     if (config->have("local-data-dir")) {
+       dir1 = config->retrieve("local-data-dir");
+       if (dir1[dir1.size()-1] != '/') dir1 += '/';
+@@ -23,22 +24,26 @@
+       else
+         dir1 = "./";
+     }
+-    dir2 = config->retrieve("data-dir");
++    dir2 = config->retrieve("dict-dir");
+     if (dir2[dir2.size()-1] != '/') dir2 += '/';
++    dir3 = config->retrieve("data-dir");
++    if (dir3[dir3.size()-1] != '/') dir3 += '/';
+   }
+   
+   const String & find_file(String & file,
+-                           const String & dir1, const String & dir2, 
++                           const String & dir1, const String & dir2, const String & dir3,
+                            const String & name, const char * extension)
+   {
+     file = dir1 + name + extension;
+     if (file_exists(file)) return dir1;
+     file = dir2 + name + extension;
+-    return dir2;
++    if (file_exists(file)) return dir2;
++    file = dir3 + name + extension;
++    return dir3;
+   }
+ 
+   bool find_file(String & file,
+-                 const String & dir1, const String & dir2, 
++                 const String & dir1, const String & dir2, const String & dir3,
+                  const String & name, 
+                  ParmString preext, ParmString ext)
+   {
+@@ -52,18 +57,25 @@
+       if (file_exists(file)) return true;
+       file = dir2 + n;
+       if (file_exists(file)) return true;
++      file = dir3 + n;
++      if (file_exists(file)) return true;
+ 
+       n = name; n += ext;
+       file = dir1 + n;
+       if (file_exists(file)) return true;
+       file = dir2 + n;
+       if (file_exists(file)) return true;
++      file = dir3 + n;
++      if (file_exists(file)) return true;
+     }
+ 
+     file = dir1 + name;
+     if (file_exists(file)) return true;
+     file = dir2 + name;
+     if (file_exists(file)) return true;
++    file = dir3 + name;
++    if (file_exists(file)) return true;
++
+ 
+     if (try_name_only) {file = dir1 + name;}
+     else               {file = dir1 + name; file += preext; file += ext;}
+--- a/common/file_data_util.hpp
++++ b/common/file_data_util.hpp
+@@ -7,12 +7,14 @@
+ 
+ namespace acommon {
+ 
+-  void fill_data_dir(const Config *, String & dir1, String & dir2);
++  void fill_data_dir(const Config *,
++		     String & dir1, String & dir2, String & dir3);
+   const String & find_file(String & path,
+-                           const String & dir1, const String & dir2, 
++                           const String & dir1,
++			   const String & dir2, const String & dir3,
+                            const String & name, const char * extension);
+   bool find_file(String & file,
+-                 const String & dir1, const String & dir2, 
++                 const String & dir1, const String & dir2, const String & dir3,
+                  const String & name, 
+                  ParmString preext, ParmString ext);
+ }
+--- a/common/convert.cpp
++++ b/common/convert.cpp
+@@ -292,9 +292,9 @@
+     to.reset();
+     from.reset();
+     
+-    String dir1,dir2,file_name;
+-    fill_data_dir(&config, dir1, dir2);
+-    find_file(file_name,dir1,dir2,encoding,".cset");
++    String dir1,dir2,dir3,file_name;
++    fill_data_dir(&config, dir1, dir2, dir3);
++    find_file(file_name,dir1,dir2,dir3,encoding,".cset");
+ 
+     FStream data;
+     PosibErrBase err = data.open(file_name, "r");
+@@ -486,9 +486,9 @@
+   PosibErr<NormTables *> NormTables::get_new(const String & encoding, 
+                                              const Config * config)
+   {
+-    String dir1,dir2,file_name;
+-    fill_data_dir(config, dir1, dir2);
+-    find_file(file_name,dir1,dir2,encoding,".cmap");
++    String dir1,dir2,dir3,file_name;
++    fill_data_dir(config, dir1, dir2,dir3);
++    find_file(file_name,dir1,dir2,dir3,encoding,".cmap");
+     
+     FStream in;
+     PosibErrBase err = in.open(file_name, "r");
+@@ -938,13 +938,18 @@
+     if (strcmp(enc, "utf-8") == 0 
+         || strcmp(enc, "ucs-2") == 0 
+         || strcmp(enc, "ucs-4") == 0) return false;
+-    String dir1,dir2,file_name;
+-    fill_data_dir(&c, dir1, dir2);
++    String dir1,dir2,dir3,file_name;
++    fill_data_dir(&c, dir1, dir2,dir3);
+     file_name << dir1 << enc << ".cset";
+     if (file_exists(file_name)) return false;
+-    if (dir1 == dir2) return true;
++    if (dir1 != dir2){
++      file_name.clear();
++      file_name << dir2 << enc << ".cset";
++      if (file_exists(file_name)) return false;
++    }
++    if (dir1 == dir3 || dir2 == dir3) return true;
+     file_name.clear();
+-    file_name << dir2 << enc << ".cset";
++    file_name << dir3 << enc << ".cset";
+     return !file_exists(file_name);
+   }
+ 
diff --git a/debian/patches/series b/debian/patches/series
index f2678df..a36d714 100644
--- a/debian/patches/series
+++ b/debian/patches/series
@@ -7,3 +7,4 @@
 #10_autotools.diff
 08_filters-info-installdir.diff
 09_debian-dictdir.diff
+1000_extra-data-searchpath.diff
>From 015e9fb8df3bfea5a6e90e7fd063b7aa1f549c2e Mon Sep 17 00:00:00 2001
From: Agustin Martin Domingo <agmar...@debian.org>
Date: Tue, 9 Dec 2014 12:43:27 +0100
Subject: [PATCH] debian/rules: Revert dict/data split while keeping multiarch
 pkglibdir (#772415)

---
 debian/libaspell15.install | 2 +-
 debian/rules               | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/debian/libaspell15.install b/debian/libaspell15.install
index cd49c5f..e8b8bc4 100644
--- a/debian/libaspell15.install
+++ b/debian/libaspell15.install
@@ -1,4 +1,4 @@
 usr/lib/*/libaspell.so.*
 usr/lib/*/libpspell.so.*
 usr/lib/aspell
-usr/share/aspell
+
diff --git a/debian/rules b/debian/rules
index 6b0f9b2..d8d1a7b 100755
--- a/debian/rules
+++ b/debian/rules
@@ -14,7 +14,7 @@ DEB_DH_MAKESHLIBS_ARGS_ALL := -V'libaspell15 (>= 0.60.7~20110707)' -Xusr/lib/asp
 
 DEB_DH_INSTALL_SOURCEDIR   := debian/tmp
 DEB_INSTALL_CHANGELOGS_ALL := ChangeLog.html
-DEB_CONFIGURE_EXTRA_FLAGS  := --enable-pkgdatadir=/usr/share/aspell \
+DEB_CONFIGURE_EXTRA_FLAGS  := --enable-pkgdatadir=/usr/lib/aspell \
 	                      --enable-pkglibdir=/usr/lib/aspell/$(DEB_HOST_MULTIARCH) \
 			      --enable-debian-dict-dir=/usr/lib/aspell \
 			      --libdir=\$${prefix}/lib/$(DEB_HOST_MULTIARCH) \
-- 
2.1.3

Reply via email to