Re: Bug#935128: aspell: potentially unbounded buffer over-read in GNU Aspell 0.60.*

2019-08-28 Thread Agustin Martin
On Mon, Aug 19, 2019 at 04:33:40PM -0400, Kevin Atkinson wrote:
> On Mon, 19 Aug 2019, Salvatore Bonaccorso wrote:
>
> > See https://lists.gnu.org/archive/html/aspell-announce/2019-08/msg0.html
>
> > Within Debian the "pumpa" will need an update. Others might be
> > required as well. Kevin Atkinson might be up for help if needed.
> Also see http://aspell.net/buffer-overread-ucs.txt for a slightly improved
> version of the announcement that I edited for clarity.

Hi all,

This message is sent to all packages that depend in some way on
libaspell15 (pdo addresses bcc'ed)

A potentially unbounded buffer over-read has been found in in GNU
Aspell 0.60.*. Package aspell 0.60.7-1 has been uploaded to Debian
experimental, including upstream patch to deal with this problem.

Unfortunately this fix may break applications that use null-terminated
UCS-2 or UCS-4 strings with the C API.  These applications will need
to be fixed to make use of the new more secure API in order to
continue to have a functional spell checker.

Most applications use UTF-8 strings and thus do not need to be fixed.

Please read http://aspell.net/buffer-overread-ucs.txt (and the
original announcement in
https://lists.gnu.org/archive/html/aspell-announce/2019-08/msg0.html)
for details and check if your package is affected. That file and new
aspell manual, contain information about what to do if that happens.

I would like to leave aspell package in experimental for one week to
allow possibly affected packages to be checked and fixed if
appropriate. Since there is no longer a dict-common-dev mailing list,
please use this bug report to notify if your package is affected and
if you need more time before new aspell with that fix is uploaded to
sid. If you need additional help, please contact the aspell-devel
mailing list (https://lists.gnu.org/mailman/listinfo/aspell-devel).

Regards,



Re: Bug#935128: aspell: potentially unbounded buffer over-read in GNU Aspell 0.60.*

2019-08-29 Thread Agustin Martin
On Wed, Aug 28, 2019 at 07:32:35PM -0400, Kevin Atkinson wrote:
> On Thu, 29 Aug 2019, Agustin Martin wrote:
> 
> > This message is sent to all packages that depend in some way on
> > libaspell15 (pdo addresses bcc'ed)
> > 
> > A potentially unbounded buffer over-read has been found in in GNU
> > Aspell 0.60.*. Package aspell 0.60.7-1 has been uploaded to Debian
> > experimental, including upstream patch to deal with this problem.
> 
> It looks like you just applied the patches from Git.  This will not work
> with a release as Aspell uses a lot of generated source files which are not
> checked into git.  You need to run 'maintainer/autogen' to update them after
> applying the patch. Assuming the normal Debian build process rebuilds the
> automake/conf related bits then you can likely get away with just doing a:
> 
>   cd auto/
>   perl -I ./ mk-src.pl
>   perl -I ./ mk-doc.pl
>   touch auto
>   cd ..

Thanks a lot for the info, 

aspell 0.60.7-2 just uploaded to Debian experimental. Build for the
different arches should start soon.

> There are some tests in test/.  There not very expensive and will make sure
> that that Aspell is correctly patched with the new interface intended for
> working with wide-characters  You should be able to run the tests by doing a
> 
>   make -C test

Unfortunately, this seems to need more that just the two git patches to work
with plain 0.60.7 (only part of test/ is created), like an updated test dir,
the aspell filter command and some new filters. Will try to extract the
relevant patches and try.

Regards,

-- 
Agustin



Re: Bug#935128: Packages potentially affected by unbounded buffer over-read in GNU Aspell 0.60.*

2019-08-30 Thread Agustin Martin
On Thu, Aug 29, 2019 at 12:20:28AM +0200, Agustin Martin wrote:
> On Mon, Aug 19, 2019 at 04:33:40PM -0400, Kevin Atkinson wrote:
> > On Mon, 19 Aug 2019, Salvatore Bonaccorso wrote:
> >
> > > See 
> > > https://lists.gnu.org/archive/html/aspell-announce/2019-08/msg0.html
> >
> > > Within Debian the "pumpa" will need an update. Others might be
> > > required as well. Kevin Atkinson might be up for help if needed.
> > Also see http://aspell.net/buffer-overread-ucs.txt for a slightly improved
> > version of the announcement that I edited for clarity.
> 
> Hi all,
> 
> This message is sent to all packages that depend in some way on
> libaspell15 (pdo addresses bcc'ed)
> 
> A potentially unbounded buffer over-read has been found in in GNU
> Aspell 0.60.*. Package aspell 0.60.7-1 has been uploaded to Debian
> experimental, including upstream patch to deal with this problem.
> 
> Unfortunately this fix may break applications that use null-terminated
> UCS-2 or UCS-4 strings with the C API.  These applications will need
> to be fixed to make use of the new more secure API in order to
> continue to have a functional spell checker.

This is the list of non aspell packages depending on libaspell15 which
are possibly affected (maintainers bcc'ed),

 eiskaltdcpp-qt
 enchant
 gnustep-gui-runtime
 inkscape
 kdelibs5-plugins
 libenchant1c2a
 libenchant2
 libenchant-voikko
 librcc0
 libtext-aspell-perl
 mcabber
 php7.3-pspell
 pumpa
 raspell
 sonnet-plugins
 tea
 weechat-plugins
 xmlcopyeditor
 yagf

-- 
Agustin



Re: Bug#1020387: dictionaries-common: Consensus regarding the packaging of the Qt WebEngine hunspell binary dictionaries

2022-10-28 Thread Agustin Martin
El mar, 25 oct 2022 a las 20:43, Soren Stoutner () escribió:
>
> While we wait for answers as to whether these dictionaries can be used by the
> Chromium package and how they might possibly be integrated with upstream
> Hunspell, I would recommend that we move forward with packaging them in /usr/
> share/hunspell-bdic.  This location provides flexibility for whatever ends up
> happening with upstream Hunspell and Chromium.
>
> The question at this point is if they should be generated at package creation
> or if they should be generated during install.  It appears that the majority
> leans towards generating them at package creation.  Is there anyone who feels
> strongly the other way?

Hi all,

I am not particularly happy about this (see details below), but seems
we will have to package all these .bdic files because qtwebengine and
chromium use them. Since some .bdic may fail to build I would rather
prefer them to be generated during package creation, where it is
easier not to create them if required. If done during package install
I think everything should be handled from qtwebengine package. In this
case some fine tuning can be done to improve efficiency (handling
symlinks better, regenerate only when a new version of dict package is
installed or incompatibilities in qtwebengine hunspell appear, ...)

Why I am not that happy about these .bdic files? Looking at
https://chromium.googlesource.com/chromium/deps/hunspell_dictionaries/+/refs/heads/main/README.chromium
and 
https://sites.google.com/a/chromium.org/dev/developers/how-tos/editing-the-spell-checking-dictionaries
the only reasons for this seem to be support for delta files, where
"The .dic_delta files are used to add words which are not there in the
.dic files" and having everything UTF-8. Correct me if I am wrong.

Packaging all possible hunspell dicts in .bdic format will in practice
not be useful to support delta files as originally intended, since
original hunspell dict will be used. Debian maintainer could use a
delta file for Debian changes in poorly maintained dicts, but I think
that in this case they should better patch original .dic file to make
the fix available to all hunspell users.

Another thing I do not like is to have three copies of hunspell flying
around, original hunspell lib and those embedded in chromium and
qtwebengine. This makes harder to keep everything synced.

I agree that that the best would to extend hunspell, but to support
.dic_delta files instead of changing it to use bdic format. Part of
the code may even be reusable to support something like aspell .multi
files.

Regards,

-- 
Agustin





https://github.com/sheremetyev/hunspell
>
> --
> Soren Stoutner
> so...@stoutner.com



Re: Bug#1020387: dictionaries-common: Consensus regarding the packaging of the Qt WebEngine hunspell binary dictionaries

2022-11-13 Thread Agustin Martin
El jue, 3 nov 2022 a las 23:33, Soren Stoutner () escribió:
>
> On Friday, October 28, 2022 4:09:45 AM MST Agustin Martin wrote:
> > I am not particularly happy about this (see details below), but seems
> > we will have to package all these .bdic files because qtwebengine and
> > chromium use them. Since some .bdic may fail to build I would rather
> > prefer them to be generated during package creation, where it is
> > easier not to create them if required. If done during package install
> > I think everything should be handled from qtwebengine package. In this
> > case some fine tuning can be done to improve efficiency (handling
> > symlinks better, regenerate only when a new version of dict package is
> > installed or incompatibilities in qtwebengine hunspell appear, ...)
>
> I agree with you.  I am also unhappy that Chromium and QtWebEngine want to use
> a specialized file format instead of just using the standard Hunspell files.
> However, as much as I don’t like it, I also agree with you that the best thing
> Debian can do in the short term is to move forward with the packaging of these
> .bdic files while we wait to see if we can make any changes upstream.
>
> Given that nobody else responded to this question, I think there is a
> consensus that it is best to create the .bdic files during package creation.
>
> The next question that needs to be answered is if we should create new binary
> packages for the .bdic files or if we should ship them as part of the existing
> Hunspell language binary packages.  The opinions that have been expressed so
> far have run the gamut on both sides, but my sense is they lean a little
> towards shipping them in the existing Hunspell packages so as to not add 80+
> new packages to Debian that only contain a few files each.
>
> Is there anyone who feels strongly that they should not be shipped in the
> existing files?

Hi,

I am for the approach that causes as little annoyance as possible to
the Debian archive, and I think that is using current packages. This
way we do not bother ftpmasters with all these new packages that might
be temporary.

I would personally expect this to be temporary until someone with the
appropiate skills provides a patch to make qtwebengine use system
hunspell in Debian (as has already been done for other libs in Debian
qtwebengine). I looked at the embedded hunspell code, but I am far
from having those skills, so got no result.

Also note that https://github.com/sheremetyev/hunspell seems to be
based in a 10 years old fork of hunspell. I hope hunspell code in
chromium and qtwebengine is not 10 years old and hunspell upstream has
been tracked for updates (at least for security updates). I have done
a quick comparison and they are not exactly the same, and not only
cosmetically, but did not go further.

It is to note that even that 10 years code apparently has support for
the IGNORE flag, unsupported by the .bdic dicts. Fortunately, seems
that there are not many dicts using that flag in
libreoffice-dictionaries.

libreoffice-dictionaries-7.4.2$ grep -r IGNORE *
dictionaries/bo/bo.aff:IGNORE ༵༷
dictionaries/ar/ar.aff:IGNORE ًٌٍَُِّْـٰ
dictionaries/uk_UA/uk_UA.aff:IGNORE ́
dictionaries/ckb/dictionaries/ckb.aff:IGNORE ًٌٍَُِّْـٰ١٢٣٤۴٥۵٦۶٧٨٩٠
dictionaries/hu_HU/hu_HU.aff:IGNORE ()]



Re: Bug#1020387: dictionaries-common: Consensus regarding the packaging of the Qt WebEngine hunspell binary dictionaries

2022-12-06 Thread Agustin Martin
El dom, 4 dic 2022 a las 4:54, Soren Stoutner () escribió:
>
> I created an MR:
>
> https://salsa.debian.org/debian/dictionaries-common/-/merge_requests/5
>
> Please review and make sure I haven’t missed anything or misrepresented the 
> consensus.

Merged.

Will wait some days for possible new comments.



Re: Bug#1020387: dictionaries-common: Consensus regarding the packaging of the Qt WebEngine hunspell binary dictionaries

2022-12-09 Thread Agustin Martin
El mar, 6 dic 2022 a las 23:34, Agustin Martin
() escribió:
>
> El dom, 4 dic 2022 a las 4:54, Soren Stoutner () escribió:
> >
> > I created an MR:
> >
> > https://salsa.debian.org/debian/dictionaries-common/-/merge_requests/5
> >
> > Please review and make sure I haven’t missed anything or misrepresented the 
> > consensus.
>
> Merged.
>
> Will wait some days for possible new comments.

By the way, I have been playing with an old helper
(installdeb-myspell) shipped with dictionaries-common-dev to help with
these bdic files. First cut committed to salsa. Currently
installdeb-myspell will fail if no conversion tool is found.



Re: Bug#1020387: dictionaries-common: Consensus regarding the packaging of the Qt WebEngine hunspell binary dictionaries

2022-12-13 Thread Agustin Martin
El mar, 13 dic 2022 a las 18:43, Soren Stoutner () escribió:
>
> Can one of the Debian Qt/KDE maintainers weigh in on the feasibility of 
> either creating a meta package that depends on the most recent package that 
> includes qwebengine_convert_dict or creating an unversioned package that 
> installs qwebengine_convert_dict?  Also, either having 
> qwebengine_convert_dict being installed in an unversioned location or having 
> a symlink that is unversioned?  That would make it easier for Hunspell 
> language packages to build-depend on qwebengine_convert_dict and wouldn’t 
> require reworking all of those packages’ build scripts every time the version 
> of Qt in Debian changes.

I modified installdeb-myspell to look for both, with qt6 version
preferred. In policy document, I commented about qt5 version
existence, but discouraging its use as it will disappear sooner. In
theory it could be useful for stable backports, but since .bdic sid
version should be usable unchanged in stable there is no real use for
it.

> Regarding qwebengine_convert_dict expecting the .dic as a file entry, I am 
> not certain I understand what you are referring to.  This is how it builds on 
> my Debian testing system.  The .dic file must be in the same directory as the 
> .aff, but it isn’t specified (or at least doesn’t need to be specified) as a 
> file entry.

$ /usr/lib/qt6/libexec/qwebengine_convert_dict
Usage: qwebengine_convert_dict  

Just put what usage note and associated example shows, it is supposed
to be more "stable". Noticed that qwebengine_convert_dict seems to
accept any of both (and look for the other). In theory, a dic file may
have no associated aff file (and be a plain wordlist), but just
checked that even that requires an empty aff file.

-- 
Agustin



Re: Bug#1020387: dictionaries-common: Consensus regarding the packaging of the Qt WebEngine hunspell binary dictionaries

2022-10-05 Thread Agustin Martin
El jue, 22 sept 2022 a las 21:30, Soren Stoutner
() escribió:
>
> On Thursday, September 22, 2022 9:20:46 AM MST Agustin Martin wrote:
>
> > First of all, I am curious about the reasons behind this new format,
> > the problems it deals with and its advantages. I assume they are valid
> > enough, but they imply yet another spellchecking engine/format. We
> > currently have goog old ispell, aspell and hunspell. vim has its own
> > spellchecker engine using its own format, with dicts that can be
> > created from old myspell2 dicts. We did not add vim format dicts (from
> > aspell dicts sources) since there seems to be some work to make vim
> > use hunspell directly. And now these bdict dicts.
>
> The .bdic format is specified by the upstream Chromium project, and is 
> required by anything that is based off of Chromium's code, like Qt WebEngine. 
>  I do not know why they went with a proprietary binary format, but I would 
> assume that if they went to so much trouble to not use the standard Hunspell 
> format there must have been something to make it worthwhile, like some 
> performance improvement.  Perhaps I am giving Google too much credit for 
> having logical reasons instead of making arbitrary decisions.

Hi, Soren

It s a pity not to have more info about the reasons for this new
format. Even if using it is more effficient in terms of plain
performance, I do not think that is noticeable in stuff like chromium.

Another question is whether that bdic format is expected to change or
that is very unlikely.

Thinking about this, I have done some tests about these bdic files
being generated at postinst, like emacs byte-compiled files (although
my tests were  more rude), delegating everything to the qtwebengine
packages. . bdic generation is not very slow, but IMHO is not fast
enough to go this way (which woud require moving
qwebengine_convert_dic to Qt WebEngine runtime package and control
everything from it).

One noticeable thing is that bdic generation  failed for some hunspell
dicts I have installed

++ processing an_ES.aff
[1003/125813.760330:FATAL:aff_reader.cc(305)] Did not find a space in 'yi'.
Trace/breakpoint trap
++ processing ar.aff
[1003/125813.796753:FATAL:aff_reader.cc(123)] We don't support the
IGNORE command yet. This would change how we would insert things in
our lookup table.
++ processing gl_ES.aff
gl_ES.dic_delta not found.
Reading gl_ES.aff
Reading gl_ES.dic
Serializing...
Verifying...
Word does not match!
  Index:2126
  Expected: Abū po:antropónimo
is:ngrama_Abū_ʿAbdullāh_Muḥammad_ibn_Jābir_ibn_Sinān_ar_Raqqī_al_Ḥarrani_aṣ_Ṣabiʾ_al_Battānī
  Actual:   Abū po:antropónimo
is:ngrama_Abū_ʿAbdullāh_Muḥammad_ibn_Jābir_ibn_Sinān_ar_Raqqī_al_Ḥarrani_aṣ_Ṣabiʾ_al_Battā
ERROR converting, the dictionary does not check out OK.

Regards,

-- 
Agustin