Bug#785799: [festvox-ca-ona-hts] Since las festival upgrade, catalan voice is not correctly recognized and does not work

Sergio Oller Tue, 06 Sep 2016 00:55:21 -0700

Hi,

El dia 6 set. 2016 3:46 a. m., "Guillem Jover" <guil...@debian.org> va
escriure:
>
> On Tue, 2016-09-06 at 00:54:38 +0200, Sergio Oller wrote:
> > 2016-09-05 23:32 GMT+02:00 Guillem Jover <guil...@debian.org>:
> > >  * The copyright years and holders might need an update?
> >
> > I have updated the copyright years and holders for the files that have
> > changed among versions.
>
> Sorry for being nit-picky, but I think you mentioned that you modified
> those files, although as holder UPC and/or Antonio Bonafonte are
> mentioned. Shouldn't you add yourself there?
>
> Once this is clarified, I'll be doing the upload.
>


Even though I may be author or coauthor of some of the files of the
package, that does not imply that I own the copyright. A typical case were
this happens is if I were an employee (the copyright of my work on my
worktime usually would belong to my employer). I do have moral authorship
rights and that is acknowledged in the documentation. So I don't need and I
should not add myself to the copyright, as I do not own it.

With respect to the copyright dates I am not sure they need to be modified
for minor changes. After the Bern convention it seems that the most
important date is the first one (in case there is a copyright dispute). The
last date will affect when the work will be available as public domain
(around 70 years after the death of the copyright holder) and that is not a
concern for the copyright owners of the package as long as it seems to be
far beyond their lifetime.

> > >  * I have no clue how these voices work internally so you'll have
> > >    to give me a hand here. What's the source for upc_ca_ona.htsvoice?
> > >    How do you generate that? Do we have the tools to do so in Debian?
> > >    Otherwise we might need to put this in contrib. :/
> >
> > I would understand moving the package to contrib if that is where it
> > belongs.
> >
> > There is a summarised description of the process of generating the
htsvoice
> > file in the README.source file. Basically, the htsvoice file contains
the
> > voice model (so no "code", just "data").
>
> Ah right, sorry I think I had seen it before when I checked the source
> some weeks ago, but had forgotten. At first sight, it unfortunately looks
> like a contrib candidate to me, indeed.
>

> In Debian we consider code/data just software, so the distinction is
> not usually significant.
>
> I think the situation here is complicated by several factors:
>
>  1) The LGPL on the output means the source should be shipped
>     together. And the DFSG does requite that anyway.

The LGPL is a very poor choice of a license for data, as it was designed
for code libraries. The intention was that if someone improves the model
they need to release it with the same terms. Probably a CC-BY-SA would be
more appropriate. In the future I will discuss with the copyright owners a
possible license change.

>  2) The source is very huge, and might not be suitable for the
>     archive anyway.

By source do you understand the raw data? The tools needed for training the
model? Or both? I ask just to be in the same page.

>  3) Generating the output is very time and memory consuming, which
>     means it is unfeasible to build as any normal package, but
>     given 2), we cannot easily fallback to use the pre-generated
>     files and just ship the sources in case someone wants to change
>     something.

That is the main reason the raw data is not provided in the package
directly.

>  4) And to generate the source we need a non-free tool as well.
>
> Perhaps other voices can easily get by because they do not suffer (as
> much) from 1, 2 and 4, which means that 3 ends up being acceptable.
>

I will check other voices if you want. I don't think all of them provide
the raw data (it was not considered part of the source).

> > Other packages (such as festlex-poslex) do provide trained models
without
> > the training the data from scratch and are in main. I found a thread
> > https://lists.debian.org/debian-legal/2009/05/msg00028.html where a
similar
> > case is discussed, but I did not see a final consensus on the issue.
Also,
> > the ftp masters authorised the previous upload of this package.
>
> Right, I think this has the potential to affect pretty much all the
> festival voices in Debian, so this probably deserves a wider discussion
> at least within the TTS team. So I'm fine with uploading a package fixing
> the current bug, and then dealing with any decision on the location of
> the sources in the archive.

The problem is wider than that:
- do 2D images rendered from 3D models require the 3D models as source?
- do audio tracks, effects or mixtures require the original audio tracks?
- what about data collected from telescopes, for instance star positions
that have been found by analyzing and filtering raw telescope/satellite
images? (I do not know if that exists in a Debian package or not, but it is
likely the astronomy people in Debian have similar datasets)

And then in voices there is also the impact this will have on Debian
accessibility features.

Having a consistent data and data models policy in Debian would be awesome
but it is beyond my habilities and expertise.

>
> Having the raw data and the output voices is already great! Even if we
> end up concluding that this needs to go into contrib.
>

Whenever it needs to go, I hope it can be uploaded soon :-)

Best,
Sergio

Bug#785799: [festvox-ca-ona-hts] Since las festival upgrade, catalan voice is not correctly recognized and does not work

Reply via email to