On Tue, 2016-09-06 at 00:54:38 +0200, Sergio Oller wrote:
> 2016-09-05 23:32 GMT+02:00 Guillem Jover <guil...@debian.org>:
> >  * The copyright years and holders might need an update?
> 
> I have updated the copyright years and holders for the files that have
> changed among versions.

Sorry for being nit-picky, but I think you mentioned that you modified
those files, although as holder UPC and/or Antonio Bonafonte are
mentioned. Shouldn't you add yourself there?

Once this is clarified, I'll be doing the upload.

> >  * I have no clue how these voices work internally so you'll have
> >    to give me a hand here. What's the source for upc_ca_ona.htsvoice?
> >    How do you generate that? Do we have the tools to do so in Debian?
> >    Otherwise we might need to put this in contrib. :/
> 
> I would understand moving the package to contrib if that is where it
> belongs.
> 
> There is a summarised description of the process of generating the htsvoice
> file in the README.source file. Basically, the htsvoice file contains the
> voice model (so no "code", just "data").

Ah right, sorry I think I had seen it before when I checked the source
some weeks ago, but had forgotten. At first sight, it unfortunately looks
like a contrib candidate to me, indeed.

In Debian we consider code/data just software, so the distinction is
not usually significant.

I think the situation here is complicated by several factors:

 1) The LGPL on the output means the source should be shipped
    together. And the DFSG does requite that anyway.
 2) The source is very huge, and might not be suitable for the
    archive anyway.
 3) Generating the output is very time and memory consuming, which
    means it is unfeasible to build as any normal package, but
    given 2), we cannot easily fallback to use the pre-generated
    files and just ship the sources in case someone wants to change
    something.
 4) And to generate the source we need a non-free tool as well.

Perhaps other voices can easily get by because they do not suffer (as
much) from 1, 2 and 4, which means that 3 ends up being acceptable.

> Other packages (such as festlex-poslex) do provide trained models without
> the training the data from scratch and are in main. I found a thread
> https://lists.debian.org/debian-legal/2009/05/msg00028.html where a similar
> case is discussed, but I did not see a final consensus on the issue. Also,
> the ftp masters authorised the previous upload of this package.

Right, I think this has the potential to affect pretty much all the
festival voices in Debian, so this probably deserves a wider discussion
at least within the TTS team. So I'm fine with uploading a package fixing
the current bug, and then dealing with any decision on the location of
the sources in the archive.

> I (as part of upstream) cannot improve the situation for the htsvoice
> models as we don't have the resources to develop a fully free alternative
> to HTK. Providing the raw data should allow anyone to train better models
> and it is already more than what other packages have done.

Having the raw data and the output voices is already great! Even if we
end up concluding that this needs to go into contrib.

Thanks,
Guillem

Reply via email to