On Tue, 2016-09-06 at 00:54:38 +0200, Sergio Oller wrote: > 2016-09-05 23:32 GMT+02:00 Guillem Jover <guil...@debian.org>: > > * The copyright years and holders might need an update? > > I have updated the copyright years and holders for the files that have > changed among versions.
Sorry for being nit-picky, but I think you mentioned that you modified those files, although as holder UPC and/or Antonio Bonafonte are mentioned. Shouldn't you add yourself there? Once this is clarified, I'll be doing the upload. > > * I have no clue how these voices work internally so you'll have > > to give me a hand here. What's the source for upc_ca_ona.htsvoice? > > How do you generate that? Do we have the tools to do so in Debian? > > Otherwise we might need to put this in contrib. :/ > > I would understand moving the package to contrib if that is where it > belongs. > > There is a summarised description of the process of generating the htsvoice > file in the README.source file. Basically, the htsvoice file contains the > voice model (so no "code", just "data"). Ah right, sorry I think I had seen it before when I checked the source some weeks ago, but had forgotten. At first sight, it unfortunately looks like a contrib candidate to me, indeed. In Debian we consider code/data just software, so the distinction is not usually significant. I think the situation here is complicated by several factors: 1) The LGPL on the output means the source should be shipped together. And the DFSG does requite that anyway. 2) The source is very huge, and might not be suitable for the archive anyway. 3) Generating the output is very time and memory consuming, which means it is unfeasible to build as any normal package, but given 2), we cannot easily fallback to use the pre-generated files and just ship the sources in case someone wants to change something. 4) And to generate the source we need a non-free tool as well. Perhaps other voices can easily get by because they do not suffer (as much) from 1, 2 and 4, which means that 3 ends up being acceptable. > Other packages (such as festlex-poslex) do provide trained models without > the training the data from scratch and are in main. I found a thread > https://lists.debian.org/debian-legal/2009/05/msg00028.html where a similar > case is discussed, but I did not see a final consensus on the issue. Also, > the ftp masters authorised the previous upload of this package. Right, I think this has the potential to affect pretty much all the festival voices in Debian, so this probably deserves a wider discussion at least within the TTS team. So I'm fine with uploading a package fixing the current bug, and then dealing with any decision on the location of the sources in the archive. > I (as part of upstream) cannot improve the situation for the htsvoice > models as we don't have the resources to develop a fully free alternative > to HTK. Providing the raw data should allow anyone to train better models > and it is already more than what other packages have done. Having the raw data and the output voices is already great! Even if we end up concluding that this needs to go into contrib. Thanks, Guillem