[I sent this a few days ago but the list was down]

Hi Fran,


El 28/02/18 a les 12:10, Francis Tyers ha escrit:
> El 2018-02-23 18:59, Antonio Toral escribió:
>> Dear apertiumers,
>>
>> I've an idea for GSOC about subtitles. Before putting it on the wiki,
>> I wanted to check with _yous_, as it seems that there has been
>> previous related work [1].
>>
>> In a nutshell, it is about translating subtitles from OpenSubtitles
>> with Apertium, for closely-related pairs of languages A and B such
>> that (i) there are mature A-->B systems in Apertium and (ii) on
>> OpenSubtiles there are many subtitles for A and very few for B. An
>> example is A=ES, B=CA. There could be 2 tasks:
>>
>> 1. Development of a tool to translate subtitles. Given a translation
>> direction A-->B:
>>
>>     1.1. Use OpenSubtitles' API to find subtitles S in A not
>> translated yet into B
>>
>>     1.2. Translate S from A to B using Apertium's API.
>>
>>     1.3. Upload the translated subtitles to OpenSubtitles. These
>> subtitles could have a preamble such as "Warning: this subtitle is
>> machine translated! Powered by Apertium.org"
>>
>> 2. Evaluation of Apertium's quality for subtitles (for 1 or 2
>> translation directions) and improvements/modifications in Apertium's
>> systems for those directions based on that evaluation.
>>
>> I'd be happy to hear opinions, experiences from related previous work,
>> criticism, etc :)
>
> My main question would be about licensing, as far as I'm aware most of
> the stuff in OpenSubtitles is not available under a free licence.
Good point! I found the following [1], from where I quote:

"Subtitle files are almost never free to redistribute, as they almost
always are a non-authorized derivative work of the original movie.
Creating a text form of the audio of an audiovisual work creates a
derivative work, and so does creating a translation. You need a license
to allow you to do so. For non-open movies you don't have that license."

So the subtitles are indeed problematic license-wise, however not per se
but due to the fact that they are derivatives of non-open movies...
Despite this issue, they are on-line, as is a derived parallel corpus...


>
> Other comments:
>
> * The type of text in OpenSubtitles is typically of a very different
> domain
> to what Apertium is usually used for, so might result in a lot of
> weirdness
> in expressions (unless extra work is done on the MT systems for coping
> with dialogue type texts)
This raises two interesting questions:
1. To what extent dialogue type texts _between closely-related
languages_ transfer straightforwardly (or not) with MT systems not
adapted to that type of text
2. How much adaptation does a RBMT system need to cope with dialogue
text "properly".

>
> * It would be a nice way to advertise Apertium.
:)

[1]
https://opensource.stackexchange.com/questions/1663/when-is-making-a-subtitle-file-for-a-commercial-movie-legal

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to