Re: [Apertium-stuff] GSOC 2020 idea

Rajarshi Roychoudhury Thu, 27 Feb 2020 10:13:24 -0800

It is not just about  minimizing loss of sentiment , it is about using that
information for better translation. A very trivial example would be that
for some situations , sentences can project a strong sentiment and simple
translation may not always yield the best result. However if we can use the
knowledge of the sentiment to choose the words , it might give better
result.


As far as the codes are concerned, I need to study the source code , or a
detailed documentation for proposing a feasible solution.

Best,
Rajarshi



On Thu, Feb 27, 2020, 23:21 Tino Didriksen <[email protected]> wrote:

> My first question would be, is this actually a problem for rule-based
> machine translation? I am not a linguist, but given how RBMT works I can't
> really see where sentiment would be lost in the process, especially
> because Apertium is designed for related languages where sentiment is
> mostly the same. But even for less related languages, it would be down to
> the quality of the source language analysis.
>
> Beyond that, please learn how Apertium specifically works, not just RBMT
> in general. http://wiki.apertium.org/wiki/Documentation is a good start,
> but our IRC channel is the best place to ask technical questions.
>
> One major issue specific to Apertium is that the source information is no
> longer available in the target generation step.
>
> E.g., since you mention English-Hindi, you could install apertium-eng-hin
> and see how each part of the pipe works. We have precompiled binaries
> common platforms. Again, see wiki and IRC.
>
> -- Tino Didriksen
>
>
> On Thu, 27 Feb 2020 at 08:16, Rajarshi Roychoudhury <
> [email protected]> wrote:
>
>> Formally i present my idea in this form:
>> From my understanding of RBMT ,
>>
>> The RBMT system contains:
>>
>>    - a *SL morphological analyser* - analyses a source language word and
>>    provides the morphological information;
>>    - a *SL parser* - is a syntax analyser which analyses source language
>>    sentences;
>>    - a *translator* - used to translate a source language word into the
>>    target language;
>>    - a *TL morphological generator* - works as a generator of
>>    appropriate target language words for the given grammatica information;
>>    - a *TL parser* - works as a composer of suitable target language
>>    sentences
>>
>> I propose a 6th component of the RBMT system: *sentiment based TL
>> morphological generator*
>>
>> I propose that we do word level sentiment analysis of the source language
>> and targeted language. For the time being i want to work on English-Hindi
>> translation. We do not need a neural network based translation, however for
>> getting the sentiment associated with each word we might use nltk,or
>> develop a character level embedding to just find out the sentiment
>> assosiated with each word,and form a dictionary out of it.I have written a
>> paper on it,and received good results.So basically,during the final
>> application development we will just have the dictionary,with no neural
>> network dependencies. This  can easily be done with Python.I just need a
>> good corpus of English and Hindi words(the sentiment datasets are available
>> online).
>>
>> The *sentiment based TL morphological generator *will generate the list
>> of possible words,and we will take that word whose sentiment is closest to
>> the source language word.
>> This is a novel method that has probably not been applied before, and
>> might generate better results.
>>
>> Please provide your valuable feedwork and suggest some necessary changes
>> that needs to be made.
>> Best,
>> Rajarshi
>>
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] GSOC 2020 idea

Reply via email to