Re: [Apertium-stuff] Guidance for hin-pan language pair development

Priyank Modi Wed, 11 Mar 2020 16:45:15 -0700

Hi Hector,
Thank you so much for the reply. The proposals were really helpful. I've
completed the coding challenge for a small set of 10 sentences(for now)
which I believe Francis has added to the repo as a test set. I'll included
the same in the proposal. For now, I'm working on building the dictionaries
using the wiki dumps as mentioned in the documentation, adding the most
frequent words systematically.
Looking through your proposal, I noticed that you included metrics like WER
and coverage to determine progress. I just wanted to confirm if these are
being computed against the dumps one downloads for the respective
languages(which seems to be the case seeing the way you mentioned the same
in your own proposal)? Or is there some separate benchmark? This will be
helpful as I can then go ahead and mention the current state of the
dictionaries in a more statistical manner.


Finally, is there something else I can do to make my proposal better? Or is
it advisable to start working on my proposal/some other non-entry level
project?

Thank you for sharing the proposals and the guidance once again.
Have a great day!

Warm regards,
PM

-- 
Priyank Modi      ●  Undergrad Research Student
IIIT-Hyderabad        ●  Language Technologies Research Center
Mobile:  +91 83281 45692
Website <https://priyankmodipm.github.io/>    ●    Linkedin
<https://www.linkedin.com/in/priyank-modi-81584b175/>

On Sat, Mar 7, 2020 at 11:43 AM Hèctor Alòs i Font <[email protected]>
wrote:

> Hi Priyank,
>
> Hindi-Punjabi seems to me a very nice pair for Apertium. It is usual that
> closely related pairs give not very satisfactory results with Google,
> because most of the time there is as an intermediate translation into
> English. In any case, if you can give some data about the quality of the
> Google translator (as I did in my 2019 GSoC application
> <http://wiki.apertium.org/wiki/Hectoralos/GSOC_2019_proposal:_Catalan-Italian_and_Catalan-Portuguese#Current_situation_of_the_language_pairs>),
> it may be useful, I think.
>
> In order to present an application for a language-pair development it is
> required to pass the so called "coding challenge"
> <http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Adopt_a_language_pair#Coding_challenge>.
> Basically, this will show that you understand the basis of the architecture
> and knows how to add new words in the dictionaries.
>
> For the project itself, you'll need to add many words to the Punjabi and
> Punjabi-Hindi dictionaries, transfer rules and lexical selection rules. If
> you intend to translate from Punjabi, you'll need to work on morphological
> disambiguation, which needs at least a couple of weeks of work. This is
> basic, since plenty of errors in Indo-European languages (and, I guess, not
> only) come from bad morphological disambiguation. Usually, closed
> categories are added first in the dictionaries and afterwards words are
> mostly added using frequency lists. If there are free resources you may
> use, this would be great, but it is absolutely necessary not to
> automatically copy from copyrighted materials. For my own application this
> year, I'm asking people to free their resources in order to be able to use
> them.
>
> You may be interested in previous applications for developing language
> pairs, for instance this one
> <http://wiki.apertium.org/wiki/Grfro3d/proposal_apertium_cat-srd_and_ita-srd>,
> in addition to mine last year.
>
> Best wishes,
> Hèctor
>
>
> Missatge de Priyank Modi <[email protected]> del dia dv., 6 de març
> 2020 a les 23:49:
>
>> Hi,
>> I am trying to work towards developing the Hindi-Punjabi pair and needed
>> some guidance on how to go about it. I ran the test files and could notice
>> that the dictionary file for Punjabi needs work(even a lot of function
>> words could not be found by the translator). Should I start with that? Are
>> there some tests each stage needs to pass? Also, finally what sort of work
>> is expected to make a decent GSOC proposal, of course I'll be interested in
>> developing this pair regardless since even Google translate doesn't seem to
>> work well for this pair(for the test set specifically the apertium
>> translator worked significantly better)
>> Any help would be appreciated.
>>
>> Thanks.
>>
>> Warm regards,
>> PM
>>
>> --
>> Priyank Modi       ●  Undergrad Research Student
>> IIIT-Hyderabad        ●  Language Technologies Research Center
>> Mobile:  +91 83281 45692
>> Website <https://priyankmodipm.github.io/>    ●    Linkedin
>> <https://www.linkedin.com/in/priyank-modi-81584b175/>
>>
>> _______________________________________________
>> Apertium-stuff mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>>
> _______________________________________________
> Apertium-stuff mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/apertium-stuff
>


-- 
Priyank Modi       ●  Undergrad Research Student
IIIT-Hyderabad        ●  Language Technologies Research Center
Mobile:  +91 83281 45692
Website <https://priyankmodipm.github.io/>    ●    Linkedin
<https://www.linkedin.com/in/priyank-modi-81584b175/>

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Re: [Apertium-stuff] Guidance for hin-pan language pair development

Reply via email to