Hi Tommi,

Hope this mail finds you well.
Thanks for your feedback.

I have thought about the baseline model that you have suggested and I think it 
will be beneficial for common words.
We can try it and assess the results. I believe the implementation won't take 
time.

I will check ideas similar to "counting the arc visitations on
state visitations in the analysis traversals."
Let me do my homework and I would be grateful if I discuss the results that I 
found with you later.

I know I should have handled the out-of-vocabulary words but I needed to dig 
deeper into the documentation of the tools to know how to do it.

My current priority is to read a lot of papers but this task takes time so that 
I can understand and digest the concepts discussed there.
I am enjoying the experience of reading publications and I believe it's a skill 
that I am eager to acquire.


Finally, I will update the work plan according to your suggestions.

Best Regards,
Amr Mohamed


From: Tommi A Pirinen <[email protected]>
Sent: Friday, April 5, 2019 4:15 PM
To: [email protected]
Subject: Re: [Apertium-stuff] GSoC 2019 project discussion - Unsupervised 
weighting of automata

Hi Amr,

a solid proposal and coding challenge, some comments inline:

On Thu, Mar 28, 2019 at 09:08:25PM +0000, Amr Mohamed Hosny Anwar wrote:
> Dear all,
>
> Kindly find a draft of my proposal for the "Unsupervised weighting of 
> automata".
> http://wiki.apertium.org/wiki/User:AMR-KELEG

Few points ot the schedule

* a week here and there for research is ok, but we want to be able to
  track progress, so experimenting and documenting would be a part of
  those weeks
* for the final part, it is important to allocate enough time for the
  integration of the project to apertium system, ideally successful
  project ends with a tool that all apertium language developers can
  integrate to their languages without significant effort


> I believe that I will need to target a set of published papers to implement 
> throughout the project.
> However, I am having trouble finding useful set of publications for the task.
> I'd be grateful if you could help me by recommending some publications or 
> even keywords to look for.
> I am currently exploring papers related to spectral learning but I don't know 
> whether this topic is related to the task or not.

This is an important point and I agree it should be the workflow to
follow some reference implementations and documentations. I saw the
spectral FST one but I have not tried it so I have no idea of the
complexity or suitability yet. I hope someone with more experience on
unsupervision can comment as well. I think one thing that can be started
as baseline from is a model that just counts things from ambiguous
results, symbols and tags and lemmas. I think there's also some
implementation and maybe a paper about counting the arc visitations on
state visitations in the analysis traversals.

Few more thoughts:

* what happens to unseen stuffs? They need to be very unlikely but still
  possible in the final re-weighted model
* to that point, most languages have infinite vocabuylaries with
  compounding and stuff, e.g. you can write manbearpig and it might not
  be in corpus but we think it's more likely than zirconiumkumqvattaxi,
  this is not necessary for the project but can be kept in sight
* I think we should measure some baselines from other mthods, e.g. the
  apertium's current statistical analysers and keep track of progress
  agaisnt those throughout summer
* I don't have any good pointers for the background, maybe check through
  what other fst folk have done:

  http://www.opengrm.org/twiki/bin/view/GRM/WebHome
  https://aclweb.org/aclwiki/SIGFSM


--
Doktor Tommi A Pirinen, Computational Linguist,
< https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora < http://hzsk.de>. CLARIN-D
Entwickler.  President of ACL SIGUR SIG for Uralic languages
< http://gtweb.uit.no/sigur/>.
I tend to follow inline-posting style in desktop e-mail messages.

Get Outlook for Android<https://aka.ms/ghei36>

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to