On Thu, Mar 26, 2020 at 11:45:41PM +0800, 杨伟哲 wrote:
> Hi Francis and Flammie,
> 
> I have finished the draft of my proposal about "Robust tokenization in
> lttoolbox".
> Could you please review it for me? I need your feedback suggestions and I
> will
> be pretty much appreciated.
> 
> Google Docs link:
> https://docs.google.com/document/d/1nHSR67u1HOO7ZhE5ulEn18ib3GKT31t958xp83Lbdqk/edit?usp=sharing

Hi Weizhe,

sorry I haven't answered earlier, the coding challenge looks ok. I
updated the idea page last weekish, did you check the new version? Also,
I think we were talking earlier about Chinese languages in apertium? if
you have experience with this, I would be happy to tie in a strategy for
CJK  or similar tokenisations to this project. That might also involve
some tweaks to the planned tokenisation? As for the plan, it seems
realistic.

Do you have the feeling that you know the parts of apertium pipeline to
modify for the project? As I don't have so in-depth knowledge of the
apertium codebase, it'd be of high importance to get feedback or
co-mentor with that knowledge.

-- 
Doktor Tommi A Pirinen, Computational Linguist,
<https://flammie.github.io/purplemonkeydishwasher/>, Universität
Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D
Entwickler.  President of ACL SIGUR SIG for Uralic languages
<http://gtweb.uit.no/sigur/>.
I tend to follow inline-posting style in desktop e-mail messages.

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to