On Thu, Mar 26, 2020 at 11:45:41PM +0800, 杨伟哲 wrote: > Hi Francis and Flammie, > > I have finished the draft of my proposal about "Robust tokenization in > lttoolbox". > Could you please review it for me? I need your feedback suggestions and I > will > be pretty much appreciated. > > Google Docs link: > https://docs.google.com/document/d/1nHSR67u1HOO7ZhE5ulEn18ib3GKT31t958xp83Lbdqk/edit?usp=sharing
Hi Weizhe, sorry I haven't answered earlier, the coding challenge looks ok. I updated the idea page last weekish, did you check the new version? Also, I think we were talking earlier about Chinese languages in apertium? if you have experience with this, I would be happy to tie in a strategy for CJK or similar tokenisations to this project. That might also involve some tweaks to the planned tokenisation? As for the plan, it seems realistic. Do you have the feeling that you know the parts of apertium pipeline to modify for the project? As I don't have so in-depth knowledge of the apertium codebase, it'd be of high importance to get feedback or co-mentor with that knowledge. -- Doktor Tommi A Pirinen, Computational Linguist, <https://flammie.github.io/purplemonkeydishwasher/>, Universität Hamburg, Hamburger Zentrum für Sprachkorpora <http://hzsk.de>. CLARIN-D Entwickler. President of ACL SIGUR SIG for Uralic languages <http://gtweb.uit.no/sigur/>. I tend to follow inline-posting style in desktop e-mail messages.
signature.asc
Description: PGP signature
_______________________________________________ Apertium-stuff mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/apertium-stuff
