Thanks! I don't see the document, Xinxiong probably replied and sent it directly to you Chelsy?
On related topic: I started to experiment with the java implementation of jieba (see the last comment in https://phabricator.wikimedia.org/T151743) The Zero result rate dropped to 17.5% instead of 10% which sounds a bit more consistent with our average. Sadly I can't really judge on the quality. This was also a very quick experiment where I had to get rid of some other features we use like unicode normalization. Thank you! On Mon, Dec 5, 2016 at 6:49 AM, Chelsy Xie <[email protected]> wrote: > Thank you!!! > > On Sun, Dec 4, 2016 at 3:59 AM, 陈新雄 <[email protected]> wrote: > >> Hi all: >> >> Please find attached the document of THULAC. >> >> If you have any questions, contact me ASAP. >> >> Regards, >> Xinxiong >> >> 2016-12-02 9:50 GMT+08:00 陈新雄 <[email protected]>: >> >>> Hi all, >>> >>> I'm Xinxiong Chen, a PhD student from Tsinghua University. I >>> graduate from THU NLP&CSS lab (Tsinghua University Natural Language >>> Processing and Computational Social Science Lab) this July. I'm the main >>> developer of THULAC and I will translate THULAC's documentation into >>> English and offer technical support. >>> >>> I know from Chelsy that most of you are using python so I will >>> translate the document of python version. >>> >>> Chelsy and I are classmates in high school and we are friends from >>> then. So don't hesitate to contact me if you have any questions. >>> >>> Regards, >>> Xinxiong >>> >>> >>> >>> 2016-12-02 9:16 GMT+08:00 Chelsy Xie <[email protected]>: >>> >>>> Hello everyone, >>>> >>>> I'm very happy to introduce you to Xinxiong Chen <[email protected]>, >>>> the main developer of THULAC <https://github.com/thunlp/THULAC-Python> >>>> and a CS PhD student at Tsinghua University. :) >>>> >>>> Discovery team is looking for a new Chinese tokenizer and THULAC >>>> <https://github.com/thunlp/THULAC-Python> may be helpful. Here >>>> <https://github.com/thunlp/THULAC-Python#代表分词软件的性能对比> is a comparison >>>> between THULAC and other Chinese tokenizers (jieba, LTP-3.2.0 and ICTCLAS). >>>> It's very kind of Xinxiong to help us to translate THULAC's documentation >>>> into English and offer technical support. >>>> >>>> Thank you very much Xinxiong! >>>> >>>> Cheers, >>>> Chelsy >>>> >>>> >>> >>> >>> -- >>> 陈 新雄(Chen Xinxiong) >>> >>> Department of Computer Science and Technology >>> Tsinghua University >>> >>> Beijing 100084, China >>> >> >> >> >> -- >> 陈 新雄(Chen Xinxiong) >> >> Department of Computer Science and Technology >> Tsinghua University >> >> Beijing 100084, China >> > > > _______________________________________________ > discovery mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/discovery > >
_______________________________________________ discovery mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/discovery
