Re: Can StandardTokenizerFactory works well for Chinese and English (Bilingual)?

Zheng Lin Edwin Yeo Wed, 30 Sep 2015 02:14:27 -0700

Hi Charlie,

Thanks for your reply. Seems like quite a number of the chinese tokenizers
are not really compatible with the newer versions of Solr


I'm also looking at HMMChineseTokenizer and JiebaTokenizer to see if they
are suitable to be used for Solr 5.x too.

Regards,
Edwin


On 30 September 2015 at 16:20, Charlie Hull <char...@flax.co.uk> wrote:

> On 30/09/2015 04:09, Zheng Lin Edwin Yeo wrote:
>
>> Hi Charlie,
>>
>
> Hi,
>
>>
>> I've checked that Paoding's code is written for Solr 3 and Solr 4
>> versions.
>> It is not written for Solr 5, thus I was unable to use it in my Solr 5.x
>> version.
>>
>
> I'm pretty sure we had to recompile it for v4.6 as well....it has been a
> little painful.
>
>>
>> Have you tried to use HMMChineseTokenizer and JiebaTokenizer as well?
>>
>
> I don't think so.
>
>
> Charlie
>
>>
>> Regards,
>> Edwin
>>
>>
>> On 25 September 2015 at 18:46, Charlie Hull <char...@flax.co.uk> wrote:
>>
>> On 25/09/2015 11:43, Zheng Lin Edwin Yeo wrote:
>>>
>>> Hi Charlie,
>>>>
>>>> Thanks for your comment. I faced the compatibility issues with Paoding
>>>> when
>>>> I tried it in Solr 5.1.0 and Solr 5.2.1, and I found out that the code
>>>> was
>>>> optimised for Solr 3.6.
>>>>
>>>> Which version of Solr are you using when you tried on the Paoding?
>>>>
>>>>
>>> Solr v4.6 I believe.
>>>
>>> Charlie
>>>
>>>
>>> Regards,
>>>> Edwin
>>>>
>>>>
>>>> On 25 September 2015 at 16:43, Charlie Hull <char...@flax.co.uk> wrote:
>>>>
>>>> On 23/09/2015 16:23, Alexandre Rafalovitch wrote:
>>>>
>>>>>
>>>>> You may find the following articles interesting:
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://discovery-grindstone.blogspot.ca/2014/01/searching-in-solr-analyzing-results-and.html
>>>>>> ( a whole epic journey)
>>>>>> https://dzone.com/articles/indexing-chinese-solr
>>>>>>
>>>>>>
>>>>>> The latter article is great and we drew on it when helping a recent
>>>>> client
>>>>> with Chinese indexing. However, if you do use Paoding bear in mind that
>>>>> it
>>>>> has few if any tests and all the comments are in Chinese. We found a
>>>>> problem with it recently (it breaks the Lucene highlighters) and have
>>>>> submitted a patch:
>>>>> http://git.oschina.net/zhzhenqin/paoding-analysis/issues/1
>>>>>
>>>>> Cheers
>>>>>
>>>>> Charlie
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>>>       Alex.
>>>>>> ----
>>>>>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>>>>>> http://www.solr-start.com/
>>>>>>
>>>>>>
>>>>>> On 23 September 2015 at 10:41, Zheng Lin Edwin Yeo <
>>>>>> edwinye...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>>
>>>>>>> Would like to check, will StandardTokenizerFactory works well for
>>>>>>> indexing
>>>>>>> both English and Chinese (Bilingual) documents, or do we need
>>>>>>> tokenizers
>>>>>>> that are customised for chinese (Eg: HMMChineseTokenizerFactory)?
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Edwin
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> --
>>>>> Charlie Hull
>>>>> Flax - Open Source Enterprise Search
>>>>>
>>>>> tel/fax: +44 (0)8700 118334
>>>>> mobile:  +44 (0)7767 825828
>>>>> web: www.flax.co.uk
>>>>>
>>>>>
>>>>>
>>>>
>>> --
>>> Charlie Hull
>>> Flax - Open Source Enterprise Search
>>>
>>> tel/fax: +44 (0)8700 118334
>>> mobile:  +44 (0)7767 825828
>>> web: www.flax.co.uk
>>>
>>>
>>
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>

Re: Can StandardTokenizerFactory works well for Chinese and English (Bilingual)?

Reply via email to