RE: Tokenizer and Filter Factory to index Chinese characters

2015-07-07 Thread Markus Jelsma
Yes, but it is a small change :) M. -Original message- > From:Zheng Lin Edwin Yeo > Sent: Tuesday 7th July 2015 4:50 > To: solr-user@lucene.apache.org > Subject: Re: Tokenizer and Filter Factory to index Chinese characters > > So we have to recompile the analyser

Re: Tokenizer and Filter Factory to index Chinese characters

2015-07-06 Thread Zheng Lin Edwin Yeo
heng Lin Edwin Yeo > > Sent: Monday 6th July 2015 12:31 > > To: solr-user@lucene.apache.org > > Subject: Re: Tokenizer and Filter Factory to index Chinese characters > > > > Yes, I tried that also, but I faced some compatibility issues with Solr > > 5.2.1, as the

RE: Tokenizer and Filter Factory to index Chinese characters

2015-07-06 Thread Markus Jelsma
Yes, analyzers slightly changed since 5.x. https://issues.apache.org/jira/browse/LUCENE-5388 -Original message- > From:Zheng Lin Edwin Yeo > Sent: Monday 6th July 2015 12:31 > To: solr-user@lucene.apache.org > Subject: Re: Tokenizer and Filter Factory to index Chines

Re: Tokenizer and Filter Factory to index Chinese characters

2015-07-06 Thread Zheng Lin Edwin Yeo
; > > > "chinese4":{ > > > > "text":["户只要订购《联合晚报》任一种配套,就可选择下列其中一项赠品带回家。 \n 签订两年配套的读者可获得一台价值 > > 199元的Lenovo TAB 2 > A7-10七寸平板电脑,或者一架价值249元的Philips > > Viva"]}, > > > > "chinese5":{ > > > >

Re: Tokenizer and Filter Factory to index Chinese characters

2015-07-06 Thread davidphilip cherian
第三枚金牌。队友陈诗桦(Jazreel)、梁蕙芬和陈诗静以3707总瓶分获得亚军,季军归菲律宾女队。(联合早报记者:郭嘉惠) > > > \n "], > > > "author":["Edwin"]}, > > > "chinese4":{ > > > "id":["chinese4"], > > > "content":["

Re: Tokenizer and Filter Factory to index Chinese characters

2015-07-05 Thread Zheng Lin Edwin Yeo
"chinese5":{ "text":["Zheng Lin Yeo"]}}} Why is this so? Regards, Edwin 2015-06-25 18:54 GMT+08:00 Markus Jelsma : > You may also want to try Paoding if you have enough time to spend: > https://github.com/cslinmiso/paoding-analysis > > -Origi

RE: Tokenizer and Filter Factory to index Chinese characters

2015-06-25 Thread Markus Jelsma
”幸运抽奖"], > "author":["Edwin"]}}} > > > Regards, > Edwin > > > 2015-06-25 17:28 GMT+08:00 Markus Jelsma : > > > Hi - we are actually using some other filters for Chinese, although they > > are not specialized for Chinese: &

Re: Tokenizer and Filter Factory to index Chinese characters

2015-06-25 Thread Zheng Lin Edwin Yeo
此外,一年一度的晚报保健美容展,将在本月23日和24日,在新达新加坡会展中心401、402展厅举行。 \n 现场将开设《联合晚报》订阅展摊,读者当场订阅晚报,除了可获得丰厚的赠品,还有机会参与“必胜”幸运抽奖"], "author":["Edwin"]}}} Regards, Edwin 2015-06-25 17:28 GMT+08:00 Markus Jelsma : > Hi - we are actually using some other filters for Chinese, although they >

RE: Tokenizer and Filter Factory to index Chinese characters

2015-06-25 Thread Markus Jelsma
Subject: Re: Tokenizer and Filter Factory to index Chinese characters > > Thank you. > > I've tried that, but when I do a search, it's returning much more > highlighted results that what it supposed to. > > For example, if I enter the following query: > http://localhos

Re: Tokenizer and Filter Factory to index Chinese characters

2015-06-25 Thread Zheng Lin Edwin Yeo
Thank you. I've tried that, but when I do a search, it's returning much more highlighted results that what it supposed to. For example, if I enter the following query: http://localhost:8983/solr/chinese1/highlight?q=我国 I get the following results: "highlighting":{ "chinese1":{ "id":["

RE: Tokenizer and Filter Factory to index Chinese characters

2015-06-25 Thread Markus Jelsma
Hello - you can use HMMChineseTokenizerFactory instead. http://lucene.apache.org/core/5_2_0/analyzers-smartcn/org/apache/lucene/analysis/cn/smart/HMMChineseTokenizerFactory.html -Original message- > From:Zheng Lin Edwin Yeo > Sent: Thursday 25th June 2015 11:02 > To: solr-user@lucene.apac