Re: HMMChineseTokenizer splits up alphanumeric characters

2016-03-19 Thread Erick Erickson
Yes, there is one and only one tokenizer allowed. Best, Erick On Wed, Mar 16, 2016 at 7:51 PM, Zheng Lin Edwin Yeo wrote: > Thanks Shawn for your reply. > > Yes, I'm looking to see if we can implement a combination of tokenizes and > filters. > > However, I tried before that we can only implemen

Re: HMMChineseTokenizer splits up alphanumeric characters

2016-03-19 Thread Zheng Lin Edwin Yeo
I found that in WordDelimiterFilterFactory, there is a parameter called splitOnNumerics, which does the same function as what HMMChineseTokenizer did. - *splitOnNumerics="1"* causes alphabet => number transitions to generate a new part [Solr 1.3]: - "j2se" => "j" "2" "se"

Re: HMMChineseTokenizer splits up alphanumeric characters

2016-03-19 Thread Zheng Lin Edwin Yeo
Thanks Shawn for your reply. Yes, I'm looking to see if we can implement a combination of tokenizes and filters. However, I tried before that we can only implement one tokenizer for each fieldType. So is it true that I can only stick to one tokenizer, and the rest of the implementation have to be

Re: HMMChineseTokenizer splits up alphanumeric characters

2016-03-18 Thread Shawn Heisey
On 3/16/2016 4:33 AM, Zheng Lin Edwin Yeo wrote: > I found that HMMChineseTokenizer will split a string that consist of > numbers and characters (alphanumeric). For example, if I have a code that > looks like "1a2b3c4d", it will be split to 1 | a | 2 | b | 3 | c | 4 | d > This has caused the search

Re: HMMChineseTokenizer splits up alphanumeric characters

2016-03-16 Thread Zheng Lin Edwin Yeo
Sorry, the correct pipeline which I'm using should be this: Regards, Edwin On 16 March 2016 at 18:33, Zheng Lin Edwin Yeo wrote: > Hi, > > I'm using Solr 5.4.0, with the HMMChineseTokenizer in my Solr, and below > is my pipeline. > > positionIncrementGap=

HMMChineseTokenizer splits up alphanumeric characters

2016-03-16 Thread Zheng Lin Edwin Yeo
Hi, I'm using Solr 5.4.0, with the HMMChineseTokenizer in my Solr, and below is my pipeline. I found that HMMChineseTokenizer will split a string that consist of numbers and characters (alphanumeric). For example, if I have a code that looks like "1a2b3c4d", it