Or have the indexing client split the data at these delimiters and
just use the CJKAnalyzer.
Erik
On Apr 10, 2009, at 7:30 AM, Grant Ingersoll wrote:
The only thing that comes to mind in a short term way is writing two
TokenFilter implementations that wrap the second and third tokenizers
On Apr 9, 2009, at 11:00 PM, Ashish P wrote:
I want to analyze a text based on pattern ";" and separate on
whitespace and
it is a Japanese text so use CJKAnalyzer + tokenizer also.
in short I want to do:
<analyzer
class="org.apache.lucene.analysis.cjk.CJKAnalyzer">
<tokenizer class="solr.PatternTokenizerFactory"
pattern=";" />
<tokenizer class="solr.WhitespaceTokenizerFactory"
/>
<tokenizer
class="org.apache.lucene.analysis.cjk.CJKTokenizer" />
</analyzer>
Can anyone please tell me how to achieve this?? Because the above
syntax is
not at all possible.
--
View this message in context:
http://www.nabble.com/multiple-tokenizers-needed-tp22982382p22982382.html
Sent from the Solr - User mailing list archive at Nabble.com.
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search