Or have the indexing client split the data at these delimiters and just use the CJKAnalyzer.

        Erik

On Apr 10, 2009, at 7:30 AM, Grant Ingersoll wrote:

The only thing that comes to mind in a short term way is writing two TokenFilter implementations that wrap the second and third tokenizers

On Apr 9, 2009, at 11:00 PM, Ashish P wrote:


I want to analyze a text based on pattern ";" and separate on whitespace and
it is a Japanese text so use CJKAnalyzer + tokenizer also.
in short I want to do:
                         <analyzer 
class="org.apache.lucene.analysis.cjk.CJKAnalyzer">
                                <tokenizer class="solr.PatternTokenizerFactory" 
pattern=";" />
                                <tokenizer class="solr.WhitespaceTokenizerFactory" 
/>
                                <tokenizer 
class="org.apache.lucene.analysis.cjk.CJKTokenizer" />
                        </analyzer>
Can anyone please tell me how to achieve this?? Because the above syntax is
not at all possible.
--
View this message in context: 
http://www.nabble.com/multiple-tokenizers-needed-tp22982382p22982382.html
Sent from the Solr - User mailing list archive at Nabble.com.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to