Hi Koji, You're right, the current code overwrites the custom tokenizer though it shouldn't. LuceneCarrot2TokenizerFactory is there to avoid circular dependencies (Carrot2 default tokenizer depends on Lucene), but it shouldn't be an issue with custom tokenizers.
I'll try to commit a fix later today. Meanwhile, if you have a chance to recompile the code, a temporary solution would be to hardcode your tokenizer class into the fragment you pasted: BasicPreprocessingPipelineDescriptor.attributeBuilder(initAttributes) .stemmerFactory(LuceneCarrot2StemmerFactory.class) .tokenizerFactory(YourCustomTokenizer.class) .lexicalDataFactory(SolrStopwordsCarrot2LexicalDataFactory.class); Staszek On Sun, May 20, 2012 at 9:40 AM, Koji Sekiguchi <k...@r.email.ne.jp> wrote: > Hello, > > As I'd like to use custom ITokenizerFactory, I set the following Carrot2 > key > in solrconfig.xml: > > <searchComponent name="clustering" > enable="${solr.clustering.enabled:true}" > class="solr.clustering.ClusteringComponent" > > <lst name="engine"> > <str name="name">default</str> > : > <str > name="PreprocessingPipeline.tokenizerFactory">my.own.TokenizerFactory</str> > </lst> > </searchComponent> > > But seems that CarrotClusteringEngine overwrites it with > LuceneCarrot2TokenizerFactory > in init() method: > > BasicPreprocessingPipelineDescriptor.attributeBuilder(initAttributes) > .stemmerFactory(LuceneCarrot2StemmerFactory.class) > .tokenizerFactory(LuceneCarrot2TokenizerFactory.class) > .lexicalDataFactory(SolrStopwordsCarrot2LexicalDataFactory.class); > > Am I missing something? > > koji > -- > Query Log Visualizer for Apache Solr > http://soleami.com/ >