Re: using Carrot2 custom ITokenizerFactory

Stanislaw Osinski Sun, 20 May 2012 02:19:20 -0700

Hi Koji,

You're right, the current code overwrites the custom tokenizer though it
shouldn't. LuceneCarrot2TokenizerFactory is there to avoid circular
dependencies (Carrot2 default tokenizer depends on Lucene), but it
shouldn't be an issue with custom tokenizers.


I'll try to commit a fix later today. Meanwhile, if you have a chance to
recompile the code, a temporary solution would be to hardcode your
tokenizer class into the fragment you pasted:

   BasicPreprocessingPipelineDescriptor.attributeBuilder(initAttributes)
       .stemmerFactory(LuceneCarrot2StemmerFactory.class)
       .tokenizerFactory(YourCustomTokenizer.class)
       .lexicalDataFactory(SolrStopwordsCarrot2LexicalDataFactory.class);

Staszek

On Sun, May 20, 2012 at 9:40 AM, Koji Sekiguchi <k...@r.email.ne.jp> wrote:

> Hello,
>
> As I'd like to use custom ITokenizerFactory, I set the following Carrot2
> key
> in solrconfig.xml:
>
>  <searchComponent name="clustering"
>                   enable="${solr.clustering.enabled:true}"
>                   class="solr.clustering.ClusteringComponent" >
>    <lst name="engine">
>      <str name="name">default</str>
>         :
>      <str
> name="PreprocessingPipeline.tokenizerFactory">my.own.TokenizerFactory</str>
>    </lst>
>  </searchComponent>
>
> But seems that CarrotClusteringEngine overwrites it with
> LuceneCarrot2TokenizerFactory
> in init() method:
>
>    BasicPreprocessingPipelineDescriptor.attributeBuilder(initAttributes)
>        .stemmerFactory(LuceneCarrot2StemmerFactory.class)
>        .tokenizerFactory(LuceneCarrot2TokenizerFactory.class)
>        .lexicalDataFactory(SolrStopwordsCarrot2LexicalDataFactory.class);
>
> Am I missing something?
>
> koji
> --
> Query Log Visualizer for Apache Solr
> http://soleami.com/
>

Re: using Carrot2 custom ITokenizerFactory

Reply via email to