On 10/11/2010 6:41 PM, Koji Sekiguchi wrote:
(10/10/12 5:57), Michael Sokolov wrote:
I would like to inject my CharStream (or possibly it could be a CharFilter; this is all in flux at the moment) into the analysis chain for a field. Can I do this in solr using the Analyzer configuration syntax in schema.xml, or
would I need to define my own Analyzer?  The solr wiki describes adding
Tokenizers, but doesn't say anything about CharReaders/Filters.

Thanks for any pointers

-Mike

Hi Mike,

You can write your own CharFilterFactory that creates your own
CharStream. Please refer existing CharFilterFactories in Solr
to see how you can implement it.

Koji

Koji - thanks for your response. I think I can see my way clear to making a factory class for my stream. My question was really about how to configure the factory. I see a number of examples of tokenizers and analyzers configured in the example schema.xml, but no readers. For example:

<fieldType name="text_ws" class="solr.TextField">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>

configures a specific tokenizer. If I want to configure my CharStream, is there an element for that? Eg:

<fieldType name="text_ws" class="solr.TextField">
<analyzer>
<reader class="com.mycompany.solr.FancyCharReader" />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>

I am guessing that I need to create my own analyzer and hard-code the reader/tokenizer filter chain in there, but it would be nice if there were a syntax like the one I inferred above.

-Mike

Reply via email to