On 10/11/2010 6:41 PM, Koji Sekiguchi wrote:
(10/10/12 5:57), Michael Sokolov wrote:
I would like to inject my CharStream (or possibly it could be a
CharFilter;
this is all in flux at the moment) into the analysis chain for a
field. Can
I do this in solr using the Analyzer configuration syntax in
schema.xml, or
would I need to define my own Analyzer? The solr wiki describes adding
Tokenizers, but doesn't say anything about CharReaders/Filters.
Thanks for any pointers
-Mike
Hi Mike,
You can write your own CharFilterFactory that creates your own
CharStream. Please refer existing CharFilterFactories in Solr
to see how you can implement it.
Koji
Koji - thanks for your response. I think I can see my way clear to
making a factory class for my stream. My question was really about how
to configure the factory. I see a number of examples of tokenizers and
analyzers configured in the example schema.xml, but no readers. For
example:
<fieldType name="text_ws" class="solr.TextField">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
configures a specific tokenizer. If I want to configure my CharStream,
is there an element for that? Eg:
<fieldType name="text_ws" class="solr.TextField">
<analyzer>
<reader class="com.mycompany.solr.FancyCharReader" />
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
I am guessing that I need to create my own analyzer and hard-code the
reader/tokenizer filter chain in there, but it would be nice if there
were a syntax like the one I inferred above.
-Mike