The exception is expected if you use CharStream aware Tokenizer without
CharFilters.
Please see example/solr/conf/schema.xml for the setting of CharFilter and
CharStreamAware*Tokenizer:
<!-- charFilter + "CharStream aware" WhitespaceTokenizer -->
<!--
<fieldType name="textCharNorm" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer
class="solr.CharStreamAwareWhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>
-->
Thank you,
Koji
Ashish P wrote:
Koji san,
Using CharStreamAwareCJKTokenizerFactory is giving me following error,
SEVERE: java.lang.ClassCastException: java.io.StringReader cannot be cast to
org.apache.solr.analysis.CharStream
May be you are typecasting Reader to subclass.
Thanks,
Ashish
Koji Sekiguchi-2 wrote:
If you use CharFilter, you should use "CharStream aware" Tokenizer to
correct terms offsets.
There are two CharStreamAware*Tokenizer in trunk/Solr 1.4.
Probably you want to use CharStreamAwareCJKTokenizer(Factory).
Koji
Ashish P wrote:
After this should I be using same cjkAnalyzer or use charFilter??
Thanks,
Ashish
Koji Sekiguchi-2 wrote:
Ashish P wrote:
I want to convert half width katakana to full width katakana. I tried
using
cjk analyzer but not working.
Does cjkAnalyzer do it or is there any other way??
CharFilter which comes with trunk/Solr 1.4 just covers this type of
problem.
If you are using Solr 1.3, try the patch attached below:
https://issues.apache.org/jira/browse/SOLR-822
Koji