Hi Eric,

I use solr version 1.4.0 and below is my schema.xml

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English"
protected="protwords.txt"/>
</analyzer>
</fieldType>

It creates 3 tokens j r r tolkien works fine but not jrr tolkien.

I will read about PatternReplaceCharFilterFactory and try it. Please let me
know if I need to do anything differently.

Thanks,
Solr User



On Mon, Nov 22, 2010 at 8:19 AM, Erick Erickson <erickerick...@gmail.com>wrote:

> What version of Solr are you using? You can think about
> PatternReplaceCharFilterFactory if you're using the right
> version of Solr.
>
> But.... you have other problems than that. Let's claim you
> get the periods removed. Do you tokenize three tokens or
> one? I.e. jrr or j r r? In the latter case your search still won't
> match.
>
> Best
> Erick
>
> On Mon, Nov 22, 2010 at 7:45 AM, Solr User <solr...@gmail.com> wrote:
>
> > Hi,
> >
> > I am searching for j.r.r. tolkien and getting results back but if I
> search
> > for jrr I am not getting any results. Also not getting any results if I
> am
> > searching for jrr tolkien. I am using AND as the default operator.
> >
> > The search results should work for both j.r.r. tolkien and jrr tolkien.
> >
> > What configuration changes I need to make so that special characters like
> > hypen (-), period (.) are ignored while indexing? or any other
> suggestions?
> >
> > Thanks,
> > Solr User
> >
>

Reply via email to