Re: Special Characters

Erick Erickson Mon, 22 Nov 2010 06:40:55 -0800

As I remember, PatternReplace... isn't in 1.4, so you'd have to move to 3.x
or trunk.


You could always write a custom class that did what you wanted, it's
actually
pretty easy.

Best
Erick

On Mon, Nov 22, 2010 at 8:37 AM, Solr User <solr...@gmail.com> wrote:

> Hi Eric,
>
> I use solr version 1.4.0 and below is my schema.xml
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <!-- in this example, we will only use synonyms at query time
> <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
> ignoreCase="true" expand="false"/>
> -->
> <!-- Case insensitive stop word removal.
> add enablePositionIncrements=true in both the index and query
> analyzers to leave a 'gap' for more accurate phrase queries.
> -->
> <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> words="stopwords.txt"
> enablePositionIncrements="true"
> />
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> <filter class="solr.StopFilterFactory"
> ignoreCase="true"
> words="stopwords.txt"
> enablePositionIncrements="true"
> />
> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.SnowballPorterFilterFactory" language="English"
> protected="protwords.txt"/>
> </analyzer>
> </fieldType>
>
> It creates 3 tokens j r r tolkien works fine but not jrr tolkien.
>
> I will read about PatternReplaceCharFilterFactory and try it. Please let me
> know if I need to do anything differently.
>
> Thanks,
> Solr User
>
>
>
> On Mon, Nov 22, 2010 at 8:19 AM, Erick Erickson <erickerick...@gmail.com
> >wrote:
>
> > What version of Solr are you using? You can think about
> > PatternReplaceCharFilterFactory if you're using the right
> > version of Solr.
> >
> > But.... you have other problems than that. Let's claim you
> > get the periods removed. Do you tokenize three tokens or
> > one? I.e. jrr or j r r? In the latter case your search still won't
> > match.
> >
> > Best
> > Erick
> >
> > On Mon, Nov 22, 2010 at 7:45 AM, Solr User <solr...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I am searching for j.r.r. tolkien and getting results back but if I
> > search
> > > for jrr I am not getting any results. Also not getting any results if I
> > am
> > > searching for jrr tolkien. I am using AND as the default operator.
> > >
> > > The search results should work for both j.r.r. tolkien and jrr tolkien.
> > >
> > > What configuration changes I need to make so that special characters
> like
> > > hypen (-), period (.) are ignored while indexing? or any other
> > suggestions?
> > >
> > > Thanks,
> > > Solr User
> > >
> >
>

Re: Special Characters

Reply via email to