Hi Jochen

You could try this:
****************
<analyzer>
   <charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping-ISOLatin1Accent.txt"/>
   <tokenizer class="solr.KeywordTokenizerFactory" />
   <filter class="solr.LowerCaseFilterFactory" />
   <filter class="solr.PatternReplaceFilterFactory"
           pattern="frei"
           replacement="blubb"
           replace="all"
   />
   <filter class="solr.PatternReplaceFilterFactory"
           pattern="[\s]+"
           replacement=""
           replace="all"
   />
   <filter class="solr.TrimFilterFactory" />
   <filter class="solr.LengthFilterFactory" min="2" max="100" />
</analyzer>
****************

Remarks:
* I am not sure whether your sequence of filters is correct. I guess you should use charFilter at the beginning of the chain only, and patternReplace after the tokenizer. * If you use ICUFoldingFilter you won't need LowerCaseFilter, it would be redundant. LowerCase might do the job
* TrimFilter is redundant in that setting, I guess.
* A LenghtFilterFactory can be helpfull against odd term of only one character
* You do have a type attribute="query" in your analyzer element. Do the two chains correspond or could you do with an analyzer for both index and query?

Regards
Oliver


-------- Original-Nachricht --------
Betreff: Re: removing whitespaces in query
Von: Jochen Lienhard <lienh...@ub.uni-freiburg.de>
An: solr-user@lucene.apache.org
Datum: 07.03.2013 11:04

Hello Jilal and Oliver,

hmmm ... I don't know, how two fields can help.

The problem seems to be, that solr does not recognize the whitespace.

We are using following analyser:
<analyzer type="query">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="Frei"
replacement="blubb" replace="all"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping-ISOLatin1Accent.txt"/>
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
</analyzer>

It replaces in the Query: Frei 91 \: 9984 the Frei with blubb ... so it seems 
to work
perfect.
But when we try to replace whitespace using \s nothing happens.

@Oliver: we dont want replace the : in the query ... it is a part of our 
callnumbers.

Greetings

Jochen

Oliver Schihin schrieb:
Hello Jochen

What are your tokenizers? I guess it should be 'KeywordTokenizerFactory'. To 
fully
understand, you might send the whole analyzer chain.

But there might be a simple mistake in your pattern, character classes are 
enclosed by
square brackets. We do a replace of all non-alphanumeric characters like this:
**********************************
<filter class="solr.PatternReplaceFilterFactory"
        pattern="[^\w]+"
        replacement=""
        replace="all"
/>
**********************************

If that helps.
Regards from Basel
Oliver

-------- Original-Nachricht --------
Betreff: removing whitespaces in query
Von: Jochen Lienhard <lienh...@ub.uni-freiburg.de>
An: solr-user@lucene.apache.org
Datum: 07.03.2013 10:33

Hello,

we have indexed a field, where we have removed the whitespaces before the 
indexing.

For example:

50A91
Frei91\:9984

Now we want allow the users to search for:

50 A 91
Frei 91 \: 9984

Our idea was to add a PatternReplaceFilterFactory in the query analyzer to 
remove the
whitespaces:
<charFilter class="solr.PatternReplaceFilterFactory" pattern="(\s+)" 
replacement=""
replace="all"/>

But it does not work.

For normal queries - we are using vufind als frontend - we can remove the 
whitespace in
the yaml part, but if
the user search with wildcards ... the yaml does not work ... so we hope to 
find a
solution in solr.

We are using solr 3.6.

Thanks for ideas and hints.

Greetings from Germany

Jochen




Reply via email to