Hi Jochen
You could try this:
****************
<analyzer>
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory"
pattern="frei"
replacement="blubb"
replace="all"
/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="[\s]+"
replacement=""
replace="all"
/>
<filter class="solr.TrimFilterFactory" />
<filter class="solr.LengthFilterFactory" min="2" max="100" />
</analyzer>
****************
Remarks:
* I am not sure whether your sequence of filters is correct. I guess you should use
charFilter at the beginning of the chain only, and patternReplace after the tokenizer.
* If you use ICUFoldingFilter you won't need LowerCaseFilter, it would be redundant.
LowerCase might do the job
* TrimFilter is redundant in that setting, I guess.
* A LenghtFilterFactory can be helpfull against odd term of only one character
* You do have a type attribute="query" in your analyzer element. Do the two chains
correspond or could you do with an analyzer for both index and query?
Regards
Oliver
-------- Original-Nachricht --------
Betreff: Re: removing whitespaces in query
Von: Jochen Lienhard <lienh...@ub.uni-freiburg.de>
An: solr-user@lucene.apache.org
Datum: 07.03.2013 11:04
Hello Jilal and Oliver,
hmmm ... I don't know, how two fields can help.
The problem seems to be, that solr does not recognize the whitespace.
We are using following analyser:
<analyzer type="query">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="Frei"
replacement="blubb" replace="all"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
</analyzer>
It replaces in the Query: Frei 91 \: 9984 the Frei with blubb ... so it seems
to work
perfect.
But when we try to replace whitespace using \s nothing happens.
@Oliver: we dont want replace the : in the query ... it is a part of our
callnumbers.
Greetings
Jochen
Oliver Schihin schrieb:
Hello Jochen
What are your tokenizers? I guess it should be 'KeywordTokenizerFactory'. To
fully
understand, you might send the whole analyzer chain.
But there might be a simple mistake in your pattern, character classes are
enclosed by
square brackets. We do a replace of all non-alphanumeric characters like this:
**********************************
<filter class="solr.PatternReplaceFilterFactory"
pattern="[^\w]+"
replacement=""
replace="all"
/>
**********************************
If that helps.
Regards from Basel
Oliver
-------- Original-Nachricht --------
Betreff: removing whitespaces in query
Von: Jochen Lienhard <lienh...@ub.uni-freiburg.de>
An: solr-user@lucene.apache.org
Datum: 07.03.2013 10:33
Hello,
we have indexed a field, where we have removed the whitespaces before the
indexing.
For example:
50A91
Frei91\:9984
Now we want allow the users to search for:
50 A 91
Frei 91 \: 9984
Our idea was to add a PatternReplaceFilterFactory in the query analyzer to
remove the
whitespaces:
<charFilter class="solr.PatternReplaceFilterFactory" pattern="(\s+)"
replacement=""
replace="all"/>
But it does not work.
For normal queries - we are using vufind als frontend - we can remove the
whitespace in
the yaml part, but if
the user search with wildcards ... the yaml does not work ... so we hope to
find a
solution in solr.
We are using solr 3.6.
Thanks for ideas and hints.
Greetings from Germany
Jochen