Hi Oliver.

thank for the answer.
We tried pattern="[\s]+" but it dont work.
I can replace anything but not the whitespace...

Here our schema:

<fieldtype name="sigField" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
         <tokenizer class="solr.KeywordTokenizerFactory"/>
         <filter class="solr.LowerCaseFilterFactory"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
   <filter class="solr.ICUFoldingFilterFactory"/>
         <filter class="solr.TrimFilterFactory"/>
      </analyzer>

<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
         <tokenizer class="solr.KeywordTokenizerFactory"/>
         <filter class="solr.LowerCaseFilterFactory"/>
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[\s]+" replacement="" replace="all"/>
         <filter class="solr.TrimFilterFactory"/>
        <filter class="solr.LengthFilterFactory" min="2" max="100" />
 </analyzer>
     </fieldtype>

solr-admin shows me in debug-mode:

<lst name="debug">
<str name="rawquerystring">si:(Frei 91\:)</str>
<str name="querystring">si:(Frei 91\:)</str>
<str name="parsedquery">+si:frei +si:91:</str>
<str name="parsedquery_toString">+si:frei +si:91:</str>
<lst name="explain"/>
<str name="QParser">LuceneQParser</str>


regards

Hannah

Am 07.03.2013 14:51, schrieb Oliver Schihin:
Hi Jochen

You could try this:
****************
<analyzer>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
   <tokenizer class="solr.KeywordTokenizerFactory" />
   <filter class="solr.LowerCaseFilterFactory" />
   <filter class="solr.PatternReplaceFilterFactory"
           pattern="frei"
           replacement="blubb"
           replace="all"
   />
   <filter class="solr.PatternReplaceFilterFactory"
           pattern="[\s]+"
           replacement=""
           replace="all"
   />
   <filter class="solr.TrimFilterFactory" />
   <filter class="solr.LengthFilterFactory" min="2" max="100" />
</analyzer>
****************

Remarks:
* I am not sure whether your sequence of filters is correct. I guess you should use charFilter at the beginning of the chain only, and patternReplace after the tokenizer. * If you use ICUFoldingFilter you won't need LowerCaseFilter, it would be redundant. LowerCase might do the job
* TrimFilter is redundant in that setting, I guess.
* A LenghtFilterFactory can be helpfull against odd term of only one character * You do have a type attribute="query" in your analyzer element. Do the two chains correspond or could you do with an analyzer for both index and query?

Regards
Oliver


-------- Original-Nachricht --------
Betreff: Re: removing whitespaces in query
Von: Jochen Lienhard <lienh...@ub.uni-freiburg.de>
An: solr-user@lucene.apache.org
Datum: 07.03.2013 11:04

Hello Jilal and Oliver,

hmmm ... I don't know, how two fields can help.

The problem seems to be, that solr does not recognize the whitespace.

We are using following analyser:
<analyzer type="query">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="Frei"
replacement="blubb" replace="all"/>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.TrimFilterFactory"/>
</analyzer>

It replaces in the Query: Frei 91 \: 9984 the Frei with blubb ... so it seems to work
perfect.
But when we try to replace whitespace using \s nothing happens.

@Oliver: we dont want replace the : in the query ... it is a part of our callnumbers.

Greetings

Jochen

Oliver Schihin schrieb:
Hello Jochen

What are your tokenizers? I guess it should be 'KeywordTokenizerFactory'. To fully
understand, you might send the whole analyzer chain.

But there might be a simple mistake in your pattern, character classes are enclosed by square brackets. We do a replace of all non-alphanumeric characters like this:
**********************************
<filter class="solr.PatternReplaceFilterFactory"
        pattern="[^\w]+"
        replacement=""
        replace="all"
/>
**********************************

If that helps.
Regards from Basel
Oliver

-------- Original-Nachricht --------
Betreff: removing whitespaces in query
Von: Jochen Lienhard <lienh...@ub.uni-freiburg.de>
An: solr-user@lucene.apache.org
Datum: 07.03.2013 10:33

Hello,

we have indexed a field, where we have removed the whitespaces before the indexing.

For example:

50A91
Frei91\:9984

Now we want allow the users to search for:

50 A 91
Frei 91 \: 9984

Our idea was to add a PatternReplaceFilterFactory in the query analyzer to remove the
whitespaces:
<charFilter class="solr.PatternReplaceFilterFactory" pattern="(\s+)" replacement=""
replace="all"/>

But it does not work.

For normal queries - we are using vufind als frontend - we can remove the whitespace in
the yaml part, but if
the user search with wildcards ... the yaml does not work ... so we hope to find a
solution in solr.

We are using solr 3.6.

Thanks for ideas and hints.

Greetings from Germany

Jochen






--
Hannah Ullrich

Universitaetsbibliothek Freiburg
IT Dezernat
Rempartstr. 10-16
79098 Freiburg
Tel: +49-761 / 203-3877


Attachment: smime.p7s
Description: S/MIME Kryptografische Unterschrift

Reply via email to