Hi Oliver. thank for the answer. We tried pattern="[\s]+" but it dont work. I can replace anything but not the whitespace...
Here our schema:<fieldtype name="sigField" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index"> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/><charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<filter class="solr.ICUFoldingFilterFactory"/> <filter class="solr.TrimFilterFactory"/> </analyzer> <analyzer type="query"><charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/><charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[\s]+" replacement="" replace="all"/>
<filter class="solr.TrimFilterFactory"/> <filter class="solr.LengthFilterFactory" min="2" max="100" /> </analyzer> </fieldtype> solr-admin shows me in debug-mode: <lst name="debug"> <str name="rawquerystring">si:(Frei 91\:)</str> <str name="querystring">si:(Frei 91\:)</str> <str name="parsedquery">+si:frei +si:91:</str> <str name="parsedquery_toString">+si:frei +si:91:</str> <lst name="explain"/> <str name="QParser">LuceneQParser</str> regards Hannah Am 07.03.2013 14:51, schrieb Oliver Schihin:
Hi Jochen You could try this: **************** <analyzer><charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/><tokenizer class="solr.KeywordTokenizerFactory" /> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.PatternReplaceFilterFactory" pattern="frei" replacement="blubb" replace="all" /> <filter class="solr.PatternReplaceFilterFactory" pattern="[\s]+" replacement="" replace="all" /> <filter class="solr.TrimFilterFactory" /> <filter class="solr.LengthFilterFactory" min="2" max="100" /> </analyzer> **************** Remarks:* I am not sure whether your sequence of filters is correct. I guess you should use charFilter at the beginning of the chain only, and patternReplace after the tokenizer. * If you use ICUFoldingFilter you won't need LowerCaseFilter, it would be redundant. LowerCase might do the job* TrimFilter is redundant in that setting, I guess.* A LenghtFilterFactory can be helpfull against odd term of only one character * You do have a type attribute="query" in your analyzer element. Do the two chains correspond or could you do with an analyzer for both index and query?Regards Oliver -------- Original-Nachricht -------- Betreff: Re: removing whitespaces in query Von: Jochen Lienhard <lienh...@ub.uni-freiburg.de> An: solr-user@lucene.apache.org Datum: 07.03.2013 11:04Hello Jilal and Oliver, hmmm ... I don't know, how two fields can help. The problem seems to be, that solr does not recognize the whitespace. We are using following analyser: <analyzer type="query"> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="Frei" replacement="blubb" replace="all"/> <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/><charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/><filter class="solr.ICUFoldingFilterFactory"/> <filter class="solr.TrimFilterFactory"/> </analyzer>It replaces in the Query: Frei 91 \: 9984 the Frei with blubb ... so it seems to workperfect. But when we try to replace whitespace using \s nothing happens.@Oliver: we dont want replace the : in the query ... it is a part of our callnumbers.Greetings Jochen Oliver Schihin schrieb:Hello JochenWhat are your tokenizers? I guess it should be 'KeywordTokenizerFactory'. To fullyunderstand, you might send the whole analyzer chain.But there might be a simple mistake in your pattern, character classes are enclosed by square brackets. We do a replace of all non-alphanumeric characters like this:********************************** <filter class="solr.PatternReplaceFilterFactory" pattern="[^\w]+" replacement="" replace="all" /> ********************************** If that helps. Regards from Basel Oliver -------- Original-Nachricht -------- Betreff: removing whitespaces in query Von: Jochen Lienhard <lienh...@ub.uni-freiburg.de> An: solr-user@lucene.apache.org Datum: 07.03.2013 10:33Hello,we have indexed a field, where we have removed the whitespaces before the indexing.For example: 50A91 Frei91\:9984 Now we want allow the users to search for: 50 A 91 Frei 91 \: 9984Our idea was to add a PatternReplaceFilterFactory in the query analyzer to remove thewhitespaces:<charFilter class="solr.PatternReplaceFilterFactory" pattern="(\s+)" replacement=""replace="all"/> But it does not work.For normal queries - we are using vufind als frontend - we can remove the whitespace inthe yaml part, but ifthe user search with wildcards ... the yaml does not work ... so we hope to find asolution in solr. We are using solr 3.6. Thanks for ideas and hints. Greetings from Germany Jochen
-- Hannah Ullrich Universitaetsbibliothek Freiburg IT Dezernat Rempartstr. 10-16 79098 Freiburg Tel: +49-761 / 203-3877
smime.p7s
Description: S/MIME Kryptografische Unterschrift