Re: removing whitespaces in query

Upayavira Thu, 07 Mar 2013 06:41:58 -0800

Your issue, I would say, is that the whitespace is being interpreted by
the query parser, before it is getting to the analyzer.


A query of 'q=foo bar' would be converted to 'text:foo text:bar'

You can achieve what you want, but you require some quite whacky syntax.
To search for the term 'energy' in the body field, and the term 'foo
bar' in the 'from' field, you could use:

http://localhost:8983/solr/collection1/select?q=body:energy AND
_query_:"{!term f=from v=$term}"&term=foo bar

This says:

 * use the default lucene query parser to convert body:energy into a
 simple term query
 * use the term query parser to create a term query using the 'from'
 field, and the value found in the 'term' request parameter', i.e. foo
 bar.
 * combine these with an AND

It appears to work for me.

This uses the term query parser to avoid the lucene query parser
splitting your query into multiple terms before it is analysed.

Hope that helps.

Upayavira

On Thu, Mar 7, 2013, at 02:08 PM, Hannah Ullrich wrote:
> Hi Oliver.
> 
> thank for the answer.
> We tried pattern="[\s]+" but it dont work.
> I can replace anything but not the whitespace...
> 
> Here our schema:
> 
> <fieldtype name="sigField" class="solr.TextField" 
> positionIncrementGap="100">
>        <analyzer type="index">
>           <tokenizer class="solr.KeywordTokenizerFactory"/>
>           <filter class="solr.LowerCaseFilterFactory"/>
>           <charFilter class="solr.MappingCharFilterFactory" 
> mapping="mapping-ISOLatin1Accent.txt"/>
>     <filter class="solr.ICUFoldingFilterFactory"/>
>           <filter class="solr.TrimFilterFactory"/>
>        </analyzer>
> 
> <analyzer type="query">
>           <charFilter class="solr.MappingCharFilterFactory" 
> mapping="mapping-ISOLatin1Accent.txt"/>
>           <tokenizer class="solr.KeywordTokenizerFactory"/>
>           <filter class="solr.LowerCaseFilterFactory"/>
>           <charFilter class="solr.PatternReplaceCharFilterFactory" 
> pattern="[\s]+" replacement="" replace="all"/>
>           <filter class="solr.TrimFilterFactory"/>
>          <filter class="solr.LengthFilterFactory" min="2" max="100" />
>   </analyzer>
>       </fieldtype>
> 
> solr-admin shows me in debug-mode:
> 
> <lst name="debug">
> <str name="rawquerystring">si:(Frei 91\:)</str>
> <str name="querystring">si:(Frei 91\:)</str>
> <str name="parsedquery">+si:frei +si:91:</str>
> <str name="parsedquery_toString">+si:frei +si:91:</str>
> <lst name="explain"/>
> <str name="QParser">LuceneQParser</str>
> 
> 
> regards
> 
> Hannah
> 
> Am 07.03.2013 14:51, schrieb Oliver Schihin:
> > Hi Jochen
> >
> > You could try this:
> > ****************
> > <analyzer>
> >    <charFilter class="solr.MappingCharFilterFactory" 
> > mapping="mapping-ISOLatin1Accent.txt"/>
> >    <tokenizer class="solr.KeywordTokenizerFactory" />
> >    <filter class="solr.LowerCaseFilterFactory" />
> >    <filter class="solr.PatternReplaceFilterFactory"
> >            pattern="frei"
> >            replacement="blubb"
> >            replace="all"
> >    />
> >    <filter class="solr.PatternReplaceFilterFactory"
> >            pattern="[\s]+"
> >            replacement=""
> >            replace="all"
> >    />
> >    <filter class="solr.TrimFilterFactory" />
> >    <filter class="solr.LengthFilterFactory" min="2" max="100" />
> > </analyzer>
> > ****************
> >
> > Remarks:
> > * I am not sure whether your sequence of filters is correct. I guess 
> > you should use charFilter at the beginning of the chain only, and 
> > patternReplace after the tokenizer.
> > * If you use ICUFoldingFilter you won't need LowerCaseFilter, it would 
> > be redundant. LowerCase might do the job
> > * TrimFilter is redundant in that setting, I guess.
> > * A LenghtFilterFactory can be helpfull against odd term of only one 
> > character
> > * You do have a type attribute="query" in your analyzer element. Do 
> > the two chains correspond or could you do with  an analyzer for both 
> > index and query?
> >
> > Regards
> > Oliver
> >
> >
> > -------- Original-Nachricht --------
> > Betreff: Re: removing whitespaces in query
> > Von: Jochen Lienhard <lienh...@ub.uni-freiburg.de>
> > An: solr-user@lucene.apache.org
> > Datum: 07.03.2013 11:04
> >
> >> Hello Jilal and Oliver,
> >>
> >> hmmm ... I don't know, how two fields can help.
> >>
> >> The problem seems to be, that solr does not recognize the whitespace.
> >>
> >> We are using following analyser:
> >> <analyzer type="query">
> >> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="Frei"
> >> replacement="blubb" replace="all"/>
> >> <tokenizer class="solr.KeywordTokenizerFactory"/>
> >> <filter class="solr.LowerCaseFilterFactory"/>
> >> <charFilter class="solr.MappingCharFilterFactory" 
> >> mapping="mapping-ISOLatin1Accent.txt"/>
> >> <filter class="solr.ICUFoldingFilterFactory"/>
> >> <filter class="solr.TrimFilterFactory"/>
> >> </analyzer>
> >>
> >> It replaces in the Query: Frei 91 \: 9984 the Frei with blubb ... so 
> >> it seems to work
> >> perfect.
> >> But when we try to replace whitespace using \s nothing happens.
> >>
> >> @Oliver: we dont want replace the : in the query ... it is a part of 
> >> our callnumbers.
> >>
> >> Greetings
> >>
> >> Jochen
> >>
> >> Oliver Schihin schrieb:
> >>> Hello Jochen
> >>>
> >>> What are your tokenizers? I guess it should be 
> >>> 'KeywordTokenizerFactory'. To fully
> >>> understand, you might send the whole analyzer chain.
> >>>
> >>> But there might be a simple mistake in your pattern, character 
> >>> classes are enclosed by
> >>> square brackets. We do a replace of all non-alphanumeric characters 
> >>> like this:
> >>> **********************************
> >>> <filter class="solr.PatternReplaceFilterFactory"
> >>>         pattern="[^\w]+"
> >>>         replacement=""
> >>>         replace="all"
> >>> />
> >>> **********************************
> >>>
> >>> If that helps.
> >>> Regards from Basel
> >>> Oliver
> >>>
> >>> -------- Original-Nachricht --------
> >>> Betreff: removing whitespaces in query
> >>> Von: Jochen Lienhard <lienh...@ub.uni-freiburg.de>
> >>> An: solr-user@lucene.apache.org
> >>> Datum: 07.03.2013 10:33
> >>>
> >>>> Hello,
> >>>>
> >>>> we have indexed a field, where we have removed the whitespaces 
> >>>> before the indexing.
> >>>>
> >>>> For example:
> >>>>
> >>>> 50A91
> >>>> Frei91\:9984
> >>>>
> >>>> Now we want allow the users to search for:
> >>>>
> >>>> 50 A 91
> >>>> Frei 91 \: 9984
> >>>>
> >>>> Our idea was to add a PatternReplaceFilterFactory in the query 
> >>>> analyzer to remove the
> >>>> whitespaces:
> >>>> <charFilter class="solr.PatternReplaceFilterFactory" 
> >>>> pattern="(\s+)" replacement=""
> >>>> replace="all"/>
> >>>>
> >>>> But it does not work.
> >>>>
> >>>> For normal queries - we are using vufind als frontend - we can 
> >>>> remove the whitespace in
> >>>> the yaml part, but if
> >>>> the user search with wildcards ... the yaml does not work ... so we 
> >>>> hope to find a
> >>>> solution in solr.
> >>>>
> >>>> We are using solr 3.6.
> >>>>
> >>>> Thanks for ideas and hints.
> >>>>
> >>>> Greetings from Germany
> >>>>
> >>>> Jochen
> >>>>
> >>>
> >>
> >>
> 
> 
> -- 
> Hannah Ullrich
> 
> Universitaetsbibliothek Freiburg
> IT Dezernat
> Rempartstr. 10-16
> 79098 Freiburg
> Tel: +49-761 / 203-3877
> 
> 
> Email had 1 attachment:
> + smime.p7s
>   6k (application/pkcs7-signature)

Re: removing whitespaces in query

Reply via email to