Re: Tokenizer question

rswart Mon, 11 Jan 2010 12:22:40 -0800

We are using the standard query parser (so no dismax).

Fieldtype is solr.TextField with the following query analyzer:


                        <analyzer type="query">
        <tokenizer class="solr.PatternTokenizerFactory" pattern="(\s+|-)" /> 
              <filter class="solr.StopFilterFactory"
words="../../../synonyms/nl_stopwords.txt" ignoreCase="true"/>
                                <filter class="solr.SynonymFilterFactory"
                                        
synonyms="../../../synonyms/nl_synonyms.txt" ignoreCase="true"
expand="true" />
                                <filter class="solr.PatternReplaceFilterFactory"
                                        pattern="-" replacement=" " 
replace="all" />
                                <filter 
class="com.foo.IgnoreListWordDelimiterFilterFactory"
                                        generateWordParts="1" 
generateNumberParts="1" catenateWords="1"
                                        catenateNumbers="0" catenateAll="0" 
preserveOriginal="0"
splitOnCaseChange="0" ignoreList="@&amp;"/>
                                <filter class="solr.PatternReplaceFilterFactory"
                                        pattern="^0+(.)" replacement="$1" 
replace="all" />
                                <filter class="solr.LowerCaseFilterFactory" />
                                <filter 
class="solr.RemoveDuplicatesTokenFilterFactory" />
                        </analyzer>




Grant Ingersoll-6 wrote:
> 
> And also, what query parser are you using? 
> On Jan 11, 2010, at 2:46 PM, Grant Ingersoll wrote:
> 
>> What do your FieldTypes look like for the fields in question?
>> 
>> On Jan 10, 2010, at 10:05 AM, rswart wrote:
>> 
>>> 
>>> Hi,
>>> 
>>> This is probably an easy question. 
>>> 
>>> I am doing a simple query on postcode and house number. If the
>>> housenumber
>>> contains a minus sign like:
>>> 
>>> q=PostCode:(1078 pw)+AND+HouseNumber:(39-43)
>>> 
>>> the resulting parsed query contains a phrase query:
>>> 
>>> +(PostCode:1078 PostCode:pw) +PhraseQuery(HouseNumber:"39 43")
>>> 
>>> This never matches.
>>> 
>>> What I want solr to do is generate the following parsed query
>>> (essentially
>>> an OR for both house numbers):
>>> 
>>> +(PostCode:1078 PostCode:pw) +(HouseNumber:39 HouseNumber:43)
>>> 
>>> Solr generates this based on the following query (so a space instead of
>>> a
>>> minus sign):
>>> 
>>> q=PostCode:(1078 pw)+AND+HouseNumber:(39 43)
>>> 
>>> 
>>> I tried two things to have Solr generate the desired parsed query:
>>> 
>>> 1. WordDelimiterFilterFactory with generateNumberParts=1 but this
>>> results in
>>> a phrase query
>>> 2. PatternTokenizerFactory that splits on (\s+|-).
>>> 
>>> But both options don't work. 
>>> 
>>> Any suggestions on how to get rid of the phrase query?
>>> 
>>> Thanks,
>>> 
>>> Richard
>>> -- 
>>> View this message in context:
>>> http://old.nabble.com/Tokenizer-question-tp27099119p27099119.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>> 
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>> 
>> Search the Lucene ecosystem using Solr/Lucene:
>> http://www.lucidimagination.com/search
>> 
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Tokenizer-question-tp27099119p27117036.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Tokenizer question

Reply via email to