Re: When searching for !@#$%^&*() all documents are matched incorrectly

Yonik Seeley Mon, 01 Jun 2009 07:55:44 -0700

On Mon, Jun 1, 2009 at 10:50 AM, Sam Michaels <mas...@yahoo.com> wrote:
>
> So the fix for this problem would be
>
> 1. Stop using WordDelimiterFilter for queries (what is the alternative) OR
> 2. Not allow any search strings without any alphanumeric characters..


Short term workaround for you, yes.
I would classify this surprising behavior as a bug we should
eventually fix though.  Could you open a JIRA issue for it?

-Yonik
http://www.lucidimagination.com

> SM.
>
>
> Yonik Seeley-2 wrote:
>>
>> OK, here's the deal:
>>
>> <str name="rawquerystring">-features:foo features:(\...@#$%\^&\*\(\))</str>
>> <str name="querystring">-features:foo features:(\...@#$%\^&\*\(\))</str>
>> <str name="parsedquery">-features:foo</str>
>> <str name="parsedquery_toString">-features:foo</str>
>>
>> The text analysis is throwing away non alphanumeric chars (probably
>> the WordDelimiterFilter).  The Lucene (and Solr) query parser throws
>> away term queries when the token is zero length (after analysis).
>> Solr then interprets the left over "-features:foo" as "all documents
>> not containing foo in the features field", so you get a bunch of
>> matches.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>> On Mon, Jun 1, 2009 at 10:15 AM, Sam Michaels <mas...@yahoo.com> wrote:
>>>
>>> Walter,
>>>
>>> The analysis link does not produce any matches for either @ or !...@#$%^&*()
>>> strings when I try to match against bathing. I'm worried that this might
>>> be
>>> the symptom of another problem (which has not revealed itself yet) and
>>> want
>>> to get to the bottom of this...
>>>
>>> Thank you.
>>> sm
>>>
>>>
>>> Walter Underwood wrote:
>>>>
>>>> Use the [analysis] link on the Solr admin UI to get more info on
>>>> how this is being interpreted.
>>>>
>>>> However, I am curious about why this is important. Do users enter
>>>> this query often? If not, maybe it is not something to spend time on.
>>>>
>>>> wunder
>>>>
>>>> On 5/31/09 2:56 PM, "Sam Michaels" <mas...@yahoo.com> wrote:
>>>>
>>>>>
>>>>> Here is the output from the debug query when I'm trying to match the
>>>>> String @
>>>>> against Bathing (should not match)
>>>>>
>>>>> <str name="GLOM-1">
>>>>> 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of:
>>>>>   0.99999994 = queryWeight(activity_type:NAME), product of:
>>>>>     3.2689075 = idf(docFreq=153, numDocs=1489)
>>>>>     0.30591258 = queryNorm
>>>>>   3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product of:
>>>>>     1.0 = tf(termFreq(activity_type:NAME)=1)
>>>>>     3.2689075 = idf(docFreq=153, numDocs=1489)
>>>>>     1.0 = fieldNorm(field=activity_type, doc=0)
>>>>> </str>
>>>>>
>>>>> Looks like the AND clause in the search string is ignored...
>>>>>
>>>>> SM.
>>>>>
>>>>>
>>>>> ryantxu wrote:
>>>>>>
>>>>>> two key things to try (for anyone ever wondering why a query matches
>>>>>> documents)
>>>>>>
>>>>>> 1.  add &debugQuery=true and look at the explain text below --
>>>>>> anything that contributed to the score is listed there
>>>>>> 2.  check /admin/analysis.jsp -- this will let you see how analyzers
>>>>>> break text up into tokens.
>>>>>>
>>>>>> Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has
>>>>>> something to do with it...
>>>>>>
>>>>>>
>>>>>> On Sat, May 30, 2009 at 5:59 PM, Sam Michaels <mas...@yahoo.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm running Solr 1.3/Java 1.6.
>>>>>>>
>>>>>>> When I run a query like  - (activity_type:NAME) AND
>>>>>>> title:(\...@#$%\^&\*\(\))
>>>>>>> all the documents are returned even though there is not a single
>>>>>>> match.
>>>>>>> There is no title that matches the string (which has been escaped).
>>>>>>>
>>>>>>> My document structure is as follows
>>>>>>>
>>>>>>> <doc>
>>>>>>> <str name="activity_type">NAME</str>
>>>>>>> <str name="title">Bathing</str>
>>>>>>> ....
>>>>>>> </doc>
>>>>>>>
>>>>>>>
>>>>>>> The title field is of type text_title which is described below.
>>>>>>>
>>>>>>> <fieldType name="text_title" class="solr.TextField"
>>>>>>> positionIncrementGap="100">
>>>>>>>      <analyzer type="index">
>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>>>        <!-- in this example, we will only use synonyms at query time
>>>>>>>        <filter class="solr.SynonymFilterFactory"
>>>>>>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
>>>>>>>        -->
>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>>>>> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
>>>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>>>>>      </analyzer>
>>>>>>>      <analyzer type="query">
>>>>>>>        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>>>>>>        <filter class="solr.SynonymFilterFactory"
>>>>>>> synonyms="synonyms.txt"
>>>>>>> ignoreCase="true" expand="true"/>
>>>>>>>        <filter class="solr.WordDelimiterFilterFactory"
>>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1"
>>>>>>> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
>>>>>>>        <filter class="solr.LowerCaseFilterFactory"/>
>>>>>>>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>>>>>
>>>>>>>      </analyzer>
>>>>>>>    </fieldType>
>>>>>>>
>>>>>>> When I run the query against Luke, no results are returned. Any
>>>>>>> suggestions
>>>>>>> are appreciated.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-document
>>>>>>> s-are-matched-incorrectly-tp23797731p23797731.html
>>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23815688.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23816242.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: When searching for !@#$%^&*() all documents are matched incorrectly

Reply via email to