On Mon, Jun 1, 2009 at 10:50 AM, Sam Michaels <mas...@yahoo.com> wrote: > > So the fix for this problem would be > > 1. Stop using WordDelimiterFilter for queries (what is the alternative) OR > 2. Not allow any search strings without any alphanumeric characters..
Short term workaround for you, yes. I would classify this surprising behavior as a bug we should eventually fix though. Could you open a JIRA issue for it? -Yonik http://www.lucidimagination.com > SM. > > > Yonik Seeley-2 wrote: >> >> OK, here's the deal: >> >> <str name="rawquerystring">-features:foo features:(\...@#$%\^&\*\(\))</str> >> <str name="querystring">-features:foo features:(\...@#$%\^&\*\(\))</str> >> <str name="parsedquery">-features:foo</str> >> <str name="parsedquery_toString">-features:foo</str> >> >> The text analysis is throwing away non alphanumeric chars (probably >> the WordDelimiterFilter). The Lucene (and Solr) query parser throws >> away term queries when the token is zero length (after analysis). >> Solr then interprets the left over "-features:foo" as "all documents >> not containing foo in the features field", so you get a bunch of >> matches. >> >> -Yonik >> http://www.lucidimagination.com >> >> >> On Mon, Jun 1, 2009 at 10:15 AM, Sam Michaels <mas...@yahoo.com> wrote: >>> >>> Walter, >>> >>> The analysis link does not produce any matches for either @ or !...@#$%^&*() >>> strings when I try to match against bathing. I'm worried that this might >>> be >>> the symptom of another problem (which has not revealed itself yet) and >>> want >>> to get to the bottom of this... >>> >>> Thank you. >>> sm >>> >>> >>> Walter Underwood wrote: >>>> >>>> Use the [analysis] link on the Solr admin UI to get more info on >>>> how this is being interpreted. >>>> >>>> However, I am curious about why this is important. Do users enter >>>> this query often? If not, maybe it is not something to spend time on. >>>> >>>> wunder >>>> >>>> On 5/31/09 2:56 PM, "Sam Michaels" <mas...@yahoo.com> wrote: >>>> >>>>> >>>>> Here is the output from the debug query when I'm trying to match the >>>>> String @ >>>>> against Bathing (should not match) >>>>> >>>>> <str name="GLOM-1"> >>>>> 3.2689073 = (MATCH) weight(activity_type:NAME in 0), product of: >>>>> 0.99999994 = queryWeight(activity_type:NAME), product of: >>>>> 3.2689075 = idf(docFreq=153, numDocs=1489) >>>>> 0.30591258 = queryNorm >>>>> 3.2689075 = (MATCH) fieldWeight(activity_type:NAME in 0), product of: >>>>> 1.0 = tf(termFreq(activity_type:NAME)=1) >>>>> 3.2689075 = idf(docFreq=153, numDocs=1489) >>>>> 1.0 = fieldNorm(field=activity_type, doc=0) >>>>> </str> >>>>> >>>>> Looks like the AND clause in the search string is ignored... >>>>> >>>>> SM. >>>>> >>>>> >>>>> ryantxu wrote: >>>>>> >>>>>> two key things to try (for anyone ever wondering why a query matches >>>>>> documents) >>>>>> >>>>>> 1. add &debugQuery=true and look at the explain text below -- >>>>>> anything that contributed to the score is listed there >>>>>> 2. check /admin/analysis.jsp -- this will let you see how analyzers >>>>>> break text up into tokens. >>>>>> >>>>>> Not sure off hand, but I'm guessing the WordDelimiterFilterFactory has >>>>>> something to do with it... >>>>>> >>>>>> >>>>>> On Sat, May 30, 2009 at 5:59 PM, Sam Michaels <mas...@yahoo.com> >>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I'm running Solr 1.3/Java 1.6. >>>>>>> >>>>>>> When I run a query like - (activity_type:NAME) AND >>>>>>> title:(\...@#$%\^&\*\(\)) >>>>>>> all the documents are returned even though there is not a single >>>>>>> match. >>>>>>> There is no title that matches the string (which has been escaped). >>>>>>> >>>>>>> My document structure is as follows >>>>>>> >>>>>>> <doc> >>>>>>> <str name="activity_type">NAME</str> >>>>>>> <str name="title">Bathing</str> >>>>>>> .... >>>>>>> </doc> >>>>>>> >>>>>>> >>>>>>> The title field is of type text_title which is described below. >>>>>>> >>>>>>> <fieldType name="text_title" class="solr.TextField" >>>>>>> positionIncrementGap="100"> >>>>>>> <analyzer type="index"> >>>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>>>>> <!-- in this example, we will only use synonyms at query time >>>>>>> <filter class="solr.SynonymFilterFactory" >>>>>>> synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/> >>>>>>> --> >>>>>>> <filter class="solr.WordDelimiterFilterFactory" >>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1" >>>>>>> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/> >>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >>>>>>> </analyzer> >>>>>>> <analyzer type="query"> >>>>>>> <tokenizer class="solr.WhitespaceTokenizerFactory"/> >>>>>>> <filter class="solr.SynonymFilterFactory" >>>>>>> synonyms="synonyms.txt" >>>>>>> ignoreCase="true" expand="true"/> >>>>>>> <filter class="solr.WordDelimiterFilterFactory" >>>>>>> generateWordParts="1" generateNumberParts="1" catenateWords="1" >>>>>>> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/> >>>>>>> <filter class="solr.LowerCaseFilterFactory"/> >>>>>>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >>>>>>> >>>>>>> </analyzer> >>>>>>> </fieldType> >>>>>>> >>>>>>> When I run the query against Luke, no results are returned. Any >>>>>>> suggestions >>>>>>> are appreciated. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> View this message in context: >>>>>>> http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-document >>>>>>> s-are-matched-incorrectly-tp23797731p23797731.html >>>>>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>>>>> >>>>>>> >>>>>> >>>>>> >>>> >>>> >>>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23815688.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> > > -- > View this message in context: > http://www.nabble.com/When-searching-for-%21%40-%24-%5E-*%28%29-all-documents-are-matched-incorrectly-tp23797731p23816242.html > Sent from the Solr - User mailing list archive at Nabble.com. > >