Re: Help in resolving the below retrieval issue

Prathik Puthran Thu, 12 Sep 2013 04:01:59 -0700

Hi,

I am also seeing this issue when the search query is something like "how
are you?" (Quotes for clarity).
The query parser splits it to the below tokens:
+text:whats +text:your +text:raashee?


However when I remove the "?" from the search query "how are you" I get the
results.
Is "?" a special character? Should it be escaped as well?


On Wed, Sep 11, 2013 at 1:50 AM, Jack Krupansky <j...@basetechnology.com>wrote:

> Removing stray hyphens (embedded hyphens, like "CD-ROM", are okay) or
> escaping them with backslash looks like your best bests. There's no query
> parser option to disable the hyphen as an exlusion operator, although an
> upgrade to a "modern" Solr should fix the problem.
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: Prathik Puthran
> Sent: Tuesday, September 10, 2013 4:13 PM
> To: solr-user@lucene.apache.org
>
> Subject: Re: Help in resolving the below retrieval issue
>
> I'm using Solr 3.4.
>
>
> This bug is causing the 2nd term i.e. "kumar" to be treated as an exclusion
> operator?
> Is it possible to configure the query parser to not treat the '-' as
> exclusion operator ?
> If not the only way is to remove the '-' from the query string?
>
> Thanks,
> Prathik
>
>
> On Tue, Sep 10, 2013 at 10:36 PM, Jack Krupansky <j...@basetechnology.com>
> **wrote:
>
>  What release of Solr are you using?
>>
>> It appears that the hyphen is being treated as an exclusion operator even
>> though it is followed by a space. Solr 4.4 doesn't appear to do that, but
>> maybe earlier releases had a problem.
>>
>> In any case, be careful with leading hyphen in queries since it does mean
>> exclude documents that contain the following term.
>>
>> Or, just escape any leading hyphen with a backslash.
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Prathik Puthran
>> Sent: Tuesday, September 10, 2013 11:47 AM
>> To: d...@lucene.apache.org ; solr-user@lucene.apache.org
>> Subject: Re: Help in resolving the below retrieval issue
>>
>>
>> Thanks Erick for the response.
>> I tried to debug the query. Below is the response in the debug node
>>
>> <str name="rawquerystring">Rahul - kumar</str><str
>> name="querystring">Rahul
>> - kumar</str><str name="parsedquery">+text:Rahul -text:kumar</str><str
>> name="parsedquery_toString">+****text:Rahul -text:kumar</str><lst
>> name="explain"/><str name="QParser">LuceneQParser</****str><arr
>> name="filter_queries"><str>****Rahul - kumar</str></arr><arr
>> name="parsed_filter_queries"><****str>+text:rahul -text:kumar</str></arr>
>>
>>
>>
>> Does it mean the query parser has parsed it to tokens "Rahul -" and
>> "kumar"?
>> Even if this was the case solr should be able to retrieve the documents
>> because I have indexed all the documents based on n-grams as well.
>>
>> Thanks,
>> Prathik
>>
>>
>> On Tue, Sep 10, 2013 at 7:09 PM, Erick Erickson <erickerick...@gmail.com
>> >*
>> *wrote:
>>
>>
>>  Try adding &debug=query to the url. What I think you'll find is that
>>
>>> you're running into
>>> a common issue, the difference between query parsing and analysis.
>>>
>>> when you submit anything with whitespace in it, the query parser will
>>> break it up
>>> _before_ it gets to the analysis part, you should see something in the
>>> debug
>>> portion of the query like
>>> field:rahul field:kumar and possibly even field:-
>>>
>>> These are searched as separate tokens. By specifying KeywordTokenizer, at
>>> index time you'll have exactly one token, rahul-kumar in the index which
>>> will not
>>> match any of the separated tokens
>>>
>>> Try escaping the spaces with backslash. You could also try quoting the
>>> input although
>>> that has some phrase implications.
>>>
>>> Do you really want this search to fail on just searching "rahul" though?
>>> Perhaps
>>> keywordTokenizer isn't best here, it depends upon your use-case...
>>>
>>> Best,
>>> Erick
>>>
>>>
>>> On Tue, Sep 10, 2013 at 8:10 AM, Prathik Puthran <
>>> prathik.puthra...@gmail.com> wrote:
>>>
>>>  Hi,
>>>
>>>>
>>>> I am facing the below issue where in Solr is not retrieving the indexed
>>>> word for some cases.
>>>>
>>>> This happens whenever the indexed word has string " - " (quotes for
>>>> clarity) as substring i.e word prefix followed by a space which is
>>>> followed
>>>> by '-' again followed by a space and followed by the rest of the word
>>>> suffix.
>>>> When I search with search query being the exact string Solr returns no
>>>> results.
>>>>
>>>> Example:
>>>> Indexed word --> "Rahul - kumar"  (quotes for clarity)
>>>> If I search with the search query as below Solr gives no results
>>>> Search query --> "Rahul - kumar"  (quotes for clarity)
>>>>
>>>> However the below search query returns the results
>>>> Search query --> "Rahul kumar"
>>>>
>>>> Can you please let me know what I am doing wrong here and what should I
>>>> do to ensure the first query i.e. "Rahul - kumar" returns the documents
>>>> indexed using it.
>>>>
>>>> Below are the analyzers I am using:
>>>> Index time analyzer components:
>>>> 1) <charFilter class="solr.****PatternReplaceCharFilterFactor****y"
>>>>
>>>> pattern="([^A-Za-z0-9 ])" replacement=""/>
>>>>  2) <tokenizer class="solr.****KeywordTokenizerFactory"/>
>>>>  3) <filter class="solr.****LowerCaseFilterFactory"/>
>>>>  4) <filter class="solr.****WordDelimiterFilterFactory"
>>>>
>>>> generateWordParts="1"
>>>> preserveOriginal="1"/>
>>>>  5) <filter class="solr.****EdgeNGramFilterFactory" minGramSize="2"
>>>>
>>>> maxGramSize="50" side="front"/>
>>>>  6) <filter class="solr.****EdgeNGramFilterFactory" minGramSize="2"
>>>>
>>>> maxGramSize="50" side="back"/>
>>>>
>>>> Query time analyzer components:
>>>>  1) <charFilter class="solr.****PatternReplaceCharFilterFactor****y"
>>>>
>>>> pattern="([^A-Za-z0-9 ])" replacement=""/>
>>>>  2) <tokenizer class="solr.****KeywordTokenizerFactory"/>
>>>>  3) <filter class="solr.****LowerCaseFilterFactory"/>
>>>>  4) <filter class="solr.****WordDelimiterFilterFactory"
>>>>
>>>> generateWordParts="1"
>>>> preserveOriginal="1"/>
>>>>
>>>>
>>>> Can you please let me know how I can fix this?
>>>>
>>>> Thanks,
>>>> Prathik
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Help in resolving the below retrieval issue

Reply via email to