Re: Help in resolving the below retrieval issue

Jack Krupansky Tue, 10 Sep 2013 13:22:00 -0700

Removing stray hyphens (embedded hyphens, like "CD-ROM", are okay) orescaping them with backslash looks like your best bests. There's no queryparser option to disable the hyphen as an exlusion operator, although anupgrade to a "modern" Solr should fix the problem.


-- Jack Krupansky

-----Original Message-----From: Prathik Puthran

Sent: Tuesday, September 10, 2013 4:13 PM
To: [email protected]
Subject: Re: Help in resolving the below retrieval issue

I'm using Solr 3.4.

This bug is causing the 2nd term i.e. "kumar" to be treated as an exclusion
operator?
Is it possible to configure the query parser to not treat the '-' as
exclusion operator ?
If not the only way is to remove the '-' from the query string?

Thanks,
Prathik

On Tue, Sep 10, 2013 at 10:36 PM, Jack Krupansky<[email protected]>wrote:

What release of Solr are you using?

It appears that the hyphen is being treated as an exclusion operator even
though it is followed by a space. Solr 4.4 doesn't appear to do that, but
maybe earlier releases had a problem.

In any case, be careful with leading hyphen in queries since it does mean
exclude documents that contain the following term.

Or, just escape any leading hyphen with a backslash.

-- Jack Krupansky

-----Original Message----- From: Prathik Puthran
Sent: Tuesday, September 10, 2013 11:47 AM
To: [email protected] ; [email protected]
Subject: Re: Help in resolving the below retrieval issue


Thanks Erick for the response.
I tried to debug the query. Below is the response in the debug node

<str name="rawquerystring">Rahul - kumar</str><strname="querystring">Rahul

- kumar</str><str name="parsedquery">+text:Rahul -text:kumar</str><str
name="parsedquery_toString">+**text:Rahul -text:kumar</str><lst
name="explain"/><str name="QParser">LuceneQParser</**str><arr
name="filter_queries"><str>**Rahul - kumar</str></arr><arr
name="parsed_filter_queries"><**str>+text:rahul -text:kumar</str></arr>


Does it mean the query parser has parsed it to tokens "Rahul -" and
"kumar"?
Even if this was the case solr should be able to retrieve the documents
because I have indexed all the documents based on n-grams as well.

Thanks,
Prathik


On Tue, Sep 10, 2013 at 7:09 PM, Erick Erickson <[email protected]>*
*wrote:

 Try adding &debug=query to the url. What I think you'll find is that

you're running into
a common issue, the difference between query parsing and analysis.

when you submit anything with whitespace in it, the query parser will
break it up
_before_ it gets to the analysis part, you should see something in the
debug
portion of the query like
field:rahul field:kumar and possibly even field:-

These are searched as separate tokens. By specifying KeywordTokenizer, at
index time you'll have exactly one token, rahul-kumar in the index which
will not
match any of the separated tokens

Try escaping the spaces with backslash. You could also try quoting the
input although
that has some phrase implications.

Do you really want this search to fail on just searching "rahul" though?
Perhaps
keywordTokenizer isn't best here, it depends upon your use-case...

Best,
Erick


On Tue, Sep 10, 2013 at 8:10 AM, Prathik Puthran <
[email protected]> wrote:

 Hi,


I am facing the below issue where in Solr is not retrieving the indexed
word for some cases.

This happens whenever the indexed word has string " - " (quotes for
clarity) as substring i.e word prefix followed by a space which is
followed
by '-' again followed by a space and followed by the rest of the word
suffix.
When I search with search query being the exact string Solr returns no
results.

Example:
Indexed word --> "Rahul - kumar"  (quotes for clarity)
If I search with the search query as below Solr gives no results
Search query --> "Rahul - kumar"  (quotes for clarity)

However the below search query returns the results
Search query --> "Rahul kumar"

Can you please let me know what I am doing wrong here and what should I
do to ensure the first query i.e. "Rahul - kumar" returns the documents
indexed using it.

Below are the analyzers I am using:
Index time analyzer components:
1) <charFilter class="solr.**PatternReplaceCharFilterFactor**y"
pattern="([^A-Za-z0-9 ])" replacement=""/>
 2) <tokenizer class="solr.**KeywordTokenizerFactory"/>
 3) <filter class="solr.**LowerCaseFilterFactory"/>
 4) <filter class="solr.**WordDelimiterFilterFactory"
generateWordParts="1"
preserveOriginal="1"/>
 5) <filter class="solr.**EdgeNGramFilterFactory" minGramSize="2"
maxGramSize="50" side="front"/>
 6) <filter class="solr.**EdgeNGramFilterFactory" minGramSize="2"
maxGramSize="50" side="back"/>

Query time analyzer components:
 1) <charFilter class="solr.**PatternReplaceCharFilterFactor**y"
pattern="([^A-Za-z0-9 ])" replacement=""/>
 2) <tokenizer class="solr.**KeywordTokenizerFactory"/>
 3) <filter class="solr.**LowerCaseFilterFactory"/>
 4) <filter class="solr.**WordDelimiterFilterFactory"
generateWordParts="1"
preserveOriginal="1"/>


Can you please let me know how I can fix this?

Thanks,
Prathik

Re: Help in resolving the below retrieval issue

Reply via email to