Hi, I am also seeing this issue when the search query is something like "how are you?" (Quotes for clarity). The query parser splits it to the below tokens: +text:whats +text:your +text:raashee?
However when I remove the "?" from the search query "how are you" I get the results. Is "?" a special character? Should it be escaped as well? On Wed, Sep 11, 2013 at 1:50 AM, Jack Krupansky <j...@basetechnology.com>wrote: > Removing stray hyphens (embedded hyphens, like "CD-ROM", are okay) or > escaping them with backslash looks like your best bests. There's no query > parser option to disable the hyphen as an exlusion operator, although an > upgrade to a "modern" Solr should fix the problem. > > > -- Jack Krupansky > > -----Original Message----- From: Prathik Puthran > Sent: Tuesday, September 10, 2013 4:13 PM > To: solr-user@lucene.apache.org > > Subject: Re: Help in resolving the below retrieval issue > > I'm using Solr 3.4. > > > This bug is causing the 2nd term i.e. "kumar" to be treated as an exclusion > operator? > Is it possible to configure the query parser to not treat the '-' as > exclusion operator ? > If not the only way is to remove the '-' from the query string? > > Thanks, > Prathik > > > On Tue, Sep 10, 2013 at 10:36 PM, Jack Krupansky <j...@basetechnology.com> > **wrote: > > What release of Solr are you using? >> >> It appears that the hyphen is being treated as an exclusion operator even >> though it is followed by a space. Solr 4.4 doesn't appear to do that, but >> maybe earlier releases had a problem. >> >> In any case, be careful with leading hyphen in queries since it does mean >> exclude documents that contain the following term. >> >> Or, just escape any leading hyphen with a backslash. >> >> -- Jack Krupansky >> >> -----Original Message----- From: Prathik Puthran >> Sent: Tuesday, September 10, 2013 11:47 AM >> To: d...@lucene.apache.org ; solr-user@lucene.apache.org >> Subject: Re: Help in resolving the below retrieval issue >> >> >> Thanks Erick for the response. >> I tried to debug the query. Below is the response in the debug node >> >> <str name="rawquerystring">Rahul - kumar</str><str >> name="querystring">Rahul >> - kumar</str><str name="parsedquery">+text:Rahul -text:kumar</str><str >> name="parsedquery_toString">+****text:Rahul -text:kumar</str><lst >> name="explain"/><str name="QParser">LuceneQParser</****str><arr >> name="filter_queries"><str>****Rahul - kumar</str></arr><arr >> name="parsed_filter_queries"><****str>+text:rahul -text:kumar</str></arr> >> >> >> >> Does it mean the query parser has parsed it to tokens "Rahul -" and >> "kumar"? >> Even if this was the case solr should be able to retrieve the documents >> because I have indexed all the documents based on n-grams as well. >> >> Thanks, >> Prathik >> >> >> On Tue, Sep 10, 2013 at 7:09 PM, Erick Erickson <erickerick...@gmail.com >> >* >> *wrote: >> >> >> Try adding &debug=query to the url. What I think you'll find is that >> >>> you're running into >>> a common issue, the difference between query parsing and analysis. >>> >>> when you submit anything with whitespace in it, the query parser will >>> break it up >>> _before_ it gets to the analysis part, you should see something in the >>> debug >>> portion of the query like >>> field:rahul field:kumar and possibly even field:- >>> >>> These are searched as separate tokens. By specifying KeywordTokenizer, at >>> index time you'll have exactly one token, rahul-kumar in the index which >>> will not >>> match any of the separated tokens >>> >>> Try escaping the spaces with backslash. You could also try quoting the >>> input although >>> that has some phrase implications. >>> >>> Do you really want this search to fail on just searching "rahul" though? >>> Perhaps >>> keywordTokenizer isn't best here, it depends upon your use-case... >>> >>> Best, >>> Erick >>> >>> >>> On Tue, Sep 10, 2013 at 8:10 AM, Prathik Puthran < >>> prathik.puthra...@gmail.com> wrote: >>> >>> Hi, >>> >>>> >>>> I am facing the below issue where in Solr is not retrieving the indexed >>>> word for some cases. >>>> >>>> This happens whenever the indexed word has string " - " (quotes for >>>> clarity) as substring i.e word prefix followed by a space which is >>>> followed >>>> by '-' again followed by a space and followed by the rest of the word >>>> suffix. >>>> When I search with search query being the exact string Solr returns no >>>> results. >>>> >>>> Example: >>>> Indexed word --> "Rahul - kumar" (quotes for clarity) >>>> If I search with the search query as below Solr gives no results >>>> Search query --> "Rahul - kumar" (quotes for clarity) >>>> >>>> However the below search query returns the results >>>> Search query --> "Rahul kumar" >>>> >>>> Can you please let me know what I am doing wrong here and what should I >>>> do to ensure the first query i.e. "Rahul - kumar" returns the documents >>>> indexed using it. >>>> >>>> Below are the analyzers I am using: >>>> Index time analyzer components: >>>> 1) <charFilter class="solr.****PatternReplaceCharFilterFactor****y" >>>> >>>> pattern="([^A-Za-z0-9 ])" replacement=""/> >>>> 2) <tokenizer class="solr.****KeywordTokenizerFactory"/> >>>> 3) <filter class="solr.****LowerCaseFilterFactory"/> >>>> 4) <filter class="solr.****WordDelimiterFilterFactory" >>>> >>>> generateWordParts="1" >>>> preserveOriginal="1"/> >>>> 5) <filter class="solr.****EdgeNGramFilterFactory" minGramSize="2" >>>> >>>> maxGramSize="50" side="front"/> >>>> 6) <filter class="solr.****EdgeNGramFilterFactory" minGramSize="2" >>>> >>>> maxGramSize="50" side="back"/> >>>> >>>> Query time analyzer components: >>>> 1) <charFilter class="solr.****PatternReplaceCharFilterFactor****y" >>>> >>>> pattern="([^A-Za-z0-9 ])" replacement=""/> >>>> 2) <tokenizer class="solr.****KeywordTokenizerFactory"/> >>>> 3) <filter class="solr.****LowerCaseFilterFactory"/> >>>> 4) <filter class="solr.****WordDelimiterFilterFactory" >>>> >>>> generateWordParts="1" >>>> preserveOriginal="1"/> >>>> >>>> >>>> Can you please let me know how I can fix this? >>>> >>>> Thanks, >>>> Prathik >>>> >>>> >>>> >>>> >>> >> >