You could pre-process your queries to convert hyphen and other special characters to spaces.

-- Jack Krupansky

-----Original Message----- From: Alireza Salimi
Sent: Wednesday, July 04, 2012 12:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens

ok, so how can I prevent this behavior to happen?
As you can see the parsed query is very different in these two cases.

On Wed, Jul 4, 2012 at 1:37 PM, Jack Krupansky <j...@basetechnology.com>wrote:

There is one other detail that should clarify the situation. At query
time, the query parser itself is breaking your query into space-delimited
terms, and only calling the analyzer for each of those terms, each of which
will be treated as if a quoted phrase. So it doesn't matter whether it is
the standard analyzer or word delimiter filter or other filter that is
breaking up the compound term.

And the default "query operator" only applies to the "terms" as the query
parser parsed them, not for the sub-terms of a compound term like CD-ROM or
gb-mb.


-- Jack Krupansky

-----Original Message----- From: Alireza Salimi
Sent: Wednesday, July 04, 2012 12:05 PM

To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens

Wow, I didn't know that. Is there a way to disable this feature? I mean, is
it something coming from the Analyzer?

On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky <j...@basetechnology.com>*
*wrote:

 Terms with embedded special characters are treated as phrases with spaces
in place of the special characters. So, "gb-mb" is treated as if you had
enclosed the term in quotes.

-- Jack Krupansky
-----Original Message----- From: Alireza Salimi
Sent: Wednesday, July 04, 2012 6:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens


Hi,

Does anybody know why hyphen '-' and q.op=AND causes such a big difference
between the two queries? I thought hyphens are removed by
StandardTokenizer
which means theoretically the two queries should be the same!

Thanks

On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi <alireza.sal...@gmail.com
>*
*wrote:

 Hi,


I'm not sure if anybody has experienced this behavior before or not.
I noticed that 'hyphen' plays a very important role here.
I used Solr's default example directory.

http://localhost:8983/solr/****select/?q=name:(gb-mb)&**<http://localhost:8983/solr/**select/?q=name:(gb-mb)&**>
version=2.2&start=0&rows=10&****indent=on&debugQuery=on&**
indent=on&wt=json&q.op=AND<htt**p://localhost:8983/solr/**
select/?q=name:(gb-mb)&**version=2.2&start=0&rows=10&**
indent=on&debugQuery=on&**indent=on&wt=json&q.op=AND<http://localhost:8983/solr/select/?q=name:(gb-mb)&version=2.2&start=0&rows=10&indent=on&debugQuery=on&indent=on&wt=json&q.op=AND>
>

results in  "parsedquery":"+name:gb +name:gib +name:gigabyte
+name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes",

While searching http://localhost:8984/solr/**
select/?q=name:(gbmb)&version=****2.2&start=0&rows=10&indent=**on&**
debugQuery=on&indent=on&wt=****json&q.op=AND<http://**
localhost:8984/solr/select/?q=**name:(gbmb)&version=2.2&start=**
0&rows=10&indent=on&**debugQuery=on&indent=on&wt=**json&q.op=AND<http://localhost:8984/solr/select/?q=name:(gbmb)&version=2.2&start=0&rows=10&indent=on&debugQuery=on&indent=on&wt=json&q.op=AND>
>

results in "parsedquery":"+(name:gb name:gib name:gigabyte
name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes)",

If you notice to the first query - with hyphens - you can see that the
results of
parsing is totally different. I know that hyphens are special characters
in Solr,
but there's no way that the first query returns any entry because it's
asking for
ALL synonyms.

Am I missing something here?

Thanks


--
Alireza Salimi
Java EE Developer





--
Alireza Salimi
Java EE Developer




--
Alireza Salimi
Java EE Developer




--
Alireza Salimi
Java EE Developer

Reply via email to