I think I am officially tired of having to explain why Solr doesn't do what users expect for this query. I mean, I can accept that low level Lucene should work strictly on the decomposed terms of test test-or*, but is is very reasonable for users (even EXPERT users) to expect that the Solr query parser will generate what the complex phrase query parser generates.

See:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-ComplexPhraseQueryParser

Having to use a separate query parser for this obvious, common case is... absurd.

(What does Elasticsearch do for this case??)

-- Jack Krupansky

-----Original Message----- From: Erick Erickson
Sent: Tuesday, June 24, 2014 11:38 AM
To: solr-user@lucene.apache.org ; Ahmet Arslan
Subject: Re: No results for a wildcard query for text_general field in solr 4.1

Wildcards are a tough thing to get your head around. I
think my first post on the users list was titled
"I just don't get wildcards at all" or something like that...

Right, wildcards aren't tokenized. So by getting your term
through the query parsing as a single token, including the
hyphen, when the analyzer sees that it's a wildcard
it doesn't break on the hyphen. So it's looking for a single
token. And since there is not single term like
test-or123 you get no matches.

I'm afraid this is just how it works. You can do something like
replace the hyphen at the app layer. But I don't think there's
a way to do what you want OOB.

Best,
Erick

On Tue, Jun 24, 2014 at 1:55 AM, Ahmet Arslan <iori...@yahoo.com.invalid> wrote:
Hi Sven,

StandardTokenizerFactory splits it into two pieces. You can confirm this at analysis page.
If this is something you don't want, lets us know.
We can help you to create an analysis chain that suits your needs.

Ahmet


On Tuesday, June 24, 2014 10:39 AM, Sven Schönfeldt <schoenfe...@subshell.com> wrote:
Hi Erick,

that is what i did, tried that input on analysis page.

The index field splitting the value into two words: „test“ and „or123"
Now checking the query at analysis page, and there are the word ist splitting into „test“ and „or123“.

By doing the query and look into the debug result, i see that there is no splitting of words. Thats what i expect…

<str name="rawquerystring">searchField_t:test\-or123*</str>
<str name="querystring">searchField_t:test\-or123*</str>
<str name="parsedquery">searchField_t:test-or123*</str>
<str name="parsedquery_toString">searchField_t:test-or123*</str>

Without the wildcard, the word is splitting also in two parts:

<str name="rawquerystring">searchField_t:test\-or123</str>
<str name="querystring">searchField_t:test\-or123</str>
<str name="parsedquery">searchField_t:test searchField_t:or123</str>
<str name="parsedquery_toString">searchField_t:test searchField_t:or123</str>

Any idea which configuration has the responsibility for that behavior?

Thanks!





Am 23.06.2014 um 22:55 schrieb Erick Erickson <erickerick...@gmail.com>:

Well, you can do more than guess by looking at the admin/analysis page
and trying your input on the field in question. That'll show you what
actual transformations are performed.

You're probably right though. Try adding &debug=query to your URL to
see what the actual parsed query looks like and compare with the
admin/analysis page....

But yeah, it's a matter of getting all the parts (query parser and
analysis chains) to "do the right thing".

Best,
Erick

On Mon, Jun 23, 2014 at 7:30 AM, Sven Schönfeldt
<schoenfe...@subshell.com> wrote:
Hi Solr-Users,

i am trying to do a wildcard query on a dynamic textfield (_t), but don’t get the right result. The configuration for the field type is „text_general“, the default configuration:

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
     <analyzer type="index">
       <tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
       <filter class="solr.LowerCaseFilterFactory"/>
     </analyzer>
     <analyzer type="query">
       <tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
       <filter class="solr.LowerCaseFilterFactory"/>
     </analyzer>
   </fieldType>


Input for the textfield is "test-or123" and my query looks like "test\-or*“.

It seems that the input is allready split into two words: „test“ and „or123“, but that's just a guess.

Anyone who can help me, and know why i don’t find the document and whats todo to make the quert working?

Regards!





Reply via email to