Thank you for your input. Here's how the query looks with debugQuery=true:
"rawquerystring": "name:industrie-anhänger", "querystring": "name:industrie-anhänger", "parsedquery": "MultiPhraseQuery(name:"(industrie-anhang industri) (anhang industrieanhang)")", "parsedquery_toString": "name:"(industrie-anhang industri) (anhang industrieanhang)"", It looks like there are some rules applied, expressed by the braces. What's the correct interpretation of that? The default operator is OR, yet this looks like the terms inside the braces group using AND. Am 11.06.2015 12:40 schrieb Upayavira: > The next thing to do is add debugQuery=true to your URL (or enable it in > the query pane of the admin UI). Then look for the parsed query info. > > On the standard text_en field which includes an English stop word > filter, I ran a query on "Jack and Jill's House" which showed > this output: > > "rawquerystring": "text_en:(Jack and Jill's House)", "querystring": > "text_en:(Jack and Jill's House)", "parsedquery": "text_en:jack > text_en:jill text_en:hous", "parsedquery_toString": "text_en:jack > text_en:jill text_en:hous", > > You can see that the parsed query is formed *after* analysis, so you can > see exactly what is being queried for. > > Also, as a corollary to this, you can use the schema browser (or > faceting for that matter) to view what terms are being indexed, to see > if they should match. > > HTH > > Upayavira > >> Am 11.06.2015 12:00 schrieb Upayavira: > Have you used the analysis tab in the admin UI? You can type in sentences for both index and query time and see how they would be analysed by various fields/field types. Once you have got index time and query time to result in the same tokens at the end of the analysis chain, you should start seeing matches in your queries. Upayavira On Thu, Jun 11, 2015, at 10:26 AM, Thomas Michael Engelke wrote: > Hey, in german, you can string most nouns together by using hyphens, like > this: Industrie = industry Anhänger = trailer Industrie- Anhänger = trailer > for industrial use Here [1[1]], you can see me querying "Industrieanhänger" > from the "name" field (name:Industrieanhänger), to make sure the index > actually contains the word. Our data is structured that products are listed > without the hyphen. Now, customers can come around and use the hyphenated > version as a search term (i.e."industrie-anhänger"), and of course we want > them to find what they are looking for. I've set it up so that the > WordDelimiterFilterFactory uses catenateWords="1", so that these words are > catenated. An analysis of "Industrieanhänger" as index and > "industrie-anhänger" as query can be seen here [2[2]]. You can see that both > word parts are found. However, querying for "industrie- anhänger" does not > yield results, only when the hyphen is removed, as you can see here [3[3]]. > I'm not sure how to proceed from here, as the results of the analysis have so far always lined up with what I could see when querying. Here's the schema definition for "text", the field type for the "name" field: <fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" splitOnNumerics="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="0" catenateAll="0" preserveOriginal="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="dictionary.txt" minWordSize="5" minSubwordSize="3" maxSubwordSize="30" onlyLongestMatch="false"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true" enablePositionIncrements="true" format="snowball"/> <filter class="solr.GermanNormalizationFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="German2" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.WordDelimiterFilterFactory" splitOnCaseChange="1" splitOnNumerics="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="0" catenateAll="0" preserveOriginal="1"/> <filter class="solr.LowerCaseFilterFactory"/> <!-- <filter class="solr.DictionaryCompoundWordTokenFilterFactory" dictionary="dictionary.txt" minWordSize="5" minSubwordSize="3" maxSubwordSize="30" onlyLongestMatch="false"/> --> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true" enablePositionIncrements="true" format="snowball"/> <filter class="solr.GermanNormalizationFilterFactory"/> <filter class="solr.SnowballPorterFilterFactory" language="German2" protected="protwords.txt"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> I've also thought it might be a problem with URL encoding not encoding the hyphen, but replacing it with %2D didn't change the outcome (and was probably wrong anyway). Any help is greatly appreciated. Links: ------ [1] http://imgur.com/2oEC5vz [1] [2] http://i.imgur.com/H0AhEsF.png [2] [3] http://imgur.com/dzmMe7t [3] Links: 1. http://imgur.com/2oEC5vz [1] 2. http://i.imgur.com/H0AhEsF.png [2] 3. http://imgur.com/dzmMe7t [3] Links: ------ [1] http://imgur.com/2oEC5vz [2] http://i.imgur.com/H0AhEsF.png [3] http://imgur.com/dzmMe7t