Hi Chantal, Please see https://issues.apache.org/jira/browse/LUCENE-7148
ahmet On Saturday, July 16, 2016 3:48 PM, CA <c...@it-agenten.com> wrote: Hello all, our index contains product offers from online shops. The fields we are indexing have all rather short values: the name of the product, the brand, the price, category and some fields containing identifiers like ASIN, GTIN etc. if available. We do not index the description texts. The regular user search uses the „edismax“ and queries the above mentioned fields which works fine for short inputs like „iphone 6s“. Now, we have to support a different kind of query which won’t be user input but using complete product names like those we store ourselves but not necessarily names that are actually part of our data set. This means that the input query can be relatively long. The output of the query is planned to consist of a More Like This list. So, in effect the query should have at least one hit that is hopefully close enough, and the actual result will be a More Like This list sourced by that one hit. I have tried to get this to work based on the „edismax“ setup for the regular user search but this does not work well when the input is longer than what we have stored as similar product. Here is an example: ## Step 1: Input (not stored in our index): "Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger“ (input to edismax without quotes) (a) This input does not produce any results with our current edismax config (details at the end of the e-mail). (b) When I relax the „mm“ parameter to "2<-1 5<-30% 8<10%“, I get one hit with the following name: => "Braun Series Clean&Renew CCR2 Cleansing Dock Cartridges Lemonfresh Formula Cartrige (Compatible with Series 7,5,3) 2 pc“ ## Step 2: When I reduce the input manually to the following: "Braun Series 9 9095CC Men's Electric Shaver“ The above shortened input returns a very good hit with the name: => "Braun 9095cc Series 9 Electric Shaver" My Question: Is it possible, and if so - how, to have the query input: "Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger“ (input to edismax without quotes) return (also or only) the hit with the name: => "Braun 9095cc Series 9 Electric Shaver" and maybe even give it a high score. I have tried to use „explainOther“ (output see at the end of this e-mail) but I have a really hard time reading it. In some cases, I’m not even able to understand where one clause ends and the next one starts (is it possible to have it returned in several lines?). Maybe someone can give me a hint on how to use that output or knows of some documentation on the i-net that explains how to make good use of it? Looking at the input string, I was wondering: (A) Is relaxing the „mm“ parameter really the way to go? (B) Should I create another name field in schema.xml that basically has a different query chain, discarding the last words of a query input if too long. Or maybe it’s possible to make tokens in the first part of the input more „important“ (though I’m not sure this is generally the case)? Should I remove some of the filters from the query chain (like the ShingleFilter)? (C) Can I configure something else or should I not use edismax for this? Thank you for reading this, any insight is highly appreciated! Chantal *** Following are the field configuration for the name field, the configuration of the edismax handler, and the output of „explainOther“ for the above example. SCHEMA.XML — „name" field: <field name="name" type="name_split" indexed="true" stored="true" required="true" multiValued="false“/> <fieldType name="name_split" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/> <filter class="solr.LengthFilterFactory" min="2" max="255"/> <filter class="org.apache.lucene.analysis.icu.ICUFoldingFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> SOLRCONFIG.XML — MLT/EDISMAX <requestHandler name="/mlt" class="solr.SearchHandler"> <lst name="defaults"> <str name="echoParams">all</str> <str name="defType">edismax</str> <str name="q.alt">*:*</str> <str name="fl">id,brand,name,price,score,popularity</str> <str name="tie">0.1</str> <str name="qf">brand_split^6 name</str> <str name="pf">brand_split^10 name^10</str> <str name="mm">2<-1 5<-30% 8<10%</str> <int name="qs">10</int> <int name="ps">20</int> <str name="wt">xml</str> <str name="mlt">false</str> <str name="mlt.qf">brand_split^6 name price</str> <str name="mlt.fl">brand_split name price</str> <str name="mlt.interestingTerms">details</str> </lst> </requestHandler> DEBUG — EXPLAIN OTHER The „other“ document with id:2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b has the title "Braun 9095cc Series 9 Electric Shaver" <response> <lst name="responseHeader"> <lst name="params“><!-- shortened for better overview --> <str name="defType">edismax</str> <str name="qf">brand_split^6 name</str> <str name="pf">brand_split^10 name^10</str> <str name="mm">2<-1 5<-30% 8<10%</str> <str name="qs">10</str> <str name="ps">20</str> <str name="tie">0.1</str> <str name="q"> Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger </str> <str name="explainOther">id:2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b</str> </lst> </lst> <result name="response" numFound="1" start="0" maxScore="97.122955"> <doc> <str name="name"> Braun Series Clean&Renew CCR2 Cleansing Dock Cartridges Lemonfresh Formula Cartrige (Compatible with Series 7,5,3) 2 pc </str> <str name="id">773d4bdb341c4dc438c481ac80de5abde08d85bf</str> <str name="brand">Braun</str> <float name="score">97.122955</float> </doc> </result> <lst name="debug"> <str name="rawquerystring"> Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger </str> <str name="querystring"> Braun Series 9 9095CC Men's Electric Shaver Wet/Dry with Clean and Renew Charger </str> <str name="parsedquery"> (+(DisjunctionMaxQuery((name:braun | (brand_split:braun)^6.0)~0.1) DisjunctionMaxQuery((name:series | (brand_split:series)^6.0)~0.1) DisjunctionMaxQuery((name:"(9095cc 9095) cc"~10 | (brand_split:"(9095cc 9095) cc"~10)^6.0)~0.1) DisjunctionMaxQuery((Synonym(name:men name:men's) | (Synonym(brand_split:men brand_split:men's))^6.0)~0.1) DisjunctionMaxQuery((name:electric | (brand_split:electric)^6.0)~0.1) DisjunctionMaxQuery((name:shaver | (brand_split:shaver)^6.0)~0.1) DisjunctionMaxQuery((name:"(wet/dry wet wetdry) dry"~10 | (brand_split:"(wet/dry wet wetdry) dry"~10)^6.0)~0.1) DisjunctionMaxQuery((name:with | (brand_split:with)^6.0)~0.1) +DisjunctionMaxQuery((name:clean | (brand_split:clean)^6.0)~0.1) +DisjunctionMaxQuery((name:renew | (brand_split:renew)^6.0)~0.1) DisjunctionMaxQuery((name:charger | (brand_split:charger)^6.0)~0.1)) DisjunctionMaxQuery(((brand_split:"(braun braun series braunseries) series (series series 9 series9) ? (9 9095cc 99095 99095cc) 9095 cc (9095cc 9095) (cc 9095cc men's 9095 9095ccmen) (cc ccmen) men (men's men men's electric menelectric) electric (electric electric shaver electricshaver) shaver (shaver shaver wet/dry shaverwetdry) wet dry (wet/dry wet wetdry) (dry wet/dry with wet wetdrywith) dry with (with with clean withclean) clean (clean clean and cleanand) and (and and renew andrenew) renew (renew renew charger renewcharger) charger charger"~20)^10.0 | (name:"(braun braun series braunseries) series (series series 9 series9) ? (9 9095cc 99095 99095cc) 9095 cc (9095cc 9095) (cc 9095cc men's 9095 9095ccmen) (cc ccmen) men (men's men men's electric menelectric) electric (electric electric shaver electricshaver) shaver (shaver shaver wet/dry shaverwetdry) wet dry (wet/dry wet wetdry) (dry wet/dry with wet wetdrywith) dry with (with with clean withclean) clean (clean clean and cleanand) and (and and renew andrenew) renew (renew renew charger renewcharger) charger charger"~20)^10.0)~0.1))/no_coord </str> <str name="parsedquery_toString"> +((name:braun | (brand_split:braun)^6.0)~0.1 (name:series | (brand_split:series)^6.0)~0.1 (name:"(9095cc 9095) cc"~10 | (brand_split:"(9095cc 9095) cc"~10)^6.0)~0.1 (Synonym(name:men name:men's) | (Synonym(brand_split:men brand_split:men's))^6.0)~0.1 (name:electric | (brand_split:electric)^6.0)~0.1 (name:shaver | (brand_split:shaver)^6.0)~0.1 (name:"(wet/dry wet wetdry) dry"~10 | (brand_split:"(wet/dry wet wetdry) dry"~10)^6.0)~0.1 (name:with | (brand_split:with)^6.0)~0.1 +(name:clean | (brand_split:clean)^6.0)~0.1 +(name:renew | (brand_split:renew)^6.0)~0.1 (name:charger | (brand_split:charger)^6.0)~0.1) ((brand_split:"(braun braun series braunseries) series (series series 9 series9) ? (9 9095cc 99095 99095cc) 9095 cc (9095cc 9095) (cc 9095cc men's 9095 9095ccmen) (cc ccmen) men (men's men men's electric menelectric) electric (electric electric shaver electricshaver) shaver (shaver shaver wet/dry shaverwetdry) wet dry (wet/dry wet wetdry) (dry wet/dry with wet wetdrywith) dry with (with with clean withclean) clean (clean clean and cleanand) and (and and renew andrenew) renew (renew renew charger renewcharger) charger charger"~20)^10.0 | (name:"(braun braun series braunseries) series (series series 9 series9) ? (9 9095cc 99095 99095cc) 9095 cc (9095cc 9095) (cc 9095cc men's 9095 9095ccmen) (cc ccmen) men (men's men men's electric menelectric) electric (electric electric shaver electricshaver) shaver (shaver shaver wet/dry shaverwetdry) wet dry (wet/dry wet wetdry) (dry wet/dry with wet wetdrywith) dry with (with with clean withclean) clean (clean clean and cleanand) and (and and renew andrenew) renew (renew renew charger renewcharger) charger charger"~20)^10.0)~0.1 </str> <lst name="explain"> <str name="773d4bdb341c4dc438c481ac80de5abde08d85bf"> 97.122955 = sum of: 97.122955 = sum of: 61.102264 = max plus 0.1 times others of: 6.80276 = weight(name:braun in 477314) [], result of: 6.80276 = score(doc=477314,freq=1.0 = termFreq=1.0 ), product of: 8.171213 = idf(docFreq=324, docCount=1147961) 0.8325276 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 40.96 = fieldLength 60.42199 = weight(brand_split:braun in 477314) [], result of: 60.42199 = score(doc=477314,freq=1.0 = termFreq=1.0 ), product of: 6.0 = boost 8.11682 = idf(docFreq=305, docCount=1023531) 1.2406745 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 1.9018271 = avgFieldLength 1.0 = fieldLength 8.663414 = max plus 0.1 times others of: 8.663414 = weight(name:series in 477314) [], result of: 8.663414 = score(doc=477314,freq=4.0 = termFreq=4.0 ), product of: 5.5549765 = idf(docFreq=4440, docCount=1147961) 1.5595771 = tfNorm, computed from: 4.0 = termFreq=4.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 40.96 = fieldLength 4.0527744 = max plus 0.1 times others of: 4.0527744 = weight(name:with in 477314) [], result of: 4.0527744 = score(doc=477314,freq=2.0 = termFreq=2.0 ), product of: 3.355103 = idf(docFreq=40070, docCount=1147961) 1.2079433 = tfNorm, computed from: 2.0 = termFreq=2.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 40.96 = fieldLength 8.542337 = max plus 0.1 times others of: 8.542337 = weight(name:clean in 477314) [], result of: 8.542337 = score(doc=477314,freq=3.0 = termFreq=3.0 ), product of: 6.008829 = idf(docFreq=2820, docCount=1147961) 1.421631 = tfNorm, computed from: 3.0 = termFreq=3.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 40.96 = fieldLength 14.762168 = max plus 0.1 times others of: 14.762168 = weight(name:renew in 477314) [], result of: 14.762168 = score(doc=477314,freq=3.0 = termFreq=3.0 ), product of: 10.383966 = idf(docFreq=35, docCount=1147961) 1.421631 = tfNorm, computed from: 3.0 = termFreq=3.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 40.96 = fieldLength </str> </lst> <str name="otherQuery">id:2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b</str> <lst name="explainOther"> <str name="2d617cee76f5ed8598cf7db1b44a40de6f3c8c9b"> 0.0 = Failure to meet condition(s) of required/prohibited clause(s) 0.0 = no match on required clause ((name:braun | (brand_split:braun)^6.0)~0.1 (name:series | (brand_split:series)^6.0)~0.1 (name:"(9095cc 9095) cc"~10 | (brand_split:"(9095cc 9095) cc"~10)^6.0)~0.1 (Synonym(name:men name:men's) | (Synonym(brand_split:men brand_split:men's))^6.0)~0.1 (name:electric | (brand_split:electric)^6.0)~0.1 (name:shaver | (brand_split:shaver)^6.0)~0.1 (name:"(wet/dry wet wetdry) dry"~10 | (brand_split:"(wet/dry wet wetdry) dry"~10)^6.0)~0.1 (name:with | (brand_split:with)^6.0)~0.1 +(name:clean | (brand_split:clean)^6.0)~0.1 +(name:renew | (brand_split:renew)^6.0)~0.1 (name:charger | (brand_split:charger)^6.0)~0.1) 0.0 = Failure to meet condition(s) of required/prohibited clause(s) 61.40732 = max plus 0.1 times others of: 9.853278 = weight(name:braun in 113560) [], result of: 9.853278 = score(doc=113560,freq=1.0 = termFreq=1.0 ), product of: 8.171213 = idf(docFreq=324, docCount=1147961) 1.2058525 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 16.0 = fieldLength 60.42199 = weight(brand_split:braun in 113560) [], result of: 60.42199 = score(doc=113560,freq=1.0 = termFreq=1.0 ), product of: 6.0 = boost 8.11682 = idf(docFreq=305, docCount=1023531) 1.2406745 = tfNorm, computed from: 1.0 = termFreq=1.0 1.2 = parameter k1 0.75 = parameter b 1.9018271 = avgFieldLength 1.0 = fieldLength 8.6537285 = max plus 0.1 times others of: 8.6537285 = weight(name:series in 113560) [], result of: 8.6537285 = score(doc=113560,freq=2.0 = termFreq=2.0 ), product of: 5.5549765 = idf(docFreq=4440, docCount=1147961) 1.5578334 = tfNorm, computed from: 2.0 = termFreq=2.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 16.0 = fieldLength 52.67099 = max plus 0.1 times others of: 52.67099 = weight(name:"(9095cc 9095) cc"~10 in 113560) [], result of: 52.67099 = score(doc=113560,freq=3.0 = phraseFreq=3.0 ), product of: 30.520727 = idf(), sum of: 13.037208 = idf(docFreq=2, docCount=1147961) 10.796498 = idf(docFreq=23, docCount=1147961) 6.687021 = idf(docFreq=1431, docCount=1147961) 1.725745 = tfNorm, computed from: 3.0 = phraseFreq=3.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 16.0 = fieldLength 8.592838 = max plus 0.1 times others of: 8.592838 = weight(name:electric in 113560) [], result of: 8.592838 = score(doc=113560,freq=2.0 = termFreq=2.0 ), product of: 5.51589 = idf(docFreq=4617, docCount=1147961) 1.5578334 = tfNorm, computed from: 2.0 = termFreq=2.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 16.0 = fieldLength 13.669254 = max plus 0.1 times others of: 13.669254 = weight(name:shaver in 113560) [], result of: 13.669254 = score(doc=113560,freq=2.0 = termFreq=2.0 ), product of: 8.7745285 = idf(docFreq=177, docCount=1147961) 1.5578334 = tfNorm, computed from: 2.0 = termFreq=2.0 1.2 = parameter k1 0.75 = parameter b 27.458092 = avgFieldLength 16.0 = fieldLength 0.0 = no match on required clause ((name:clean | (brand_split:clean)^6.0)~0.1) 0.0 = No matching clause 0.0 = no match on required clause ((name:renew | (brand_split:renew)^6.0)~0.1) 0.0 = No matching clause </str> </lst> </lst> </response>