Maybe rerankqparserplugin? On Aug 12, 2016 11:54, "John Bickerstaff" <j...@johnbickerstaff.com> wrote:
> @Hossman -- thanks again. > > I've made the following change and so far things look good. I couldn't see > debug or find results for what I put in for $func, so I just removed it, > but making modifications as you suggested appears to be working. > > Including the actual line from my endpoint XML in case this thread helps > someone else... > > <str name="q">{!boost defType=synonym_edismax qf='title' synonyms='true' > synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq='' > v=$q}</str> > > On Fri, Aug 12, 2016 at 12:09 PM, John Bickerstaff < > j...@johnbickerstaff.com > > wrote: > > > Thanks! I'll check it out. > > > > On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar <susheel2...@gmail.com> > > wrote: > > > >> Not exactly sure what you are looking from chaining the results but > >> similar > >> functionality is available in Streaming expressions where result of > inner > >> expressions are passed to outer expressions and so on > >> https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions > >> > >> HTH > >> Susheel > >> > >> On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff < > >> j...@johnbickerstaff.com> > >> wrote: > >> > >> > Hossman - many thanks again for your comprehensive and very helpful > >> answer! > >> > > >> > All, > >> > > >> > I am (possibly mis-remembering) reading something about being able to > >> pass > >> > the results of one query to another query... Essentially "chaining" > >> result > >> > sets. > >> > > >> > I have looked in docs and can't find anything on a quick search -- I > may > >> > have been reading about the Re-Ranking feature, which doesn't help me > (I > >> > know because I just tried and it seems to return all results anyway, > >> just > >> > re-ranking the number specified in the reRankDocs flag...) > >> > > >> > Is there a way to (cleanly) send the results of one query to another > >> query > >> > for further processing? Essentially, pass ONLY the results (including > >> an > >> > empty set of results) to another query for processing? > >> > > >> > thanks... > >> > > >> > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff < > >> > j...@johnbickerstaff.com> > >> > wrote: > >> > > >> > > Thanks! > >> > > > >> > > To answer your questions, while I digest the rest of that > >> information... > >> > > > >> > > I'm using the hon-lucene-synonyms.5.0.4.jar from here: > >> > > https://github.com/healthonnet/hon-lucene-synonyms > >> > > > >> > > The config looks like this - and IIRC, is simply a copy from the > >> > > recommended cofig on the site mentioned above. > >> > > > >> > > <queryParser name="synonym_edismax" class="com.github.healthonnet. > >> > search. > >> > > SynonymExpandingExtendedDismaxQParserPlugin"> > >> > > <!-- You can define more than one synonym analyzer in the > >> following > >> > > list. > >> > > For example, you might have one set of synonyms for > English, > >> one > >> > > for French, > >> > > one for Spanish, etc. > >> > > --> > >> > > <lst name="synonymAnalyzers"> > >> > > <!-- Name your analyzer something useful, e.g. "analyzer_en", > >> > > "analyzer_fr", "analyzer_es", etc. > >> > > If you only have one, the name doesn't matter (hence > >> > > "myCoolAnalyzer"). > >> > > --> > >> > > <lst name="myCoolAnalyzer"> > >> > > <!-- We recommend a PatternTokenizerFactory that tokenizes > >> based > >> > > on whitespace and quotes. > >> > > This seems to work best with most people's synonym > files. > >> > > For details, read the discussion here: > >> > > http://github.com/healthonnet/hon-lucene-synonyms/issues/26 > >> > > --> > >> > > <lst name="tokenizer"> > >> > > <str name="class">solr.PatternTokenizerFactory</str> > >> > > <str name="pattern"><![CDATA[(?:\s|\")+]]></str> > >> > > </lst> > >> > > <!-- The ShingleFilterFactory outputs synonyms of multiple > >> token > >> > > lengths (e.g. unigrams, bigrams, trigrams, etc.). > >> > > The default here is to assume you don't have any > synonyms > >> > > longer than 4 tokens. > >> > > You can tweak this depending on what your synonyms look > >> > like. > >> > > E.g. if you only have unigrams, you can remove > >> > > it entirely, and if your synonyms are up to 7 tokens in > >> > > length, you should set the maxShingleSize to 7. > >> > > --> > >> > > <lst name="filter"> > >> > > <str name="class">solr.ShingleFilterFactory</str> > >> > > <str name="outputUnigramsIfNoShingles">true</str> > >> > > <str name="outputUnigrams">true</str> > >> > > <str name="minShingleSize">2</str> > >> > > <str name="maxShingleSize">4</str> > >> > > </lst> > >> > > <!-- This is where you set your synonym file. For the unit > >> tests > >> > > and "Getting Started" examples, we use example_synonym_file.txt. > >> > > This plugin will work best if you keep expand set to > true > >> > and > >> > > have all your synonyms comma-separated (rather than =>-separated). > >> > > --> > >> > > <lst name="filter"> > >> > > <str name="class">solr.SynonymFilterFactory</str> > >> > > <str name="tokenizerFactory">solr. > >> > KeywordTokenizerFactory</str> > >> > > <str name="synonyms">example_synonym_file.txt</str> > >> > > <str name="expand">true</str> > >> > > <str name="ignoreCase">true</str> > >> > > </lst> > >> > > </lst> > >> > > </lst> > >> > > </queryParser> > >> > > > >> > > > >> > > > >> > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter < > >> > hossman_luc...@fucit.org > >> > > > wrote: > >> > > > >> > >> > >> > >> : First let me say that this is very possibly the "x - y problem" > so > >> let > >> > >> me > >> > >> : state up front what my ultimate need is -- then I'll ask about > the > >> > >> thing I > >> > >> : imagine might help... which, of course, is heavily biased in the > >> > >> direction > >> > >> : of my experience coding Java and writing SQL... > >> > >> > >> > >> Thank you so much for asking your question this way! > >> > >> > >> > >> Right off the bat, the background you've provided seems > supicious... > >> > >> > >> > >> : I have a piece of a query that calculates a score based on a > >> > "weighting" > >> > >> ... > >> > >> : The specific line is this: > >> > >> : <str name="bf">product(field(category_weight),20)</str> > >> > >> : > >> > >> : What I just realized is that when I query Solr for a string that > >> has > >> > NO > >> > >> : matches in the entire corpus, I still get a slew of results > because > >> > >> EVERY > >> > >> : doc has the weighting value in the category_weight field - and > >> > therefore > >> > >> : every doc gets some score. > >> > >> > >> > >> ...that is *NOT* how dismax and edisamx normally work. > >> > >> > >> > >> While both the "bf" abd "bq" params result in "additive" boosting, > >> and > >> > the > >> > >> implementation of that "additive boost" comes from adding new > >> optional > >> > >> clauses to the top level BooleanQuery that is executed, that only > >> > happens > >> > >> after the "main" query (from your "q" param) is added to that top > >> level > >> > >> BooleanQuery as a "mandaory" clause. > >> > >> > >> > >> So, for example, "bf=true()" and "bq=*:*" should match & boost > every > >> > doc, > >> > >> but with the techprducts configs/data these requests still don't > >> match > >> > >> anything... > >> > >> > >> > >> /select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query > >> > >> /select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query > >> > >> > >> > >> ...and if you look at the debug output, the parsed queries shows > that > >> > the > >> > >> "bogus" part of the query is mandatory... > >> > >> > >> > >> +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*) > >> > >> FunctionQuery(const(true)) > >> > >> > >> > >> (i didn't use "pf" in that example, but the effect is the same, the > >> "pf" > >> > >> based clauses are optional, while the "qf" based clauses are > >> mandatory) > >> > >> > >> > >> If you compare that example to your debug output, you'll notice a > >> > >> difference in structure -- it's a bit hard to see in your example, > >> but > >> > if > >> > >> you simplify your qf, pf, and q fields it should be more obvious, > but > >> > >> AFAICT the "main" parts of your query are getting wrapped in an > extra > >> > >> layer of parents (ie: an extra BooleanQuery) which is *not* > >> mandatory in > >> > >> the top level query ... i don't see *any* mandatory clauses in your > >> top > >> > >> level BooleanQuery, which is why any match on a bf or bq function > is > >> > >> enough to cause a document to match. > >> > >> > >> > >> I suspect the reason your parsed query structure is so diff has to > do > >> > with > >> > >> this... > >> > >> > >> > >> : <str name="defType">synonym_edismax</str>> > >> > >> > >> > >> > >> > >> 1) how exactly is "synonym_edismax" defined in your solrconfig.xml? > >> > >> 2) what QParserPlugin are you using to implement that? > >> > >> > >> > >> I suspect whatever QParserPlugin you are using has a bug in it :) > >> > >> > >> > >> > >> > >> If you can't fix the bug, one possibile workaround would be to > >> abandon > >> > bf > >> > >> and bq params completely, and instead wrap the query it produces in > >> in a > >> > >> {!boost} parser with whatever function you want (using functions > like > >> > >> sum() or prod() to combine multiple functions, and query() to > >> > incorporate > >> > >> your current bq param). Doing this will require chanign how you > >> specify > >> > >> you input (example below) and it will result in *multiplicitive* > >> boosts > >> > -- > >> > >> so your scores will be much diff, and you will likely have to > adjust > >> > your > >> > >> constants, but: 1) multiplicitive boosts are almost always what > >> people > >> > >> *really* want anyway; 2) it will ensure the boosts are only applied > >> for > >> > >> things matching your main query, no matter how that query parser > >> works > >> > or > >> > >> what bugs it has. > >> > >> > >> > >> Example of using {!boost} to wrap an arbitrary other parser... > >> > >> > >> > >> instead of... > >> > >> defType=foofoo > >> > >> q=barbarbar > >> > >> > >> > >> use... > >> > >> q={!boost b=$func defType=foofoo v=$qq} > >> > >> qq=barbarbar > >> > >> func=sum(something,somethingelse) > >> > >> > >> > >> https://cwiki.apache.org/confluence/display/solr/Other+Parsers > >> > >> https://cwiki.apache.org/confluence/display/solr/Function+Queries > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> : > >> > >> : What I would like is to return zero results if there is no match > >> for > >> > the > >> > >> : querystring. My collection is small enough that I don't care if > >> the > >> > >> actual > >> > >> : calculation runs on each doc (although that's wasteful) -- I just > >> > don't > >> > >> : want to see results come back for zero matches to the querystring > >> > >> : > >> > >> : (The /select endpoint does this of course, but my custom endpoint > >> > >> includes > >> > >> : this "weighting" piece and therefore returns every doc in the > >> corpus > >> > >> : because they all have the weighting. > >> > >> : > >> > >> : ==================== > >> > >> : Enter my imagined solution... The potential X-Y problem... > >> > >> : ==================== > >> > >> : > >> > >> : So - given that I come from a programming background, I > immediately > >> > >> start > >> > >> : thinking of an if statement ... > >> > >> : > >> > >> : if(some_score_for_the_primary_search_string) { > >> > >> : run_the_category_weight_calculation; > >> > >> : } else { > >> > >> : do_NOT_run_category_weight_calc; > >> > >> : } > >> > >> : > >> > >> : > >> > >> : Another way of thinking of it would be something like the "WHERE" > >> > >> clause in > >> > >> : SQL... > >> > >> : > >> > >> : run_category_weight_calculation WHERE "searchstring" is found > in > >> the > >> > >> : document, not otherwise. > >> > >> : > >> > >> : I'm aware that things could be handled in the client-side of my > web > >> > app, > >> > >> : but if possible, I'd like the interface to SOLR to be as clean as > >> > >> possible, > >> > >> : and massage incoming SOLR data as little as possible. > >> > >> : > >> > >> : In other words, do NOT return any docs if the querystring (and > any > >> > >> : synonyms) match zero docs. > >> > >> : > >> > >> : Here is the endpoint XML for the query. I've highlighted the > >> specific > >> > >> line > >> > >> : that is causing the unintended results... > >> > >> : > >> > >> : > >> > >> : <requestHandler name="/foo" class="solr.SearchHandler"> > >> > >> : <!-- default values for query parameters can be specified, > >> these > >> > >> : will be overridden by parameters in the request > >> > >> : --> > >> > >> : <lst name="defaults"> > >> > >> : <str name="echoParams">all</str> > >> > >> : <int name="rows">20</int> > >> > >> : <!-- Query settings --> > >> > >> : <str name="df">text</str> > >> > >> : <!-- <str name="df">title</str> --> > >> > >> : <str name="defType">synonym_edismax</str>> > >> > >> : <str name="synonyms">true</str> > >> > >> : <!-- The line below balances out the weighting of exact > >> matches to > >> > >> the > >> > >> : synonym phrase entered by the user > >> > >> : with the category_weight calculation and the titleQuery > >> calc. > >> > >> : These numbers exist in a balance and > >> > >> : if one is raised or lowered, the others (probably) need > to > >> > >> change > >> > >> : as well. It may be better to go with decimals > >> > >> : for all of them... .4 instead of 4 and 2 instead of 20 > and > >> > 2.5 > >> > >> : instead of 25. > >> > >> : In the end, I'm not sure it really matters, but don't > >> change > >> > >> one > >> > >> : without changing the others > >> > >> : unless you've tested and are sure you want the results > >> --> > >> > >> : <float name="synonyms.originalBoost">1.5</float> > >> > >> : <float name="synonyms.synonymBoost">1.1</float> > >> > >> : <str name="mm">75%</str> > >> > >> : <str name="q.alt">*:*</str> > >> > >> : <str name="rows">20</str> > >> > >> : <str name="fq">meta_doc_type:chapterDoc</str> > >> > >> : <str name="bq">{!synonym_edismax qf='title' > synonyms='true' > >> > >> : synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' > >> bq='' > >> > >> : v=$q}</str> > >> > >> : <str name="fl">id category_weight title category_ss score > >> > >> : contentType</str> > >> > >> : <str name="titleQuery">{!edismax qf='title' bf='' bq='' > >> > >> v=$q}</str> > >> > >> : ===================================================== > >> > >> : *<str name="bf">product(field(category_weight),20)</str>* > >> > >> : ===================================================== > >> > >> : <str name="bf">product(query($titleQuery),4)</str> > >> > >> : <str name="qf">text contentType^1000</str> > >> > >> : <str name="wt">python</str> > >> > >> : <str name="debug">true</str> > >> > >> : <str name="debug.explain.structured">true</str> > >> > >> : <str name="indent">true</str> > >> > >> : <str name="echoParams">all</str> > >> > >> : </lst> > >> > >> : </requestHandler> > >> > >> : > >> > >> : And here is the debug output for a query. (This was a test for > >> > >> synonyms, > >> > >> : which you'll see in the output.) The original query string was, > of > >> > >> : course, "μ-heavy > >> > >> : chain disease" > >> > >> : > >> > >> : You'll note that although there is no score in the first doc > >> explain > >> > for > >> > >> : the actual querystring, the highlighted section does get a score > >> for > >> > >> : product(double(category_weight)=1.5,const(20)) > >> > >> : > >> > >> : ... which is the thing that is currently causing all the docs in > >> the > >> > >> : collection to "match" even though the querystring is not in any > of > >> > them. > >> > >> : > >> > >> : "debug":{ "rawquerystring":"\"μ-heavy chain disease\"", > >> > >> : "querystring":"\"μ-heavy > >> > >> : chain disease\"", "parsedquery":"(DisjunctionMaxQuery((text:\"μ > >> heavy > >> > >> chain > >> > >> : disease\" | (contentType:\"μ heavy chain disease\")^1000.0))^1.5 > >> > >> : ((+DisjunctionMaxQuery((text:\"mu heavy chain disease\" | > >> > >> (contentType:\"mu > >> > >> : heavy chain disease\")^1000.0)))/no_coord^1.1) > >> > >> : ((+DisjunctionMaxQuery((text:\"μ hcd\" | (contentType:\"μ > >> > >> : hcd\")^1000.0)))/no_coord^1.1) ((+DisjunctionMaxQuery((text:\"μ > >> heavy > >> > >> chain > >> > >> : disease\" | (contentType:\"μ heavy chain > >> > disease\")^1000.0)))/no_coord^ > >> > >> 1.1) > >> > >> : ((+DisjunctionMaxQuery((text:\"μ hcd\" | (contentType:\"μ > >> > >> : hcd\")^1000.0)))/no_coord^1.1)) ((DisjunctionMaxQuery((title:\"μ > >> > heavy > >> > >> : chain disease\"))^2.5 ((+DisjunctionMaxQuery((title:\"mu heavy > >> chain > >> > >> : disease\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ > >> > >> : hcd\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ heavy > >> chain > >> > >> : disease\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ > >> > >> : hcd\")))/no_coord^1.1))) > >> > >> : FunctionQuery(product(double(category_weight),const(20))) > >> > >> : FunctionQuery(product(query(+(title:\"μ heavy chain > >> > >> : disease\"),def=0.0),const(4)))", "parsedquery_toString":"(((tex > >> t:\"μ > >> > >> heavy > >> > >> : chain disease\" | (contentType:\"μ heavy chain > >> disease\")^1000.0))^1.5 > >> > >> : ((+(text:\"mu heavy chain disease\" | (contentType:\"mu heavy > chain > >> > >> : disease\")^1000.0))^1.1) ((+(text:\"μ hcd\" | (contentType:\"μ > >> > >> : hcd\")^1000.0))^1.1) ((+(text:\"μ heavy chain disease\" | > >> > >> (contentType:\"μ > >> > >> : heavy chain disease\")^1000.0))^1.1) ((+(text:\"μ hcd\" | > >> > >> (contentType:\"μ > >> > >> : hcd\")^1000.0))^1.1)) ((((title:\"μ heavy chain disease\"))^2.5 > >> > >> : ((+(title:\"mu heavy chain disease\"))^1.1) ((+(title:\"μ > >> hcd\"))^1.1) > >> > >> : ((+(title:\"μ heavy chain disease\"))^1.1) ((+(title:\"μ > >> > hcd\"))^1.1))) > >> > >> : product(double(category_weight),const(20)) > >> product(query(+(title:\"μ > >> > >> heavy > >> > >> : chain disease\"),def=0.0),const(4))", "explain":{ " > >> > >> : 33d808fe-6ccf-4305-a643-48e94de34d18":{ "match":true, > >> "value":30.0, " > >> > >> : description":"sum of:", "details":[{ "match":true, "value":30.0, > " > >> > >> : description":"FunctionQuery(product(double(category_weight), > >> > >> const(20))), > >> > >> : product of:", > >> > >> : ===================================================== > >> > >> : *"details":**[{ "match":true, "value":30.0, > >> > >> : "description":"product(double(category_weight)=1.5,const(20))"}, > >> {* > >> > >> : ===================================================== > >> > >> : > >> > >> : "match":true, "value":1.0, "description":"boost"}, { > "match":true, > >> > >> "value": > >> > >> : 1.0, "description":"queryNorm"}]}, { > >> > >> : > >> > >> > >> > >> -Hoss > >> > >> http://www.lucidworks.com/ > >> > > > >> > > > >> > > > >> > > >> > > > > >