Could you please provide me the original request (the HTTP-request)? I am a little bit confused to what "query_score" refers. As far as I can see it isn't a magic-value.
Kind regards, Em Am 20.02.2012 14:05, schrieb Carlos Gonzalez-Cadenas: > Yeah Em, it helped a lot :) > > Here it is (for the user query "hoteles"): > > *+(stopword_shortened_phrase:hoteles | stopword_phrase:hoteles | > wildcard_stopword_shortened_phrase:hoteles | > wildcard_stopword_phrase:hoteles) * > > *product(pow(query((stopword_shortened_phrase:hoteles | > stopword_phrase:hoteles | wildcard_stopword_shortened_phrase:hoteles | > wildcard_stopword_phrase:hoteles),def=0.0),const(0.5)),float(query_score))* > > Thanks a lot for your help. > > Carlos > Carlos Gonzalez-Cadenas > CEO, ExperienceOn - New generation search > http://www.experienceon.com > > Mobile: +34 652 911 201 > Skype: carlosgonzalezcadenas > LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas > > > On Mon, Feb 20, 2012 at 1:50 PM, Em <mailformailingli...@yahoo.de> wrote: > >> Carlos, >> >> nice to hear that the approach helped you! >> >> Could you show us how your query-request looks like after reworking? >> >> Regards, >> Em >> >> Am 20.02.2012 13:30, schrieb Carlos Gonzalez-Cadenas: >>> Hello all: >>> >>> We've done some tests with Em's approach of putting a BooleanQuery in >> front >>> of our user query, that means: >>> >>> BooleanQuery >>> must (DismaxQuery) >>> should (FunctionQuery) >>> >>> The FunctionQuery obtains the SOLR IR score by means of a >> QueryValueSource, >>> then does the SQRT of this value, and then multiplies it by our custom >>> "query_score" float, pulling it by means of a FieldCacheSource. >>> >>> In particular, we've proceeded in the following way: >>> >>> - we've loaded the whole index in the page cache of the OS to make >> sure >>> we don't have disk IO problems that might affect the benchmarks (our >>> machine has enough memory to load all the index in RAM) >>> - we've executed an out-of-benchmark query 10-20 times to make sure >> that >>> everything is jitted and that Lucene's FieldCache is properly >> populated. >>> - we've disabled all the caches (filter query cache, document cache, >>> query cache) >>> - we've executed 8 different user queries with and without >>> FunctionQueries, with early termination in both cases (our collector >> stops >>> after collecting 50 documents per shard) >>> >>> Em was correct, the query is much faster with the BooleanQuery in front, >>> but it's still 30-40% slower than the query without FunctionQueries. >>> >>> Although one may think that it's reasonable that the query response time >>> increases because of the extra computations, we believe that the increase >>> is too big, given that we're collecting just 500-600 documents due to the >>> early query termination techniques we currently use. >>> >>> Any ideas on how to make it faster?. >>> >>> Thanks a lot, >>> Carlos >>> >>> Carlos Gonzalez-Cadenas >>> CEO, ExperienceOn - New generation search >>> http://www.experienceon.com >>> >>> Mobile: +34 652 911 201 >>> Skype: carlosgonzalezcadenas >>> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas >>> >>> >>> On Fri, Feb 17, 2012 at 11:07 AM, Carlos Gonzalez-Cadenas < >>> c...@experienceon.com> wrote: >>> >>>> Thanks Em, Robert, Chris for your time and valuable advice. We'll make >>>> some tests and will let you know soon. >>>> >>>> >>>> >>>> On Thu, Feb 16, 2012 at 11:43 PM, Em <mailformailingli...@yahoo.de> >> wrote: >>>> >>>>> Hello Carlos, >>>>> >>>>> I think we missunderstood eachother. >>>>> >>>>> As an example: >>>>> BooleanQuery ( >>>>> clauses: ( >>>>> MustMatch( >>>>> DisjunctionMaxQuery( >>>>> TermQuery("stopword_field", "barcelona"), >>>>> TermQuery("stopword_field", "hoteles") >>>>> ) >>>>> ), >>>>> ShouldMatch( >>>>> FunctionQuery( >>>>> *please insert your function here* >>>>> ) >>>>> ) >>>>> ) >>>>> ) >>>>> >>>>> Explanation: >>>>> You construct an artificial BooleanQuery which wraps your user's query >>>>> as well as your function query. >>>>> Your user's query - in that case - is just a DisjunctionMaxQuery >>>>> consisting of two TermQueries. >>>>> In the real world you might construct another BooleanQuery around your >>>>> DisjunctionMaxQuery in order to have more flexibility. >>>>> However the interesting part of the given example is, that we specify >>>>> the user's query as a MustMatch-condition of the BooleanQuery and the >>>>> FunctionQuery just as a ShouldMatch. >>>>> Constructed that way, I am expecting the FunctionQuery only scores >> those >>>>> documents which fit the MustMatch-Condition. >>>>> >>>>> I conclude that from the fact that the FunctionQuery-class also has a >>>>> skipTo-method and I would expect that the scorer will use it to score >>>>> only matching documents (however I did not search where and how it >> might >>>>> get called). >>>>> >>>>> If my conclusion is wrong than hopefully Robert Muir (as far as I can >>>>> see the author of that class) can tell us what was the intention by >>>>> constructing an every-time-match-all-function-query. >>>>> >>>>> Can you validate whether your QueryParser constructs a query in the >> form >>>>> I drew above? >>>>> >>>>> Regards, >>>>> Em >>>>> >>>>> Am 16.02.2012 20:29, schrieb Carlos Gonzalez-Cadenas: >>>>>> Hello Em: >>>>>> >>>>>> 1) Here's a printout of an example DisMax query (as you can see mostly >>>>> MUST >>>>>> terms except for some SHOULD terms used for boosting scores for >>>>> stopwords) >>>>>> * >>>>>> * >>>>>> *((+stopword_shortened_phrase:hoteles >>>>> +stopword_shortened_phrase:barcelona >>>>>> stopword_shortened_phrase:en) | (+stopword_phrase:hoteles >>>>>> +stopword_phrase:barcelona >>>>>> stopword_phrase:en) | (+stopword_shortened_phrase:hoteles >>>>> +stopword_short >>>>>> ened_phrase:barcelona stopword_shortened_phrase:en) | >>>>> (+stopword_phrase:hoteles >>>>>> +stopword_phrase:barcelona stopword_phrase:en) | (+stopword_shor >>>>>> tened_phrase:hoteles +wildcard_stopword_shortened_phrase:barcelona >>>>>> stopword_shortened_phrase:en) | (+stopword_phrase:hoteles >>>>> +wildcard_stopw >>>>>> ord_phrase:barcelona stopword_phrase:en) | >>>>> (+stopword_shortened_phrase:hoteles >>>>>> +wildcard_stopword_shortened_phrase:barcelona >>>>> stopword_shortened_phrase:en) >>>>>> | (+stopword_phrase:hoteles +wildcard_stopword_phrase:barcelona >>>>>> stopword_phrase:en))* >>>>>> * >>>>>> * >>>>>> 2)* *The collector is inserted in the SolrIndexSearcher (replacing the >>>>>> TimeLimitingCollector). We trigger it through the SOLR interface by >>>>> passing >>>>>> the timeAllowed parameter. We know this is a hack but AFAIK there's no >>>>>> out-of-the-box way to specify custom collectors by now ( >>>>>> https://issues.apache.org/jira/browse/SOLR-1680). In any case the >>>>> collector >>>>>> part works perfectly as of now, so clearly this is not the problem. >>>>>> >>>>>> 3) Re: your sentence: >>>>>> * >>>>>> * >>>>>> **I* would expect that with a shrinking set of matching documents to >>>>>> the overall-query, the function query only checks those documents that >>>>> are >>>>>> guaranteed to be within the result set.* >>>>>> * >>>>>> * >>>>>> Yes, I agree with this, but this snippet of code in FunctionQuery.java >>>>>> seems to say otherwise: >>>>>> >>>>>> // instead of matching all docs, we could also embed a query. >>>>>> // the score could either ignore the subscore, or boost it. >>>>>> // Containment: floatline(foo:myTerm, "myFloatField", 1.0, 0.0f) >>>>>> // Boost: foo:myTerm^floatline("myFloatField",1.0,0.0f) >>>>>> @Override >>>>>> public int nextDoc() throws IOException { >>>>>> for(;;) { >>>>>> ++doc; >>>>>> if (doc>=maxDoc) { >>>>>> return doc=NO_MORE_DOCS; >>>>>> } >>>>>> if (acceptDocs != null && !acceptDocs.get(doc)) continue; >>>>>> return doc; >>>>>> } >>>>>> } >>>>>> >>>>>> It seems that the author also thought of maybe embedding a query in >>>>> order >>>>>> to restrict matches, but this doesn't seem to be in place as of now >> (or >>>>>> maybe I'm not understanding how the whole thing works :) ). >>>>>> >>>>>> Thanks >>>>>> Carlos >>>>>> * >>>>>> * >>>>>> >>>>>> Carlos Gonzalez-Cadenas >>>>>> CEO, ExperienceOn - New generation search >>>>>> http://www.experienceon.com >>>>>> >>>>>> Mobile: +34 652 911 201 >>>>>> Skype: carlosgonzalezcadenas >>>>>> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas >>>>>> >>>>>> >>>>>> On Thu, Feb 16, 2012 at 8:09 PM, Em <mailformailingli...@yahoo.de> >>>>> wrote: >>>>>> >>>>>>> Hello Carlos, >>>>>>> >>>>>>>> We have some more tests on that matter: now we're moving from >> issuing >>>>>>> this >>>>>>>> large query through the SOLR interface to creating our own >>>>>>> QueryParser. The >>>>>>>> initial tests we've done in our QParser (that internally creates >>>>> multiple >>>>>>>> queries and inserts them inside a DisjunctionMaxQuery) are very >> good, >>>>>>> we're >>>>>>>> getting very good response times and high quality answers. But when >>>>> we've >>>>>>>> tried to wrap the DisjunctionMaxQuery within a FunctionQuery (i.e. >>>>> with a >>>>>>>> QueryValueSource that wraps the DisMaxQuery), then the times move >> from >>>>>>>> 10-20 msec to 200-300msec. >>>>>>> I reviewed the sourcecode and yes, the FunctionQuery iterates over >> the >>>>>>> whole index, however... let's see! >>>>>>> >>>>>>> In relation to the DisMaxQuery you create within your parser: What >> kind >>>>>>> of clause is the FunctionQuery and what kind of clause are your other >>>>>>> queries (MUST, SHOULD, MUST_NOT...)? >>>>>>> >>>>>>> *I* would expect that with a shrinking set of matching documents to >> the >>>>>>> overall-query, the function query only checks those documents that >> are >>>>>>> guaranteed to be within the result set. >>>>>>> >>>>>>>> Note that we're using early termination of queries (via a custom >>>>>>>> collector), and therefore (as shown by the numbers I included above) >>>>> even >>>>>>>> if the query is very complex, we're getting very fast answers. The >>>>> only >>>>>>>> situation where the response time explodes is when we include a >>>>>>>> FunctionQuery. >>>>>>> Could you give us some details about how/where did you plugin the >>>>>>> Collector, please? >>>>>>> >>>>>>> Kind regards, >>>>>>> Em >>>>>>> >>>>>>> Am 16.02.2012 19:41, schrieb Carlos Gonzalez-Cadenas: >>>>>>>> Hello Em: >>>>>>>> >>>>>>>> Thanks for your answer. >>>>>>>> >>>>>>>> Yes, we initially also thought that the excessive increase in >> response >>>>>>> time >>>>>>>> was caused by the several queries being executed, and we did another >>>>>>> test. >>>>>>>> We executed one of the subqueries that I've shown to you directly in >>>>> the >>>>>>>> "q" parameter and then we tested this same subquery (only this one, >>>>>>> without >>>>>>>> the others) with the function query "query($q1)" in the "q" >> parameter. >>>>>>>> >>>>>>>> Theoretically the times for these two queries should be more or less >>>>> the >>>>>>>> same, but the second one is several times slower than the first one. >>>>>>> After >>>>>>>> this observation we learned more about function queries and we >> learned >>>>>>> from >>>>>>>> the code and from some comments in the forums [1] that the >>>>>>> FunctionQueries >>>>>>>> are expected to match all documents. >>>>>>>> >>>>>>>> We have some more tests on that matter: now we're moving from >> issuing >>>>>>> this >>>>>>>> large query through the SOLR interface to creating our own >>>>> QueryParser. >>>>>>> The >>>>>>>> initial tests we've done in our QParser (that internally creates >>>>> multiple >>>>>>>> queries and inserts them inside a DisjunctionMaxQuery) are very >> good, >>>>>>> we're >>>>>>>> getting very good response times and high quality answers. But when >>>>> we've >>>>>>>> tried to wrap the DisjunctionMaxQuery within a FunctionQuery (i.e. >>>>> with a >>>>>>>> QueryValueSource that wraps the DisMaxQuery), then the times move >> from >>>>>>>> 10-20 msec to 200-300msec. >>>>>>>> >>>>>>>> Note that we're using early termination of queries (via a custom >>>>>>>> collector), and therefore (as shown by the numbers I included above) >>>>> even >>>>>>>> if the query is very complex, we're getting very fast answers. The >>>>> only >>>>>>>> situation where the response time explodes is when we include a >>>>>>>> FunctionQuery. >>>>>>>> >>>>>>>> Re: your question of what we're trying to achieve ... We're >>>>> implementing >>>>>>> a >>>>>>>> powerful query autocomplete system, and we use several fields to a) >>>>>>> improve >>>>>>>> performance on wildcard queries and b) have a very precise control >>>>> over >>>>>>> the >>>>>>>> score. >>>>>>>> >>>>>>>> Thanks a lot for your help, >>>>>>>> Carlos >>>>>>>> >>>>>>>> [1]: >>>>>>> >>>>> >> http://grokbase.com/p/lucene/solr-user/11bjw87bt5/functionquery-score-0 >>>>>>>> >>>>>>>> Carlos Gonzalez-Cadenas >>>>>>>> CEO, ExperienceOn - New generation search >>>>>>>> http://www.experienceon.com >>>>>>>> >>>>>>>> Mobile: +34 652 911 201 >>>>>>>> Skype: carlosgonzalezcadenas >>>>>>>> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Feb 16, 2012 at 7:09 PM, Em <mailformailingli...@yahoo.de> >>>>>>> wrote: >>>>>>>> >>>>>>>>> Hello Carlos, >>>>>>>>> >>>>>>>>> well, you must take into account that you are executing up to 8 >>>>> queries >>>>>>>>> per request instead of one query per request. >>>>>>>>> >>>>>>>>> I am not totally sure about the details of the implementation of >> the >>>>>>>>> max-function-query, but I guess it first iterates over the results >> of >>>>>>>>> the first max-query, afterwards over the results of the second >>>>> max-query >>>>>>>>> and so on. This is a much higher complexity than in the case of a >>>>> normal >>>>>>>>> query. >>>>>>>>> >>>>>>>>> I would suggest you to optimize your request. I don't think that >> this >>>>>>>>> particular function query is matching *all* docs. Instead I think >> it >>>>>>>>> just matches those docs specified by your inner-query (although I >>>>> might >>>>>>>>> be wrong about that). >>>>>>>>> >>>>>>>>> What are you trying to achieve by your request? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Em >>>>>>>>> >>>>>>>>> Am 16.02.2012 16:24, schrieb Carlos Gonzalez-Cadenas: >>>>>>>>>> Hello Em: >>>>>>>>>> >>>>>>>>>> The URL is quite large (w/ shards, ...), maybe it's best if I >> paste >>>>> the >>>>>>>>>> relevant parts. >>>>>>>>>> >>>>>>>>>> Our "q" parameter is: >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >> "q":"_val_:\"product(query_score,max(query($q8),max(query($q7),max(query($q4),query($q3)))))\"", >>>>>>>>>> >>>>>>>>>> The subqueries q8, q7, q4 and q3 are regular queries, for example: >>>>>>>>>> >>>>>>>>>> "q7":"stopword_phrase:colomba~1 AND stopword_phrase:santa AND >>>>>>>>>> wildcard_stopword_phrase:car^0.7 AND stopword_phrase:hoteles OR >>>>>>>>>> (stopword_phrase:las AND stopword_phrase:de)" >>>>>>>>>> >>>>>>>>>> We've executed the subqueries q3-q8 independently and they're very >>>>>>> fast, >>>>>>>>>> but when we introduce the function queries as described below, it >>>>> all >>>>>>>>> goes >>>>>>>>>> 10X slower. >>>>>>>>>> >>>>>>>>>> Let me know if you need anything else. >>>>>>>>>> >>>>>>>>>> Thanks >>>>>>>>>> Carlos >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Carlos Gonzalez-Cadenas >>>>>>>>>> CEO, ExperienceOn - New generation search >>>>>>>>>> http://www.experienceon.com >>>>>>>>>> >>>>>>>>>> Mobile: +34 652 911 201 >>>>>>>>>> Skype: carlosgonzalezcadenas >>>>>>>>>> LinkedIn: http://www.linkedin.com/in/carlosgonzalezcadenas >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Feb 16, 2012 at 4:02 PM, Em <mailformailingli...@yahoo.de >>> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hello carlos, >>>>>>>>>>> >>>>>>>>>>> could you show us how your Solr-call looks like? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Em >>>>>>>>>>> >>>>>>>>>>> Am 16.02.2012 14:34, schrieb Carlos Gonzalez-Cadenas: >>>>>>>>>>>> Hello all: >>>>>>>>>>>> >>>>>>>>>>>> We'd like to score the matching documents using a combination of >>>>>>> SOLR's >>>>>>>>>>> IR >>>>>>>>>>>> score with another application-specific score that we store >> within >>>>>>> the >>>>>>>>>>>> documents themselves (i.e. a float field containing the >>>>> app-specific >>>>>>>>>>>> score). In particular, we'd like to calculate the final score >>>>> doing >>>>>>>>> some >>>>>>>>>>>> operations with both numbers (i.e product, sqrt, ...) >>>>>>>>>>>> >>>>>>>>>>>> According to what we know, there are two ways to do this in >> SOLR: >>>>>>>>>>>> >>>>>>>>>>>> A) Sort by function [1]: We've tested an expression like >>>>>>>>>>>> "sort=product(score, query_score)" in the SOLR query, where >> score >>>>> is >>>>>>>>> the >>>>>>>>>>>> common SOLR IR score and query_score is our own precalculated >>>>> score, >>>>>>>>> but >>>>>>>>>>> it >>>>>>>>>>>> seems that SOLR can only do this with stored/indexed fields (and >>>>>>>>>>> obviously >>>>>>>>>>>> "score" is not stored/indexed). >>>>>>>>>>>> >>>>>>>>>>>> B) Function queries: We've used _val_ and function queries like >>>>> max, >>>>>>>>> sqrt >>>>>>>>>>>> and query, and we've obtained the desired results from a >>>>> functional >>>>>>>>> point >>>>>>>>>>>> of view. However, our index is quite large (400M documents) and >>>>> the >>>>>>>>>>>> performance degrades heavily, given that function queries are >>>>> AFAIK >>>>>>>>>>>> matching all the documents. >>>>>>>>>>>> >>>>>>>>>>>> I have two questions: >>>>>>>>>>>> >>>>>>>>>>>> 1) Apart from the two options I mentioned, is there any other >>>>>>> (simple) >>>>>>>>>>> way >>>>>>>>>>>> to achieve this that we're not aware of? >>>>>>>>>>>> >>>>>>>>>>>> 2) If we have to choose the function queries path, would it be >>>>> very >>>>>>>>>>>> difficult to modify the actual implementation so that it doesn't >>>>>>> match >>>>>>>>>>> all >>>>>>>>>>>> the documents, that is, to pass a query so that it only operates >>>>> over >>>>>>>>> the >>>>>>>>>>>> documents matching the query?. Looking at the FunctionQuery.java >>>>>>> source >>>>>>>>>>>> code, there's a comment that says "// instead of matching all >>>>> docs, >>>>>>> we >>>>>>>>>>>> could also embed a query. the score could either ignore the >>>>> subscore, >>>>>>>>> or >>>>>>>>>>>> boost it", which is giving us some hope that maybe it's possible >>>>> and >>>>>>>>> even >>>>>>>>>>>> desirable to go in this direction. If you can give us some >>>>> directions >>>>>>>>>>> about >>>>>>>>>>>> how to go about this, we may be able to do the actual >>>>> implementation. >>>>>>>>>>>> >>>>>>>>>>>> BTW, we're using Lucene/SOLR trunk. >>>>>>>>>>>> >>>>>>>>>>>> Thanks a lot for your help. >>>>>>>>>>>> Carlos >>>>>>>>>>>> >>>>>>>>>>>> [1]: http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >