Yes, your solution is much simpler, providing the result through a single query. I didnt understand it the first time I read it. I guess you would need to run it backwards as well to really evaluate the relevance, i.e. First q=<query1>&facet=on&facet.query=<query2>
Then q=<query2>&facet=on&facet.query=<query1> Query 1 may return 100.000 hits with 500 overlapping with query 2. This would indicate no relevance. Query 2 may return 1.000 documents with 500 overlaping with 1. This would indicate relevance. I will test it out the next days and let you know how it works for us. Regardsm Gert. ________________________________ From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Fri 4/23/2010 11:24 PM To: solr-user@lucene.apache.org Subject: Re: Comparing two queries Gert, In your second query example you used "qf=...". Did you mean "fq=...." ? If so, the answer is no - filter queries don't affect the score. I haven't tried your approach, but intuitively feel that looking at % overlap may work better. Otis ---- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ ----- Original Message ---- > From: "Villemos, Gert" <gert.ville...@logica.com> > To: solr-user@lucene.apache.org; solr-user@lucene.apache.org > Sent: Fri, April 23, 2010 5:08:04 PM > Subject: RE: Comparing two queries > > I was thinking along the lines 1. Retrieve the top result for one > query. 2. Take the resulting document and evaluate the score that it would > get in another query. 3. If the scores are similar, then the queries most > likely overlap. I guess that if I had two simple query strings "archive > crash" and query "archiving failure" then I could: 1. Use the query > ?q="archive crash"&rows=1 which will return me one result (if any). 2. > Read the score of the returned document. 3. Read the unique identifier field > value, lets say it has field name 'URI' and value > "50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55'. 4. Use the query ?q="archiving > failure"&qf=URI:50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55&rows=1 5. Read > the score of the returned document (the document will be the same as returned > under 1, the score will be different, evaluated based on the second > query). 6. Evaluate how similar the scores are. My question this > approach is; is the score calculated in 4 affected by the subquery, whoes role > is solely to select a specific result? I'm using the dismax by the way. > Should I use the standard handler instead? Would it make a difference? > Thanks, Gert. > ________________________________ From: Erik Hatcher [mailto: > ymailto="mailto:erik.hatc...@gmail.com" > href="mailto:erik.hatc...@gmail.com">erik.hatc...@gmail.com] Sent: Fri > 4/23/2010 8:08 PM To: > href="mailto:solr-user@lucene.apache.org">solr-user@lucene.apache.org Subject: > Re: Comparing two queries Or, use facet.query to get the > overlap. Here's > ? q=<query1>&facet=on&facet.query=<query2> You'll > get the hit count from query #1 in the results, and the overlapping count to > query #2 in the facet query response. Erik - > > >http://www.lucidimagination.com <http://www.lucidimagination.com/> < > href="http://www.lucidimagination.com/" target=_blank > >http://www.lucidimagination.com/> On Apr 23, 2010, at 11:01 AM, > Otis Gospodnetic wrote: > Hello Gert, > > I think you'd > have to apply custom heuristics that involves looking > at top N hits for > each query and looking at the % overlap. > > Otis > > ---- > Sematext :: > >http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem > search :: > >http://search-lucene.com/ > > > > ----- Original > Message ---- >> From: "Villemos, Gert" < > ymailto="mailto:gert.ville...@logica.com" > href="mailto:gert.ville...@logica.com">gert.ville...@logica.com> >> > To: > href="mailto:solr-user@lucene.apache.org">solr-user@lucene.apache.org >> > Sent: Fri, April 23, 2010 10:20:54 AM >> Subject: Comparing two > queries >> >> We want to support that a user can register for > interest in >> information, > based on a query he has defined > himself. For example that he >> type in a > query, press a save > button, provides his email and the system will >> now > email him > with a daily digest. > > > > As part of this, it > would >> be nice to be able to tell the user that the > same / a > similar query are >> already being monitored by another user, > as > the users will likely have the >> same interests. I would > therefore like to > evaluate whether two queries will >> return > (almost) the same set of > results. > > > > But > how can I >> compare two queries to determine if they will > return > (almost) the same set of >> > results? > > > > Thanks, > > > Gert. > > > > Please help Logica >> to respect > the environment by not printing this email / Pour >> > contribuer >> comme Logica au respect de l'environnement, merci de ne > pas >> imprimer ce mail >> / Bitte drucken Sie diese > Nachricht nicht aus und helfen Sie so >> Logica >> dabei, die > Umwelt zu schützen. / Por favor ajude a Logica a >> respeitar > o >> ambiente nao imprimindo este correio > electronico. > > > > This e-mail and >> any > attachment is for authorised use by the intended recipient(s) >> only. > It may >> contain proprietary material, confidential information and/or > be >> subject to >> legal privilege. It should not be copied, > disclosed to, retained or >> used by, any >> other party. If > you are not an intended recipient then please >> promptly > delete >> this e-mail and any attachment and all copies and inform the > >> sender. Thank >> you. Please > help Logica to respect the environment by not printing this email / Pour > contribuer comme Logica au respect de l'environnement, merci de ne pas > imprimer > ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so > Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a > respeitar o ambiente nao imprimindo este correio > electronico. This e-mail and any attachment is for authorised use > by the intended recipient(s) only. It may contain proprietary material, > confidential information and/or be subject to legal privilege. It should not > be > copied, disclosed to, retained or used by, any other party. If you are not an > intended recipient then please promptly delete this e-mail and any attachment > and all copies and inform the sender. Thank you. Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.