Yes, your solution is much simpler, providing the result through a single 
query. I didnt understand it the first time I read it.
 
I guess you would need to run it backwards as well to really evaluate the 
relevance, i.e. 
 
First 
    q=<query1>&facet=on&facet.query=<query2>

Then 
    q=<query2>&facet=on&facet.query=<query1>
 
Query 1 may return 100.000 hits with 500 overlapping with query 2. This would 
indicate no relevance.
Query 2 may return 1.000 documents with 500 overlaping with 1. This would 
indicate relevance.
 
I will test it out the next days and let you know how it works for us.
 
Regardsm
Gert.
 
 
 

________________________________

From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Fri 4/23/2010 11:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Comparing two queries



Gert,

In your second query example you used "qf=...".  Did you mean "fq=...." ?  If 
so, the answer is no - filter queries don't affect the score.


I haven't tried your approach, but intuitively feel that looking at % overlap 
may work better.
Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



----- Original Message ----
> From: "Villemos, Gert" <gert.ville...@logica.com>
> To: solr-user@lucene.apache.org; solr-user@lucene.apache.org
> Sent: Fri, April 23, 2010 5:08:04 PM
> Subject: RE: Comparing two queries
>
> I was thinking along the lines

1. Retrieve the top result for one
> query.
2. Take the resulting document and evaluate the score that it would
> get in another query.
3. If the scores are similar, then the queries most
> likely overlap.

I guess that if I had two simple query strings "archive
> crash" and query "archiving failure" then I could:

1. Use the query
> ?q="archive crash"&rows=1 which will return me one result (if any).
2.
> Read the score of the returned document.
3. Read the unique identifier field
> value, lets say it has field name 'URI' and value
> "50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55'.
4. Use the query ?q="archiving
> failure"&qf=URI:50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55&rows=1
5. Read
> the score of the returned document (the document will be the same as returned
> under 1, the score will be different, evaluated based on the second
> query).
6. Evaluate how similar the scores are.

My question this
> approach is; is the score calculated in 4 affected by the subquery, whoes role
> is solely to select a specific result?

I'm using the dismax by the way.
> Should I use the standard handler instead? Would it make a difference?

>
Thanks,
Gert.


>

________________________________

From: Erik Hatcher [mailto:
> ymailto="mailto:erik.hatc...@gmail.com";
> href="mailto:erik.hatc...@gmail.com";>erik.hatc...@gmail.com]
Sent: Fri
> 4/23/2010 8:08 PM
To:
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
Subject:
> Re: Comparing two queries



Or, use facet.query to get the
> overlap.  Here's
> ?
q=<query1>&facet=on&facet.query=<query2>

You'll
> get the hit count from query #1 in the results, and the
overlapping count to
> query #2 in the facet query response.

        Erik -
>
> >http://www.lucidimagination.com <http://www.lucidimagination.com/>  <
> href="http://www.lucidimagination.com/"; target=_blank
> >http://www.lucidimagination.com/>

On Apr 23, 2010, at 11:01 AM,
> Otis Gospodnetic wrote:

> Hello Gert,
>
> I think you'd
> have to apply custom heuristics that involves looking
> at top N hits for
> each query and looking at the % overlap.
>
> Otis
>
> ----
> Sematext ::
> >http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem
> search ::
> >http://search-lucene.com/
>
>
>
> ----- Original
> Message ----
>> From: "Villemos, Gert" <
> ymailto="mailto:gert.ville...@logica.com";
> href="mailto:gert.ville...@logica.com";>gert.ville...@logica.com>
>>
> To:
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
>>
> Sent: Fri, April 23, 2010 10:20:54 AM
>> Subject: Comparing two
> queries
>>
>> We want to support that a user can register for
> interest in
>> information,
> based on a query he has defined
> himself. For example that he
>> type in a
> query, press a save
> button, provides his email and the system will
>> now
> email him
> with a daily digest.
>
>
>
> As part of this, it
> would
>> be nice to be able to tell the user that the
> same / a
> similar query are
>> already being monitored by another user,
> as
> the users will likely have the
>> same interests. I would
> therefore like to
> evaluate whether two queries will
>> return
> (almost) the same set of
> results.
>
>
>
> But
> how can I
>> compare two queries to determine if they will
> return
> (almost) the same set of
>>
> results?
>
>
>
> Thanks,
>
>
> Gert.
>
>
>
> Please help Logica
>> to respect
> the environment by not printing this email  / Pour
>>
> contribuer
>> comme Logica au respect de l'environnement, merci de ne
> pas
>> imprimer ce mail
>> /  Bitte drucken Sie diese
> Nachricht nicht aus und helfen Sie so
>> Logica
>> dabei, die
> Umwelt zu schützen. /  Por favor ajude a Logica a
>> respeitar
> o
>> ambiente nao imprimindo este correio
> electronico.
>
>
>
> This e-mail and
>> any
> attachment is for authorised use by the intended recipient(s)
>> only.
> It may
>> contain proprietary material, confidential information and/or
> be
>> subject to
>> legal privilege. It should not be copied,
> disclosed to, retained or
>> used by, any
>> other party. If
> you are not an intended recipient then please
>> promptly
> delete
>> this e-mail and any attachment and all copies and inform the
>
>> sender. Thank
>> you.






Please
> help Logica to respect the environment by not printing this email  / Pour
> contribuer comme Logica au respect de l'environnement, merci de ne pas 
> imprimer
> ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so
> Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a
> respeitar o ambiente nao imprimindo este correio
> electronico.



This e-mail and any attachment is for authorised use
> by the intended recipient(s) only. It may contain proprietary material,
> confidential information and/or be subject to legal privilege. It should not 
> be
> copied, disclosed to, retained or used by, any other party. If you are not an
> intended recipient then please promptly delete this e-mail and any attachment
> and all copies and inform the sender. Thank you.





Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.

Reply via email to