Query results vs. facets results

2012-07-14 Thread tudor
Hello,

I am new to Solr and I running some tests with our data in Solr. We are
using version 3.6 and the data is imported form a DB2 database using Solr's
DIH. We have defined a single entity in the db-data-config.xml, which is an
equivalent of the following query:



This might lead to some names appearing multiple times in the result set.
This is OK.

For the unique ID in the schema, we are using a solr.UUIDField:


http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=100&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=NAME&group.ngroups=true&group.truncate=true

yields 

<int name="ngroups">134

as a result, which is exactly what we expect. 

On the other hand, running

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=NAME&group.truncate=true&facet=true&facet.field=CITY&group.ngroups=true

yields 


   
 
  
103

I would expect to have the same number (134) in this facet result as well.
Could you please let me know why these two results are different?

Thank you,
Tudor



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3994988.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query results vs. facets results

2012-07-15 Thread tudor
Hello,

I am new to Solr and I running some tests with our data in Solr. I am using
version 3.6 and the data is imported form a DB2 database using Solr's DIH.
We have defined a single entity in the db-data-config.xml, which is an
equivalent of the following query:



The ID in NAME_CONNECTIONS is not unique, so it might appear multiple times.

For the unique ID in the schema, we are using a solr.UUIDField:


http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.ngroups=true&group.truncate=true

yields 

<int name="ngroups">134

as a result, which is exactly what we expect.

On the other hand, running

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field=CITY&group.ngroups=true

yields


   
 
  
103

I would expect to have the same number (134) in this facet result as the
previous filter result. Could you please let me know why these two results
are different?

Thank you,
Tudor 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-15 Thread tudor
Hi Eric,

Thanks for the reply.

The query:
 
http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.ngroups=true&group.truncate=true&debugQuery=on

yields this in the debug section:

CITY:MILTON
  CITY:MILTON
  CITY:MILTON
  CITY:MILTON
  LuceneQParser

in the explain section. There is no information about grouping.

Second query:

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field=CITY&facet.missing=true&group.ngroups=true&debugQuery=on

yields this in the debug section:


  *
  *
  ID:*
  ID:*
  LuceneQParser

To be honest, these do not tell me too much. I would like to see some
information about the grouping, since I believe this is where I am missing
something.

In the mean time, I have combined the two queries above, hoping to make some
sense out of the results. The following query filters all the entries with
the city name "MILTON" and groups together the ones with the same ID. Also,
the query facets the entries on city, grouping the ones with the same ID. So
the results numbers refer to the number of groups.

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq={!tag=dt}CITY:MILTON&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field={!ex=dt}CITY&facet.missing=true&group.ngroups=true&debugQuery=on

yields the same (for me perplexing) results:


  
  284
  134

(i.e.: fq says: 134 groups with CITY:MILTON)
...


  
  
   ...
  103

(i.e.: faceted search says: 103 groups with CITY:MILTON)

I really believe that these different results have something to do with the
grouping that Solr makes, but I do not know how to dig into this.

Thank you and best regards,
Tudor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995150.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-15 Thread tudor
Hi Erick, 

Thanks for the reply. 

The query: 
  
http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.ngroups=true&group.truncate=true&debugQuery=on

yields this in the debug section: 

CITY:MILTON
  CITY:MILTON
  CITY:MILTON
  CITY:MILTON
  LuceneQParser

in the explain section. There is no information about grouping. 

Second query: 

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field=CITY&facet.missing=true&group.ngroups=true&debugQuery=on

yields this in the debug section: 


  *
  *
  ID:*
  ID:*
  LuceneQParser

To be honest, these do not tell me too much. I would like to see some
information about the grouping, since I believe this is where I am missing
something. 

In the mean time, I have combined the two queries above, hoping to make some
sense out of the results. The following query filters all the entries with
the city name "MILTON" and groups together the ones with the same ID. Also,
the query facets the entries on city, grouping the ones with the same ID. So
the results numbers refer to the number of groups. 

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq={!tag=dt}CITY:MILTON&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field={!ex=dt}CITY&facet.missing=true&group.ngroups=true&debugQuery=on
 

yields the same (for me perplexing) results: 


  
  284
  134

(i.e.: fq says: 134 groups with CITY:MILTON) 
... 


  
  
   ... 
  103

(i.e.: faceted search says: 103 groups with CITY:MILTON) 

I really believe that these different results have something to do with the
grouping that Solr makes, but I do not know how to dig into this. 

Thank you and best regards, 
Tudor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995152.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-15 Thread tudor
Hi Eric, 

Thanks for the reply. 

The query: 
  
http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.ngroups=true&group.truncate=true&debugQuery=on

yields this in the debug section: 

CITY:MILTON
  CITY:MILTON
  CITY:MILTON
  CITY:MILTON
  LuceneQParser

There is no information about grouping. 

Second query: 

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field=CITY&facet.missing=true&group.ngroups=true&debugQuery=on

yields this in the debug section: 


  *
  *
  ID:*
  ID:*
  LuceneQParser

To be honest, these do not tell me too much. I would like to see some
information about the grouping, since I believe this is where I am missing
something. 

In the mean time, I have combined the two queries above, hoping to make some
sense out of the results. The following query filters all the entries with
the city name "MILTON" and groups together the ones with the same ID. Also,
the query facets the entries on city, grouping the ones with the same ID. So
the results numbers refer to the number of groups. 

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq={!tag=dt}CITY:MILTON&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field={!ex=dt}CITY&facet.missing=true&group.ngroups=true&debugQuery=on
 

yields the same (for me perplexing) results: 


  
  284
  134

(i.e.: fq says: 134 groups with CITY:MILTON) 
... 


  
  
   ... 
  103

(i.e.: faceted search says: 103 groups with CITY:MILTON) 

I really believe that these different results have something to do with the
grouping that Solr makes, but I do not know how to dig into this. 

Thank you and best regards, 
Tudor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995154.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-15 Thread tudor
Hi Erick, 

Thanks for the reply. 

The query: 
  
http://localhost:8983/solr/db/select?indent=on&version=2.2&q=CITY:MILTON&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.ngroups=true&group.truncate=true&debugQuery=on

yields this in the debug section: 

CITY:MILTON
  CITY:MILTON
  CITY:MILTON
  CITY:MILTON
  LuceneQParser

There is no information about grouping. 

Second query: 

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq=&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field=CITY&facet.missing=true&group.ngroups=true&debugQuery=on

yields this in the debug section: 


  *
  *
  ID:*
  ID:*
  LuceneQParser

To be honest, these do not tell me too much. I would like to see some
information about the grouping, since I believe this is where I am missing
something. 

In the mean time, I have combined the two queries above, hoping to make some
sense out of the results. The following query filters all the entries with
the city name "MILTON" and groups together the ones with the same ID. Also,
the query facets the entries on city, grouping the ones with the same ID. So
the results numbers refer to the number of groups. 

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*&fq={!tag=dt}CITY:MILTON&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=ID&group.truncate=true&facet=true&facet.field={!ex=dt}CITY&facet.missing=true&group.ngroups=true&debugQuery=on
 

yields the same (for me perplexing) results: 


  
  284
  134

(i.e.: fq says: 134 groups with CITY:MILTON) 
... 


  
  
   ... 
  103

(i.e.: faceted search says: 103 groups with CITY:MILTON) 

I really believe that these different results have something to do with the
grouping that Solr makes, but I do not know how to dig into this. 

Thank you and best regards, 
Tudor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995156.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-16 Thread tudor

Erick Erickson wrote
> 
> Ahhh, you need to look down another few lines. When you specify fq, there
> should be a section of the debug output like
> 
>   .
>   .
>   .
> 
> 
> where the array is the parsed form of the filter queries. I was thinking
> about
> comparing that with the parsed form of the "q" parameter in the non-filter
> case to see what insight one could gain from that.
> 
> 

There is no "filter_queries" section because I do not use an fq in the first
two queries. I use one in the combined query, for which you can see the
output further below.


Erick Erickson wrote
> 
> 
> But there's already one difference, when you use *, you get
>  ID:*
> 
> Is it possible that you have some documents that do NOT have an ID field?
> try *:* rather than just *. I'm guessing that your default search field is
> ID
> and you have some documents without an ID field. Not a good guess if ID
> is your  though..
> 
> Try q=*:* -ID:* and see if you get 31 docs.
> 
> 

All the entries have an ID, so q=*:* -ID:* yielded 0 results.
The ID could appear multiple times, that is the reason behind grouping of
results. Indeed, ID is the default search field.


Erick Erickson wrote
> 
> 
> Also note that if you _have_ specified ID as your  _but_ you
> didn't
> re-index afterwards (actually, I'd blow away the entire
> /data directory
> and restart) you may have stale data in there that allowed documents to
> exist
> that do not have uniqueKey fields.
> 
> 

For Solr's unique id I use a  field (which, of course, has a different name than the
default search ID), so it should not be a problem.

I have re-indexed the data, and I get somewhat a different result. This is
the query:

http://localhost:8983/solr/db/select?indent=on&version=2.2&q=*:*&fq={!tag=dt}CITY:MILTON&start=0&rows=10&fl=*&wt=&explainOther=&hl.fl=&group=true&group.field=STR_ENTERPRISE_ID&group.truncate=true&facet=true&facet.field={!ex=dt}CITY&facet.missing=true&group.ngroups=true&debugQuery=on

And the results as well as the debug information:


  
  284
  134
  
   ...


  


  ...
89
  ...


  *:*
  *:*
  MatchAllDocsQuery(*:*)
  *:*
  
  LuceneQParser
  
  {!tag=dt}CITY:MILTON
  
  CITY:MILTON
  
  


So now fq says: 134 groups with CITY:MILTON and faceted search says: 83
groups with CITY:MILTON. 

How can I see some information about the grouping in Solr?

Thanks Erick!

Regards,
Tudor


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995388.html
Sent from the Solr - User mailing list archive at Nabble.com.