Yes the AnalyticsQuery is being called twice in the logs, which is not a
good thing. Originally I believe this was not the case but changes in the
QueryComponent in later release have caused this to happen. The test cases
aren't broken by this so it didn't get caught.

The actual merge of the results from the AnalyticsQuery, which is done in
the MergeStrategy, will only happen on the first stage. In the second stage
the results from the Analytics query should be ignored. As a work around
for the double call to the AnalyticsQuery you can look for the "ids" param
in your Analytics query and skip gathering the analytics if it's present.
The ids param is sent in the second phase of a distributed search.

What you're running into here is that the MergeStrategy is not really in
use in combination with the AnalyticsQuery. There are users that use the
MergeStrategy to handle custom merging of documents to produce custom
rankings. But the AnalyticsQuery really hasn't been used much with the
MergeStrategy that I'm aware of. So this has not been reported before.

I have moved away from using the MergeStrategy for merging custom
analytics. I'll give you a little context for how this has evolved.

The MergeStrategy was originally introduced for an e-commerce customer that
wanted to produce custom rankings. As part of that work the AnalyticsQuery
was added to support custom analytics. And the MergeStrategy supported that
as well.

Later, Streaming Expressions were added which took control of the merge in
a much more elegant way then the MergeStrategy. So now there are features
in Solr that nicely combine an AnalyticsQuery which is merged through the
Streaming Expression framework. The FeatureSelectionStream and the
TextLogitStream use this approach. These two streams are in master and
branch_6x if you want to see how they operate.



















Joel Bernstein
http://joelsolr.blogspot.com/

On Thu, Aug 11, 2016 at 10:29 AM, tedsolr <tsm...@sciquest.com> wrote:

> OK, some more info ... it's not aggregating because the doc values it's
> using
> for grouping are the unique ID field's. There are some big differences in
> the whole flow between searches against a single shard collection, and
> searches against a multi-shard collection. In a single shard collection the
> AnalyticsQuery is called one time, and there's only one pass through the
> delegating collector. If someone could explain what's going on in a
> multi-sharded search that would help a lot I think. My test collection has
> two shards each one has a replica.
>
> For this search
> .../aggr?q=*:*&fl=VENDOR_NAME&sort=VENDOR_NAME+asc
> The user has selected just one field to view, so I make VENDOR_NAME the
> group by field.
>
> This is what I see while debugging:
> 1. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME
> +
> [AggregationStats]
> 2. custom AnalyticsQuery is instantiated (again) and the "fl" param is id +
> [AggregationStats]
> 3. custom AnalyticsQuery is instantiated (again) and the "fl" param is id +
> [AggregationStats]
> 4. getAnalyticsCollector() is called (fl is id + [AggregationStats])
> 5. getAnalyticsCollector() is called again (fl is id + [AggregationStats])
> 6. custom DelegatingCollector finish() is called
> 7. custom DelegatingCollector finish() is called
> 8. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME
> +
> [AggregationStats] + id +  [AggregationStats]
> 9. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME
> +
> [AggregationStats] + id +  [AggregationStats]
>
> And from the log:
>
> INFO  - 2016-08-11 09:19:47.245; [ShardTest1 shard1_1 core_node4
> ShardTest1_shard1_1_replica1] org.apache.solr.core.SolrCore;
> [ShardTest1_shard1_1_replica1] webapp=/solr path=/aggr
> params={distrib=false&qt=/aggr&fl=id&shards.purpose=4&
> start=0&fsv=true&sort=VENDOR_NAME+asc&fq={!AggregationPostFilter+count%
> 3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/
> ShardTest1_shard1_1_replica1/|http://localhost:8984/solr/
> ShardTest1_shard1_1_replica2/&rows=10&version=2&q=*:*&NOW=
> 1470925120206&isShard=true&wt=javabin&_=1470925120222}
> hits=12096 status=0 QTime=64734
>
> INFO  - 2016-08-11 09:19:48.876; [ShardTest1 shard1_0 core_node3
> ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
> [ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
> params={distrib=false&qt=/aggr&fl=id&shards.purpose=4&
> start=0&fsv=true&sort=VENDOR_NAME+asc&fq={!AggregationPostFilter+count%
> 3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/
> ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/
> ShardTest1_shard1_0_replica2/&rows=10&version=2&q=*:*&NOW=
> 1470925120206&isShard=true&wt=javabin&_=1470925120222}
> hits=12062 status=0 QTime=66365
>
> INFO  - 2016-08-11 09:19:50.952; [ShardTest1 shard1_1 core_node4
> ShardTest1_shard1_1_replica1] org.apache.solr.core.SolrCore;
> [ShardTest1_shard1_1_replica1] webapp=/solr path=/aggr
> params={distrib=false&qt=/aggr&fl=VENDOR_NAME&fl=[AggregationStats]&fl=id&
> shards.purpose=64&fq={!AggregationPostFilter+count%
> 3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/
> ShardTest1_shard1_1_replica1/|http://localhost:8984/solr/
> ShardTest1_shard1_1_replica2/&version=2&q=*:*&NOW=
> 1470925120206&ids=100713,940122,44812,210965,584851&
> isShard=true&wt=javabin&_=1470925120222}
> status=0 QTime=2070
>
> INFO  - 2016-08-11 09:19:53.176; [ShardTest1 shard1_0 core_node3
> ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
> [ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
> params={distrib=false&qt=/aggr&fl=VENDOR_NAME&fl=[AggregationStats]&fl=id&
> shards.purpose=64&fq={!AggregationPostFilter+count%
> 3DCount+spend%3DINVOICE_AMOUNT}&shard.url=http://localhost:8983/solr/
> ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/
> ShardTest1_shard1_0_replica2/&version=2&q=*:*&NOW=
> 1470925120206&ids=533737,44864,100672,940123,96752&
> isShard=true&wt=javabin&_=1470925120222}
> status=0 QTime=4293
>
> INFO  - 2016-08-11 09:19:53.178; [ShardTest1 shard1_0 core_node3
> ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
> [ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
> params={q=*:*&indent=true&fl=VENDOR_NAME&sort=VENDOR_NAME+
> asc&wt=json&_=1470925120222}
> hits=24158 status=0 QTime=72972
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-
> tp4289274p4291301.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Reply via email to