On 5/16/2016 10:28 PM, Jeff Wartes wrote:
> One thing that still feels a bit odd though is that the health check query 
> was referencing a collection that no longer existed in the cluster. So it 
> seems like it was downloading the state for ALL non-hosted collections, not a 
> requested one.
>
> This touches a bit on a sore point with me. I dislike that those 
> collection-not-here proxy requests aren’t logged on the server doing the 
> proxy, because you end up with traffic visible at the http interface but not 
> the solr level. Honestly, I dislike that transparent proxy approach in 
> general, because it means I lose the ability to dedicate entire nodes to the 
> fan-out and shard-aggregation process like I could pre-solrcloud.

Recently I was informed that CloudSolrClient operates on cached state
information, and doesn't make requests to zookeeper very often.  I don't
know if internal cloud queries are handled the same, but they probably
are.  I think you've found an exception to that behavior -- collections
that don't exist.

On a small cloud, this is likely not a performance bottleneck at all ...
but a cloud with dozens of servers and hundreds of collections would be
a different story.

I just tried a query against a small cloud install (running 4.2.1) for a
nonexistent collection, and yes indeed, there's nothing logged *at
all*.  I would have expected a log entry of SOME kind.

So, I think you've found two problems that each need an issue in Jira. 
I hesitate slightly at calling them bugs, because it's probably working
as designed ... but if so, I think the design is incorrect.

1) There are no log entry for queries to nonexistent collection.  The
full SolrCore log entry showing all the parameters would be nice, but
even something short about a query to a nonexistent collection would be
enough.

2) When a collection doesn't exist, that fact should be cached in the
same way that a good clusterstate is cached, to reduce traffic to zookeeper.


Additional detail for 2) above:

I'm envisioning a separate cache memory structure from the
clusterstate(s), which should probably be kept fairly small.  Denial of
service attacks on publicly-accessible servers are the only time that a
cloud is *likely* to receive requests for many collections that don't
exist.  Misconfigurations are more likely to request a few nonexistent
collections repeatedly.

For older 4.x servers, nonexistent collections might actually be cached,
because in the older versions, the entire clusterstate for all
collections is contained in a single file.  If that file is cached, then
so is the fact that a given collection doesn't exist.

Thanks,
Shawn

Reply via email to