On 5/16/2016 10:28 PM, Jeff Wartes wrote: > One thing that still feels a bit odd though is that the health check query > was referencing a collection that no longer existed in the cluster. So it > seems like it was downloading the state for ALL non-hosted collections, not a > requested one. > > This touches a bit on a sore point with me. I dislike that those > collection-not-here proxy requests aren’t logged on the server doing the > proxy, because you end up with traffic visible at the http interface but not > the solr level. Honestly, I dislike that transparent proxy approach in > general, because it means I lose the ability to dedicate entire nodes to the > fan-out and shard-aggregation process like I could pre-solrcloud.
Recently I was informed that CloudSolrClient operates on cached state information, and doesn't make requests to zookeeper very often. I don't know if internal cloud queries are handled the same, but they probably are. I think you've found an exception to that behavior -- collections that don't exist. On a small cloud, this is likely not a performance bottleneck at all ... but a cloud with dozens of servers and hundreds of collections would be a different story. I just tried a query against a small cloud install (running 4.2.1) for a nonexistent collection, and yes indeed, there's nothing logged *at all*. I would have expected a log entry of SOME kind. So, I think you've found two problems that each need an issue in Jira. I hesitate slightly at calling them bugs, because it's probably working as designed ... but if so, I think the design is incorrect. 1) There are no log entry for queries to nonexistent collection. The full SolrCore log entry showing all the parameters would be nice, but even something short about a query to a nonexistent collection would be enough. 2) When a collection doesn't exist, that fact should be cached in the same way that a good clusterstate is cached, to reduce traffic to zookeeper. Additional detail for 2) above: I'm envisioning a separate cache memory structure from the clusterstate(s), which should probably be kept fairly small. Denial of service attacks on publicly-accessible servers are the only time that a cloud is *likely* to receive requests for many collections that don't exist. Misconfigurations are more likely to request a few nonexistent collections repeatedly. For older 4.x servers, nonexistent collections might actually be cached, because in the older versions, the entire clusterstate for all collections is contained in a single file. If that file is cached, then so is the fact that a given collection doesn't exist. Thanks, Shawn