There are too many ways to submit an intra-cluster request within Solr. It’s been gnawing at me so I did an audit and made recommendations:
There are too many ways to submit an intra-cluster request within Solr. It’s been gnawing at me so I did an audit and made recommendations: A. Creating new Http SolrClient (with or without Cloud) from org.apache.solr.update.UpdateShardHandler#getDefaultHttpClient — very popular, 14 usages. Recommendation: This is a poor place for such a client; will be moved shortly in SOLR-16503 to CoreContainer and return an Http2SolrClient instead of HttpClient. B. org.apache.solr.client.solrj.cloud.SolrCloudManager#request (default impl is essentially a CloudLegacySolrClient). 3 usages. I’m suspicious we need this; either embrace or abandon IMO. Recommendation: embrace but migrate to getCloudSolrClient and add getHttpSolrClient too, the latter sourced from CoreContainer. Then allows requestAsync usage and hopefully will also have a URL-specific method for SOLR-17256 right next to it, there’s an httpRequest method that is now unused. Recommendation: remove C. org.apache.solr.update.UpdateShardHandler#getUpdateOnlyHttpClient for “updates” only. 2 usages D. org.apache.solr.update.UpdateShardHandler#getRecoveryOnlyHttpClient for “recovery” only. 4 usages. E. org.apache.solr.client.solrj.io.SolrClientCache wow, 86 usages!! 68 are in solrj-streaming since this is where this thing originated from, thus leaving 18 usages elsewhere but I’m over-counting. The unique classes using (excluding streaming, excluding a benchmark that tests streaming too) are 6 usages (classes). * Some usages are generic to a configurable SolrCloud cluster (solrj-streaming definitely): ReindexCollection and SolrReporter (metrics). This is the primary purpose/value of SolrClientCache; should continue. * Some of the other cases just need a cluster-local CloudSolrClient: SchemaDesigner (2 classes), CollectionsRepairEventListener. Recommendation: Use SolrCloudManager. * The other 4 usages are to get an Http SolrClient for a URL. Recommendation: Use CoreContainer.getDefaultHttpSolrClient moving from UpdateShardHandler (or similar recommended in SolrCloudManager). Requires solution to SOLR-17256 like an Http2SolrClient specific request method specifying an additional URL parameter. F. org.apache.solr.handler.component.HttpShardHandler / ShardHandler. Originally built for distributed-search, which requires supporting multiple replicas that are equivalent to serve a request for a shard. Later a number of Overseer/cluster commands started using/abusing it. There are some API tech-debt / issues I see here as a result of this. Now that Http2SolrClient has async support (it didn’t when HttpShardHandler was first created), I wonder if some Overseer usages should use that instead. I think it could be simpler. Recommendation: Add javadocs warning users to avoid using it outside of distributed search. Point out Http2SolrClient.requestAsync. As an aside, I want to mention that EmbeddedSolrServer is pretty cool; I could imagine expanding its use within Solr in some cases to avoid an HTTP layer and disconnected caller chain. Like to issue an admin call, even a SolrCloud one (e.g. to move a shard. But it doesn't do collection routing. It could be neat if the node-local CloudSolrClient proposed above could somehow detect such an admin call and route through locally to the node. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley