Returning 500K rows is, as Shawn says, not Solr's sweet spot. My guess: All the work you're doing trying to return that many rows, particularly in SolrCloud mode is simply overloading your system to the point that the ZK connection times out. Don't do that. If you need that many rows, either Shawn's cursorMark option or use export/streaming aggregation are much better choices.....
Consider what happens on a sharded request: - the initial node sends a sub-request to a replica for each shard. - each replica returns it's candidate topN (doc ID and sort criteria) - the initial node sorts these lists (1M from each replica in your example) to get the true top N - the initial node requests the docs from each replica that made it into the true top N - each replica goes to disk, decompresses the doc and pulls out the fields - each replica sends its portion of the top N to the initial node - an enormous packet containing all 1M final docs is assembled and returned to the client. - this sucks up bandwidth and resources - that's bad enough, but especially if your ZK nodes are on the same box as your Solr nodes they're even more like to have a timeout issue. Best, Erick On Fri, Nov 18, 2016 at 8:45 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 11/18/2016 6:50 PM, Chetas Joshi wrote: >> The numFound is millions but I was also trying with rows= 1 Million. I will >> reduce it to 500K. >> >> I am sorry. It is state.json. I am using Solr 5.5.0 >> >> One of the things I am not able to understand is why my ingestion job is >> complaining about "Cannot talk to ZooKeeper - Updates are disabled." >> >> I have a spark streaming job that continuously ingests into Solr. My shards >> are always up and running. The moment I start a query on SolrCloud it starts >> running into this exception. However as you said ZK will only update the >> state of the cluster when the shards go down. Then why my job is trying to >> contact ZK when the cluster is up and why is the exception about updating ZK? > > SolrCloud and SolrJ (CloudSolrClient) both maintain constant connections > to all the zookeeper servers they are configured to use. If zookeeper > quorum is lost, SolrCloud will go read-only -- no updating is possible. > That is what is meant by "updates are disabled." > > Solr and Lucene are optimized for very low rowcounts, typically two or > three digits. Asking for hundreds of thousands of rows is problematic. > The cursorMark feature is designed for efficient queries when paging > deeply into results, but it assumes your rows value is relatively small, > and that you will be making many queries to get a large number of > results, each of which will be fast and won't overload the server. > > Since it appears you are having a performance issue, here's a few things > I have written on the topic: > > https://wiki.apache.org/solr/SolrPerformanceProblems > > Thanks, > Shawn >