Hi Everyone, We are using Solr 7.4 with 3 external ZKs and 7 Solr node in a cloud setup. We are using Streaming expression to pull 12million records from a different Solr Cloud using below expression.
http://solrTarget:8983/solr/collection1/stream?expr=commit(collection1,batchSize=10000,update(collection1,batchSize=10000,search(collection1,zkHost="zkhost_source:9983",sort="timestamp_tdt asc, id asc", rows=12114606, q=" aggr_type_s:click@doc_id,filters* AND timestamp_tdt:[2020-03-25T18:58:33.337Z TO 2020-03-26T18:58:33.336Z]", fl="id,timestamp_tdt,*",TZ="CST"))). Collection 1 in SolrTarget has 2 shards and 2 replicas. Collection 1 in solrSource has 1 shard and 2 replicas The streaming expression reads documents from collection1 in zkhost_source:9983 and indexes into collection1 in solrTarget environment. Similar streaming expression with less number of documents (less than 5million) working without any failures. This streaming expression is not been successful as it grow bigger and bigger, as we have been noticing that streaming expression is getting failed response with different kind of errors. Few error messages are below, 1. Error trying to proxy request for url: http:// solrTarget:8983/solr/collection1/stream, metadata=[error-class, org.apache.solr.common.SolrException, root-error-class, java.net.SocketTimeoutException], trace=org.apache.solr.common.SolrException: Error trying to proxy request for url: http:// solrTarget:8983/solr/collection1/stream 2. {result-set={docs=[{EXCEPTION=java.util.concurrent.ExecutionException: java.io.IOException: params sort=timestamp_tdt+asc,+id+asc&rows=12114606&q=aggr_type_s:click@doc_id,filters*+AND+timestamp_tdt:[2020-03-25T18:58:33.337Z+TO+2020-03-26T18:58:33.336Z]&fl=id,timestamp_tdt,*&TZ=CST&distrib=false, RESPONSE_TIME=121125, EOF=true}]}} 3. {result-set={docs=[{EXCEPTION=org.apache.solr.common.SolrException: Could not load collection from ZK: collection10, RESPONSE_TIME=139300, EOF=true}]}} Is it a known issue with Streaming expression when it comes to bulk indexing using update and commit expression? Is there any work-around to this issue? Is there a better option available in Solr to index 12million records (with only 12 fields per document) at a faster speed? Thanks, Mohamed