Hi Everyone,

We are using Solr 7.4 with 3 external ZKs and 7 Solr node in a cloud setup. We 
are using Streaming expression to pull 12million records from a different Solr 
Cloud using below expression.

http://solrTarget:8983/solr/collection1/stream?expr=commit(collection1,batchSize=10000,update(collection1,batchSize=10000,search(collection1,zkHost="zkhost_source:9983",sort="timestamp_tdt
 asc, id asc", rows=12114606, q=" aggr_type_s:click@doc_id,filters* AND 
timestamp_tdt:[2020-03-25T18:58:33.337Z TO 2020-03-26T18:58:33.336Z]", 
fl="id,timestamp_tdt,*",TZ="CST"))).

Collection 1 in SolrTarget has 2 shards and 2 replicas. Collection 1 in 
solrSource has 1 shard and 2 replicas

The streaming expression reads documents from collection1 in zkhost_source:9983 
and indexes into collection1 in solrTarget environment.
Similar streaming expression with less number of documents (less than 5million) 
working without any failures.
This streaming expression is not been successful as it grow bigger and bigger, 
as we have been noticing that streaming expression is getting failed response 
with different kind of errors.

Few error messages are below,


  1.  Error trying to proxy request for url: http:// 
solrTarget:8983/solr/collection1/stream, metadata=[error-class, 
org.apache.solr.common.SolrException, root-error-class, 
java.net.SocketTimeoutException], trace=org.apache.solr.common.SolrException: 
Error trying to proxy request for url: http:// 
solrTarget:8983/solr/collection1/stream
  2.  {result-set={docs=[{EXCEPTION=java.util.concurrent.ExecutionException: 
java.io.IOException: params 
sort=timestamp_tdt+asc,+id+asc&rows=12114606&q=aggr_type_s:click@doc_id,filters*+AND+timestamp_tdt:[2020-03-25T18:58:33.337Z+TO+2020-03-26T18:58:33.336Z]&fl=id,timestamp_tdt,*&TZ=CST&distrib=false,
 RESPONSE_TIME=121125, EOF=true}]}}
  3.  {result-set={docs=[{EXCEPTION=org.apache.solr.common.SolrException: Could 
not load collection from ZK: collection10, RESPONSE_TIME=139300, EOF=true}]}}


Is it a known issue with Streaming expression when it comes to bulk indexing 
using update and commit expression? Is there any work-around to this issue?

Is there a better option available in Solr to index 12million records (with 
only 12 fields per document) at a faster speed?

Thanks,
Mohamed

Reply via email to