I have personally not used streaming expressions to commit data to a collection (have used them a lot of querying), and would not recommend it for bulk indexing unless Joel recommends it :)
On the other hand we have had decent success in indexing at scale and 12 million is not a big number. You would need to have a decently sized cluster and have a commensurate number of shards. Indexing speed has correlation to number of shards, inverse correlation to number of replicas and maxShardsPerNode. You can use traditional solrj apis to commit in parallel usage multiple threads concurrently. > On Mar 26, 2020, at 2:59 PM, Mohamed Sirajudeen Mayitti Ahamed Pillai > <mohamedsirajudeen.mayittiahamedpil...@footlocker.com> wrote: > > Hi Everyone, > > We are using Solr 7.4 with 3 external ZKs and 7 Solr node in a cloud setup. > We are using Streaming expression to pull 12million records from a different > Solr Cloud using below expression. > > http://solrTarget:8983/solr/collection1/stream?expr=commit(collection1,batchSize=10000,update(collection1,batchSize=10000,search(collection1,zkHost="zkhost_source:9983",sort="timestamp_tdt > asc, id asc", rows=12114606, q=" aggr_type_s:click@doc_id,filters* AND > timestamp_tdt:[2020-03-25T18:58:33.337Z TO 2020-03-26T18:58:33.336Z]", > fl="id,timestamp_tdt,*",TZ="CST"))). > > Collection 1 in SolrTarget has 2 shards and 2 replicas. Collection 1 in > solrSource has 1 shard and 2 replicas > > The streaming expression reads documents from collection1 in > zkhost_source:9983 and indexes into collection1 in solrTarget environment. > Similar streaming expression with less number of documents (less than > 5million) working without any failures. > This streaming expression is not been successful as it grow bigger and > bigger, as we have been noticing that streaming expression is getting failed > response with different kind of errors. > > Few error messages are below, > > > 1. Error trying to proxy request for url: http:// > solrTarget:8983/solr/collection1/stream, metadata=[error-class, > org.apache.solr.common.SolrException, root-error-class, > java.net.SocketTimeoutException], trace=org.apache.solr.common.SolrException: > Error trying to proxy request for url: http:// > solrTarget:8983/solr/collection1/stream > 2. {result-set={docs=[{EXCEPTION=java.util.concurrent.ExecutionException: > java.io.IOException: params > sort=timestamp_tdt+asc,+id+asc&rows=12114606&q=aggr_type_s:click@doc_id,filters*+AND+timestamp_tdt:[2020-03-25T18:58:33.337Z+TO+2020-03-26T18:58:33.336Z]&fl=id,timestamp_tdt,*&TZ=CST&distrib=false, > RESPONSE_TIME=121125, EOF=true}]}} > 3. {result-set={docs=[{EXCEPTION=org.apache.solr.common.SolrException: > Could not load collection from ZK: collection10, RESPONSE_TIME=139300, > EOF=true}]}} > > > Is it a known issue with Streaming expression when it comes to bulk indexing > using update and commit expression? Is there any work-around to this issue? > > Is there a better option available in Solr to index 12million records (with > only 12 fields per document) at a faster speed? > > Thanks, > Mohamed