Ricky,

The DistributeLoad processor is simply used to route to one of many 
relationships. So if you have, for instance, 5 different servers that you can 
FTP files to, you can use DistributeLoad to round robin the files between them, 
so that you end up pushing 20% to each of 5 PutFTP processors.


What you’re wanting to do, it sounds like, is to distribute the FlowFiles to 
different nodes in the cluster. The Remote Process Group is how you would need 
to do that at this time. We have discussed having the ability to mark a 
Connection as “Auto-Distributed” (or maybe some better name 😊) and have that 
automatically distribute the data between nodes in the cluster, but that 
feature hasn’t yet been implemented. 


Does that answer your question?


Thanks

-Mark






From: Ricky Saltzer
Sent: ‎Friday‎, ‎February‎ ‎6‎, ‎2015 ‎2‎:‎56‎ ‎PM
To: [email protected]





Hi -

I have a question regarding load distribution in a clustered NiFi
environment. I have a really simple example, I'm using the GenerateFlowFile
processor to generate some random data, then I MD5 hash the file and print
out the resulting hash.

I want only the primary node to generate the data, but I want both nodes in
the cluster to share the hashing workload. It appears if I set the
scheduling strategy to "On primary node" for the GenerateFlowFile
processor, then the next processor (HashContent) is only being accepted and
processed by a single node.

I've put DistributeLoad processor in-between the HashContent and
GenerateFlowFile, but this requires me to use the remote process group to
distribute the load, which doesn't seem intuitive when I'm already
clustered.

I guess my question is, is it possible for the DistributeLoad processor to
understand that NiFi is in a clustered environment, and have an ability to
distribute the next processor (HashContent) amongst all nodes in the
cluster?

Cheers,
-- 
Ricky Saltzer
http://www.cloudera.com

Reply via email to