Hi -

I have a question regarding load distribution in a clustered NiFi
environment. I have a really simple example, I'm using the GenerateFlowFile
processor to generate some random data, then I MD5 hash the file and print
out the resulting hash.

I want only the primary node to generate the data, but I want both nodes in
the cluster to share the hashing workload. It appears if I set the
scheduling strategy to "On primary node" for the GenerateFlowFile
processor, then the next processor (HashContent) is only being accepted and
processed by a single node.

I've put DistributeLoad processor in-between the HashContent and
GenerateFlowFile, but this requires me to use the remote process group to
distribute the load, which doesn't seem intuitive when I'm already
clustered.

I guess my question is, is it possible for the DistributeLoad processor to
understand that NiFi is in a clustered environment, and have an ability to
distribute the next processor (HashContent) amongst all nodes in the
cluster?

Cheers,
-- 
Ricky Saltzer
http://www.cloudera.com

Reply via email to