Mark - Thanks for the fast reply, much appreciated. This is what I figured, but since I was already in clustered mode, I wanted to make sure there wasn't an easier way than adding each node as a remote process group.
Is there already a JIRA to track the ability to auto distribute in clustered mode, or would you like me to open it up? Thanks again, Ricky On Fri, Feb 6, 2015 at 2:58 PM, Mark Payne <[email protected]> wrote: > Ricky, > > > The DistributeLoad processor is simply used to route to one of many > relationships. So if you have, for instance, 5 different servers that you > can FTP files to, you can use DistributeLoad to round robin the files > between them, so that you end up pushing 20% to each of 5 PutFTP processors. > > > What you’re wanting to do, it sounds like, is to distribute the FlowFiles > to different nodes in the cluster. The Remote Process Group is how you > would need to do that at this time. We have discussed having the ability to > mark a Connection as “Auto-Distributed” (or maybe some better name 😊) and > have that automatically distribute the data between nodes in the cluster, > but that feature hasn’t yet been implemented. > > > Does that answer your question? > > > Thanks > > -Mark > > > > > > > From: Ricky Saltzer > Sent: Friday, February 6, 2015 2:56 PM > To: [email protected] > > > > > > Hi - > > I have a question regarding load distribution in a clustered NiFi > environment. I have a really simple example, I'm using the GenerateFlowFile > processor to generate some random data, then I MD5 hash the file and print > out the resulting hash. > > I want only the primary node to generate the data, but I want both nodes in > the cluster to share the hashing workload. It appears if I set the > scheduling strategy to "On primary node" for the GenerateFlowFile > processor, then the next processor (HashContent) is only being accepted and > processed by a single node. > > I've put DistributeLoad processor in-between the HashContent and > GenerateFlowFile, but this requires me to use the remote process group to > distribute the load, which doesn't seem intuitive when I'm already > clustered. > > I guess my question is, is it possible for the DistributeLoad processor to > understand that NiFi is in a clustered environment, and have an ability to > distribute the next processor (HashContent) amongst all nodes in the > cluster? > > Cheers, > -- > Ricky Saltzer > http://www.cloudera.com > -- Ricky Saltzer http://www.cloudera.com
