Agreed, making the site to site feature as easy to configure as a regular processor would eliminate a lot of the confusion. Depending on how long it would take to get this in as a first-class citizen, it might be worthwhile writing up a "How to enable site to site" page so that the remote process group can link to. As Mark described, this was easy, and things started to work as soon as I restarted the services.
Ricky On Sun, Feb 8, 2015 at 1:24 PM, Joe Witt <[email protected]> wrote: > I think the change you made more recently was totally appropriate. > The right answer here in my opinion is to provide a way for users to > manage/view/understand this at runtime. > > Site to site is a pretty great feature and we just need to give it a > more first-class treatment: > - Great documentation for users in the app > - Great documentation for the protocol itself and examples of clients > (we should likely even help seed the development of a few for popular > languages) > - Good user experience at runtime to modify and understand what is > happening, etc.. > > Also sorry for my unbelievably unreadable e-mail on this thread > earlier . I really should never send e-mails from my phone. > > Thanks > Joe > > On Sun, Feb 8, 2015 at 1:17 PM, Mark Payne <[email protected]> wrote: > > Originally, I set the default in the properties file so that > site-to-site is configured to be secure but not enabled. I did this because > I didn’t want to enable it as non-secure by default because I was afraid > that this would be dangerous… so I required that users explicitly go in and > set it up. But thinking back to this, maybe that was a mistake. We set the > default UI port to be 8080 and non-secure, so maybe we should just set the > default so that site-to-site is enabled non-secure, as well. That would > probably just make this a lot easier. > > > > > > > > > > > > > > From: Ricky Saltzer > > Sent: Sunday, February 8, 2015 1:16 PM > > To: [email protected] > > > > > > > > > > > > Thanks for the tip, Mark! Allowing the user to enable the site to site > > feature during runtime would be a good step in the right direction. > > Documentation on how it works and why it's different from having your > nodes > > in a cluster would also make things easier to understand. > > > > > > On Sun, Feb 8, 2015 at 9:11 AM, Joe Witt <[email protected]> wrote: > > > >> Site to site is a powerhouse feature but has caused a good bit of > >> confusion. Perhaps we should plan or its inclusion in the things that > can > >> be tuned/set at runtime. > >> > >> It would be good to include with that information about bounded > >> interfaces. Information about what messages will get sent, etc. > Folks > >> in proxy type situations have a hard time reasoning over what is > >> happening. That is little sense of "is this thing on". > >> > >> What do you all think? > >> On Feb 8, 2015 6:51 AM, "Mark Payne" <[email protected]> wrote: > >> > >> > Ricky, > >> > > >> > In the nifi.properties file, there's a property named > >> > "nifi.remote.input.port". By default it's empty. Set that to whatever > >> port > >> > you want to use for site-to-site. Additionally, you'll need to either > set > >> > "nifi.remote.input.secure" to false or configure keystore and > truststore > >> > properties. Configure this for nodes and NCM. After that you should > be > >> > good to go (after restart)! > >> > > >> > If you run into any issues let us know. > >> > > >> > Thanks > >> > -Mark > >> > > >> > Sent from my iPhone > >> > > >> > > On Feb 8, 2015, at 5:54 AM, Ricky Saltzer <[email protected]> > wrote: > >> > > > >> > > Hey Joe - > >> > > > >> > > This makes sense, and I'm in the process of trying it out now. I'm > >> > running > >> > > into a small issue where the remote process group is saying neither > of > >> > the > >> > > nodes are configured for Site-to-Site communication. > >> > > > >> > > Although not super intuitive, sending to the remote process group > >> > pointing > >> > > to the cluster should be fine as long as it works (which I'm sure it > >> > does). > >> > > > >> > > Ricky > >> > > > >> > >> On Fri, Feb 6, 2015 at 3:24 PM, Joe Witt <[email protected]> > wrote: > >> > >> > >> > >> Ricky, > >> > >> > >> > >> So the use case you're coming from here is a good and common one > which > >> > is: > >> > >> > >> > >> If I have a datasource which does not offer scalabilty (it can only > >> > >> send to a single node for instance) but I have a scalable > distribution > >> > >> cluster what are my options? > >> > >> > >> > >> So today you can accept the data on a single node then immediate > do as > >> > >> Mark describes and fire it to a "Remote Process Group" addressing > the > >> > >> cluster itself. That way NiFi will automatically figure out all > the > >> > >> nodes in the cluster and spread the data around factoring in > >> > >> load/etc.. But we do want to establish an even more automatic > >> > >> mechanism on a connection itself where the user can indicate the > data > >> > >> should be auto-balanced. > >> > >> > >> > >> The reverse is really true as well where you can have a consumer > which > >> > >> only wants to accept from a single host. So there too we need a > >> > >> mechanism to descale the approach. > >> > >> > >> > >> I realize the flow you're working with now is just a sort of > >> > >> familiarization thing. But do you think this is something we > should > >> > >> tackle soon (based on real scenarios you face)? > >> > >> > >> > >> Thanks > >> > >> Joe > >> > >> > >> > >>> On Fri, Feb 6, 2015 at 3:07 PM, Mark Payne <[email protected]> > >> > wrote: > >> > >>> Ricky, > >> > >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>> I don’t think there’s a JIRA ticket currently. Feel free to create > >> one. > >> > >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>> I think we may need to do a better job documenting how the Remote > >> > >> Process Groups. If you have a cluster setup, you would add a Remote > >> > Process > >> > >> Group that points to the Cluster Manager. (I.e., the URL that you > >> > connect > >> > >> to in order to see the graph). > >> > >>> > >> > >>> > >> > >>> Then, anything that you send to the Remote Process Group will > >> > >> automatically get load-balanced across all of the nodes in the > >> cluster. > >> > So > >> > >> you could setup a flow that looks something like: > >> > >>> > >> > >>> > >> > >>> GenerateFlowFile -> RemoteProcessGroup > >> > >>> > >> > >>> > >> > >>> Input Port -> HashContent > >> > >>> > >> > >>> > >> > >>> So these 2 flows are disjoint. The first part generates data and > then > >> > >> distributes it to the cluster (when you connect to the Remote > Process > >> > >> Group, you choose which Input Port to send to). > >> > >>> > >> > >>> > >> > >>> But what we’d like to do in the future is something like: > >> > >>> > >> > >>> > >> > >>> GenerateFlowFile -> HashContent > >> > >>> > >> > >>> > >> > >>> And then on the connection in the middle choose to auto-distribute > >> the > >> > >> data. Right now you have to put the Remote Process Group in there > to > >> > >> distribute to the cluster, and add the Input Port to receive the > data. > >> > But > >> > >> there should only be a single RemoteProcessGroup that points to the > >> > entire > >> > >> cluster, not one per node. > >> > >>> > >> > >>> > >> > >>> Thanks > >> > >>> > >> > >>> -Mark > >> > >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>> From: Ricky Saltzer > >> > >>> Sent: Friday, February 6, 2015 3:06 PM > >> > >>> To: [email protected] > >> > >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>> Mark - > >> > >>> > >> > >>> Thanks for the fast reply, much appreciated. This is what I > figured, > >> > but > >> > >>> since I was already in clustered mode, I wanted to make sure there > >> > wasn't > >> > >>> an easier way than adding each node as a remote process group. > >> > >>> > >> > >>> Is there already a JIRA to track the ability to auto distribute in > >> > >>> clustered mode, or would you like me to open it up? > >> > >>> > >> > >>> Thanks again, > >> > >>> Ricky > >> > >>> > >> > >>>> On Fri, Feb 6, 2015 at 2:58 PM, Mark Payne <[email protected] > > > >> > wrote: > >> > >>>> > >> > >>>> Ricky, > >> > >>>> > >> > >>>> > >> > >>>> The DistributeLoad processor is simply used to route to one of > many > >> > >>>> relationships. So if you have, for instance, 5 different servers > >> that > >> > >> you > >> > >>>> can FTP files to, you can use DistributeLoad to round robin the > >> files > >> > >>>> between them, so that you end up pushing 20% to each of 5 PutFTP > >> > >> processors. > >> > >>>> > >> > >>>> > >> > >>>> What you’re wanting to do, it sounds like, is to distribute the > >> > >> FlowFiles > >> > >>>> to different nodes in the cluster. The Remote Process Group is > how > >> you > >> > >>>> would need to do that at this time. We have discussed having the > >> > >> ability to > >> > >>>> mark a Connection as “Auto-Distributed” (or maybe some better > name > >> 😊) > >> > >> and > >> > >>>> have that automatically distribute the data between nodes in the > >> > >> cluster, > >> > >>>> but that feature hasn’t yet been implemented. > >> > >>>> > >> > >>>> > >> > >>>> Does that answer your question? > >> > >>>> > >> > >>>> > >> > >>>> Thanks > >> > >>>> > >> > >>>> -Mark > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> From: Ricky Saltzer > >> > >>>> Sent: Friday, February 6, 2015 2:56 PM > >> > >>>> To: [email protected] > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> Hi - > >> > >>>> > >> > >>>> I have a question regarding load distribution in a clustered NiFi > >> > >>>> environment. I have a really simple example, I'm using the > >> > >> GenerateFlowFile > >> > >>>> processor to generate some random data, then I MD5 hash the file > and > >> > >> print > >> > >>>> out the resulting hash. > >> > >>>> > >> > >>>> I want only the primary node to generate the data, but I want > both > >> > >> nodes in > >> > >>>> the cluster to share the hashing workload. It appears if I set > the > >> > >>>> scheduling strategy to "On primary node" for the GenerateFlowFile > >> > >>>> processor, then the next processor (HashContent) is only being > >> > accepted > >> > >> and > >> > >>>> processed by a single node. > >> > >>>> > >> > >>>> I've put DistributeLoad processor in-between the HashContent and > >> > >>>> GenerateFlowFile, but this requires me to use the remote process > >> group > >> > >> to > >> > >>>> distribute the load, which doesn't seem intuitive when I'm > already > >> > >>>> clustered. > >> > >>>> > >> > >>>> I guess my question is, is it possible for the DistributeLoad > >> > processor > >> > >> to > >> > >>>> understand that NiFi is in a clustered environment, and have an > >> > ability > >> > >> to > >> > >>>> distribute the next processor (HashContent) amongst all nodes in > the > >> > >>>> cluster? > >> > >>>> > >> > >>>> Cheers, > >> > >>>> -- > >> > >>>> Ricky Saltzer > >> > >>>> http://www.cloudera.com > >> > >>> > >> > >>> > >> > >>> > >> > >>> -- > >> > >>> Ricky Saltzer > >> > >>> http://www.cloudera.com > >> > > > >> > > > >> > > > >> > > -- > >> > > Ricky Saltzer > >> > > http://www.cloudera.com > >> > > >> > > > > > > > > -- > > Ricky Saltzer > > http://www.cloudera.com > -- Ricky Saltzer http://www.cloudera.com
