Kartik Ok yes so your reply is definitely in the nifi wheelhouse.
For your original case whereby you want to copy but retain the original object there are a few ways to do it. One is to actually pull the data from its original location and send a copy to your analytic system and also give a copy back to the original system. If you truly must keep the original where it was then there are really only 'ok' options. You need nifi then to act as an idempotent receiver which means it will keep state about what it has grabbed a copy of and will avoid sending it through more than once. Sounds like no big deal but it means some database and constantly checking the same things and tension on clustering. It is in many ways something which isnt conducive to healthy dataflow. It can be done but isnt fun. So before walking that path is putting back a copy of the data in the original system but not in a directory you are polling an option? Please feel free to subscribe to the mailing list so your notes will get through without delay. Thanks Joe On Apr 7, 2015 11:36 PM, "Kartik Veerepalli" <[email protected]> wrote: > Corey, > > > My apologies for not making myself clear. But, the points you listed are > exactly what I meant. > > > Joe: I did checkout RSync, but we are planning to establish a continuos > data flow pipeline from wide range of servers, message bus, etc. to HDFS. > We think Apache Nifi can be integrated/used as a data flow system with our > Analytics as a Service Platform that we are building. Thanks for the help. > > > Kartik >
