We did something similar to this, but kept a simple flat file of where we left 
off, basically used the date or a sequence number along with a custom flow 
processor. We also had the system with the data on it send it and put in a 
directory that NiFi monitored with the GetFile processor. This would require 
something on the sending system then to keep track.

Ralph Spangler
Chief Engineer
L-3 NSS Data Tactics
7901 Jones Branch Drive, Suite 700
McLean, VA  22102
Office: (571) 257-0491
Cell: (321) 212-9552
Fax: (703) 506-6703
[email protected]
 
The information contained in this message may be privileged and/or confidential 
and protected from disclosure.  If the reader of this message is not the 
intended recipient or an employee or agent responsible for delivering this 
message to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited.  If you have received this communication in error, please notify 
the sender immediately by replying to this message and deleting the material 
from any computer.

-----Original Message-----
From: Joe Witt [mailto:[email protected]] 
Sent: Wednesday, April 08, 2015 12:35 AM
To: [email protected]
Subject: Re: Conflict Resolution Strategy

Kartik

Ok yes so your reply is definitely in the nifi wheelhouse.

For your original case whereby you want to copy but retain the original object 
there are a few ways to do it.  One is to actually pull the data from its 
original location and send a copy to your analytic system and also give a copy 
back to the original system.

If you truly must keep the original where it was then there are really only 
'ok' options.  You need nifi then to act as an idempotent receiver which means 
it will keep state about what it has grabbed a copy of and will avoid sending 
it through more than once.  Sounds like no big deal but it means some database 
and constantly checking the same things and tension on clustering.  It is in 
many ways something which isnt conducive to healthy dataflow.  It can be done 
but isnt fun.

So before walking that path is putting back a copy of the data in the original 
system but not in a directory you are polling an option?

Please feel free to subscribe to the mailing list so your notes will get 
through without delay.

Thanks
Joe
On Apr 7, 2015 11:36 PM, "Kartik Veerepalli" <[email protected]>
wrote:

> Corey,
>
>
> My apologies for not making myself clear. But, the points you listed 
> are exactly what I meant.
>
>
> Joe: I did checkout RSync, but we are planning to establish a 
> continuos data flow pipeline from wide range of servers, message bus, etc. to 
> HDFS.
> We think Apache Nifi can be integrated/used as a data flow system with 
> our Analytics as a Service Platform that we are building. Thanks for the help.
>
>
> Kartik
>

Reply via email to