Kartik,
Thanks for your interest in NiFi!
I know you've gotten a few responses to this already, but you're right -
this is something we should address. I think the basic idea is that many
people just pick up from a temp directory and push it back to a
permanent directory.
But if that doesn't work for you, we could update the processor to do
something a bit smarter. One idea that might make sense is to pick up
the oldest files first. Then, we can keep track of the "last modified
date" of the last file that it has picked up. This way, we can keep
minimal state about what has been pulled in but still pull in only new
data and avoid deleting it.
Do you think this solution would help you?
Thanks
-Mark
------ Original Message ------
From: "Kartik Veerepalli" <[email protected]>
To: "[email protected]" <[email protected]>
Sent: 4/7/2015 10:46:11 PM
Subject: Re: Conflict Resolution Strategy
Corey,
My apologies for not making myself clear. But, the points you listed
are exactly what I meant.
Joe: I did checkout RSync, but we are planning to establish a continuos
data flow pipeline from wide range of servers, message bus, etc. to
HDFS. We think Apache Nifi can be integrated/used as a data flow system
with our Analytics as a Service Platform that we are building. Thanks
for the help.
Kartik