Thanks! What about Kafka with Flume? And also I would like to tell that everyday data intake is in millions and can't afford to loose even a single piece of data. Which makes a need of high availablity.
Warm Regards Sidharth Kumar | Mob: +91 8197 555 599/7892 192 367 | LinkedIn: www.linkedin.com/in/sidharthkumar2792 On 30-Jun-2017 10:04 AM, "JP gupta" <[email protected]> wrote: > The ideal sequence should be: > > 1. Ingress using Kafka -> Validation and processing using Spark -> > Write into any NoSql DB or Hive. > > From my recent experience, writing directly to HDFS can be slow depending > on the data format. > > > > Thanks > > JP > > > > *From:* Sudeep Singh Thakur [mailto:[email protected]] > *Sent:* 30 June 2017 09:26 > *To:* Sidharth Kumar > *Cc:* Maggy; [email protected] > *Subject:* Re: Kafka or Flume > > > > In your use Kafka would be better because you want some transformations > and validations. > > Kind regards, > Sudeep Singh Thakur > > > > On Jun 30, 2017 8:57 AM, "Sidharth Kumar" <[email protected]> > wrote: > > Hi, > > > > I have a requirement where I have all transactional data injestion into > hadoop in real time and before storing the data into hadoop, process it to > validate the data. If the data failed to pass validation process , it will > not be stored into hadoop. The validation process also make use of > historical data which is stored in hadoop. So, my question is which > injestion tool will be best for this Kafka or Flume? > > > > Any suggestions will be a great help for me. > > > Warm Regards > > Sidharth Kumar | Mob: +91 8197 555 599/7892 192 367 | LinkedIn: > www.linkedin.com/in/sidharthkumar2792 > > > > > >
