Kafka is capable of processing billions of events per second. You can scale it horizontally with Kafka broker servers.
You can try out these steps 1. Create a topic in Kafka to get your all data. You have to use Kafka producer to ingest data into Kafka. 2. If you are going to write your own HDFS client to put data into HDFS then, you can read data from topic in step-1, validate and store into HDFS. 3. If you want to OpenSource tool (Gobbling or confluent Kafka HDFS connector) to put data into HDFS then Write tool to read data from topic, validate and store in other topic. We are using combination of these steps to process over 10 million events/second. I hope it helps.. Thanks Mallan On Jun 30, 2017 10:31 AM, "Sidharth Kumar" <[email protected]> wrote: > Thanks! What about Kafka with Flume? And also I would like to tell that > everyday data intake is in millions and can't afford to loose even a single > piece of data. Which makes a need of high availablity. > > Warm Regards > > Sidharth Kumar | Mob: +91 8197 555 599/7892 192 367 | LinkedIn: > www.linkedin.com/in/sidharthkumar2792 > > > > > > > On 30-Jun-2017 10:04 AM, "JP gupta" <[email protected]> wrote: > >> The ideal sequence should be: >> >> 1. Ingress using Kafka -> Validation and processing using Spark -> >> Write into any NoSql DB or Hive. >> >> From my recent experience, writing directly to HDFS can be slow depending >> on the data format. >> >> >> >> Thanks >> >> JP >> >> >> >> *From:* Sudeep Singh Thakur [mailto:[email protected]] >> *Sent:* 30 June 2017 09:26 >> *To:* Sidharth Kumar >> *Cc:* Maggy; [email protected] >> *Subject:* Re: Kafka or Flume >> >> >> >> In your use Kafka would be better because you want some transformations >> and validations. >> >> Kind regards, >> Sudeep Singh Thakur >> >> >> >> On Jun 30, 2017 8:57 AM, "Sidharth Kumar" <[email protected]> >> wrote: >> >> Hi, >> >> >> >> I have a requirement where I have all transactional data injestion into >> hadoop in real time and before storing the data into hadoop, process it to >> validate the data. If the data failed to pass validation process , it will >> not be stored into hadoop. The validation process also make use of >> historical data which is stored in hadoop. So, my question is which >> injestion tool will be best for this Kafka or Flume? >> >> >> >> Any suggestions will be a great help for me. >> >> >> Warm Regards >> >> Sidharth Kumar | Mob: +91 8197 555 599/7892 192 367 | LinkedIn: >> www.linkedin.com/in/sidharthkumar2792 >> >> >> >> >> >> >
