If you absolutely want to use Kafka after trying other mechanisms, I would suggest Kafka Connect. Jeremy Custenborder has a good Kafka connector as a sink to SOLR. You can define your own avro schemas on the Kafka topic that adhere to your SOLR schema to give you that degree of control.
We have used Lucidworks Spark Connector to index 500 million documents to SOLR within 4 hours. We had around 70 fields per document. This is a very good choice if you want to sync data from a DB to SOLR. Have an interim step of using an ETL tool like Ab Initio that will perform the basic joins on your table, extract the data in CSV for the Spark Connector. All the hardwork of opening and managing the connections with SOLR is done in the connector. Please note that this connector indexed data to a live SOLR cluster unlike offline indexing using Map Reduce. Thanks Nishant On Thu, Jan 31, 2019, 5:15 AM Srinivas Kashyap <srini...@bamboorose.com wrote: > Hello, > > As we all know DIH is single threaded and has it's own issues while > indexing. > > Got to know that we can write our own API's to pull data from DB and push > it into solr. One such I heard was Apache Kafka being used for the purpose. > > Can any of you send me the links and guides to use apache kafka to pull > data from DB and push into solr? > > If there are any other alternatives please suggest. > > Thanks and Regards, > Srinivas Kashyap > ________________________________ > DISCLAIMER: > E-mails and attachments from Bamboo Rose, LLC are confidential. > If you are not the intended recipient, please notify the sender > immediately by replying to the e-mail, and then delete it without making > copies or using it in any way. > No representation is made that this email or any attachments are free of > viruses. Virus scanning is recommended and is the responsibility of the > recipient. >