Re: Alternative for DIH

Nish Karve Sat, 02 Feb 2019 08:02:06 -0800

If you absolutely want to use Kafka after trying other mechanisms, I would
suggest Kafka Connect. Jeremy Custenborder has a good Kafka connector as a
sink to SOLR. You can define your own avro schemas on the Kafka topic that
adhere to your SOLR schema to give you that degree of control.

We have used Lucidworks Spark Connector to index 500 million documents to
SOLR within 4 hours. We had around 70 fields per document. This is a very
good choice if you want to sync data from a DB to SOLR. Have an interim
step of using an ETL tool like Ab Initio that will perform the basic joins
on your table, extract the data in CSV for the Spark Connector. All the
hardwork of opening and managing the connections with SOLR is done in the
connector. Please note that this connector indexed data to a live SOLR
cluster unlike offline indexing using Map Reduce.

Thanks
Nishant

On Thu, Jan 31, 2019, 5:15 AM Srinivas Kashyap <srini...@bamboorose.com
wrote:

> Hello,
>
> As we all know DIH is single threaded and has it's own issues while
> indexing.
>
> Got to know that we can write our own API's to pull data from DB and push
> it into solr. One such I heard was Apache Kafka being used for the purpose.
>
> Can any of you send me the links and guides to use apache kafka to pull
> data from DB and push into solr?
>
> If there are any other alternatives please suggest.
>
> Thanks and Regards,
> Srinivas Kashyap
> ________________________________
> DISCLAIMER:
> E-mails and attachments from Bamboo Rose, LLC are confidential.
> If you are not the intended recipient, please notify the sender
> immediately by replying to the e-mail, and then delete it without making
> copies or using it in any way.
> No representation is made that this email or any attachments are free of
> viruses. Virus scanning is recommended and is the responsibility of the
> recipient.
>

Re: Alternative for DIH

Reply via email to