Re: Alternative for DIH

2019-02-02 Thread Nish Karve
If you absolutely want to use Kafka after trying other mechanisms, I would suggest Kafka Connect. Jeremy Custenborder has a good Kafka connector as a sink to SOLR. You can define your own avro schemas on the Kafka topic that adhere to your SOLR schema to give you that degree of control. We have us

Re: Alternative for DIH

2019-02-02 Thread Erick Erickson
Depending on how complicated you need this to be, you can just write your own in SolrJ, see: https://lucidworks.com/2012/02/14/indexing-with-solrj/ You haven't said a lot about the characteristics of your situation. Are you talking 1B rows from the DB? 1M? what is the pain point? Because until on

Re: Alternative for DIH

2019-01-31 Thread Alexandre Rafalovitch
Apache NiFi may also be something of interest: https://nifi.apache.org/ Regards, Alex. On Thu, 31 Jan 2019 at 11:15, Mikhail Khludnev wrote: > > Hello, > > I did this deck some time ago. It might be useful for choosing one. > https://docs.google.com/presentation/d/e/2PACX-1vQzi3QOZAwLh_t3zs1g

Re: Alternative for DIH

2019-01-31 Thread Mikhail Khludnev
Hello, I did this deck some time ago. It might be useful for choosing one. https://docs.google.com/presentation/d/e/2PACX-1vQzi3QOZAwLh_t3zs1gH9EGCB2HKUgiN3WJRGHpULyA-GleCrQ41dIOINa18h_XG64BX5D_ZG6jKmXL/pub?start=false&loop=false&delayms=3000 Note, as far as I understand Lucidworks' answer to this

Re: Alternative for DIH

2019-01-31 Thread Jörn Franke
I recommend to look at the underlying problem that you try to solve. Writing an own loader requires thorough technical design (eg recoverability in case of errors, stoping in case user requested it, proper multithreading without overloading the cluster etc) - I have not seen many that were well