There are ways to load data directly from Spark to Solr but I didn't find
any of them satisfactory so I just create enough Spark partitions with
reparition() (increase partition count)/coalesce() (decrease partition
count) that I get as many Parquet files as I want and then I use a bash
script to iterate and load the files via the post command.

Thanks,
Russell Jurney @rjurney <http://twitter.com/rjurney>
russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB
<http://facebook.com/jurney> datasyndrome.com


On Fri, Aug 7, 2020 at 9:48 AM Jörn Franke <jornfra...@gmail.com> wrote:

> DIH is deprecated and it will be removed from Solr. You may though still
> be able to install it as a plug-in. However, AFAIK nobody maintains it. Do
> not use it anymore
>
> You can write a custom Spark data source that writes to Solr or does it in
> a spark Map step using SolrJ .
> In both cases do not create 100s of executors to avoid overloading.
>
>
> > Am 07.08.2020 um 18:39 schrieb Kevin Van Lieshout <
> kevin.vanl...@gmail.com>:
> >
> > Hi,
> >
> > Is there any assistance around writing parquets from spark to solr shards
> > or is it possible to customize a DIH to import a parquet to a solr shard.
> > Let me know if this is possible, or the best work around for this. Much
> > appreciated, thanks
> >
> >
> > Kevin VL
>

Reply via email to