Re: Solr + Parquets

Aroop Ganguly Mon, 10 Aug 2020 19:57:21 -0700


> script to iterate and load the files via the post command.
You mean load parquet filed over post? That sounds unbelievable …
Do u mean you created Solr doc for each parquet record in a partition and used 
solrJ or some other java lib to post the docs to Solr?


df.mapPatitions(p => { ///batch the parquet records, convert batch to a 
solr-doc-batch, then send to Solr via Solr request})


If you are sending raw parquet to Solr I would love to learn more :) !

> On Aug 10, 2020, at 7:50 PM, Russell Jurney <[email protected]> wrote:
> 
> There are ways to load data directly from Spark to Solr but I didn't find
> any of them satisfactory so I just create enough Spark partitions with
> reparition() (increase partition count)/coalesce() (decrease partition
> count) that I get as many Parquet files as I want and then I use a bash
> script to iterate and load the files via the post command.
> 
> Thanks,
> Russell Jurney @rjurney <http://twitter.com/rjurney>
> [email protected] LI <http://linkedin.com/in/russelljurney> FB
> <http://facebook.com/jurney> datasyndrome.com
> 
> 
> On Fri, Aug 7, 2020 at 9:48 AM Jörn Franke <[email protected]> wrote:
> 
>> DIH is deprecated and it will be removed from Solr. You may though still
>> be able to install it as a plug-in. However, AFAIK nobody maintains it. Do
>> not use it anymore
>> 
>> You can write a custom Spark data source that writes to Solr or does it in
>> a spark Map step using SolrJ .
>> In both cases do not create 100s of executors to avoid overloading.
>> 
>> 
>>> Am 07.08.2020 um 18:39 schrieb Kevin Van Lieshout <
>> [email protected]>:
>>> 
>>> Hi,
>>> 
>>> Is there any assistance around writing parquets from spark to solr shards
>>> or is it possible to customize a DIH to import a parquet to a solr shard.
>>> Let me know if this is possible, or the best work around for this. Much
>>> appreciated, thanks
>>> 
>>> 
>>> Kevin VL
>>

Re: Solr + Parquets

Reply via email to