Re: Solr + Parquets

Russell Jurney Mon, 10 Aug 2020 21:06:21 -0700

Sorry, I'm a goofball. I use Parquet but use bzip2 json format for the last
hop.


Thanks,
Russell Jurney @rjurney <http://twitter.com/rjurney>
russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB
<http://facebook.com/jurney> datasyndrome.com


On Mon, Aug 10, 2020 at 7:56 PM Aroop Ganguly
<aroopgang...@icloud.com.invalid> wrote:

>
> > script to iterate and load the files via the post command.
> You mean load parquet filed over post? That sounds unbelievable …
> Do u mean you created Solr doc for each parquet record in a partition and
> used solrJ or some other java lib to post the docs to Solr?
>
> df.mapPatitions(p => { ///batch the parquet records, convert batch to a
> solr-doc-batch, then send to Solr via Solr request})
>
>
> If you are sending raw parquet to Solr I would love to learn more :) !
>
> > On Aug 10, 2020, at 7:50 PM, Russell Jurney <russell.jur...@gmail.com>
> wrote:
> >
> > There are ways to load data directly from Spark to Solr but I didn't find
> > any of them satisfactory so I just create enough Spark partitions with
> > reparition() (increase partition count)/coalesce() (decrease partition
> > count) that I get as many Parquet files as I want and then I use a bash
> > script to iterate and load the files via the post command.
> >
> > Thanks,
> > Russell Jurney @rjurney <http://twitter.com/rjurney>
> > russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB
> > <http://facebook.com/jurney> datasyndrome.com
> >
> >
> > On Fri, Aug 7, 2020 at 9:48 AM Jörn Franke <jornfra...@gmail.com> wrote:
> >
> >> DIH is deprecated and it will be removed from Solr. You may though still
> >> be able to install it as a plug-in. However, AFAIK nobody maintains it.
> Do
> >> not use it anymore
> >>
> >> You can write a custom Spark data source that writes to Solr or does it
> in
> >> a spark Map step using SolrJ .
> >> In both cases do not create 100s of executors to avoid overloading.
> >>
> >>
> >>> Am 07.08.2020 um 18:39 schrieb Kevin Van Lieshout <
> >> kevin.vanl...@gmail.com>:
> >>>
> >>> Hi,
> >>>
> >>> Is there any assistance around writing parquets from spark to solr
> shards
> >>> or is it possible to customize a DIH to import a parquet to a solr
> shard.
> >>> Let me know if this is possible, or the best work around for this. Much
> >>> appreciated, thanks
> >>>
> >>>
> >>> Kevin VL
> >>
>
>

Re: Solr + Parquets

Reply via email to