That's a good point, we load data from pig to solr everyday.

1. What we do:
Pig jobs creates a csv dump, scp it over to a solr node and UpdateCSV
request handler loads the data in solr. A complete rebuild of index for
about 50M documents (20GB) takes 20mins (pig job which pulls and processes
data in cassandra and UpdateCSV loads).

2. Alternate way:
Another way I explored was writing a PIG UDF which POSTS to solr. But batch
http posts were slower than a CSV load for a full index rebuild (and that
was an important usecase for us).

These might not be the best practices, would like to know how others
handling this problem.

Thanks,
-Utkarsh



On Wed, Aug 21, 2013 at 11:29 AM, geeky2 <[email protected]> wrote:

> Hello All,
>
> Is anyone loading Solr from a Pig script / process?
>
> I was talking to another group in our company and they have standardized on
> MongoDB instead of Solr - apparently there is very good support between
> MongoDB and Pig - allowing users to "stream" data directly from a Pig
> process in to MongoDB.
>
> Does solr have anything like this as well?
>
> thx
> mark
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/loading-solr-from-Pig-tp4085933.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks,
-Utkarsh

Reply via email to