That's a good point, we load data from pig to solr everyday. 1. What we do: Pig jobs creates a csv dump, scp it over to a solr node and UpdateCSV request handler loads the data in solr. A complete rebuild of index for about 50M documents (20GB) takes 20mins (pig job which pulls and processes data in cassandra and UpdateCSV loads).
2. Alternate way: Another way I explored was writing a PIG UDF which POSTS to solr. But batch http posts were slower than a CSV load for a full index rebuild (and that was an important usecase for us). These might not be the best practices, would like to know how others handling this problem. Thanks, -Utkarsh On Wed, Aug 21, 2013 at 11:29 AM, geeky2 <[email protected]> wrote: > Hello All, > > Is anyone loading Solr from a Pig script / process? > > I was talking to another group in our company and they have standardized on > MongoDB instead of Solr - apparently there is very good support between > MongoDB and Pig - allowing users to "stream" data directly from a Pig > process in to MongoDB. > > Does solr have anything like this as well? > > thx > mark > > > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/loading-solr-from-Pig-tp4085933.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Thanks, -Utkarsh
