MongoDB and Solr - Massive re-indexing

Robert Brown Thu, 02 Jun 2016 10:57:07 -0700

Hi,

Currently we import data-sets from various sources (csv, xml, json,etc.) and POST to Solr, after some pre-processing to get it into aconsistent format, and some other transformations.

We currently dump out to a json file in batches of 1,000 documents andPOST that file to Solr.

Roughly 50m documents come in throughout the day, and are fullyre-indexed. Following the update calls, we then delete any docs basedon a last_seen datetime field, which removes documents before the mostrecent run, related to that run.

I'm now importing our raw data firstly into MongoDB, in raw format. Thedata will then be translated and stored in another Mongo collection.These 2 steps are for business reasons.


That final Mongo collection then needs to be sent to Solr.

My question is whether sending batches of 1,000 documents to Solr isstill beneficial (thinking about docs that may not change), or if Ishould look at the MongoDB connector for Solr, based on the volume ofincoming data we see.

Would the connector still see all docs updating if I re-insert themblindly, and thus still send all 50m documents back to Solr everyday anyway?


Is my setup quite typical for the MongoDB connector?

Thanks,
Rob

MongoDB and Solr - Massive re-indexing

Reply via email to