Hi Jonatan,

Ideally you'd use a Solr API client that allowed batched updates, so
you'd be sending documents 100 at a time, say. Alternatively, if
you're good with Java, you could build an index by using the
EmbeddedSolrServer class in the same process as the code you use to
parse the documents. But if your Solr API client is using batches and
multiple connections, I'm not sure if the tradeoff is worth it.

Also, there are some various efforts out there to build indexes in
Hadoop, but I don't believe any of them are 100% production ready
(would like to be proven wrong.)

Michael Della Bitta

------------------------------------------------
Appinions, Inc. -- Where Influence Isn’t a Game.
http://www.appinions.com


On Thu, Jul 19, 2012 at 12:04 PM, Jonatan Fournier
<jonatan.fourn...@gmail.com> wrote:
> Hello,
>
> I was wondering if there's other ways to import data in Solr than
> posting xml/json/csv to the server URL (e.g. locally building the
> index). Is the DataImporter only for database?
>
> My data is in an enormous text file that is parsed in python, I get
> clean json/xml out of it if I want, but the thing is that it drills
> down to about 300 millions "documents", so I don't want to execute 300
> millions http post in a for loop, even with relaxed soft commits etc
> it will take weeks, months to populate the index.
>
> I need to do that only once on an offline server and never add data
> back to the index (e.g. becomes a read-only instance).
>
> Any temporary index configuration I could have to populate the server
> with optimal add speed, then turn back the settings optimized for a
> read only instance?
>
> Thanks!
>
> --
> jonatan

Reply via email to