First, turn off all your soft commit stuff, that won't help in your situation. If you do leave autocommit on, make it a really high number (let's say 1,000,000 to start).
You won't have to make 300M calls, you can batch, say, 1,000 docs into each request. DIH supports a bunch of different data sources, take a look at: http://wiki.apache.org/solr/DataImportHandler, the EntityProcessor, DataSource and the like. There is also the CSV update processor, see: http://wiki.apache.org/solr/UpdateCSV. It might be better to, say, break up your massive file into N CSV files and import those. Best Erick On Thu, Jul 19, 2012 at 12:04 PM, Jonatan Fournier <jonatan.fourn...@gmail.com> wrote: > Hello, > > I was wondering if there's other ways to import data in Solr than > posting xml/json/csv to the server URL (e.g. locally building the > index). Is the DataImporter only for database? > > My data is in an enormous text file that is parsed in python, I get > clean json/xml out of it if I want, but the thing is that it drills > down to about 300 millions "documents", so I don't want to execute 300 > millions http post in a for loop, even with relaxed soft commits etc > it will take weeks, months to populate the index. > > I need to do that only once on an offline server and never add data > back to the index (e.g. becomes a read-only instance). > > Any temporary index configuration I could have to populate the server > with optimal add speed, then turn back the settings optimized for a > read only instance? > > Thanks! > > -- > jonatan