> My data is in an enormous text file that is parsed in python, You mean it is in Python s-expressions? I don't think there is a parser in DIH for that.
On Thu, Jul 19, 2012 at 9:27 AM, Erick Erickson <erickerick...@gmail.com> wrote: > First, turn off all your soft commit stuff, that won't help in your situation. > If you do leave autocommit on, make it a really high number > (let's say 1,000,000 to start). > > You won't have to make 300M calls, you can batch, say, 1,000 docs > into each request. > > DIH supports a bunch of different data sources, take a > look at: http://wiki.apache.org/solr/DataImportHandler, the > EntityProcessor, DataSource and the like. > > There is also the CSV update processor, see: > http://wiki.apache.org/solr/UpdateCSV. It might be better to, say, > break up your massive file into N CSV files and import those. > > Best > Erick > > On Thu, Jul 19, 2012 at 12:04 PM, Jonatan Fournier > <jonatan.fourn...@gmail.com> wrote: >> Hello, >> >> I was wondering if there's other ways to import data in Solr than >> posting xml/json/csv to the server URL (e.g. locally building the >> index). Is the DataImporter only for database? >> >> My data is in an enormous text file that is parsed in python, I get >> clean json/xml out of it if I want, but the thing is that it drills >> down to about 300 millions "documents", so I don't want to execute 300 >> millions http post in a for loop, even with relaxed soft commits etc >> it will take weeks, months to populate the index. >> >> I need to do that only once on an offline server and never add data >> back to the index (e.g. becomes a read-only instance). >> >> Any temporary index configuration I could have to populate the server >> with optimal add speed, then turn back the settings optimized for a >> read only instance? >> >> Thanks! >> >> -- >> jonatan -- Lance Norskog goks...@gmail.com