Are you loading data from multiple tables ? How many levels deep ? After some experimenting, I gave up on the DIH because I found it to generate very chatty (one row at a time) SQL against my schema, and I experienced concurrency bugs unless multithreading was set to false, and I wasn't too confident in the incremental mode against a complex schema.
Here is what worked for us (with Oracle): - create materialized views; make sure that you include a 'lastUpdateTime' field in the main table. This step may be unnecessary if your source data does not need any pre-processing / cleaning / reorganizing. - write a stored procedure that exports the data in Solr's XML format; parameterize it with a range of primary keys of your main table so that you can partition the export into manageable subsets. The XML format is very simple, no need for complex in-the-database XML functions to generate it. - use the database scheduler to run that procedure as a set of jobs; run a few of them in parallel. - use CURL or WGET or similar to feed the XML files into the index as soon as they are available. - compress and archive the XML files; they will come handy when you need to provision another index instance and will save you a lot of exporting time. - make sure your stored procedure can work in incremental mode: e.g. export all records updated after a certain timestamp; then just push the resulting XML into Solr. Alain On Tue, Oct 25, 2011 at 9:56 PM, Awasthi, Shishir <shishir.awas...@baml.com>wrote: > Hi, > > I recently started working on SOLR and loaded approximately 4 million > records to the solr using DataImportHandler. It took 5 days to complete > this process. > > > > Can you please suggest how this can be improved? I would like this to be > done in less than 6 hrs. > > > > Thanks, > > Shishir > > ---------------------------------------------------------------------- > This message w/attachments (message) is intended solely for the use of the > intended recipient(s) and may contain information that is privileged, > confidential or proprietary. If you are not an intended recipient, please > notify the sender, and then please delete and destroy all copies and > attachments, and be advised that any review or dissemination of, or the > taking of any action in reliance on, the information contained in or > attached to this message is prohibited. > Unless specifically indicated, this message is not an offer to sell or a > solicitation of any investment products or other financial product or > service, an official confirmation of any transaction, or an official > statement of Sender. Subject to applicable law, Sender may intercept, > monitor, review and retain e-communications (EC) traveling through its > networks/systems and may produce any such EC to regulators, law enforcement, > in litigation and as required by law. > The laws of the country of each sender/recipient may impact the handling of > EC, and EC may be archived, supervised and produced in countries other than > the country in which you are located. This message cannot be guaranteed to > be secure or free of errors or viruses. > > References to "Sender" are references to any subsidiary of Bank of America > Corporation. Securities and Insurance Products: * Are Not FDIC Insured * Are > Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a > Condition to Any Banking Service or Activity * Are Not Insured by Any > Federal Government Agency. Attachments that are part of this EC may have > additional important disclosures and disclaimers, which you should read. > This message is subject to terms available at the following link: > http://www.bankofamerica.com/emaildisclaimer. By messaging with Sender you > consent to the foregoing. >