Are you loading data from multiple tables ? How many levels deep ? After
some experimenting, I gave up on the DIH because I found it to generate very
chatty (one row at a time) SQL against my schema, and I experienced
concurrency bugs unless multithreading was set to false, and I wasn't too
confident in the incremental mode against a complex schema.

Here is what worked for us (with Oracle):

- create materialized views; make sure that you include a 'lastUpdateTime'
field in the main table. This step may be unnecessary if your source data
does not need any pre-processing / cleaning / reorganizing.
- write a stored procedure that exports the data in Solr's XML format;
parameterize it with a range of primary keys of your main table so that you
can partition the export into manageable subsets. The XML format is very
simple, no need for complex in-the-database XML functions to generate it.
- use the database scheduler to run that procedure as a set of jobs; run a
few of them in parallel.
- use CURL or WGET or similar to feed the XML files into the index as soon
as they are available.
- compress and archive the XML files; they will come handy when you need to
provision another index instance and will save you a lot of exporting time.
- make sure your stored procedure can work in incremental mode: e.g. export
all records updated after a certain timestamp; then just push the resulting
XML into Solr.

Alain

On Tue, Oct 25, 2011 at 9:56 PM, Awasthi, Shishir
<shishir.awas...@baml.com>wrote:

> Hi,
>
> I recently started working on SOLR and loaded approximately 4 million
> records to the solr using DataImportHandler. It took 5 days to complete
> this process.
>
>
>
> Can you please suggest how this can be improved? I would like this to be
> done in less than 6 hrs.
>
>
>
> Thanks,
>
> Shishir
>
> ----------------------------------------------------------------------
> This message w/attachments (message) is intended solely for the use of the
> intended recipient(s) and may contain information that is privileged,
> confidential or proprietary. If you are not an intended recipient, please
> notify the sender, and then please delete and destroy all copies and
> attachments, and be advised that any review or dissemination of, or the
> taking of any action in reliance on, the information contained in or
> attached to this message is prohibited.
> Unless specifically indicated, this message is not an offer to sell or a
> solicitation of any investment products or other financial product or
> service, an official confirmation of any transaction, or an official
> statement of Sender. Subject to applicable law, Sender may intercept,
> monitor, review and retain e-communications (EC) traveling through its
> networks/systems and may produce any such EC to regulators, law enforcement,
> in litigation and as required by law.
> The laws of the country of each sender/recipient may impact the handling of
> EC, and EC may be archived, supervised and produced in countries other than
> the country in which you are located. This message cannot be guaranteed to
> be secure or free of errors or viruses.
>
> References to "Sender" are references to any subsidiary of Bank of America
> Corporation. Securities and Insurance Products: * Are Not FDIC Insured * Are
> Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a
> Condition to Any Banking Service or Activity * Are Not Insured by Any
> Federal Government Agency. Attachments that are part of this EC may have
> additional important disclosures and disclaimers, which you should read.
> This message is subject to terms available at the following link:
> http://www.bankofamerica.com/emaildisclaimer. By messaging with Sender you
> consent to the foregoing.
>

Reply via email to