Pranav, If possible, you may wish to consider moving a job this large outside of DataImportHandler to a standalone program, as the SQL processing is somewhat limited by the N+1 subselects problem.
Michael Della Bitta ------------------------------------------------ Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn’t a Game On Wed, Aug 8, 2012 at 1:16 AM, Pranav Prakash <pra...@gmail.com> wrote: > Folks, > > My full data import takes ~80hrs. It has around ~9m documents and ~15 SQL > queries for each document. The database servers are different from Solr > Servers. Each document has an update processor chain which (a) calculates > signature of the document using SignatureUpdateProcessorFactory and (b) > Finds out terms which have term frequency > 2; using a custom processor. > The index size is ~ 480GiB > > I want to know if the amount of time taken is too large compared to the > document count? How do I benchmark the stats and what are some of the ways > I can improve this? I believe there are some optimizations that I could do > at Update Processor Factory level as well. What would be a good way to get > dirty on this? > > *Pranav Prakash* > > "temet nosce"