Pranav,

If possible, you may wish to consider moving a job this large outside
of DataImportHandler to a standalone program, as the SQL processing is
somewhat limited by the N+1 subselects problem.

Michael Della Bitta

------------------------------------------------
Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
www.appinions.com
Where Influence Isn’t a Game


On Wed, Aug 8, 2012 at 1:16 AM, Pranav Prakash <pra...@gmail.com> wrote:
> Folks,
>
> My full data import takes ~80hrs. It has around ~9m documents and ~15 SQL
> queries for each document. The database servers are different from Solr
> Servers. Each document has an update processor chain which (a) calculates
> signature of the document using SignatureUpdateProcessorFactory and (b)
> Finds out terms which have term frequency > 2; using a custom processor.
> The index size is ~ 480GiB
>
> I want to know if the amount of time taken is too large compared to the
> document count? How do I benchmark the stats and what are some of the ways
> I can improve this? I believe there are some optimizations that I could do
> at Update Processor Factory level as well. What would be a good way to get
> dirty on this?
>
> *Pranav Prakash*
>
> "temet nosce"

Reply via email to