Just to pile on....

_very_ frequently in my experience the problem
is not Solr at all, but acquiring the data in the
first place, i.e. often executing the DB query.

A very simple test is (in the SolrJ world) just comment
out the server.add(doclist).

Assuming you're using SolrJ, you _are_ indexing in
batches, right? And you are _not_ committing from
the  program, right? And.... As Hossman often says,
details matter.

Also, take a look at your Solr server CPU utilization. You
can get a crude idea of how much work it's doing,
unless you have it running at 100% your bottleneck is
on the acquisition side.

For a benchmark (admittedly not directly comparable),
I can index 11M Wikipedia docs on my laptop in < 1
hour without tuning anything. They're in XML format
so data acquisition is very fast...

Best,
Erick

On Mon, Dec 22, 2014 at 7:21 AM, Mikhail Khludnev
<mkhlud...@griddynamics.com> wrote:
> What your indexer is build on? Do you use SolrJ, just REST, or
> DataImportHandler? What's you DB schema is briefly?
> Frankly speaking, there are few approaches to handle indexing concurrently,
> details depends on the details mentioned above.
>
> On Mon, Dec 22, 2014 at 5:54 PM, Peri Subrahmanya <
> peri.subrahma...@htcinc.com> wrote:
>>
>> Hi,
>>
>> We have millions of records in our db that we do a complete re-index of
>> every fortnight or so. It takes around 11 hours or so and I was wondering
>> if there was a way to fetch the records in batches parallel and issue the
>> solr http command with the solr docs in parallel. Please let me know.
>>
>> Thanks
>> -Peri.S
>> http://www.kuali.org/ole <http://www.kuali.org/ole>
>>
>>
>>
>> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended
>> recipient, please delete without copying and kindly advise us by e-mail of
>> the mistake in delivery.
>> NOTE: Regardless of content, this e-mail shall not operate to bind HTC
>> Global Services to any order or other contract unless pursuant to explicit
>> written agreement or government initiative expressly permitting the use of
>> e-mail for such purpose.
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mkhlud...@griddynamics.com>

Reply via email to