Re: Parallel Indexing

Peri Subrahmanya Mon, 22 Dec 2014 07:59:35 -0800

Thanks guys for the quick responses. I need to take the suggestions, 
incorporate them, figure out how is that we are doing the fetching etc and 
reply back on this post. The suggestions have been very helpful in taking this 
forward for us here.


Thanks
-Peri.S

> On Dec 22, 2014, at 10:32 AM, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> Just to pile on....
> 
> _very_ frequently in my experience the problem
> is not Solr at all, but acquiring the data in the
> first place, i.e. often executing the DB query.
> 
> A very simple test is (in the SolrJ world) just comment
> out the server.add(doclist).
> 
> Assuming you're using SolrJ, you _are_ indexing in
> batches, right? And you are _not_ committing from
> the  program, right? And.... As Hossman often says,
> details matter.
> 
> Also, take a look at your Solr server CPU utilization. You
> can get a crude idea of how much work it's doing,
> unless you have it running at 100% your bottleneck is
> on the acquisition side.
> 
> For a benchmark (admittedly not directly comparable),
> I can index 11M Wikipedia docs on my laptop in < 1
> hour without tuning anything. They're in XML format
> so data acquisition is very fast...
> 
> Best,
> Erick
> 
> On Mon, Dec 22, 2014 at 7:21 AM, Mikhail Khludnev
> <mkhlud...@griddynamics.com <mailto:mkhlud...@griddynamics.com>> wrote:
>> What your indexer is build on? Do you use SolrJ, just REST, or
>> DataImportHandler? What's you DB schema is briefly?
>> Frankly speaking, there are few approaches to handle indexing concurrently,
>> details depends on the details mentioned above.
>> 
>> On Mon, Dec 22, 2014 at 5:54 PM, Peri Subrahmanya <
>> peri.subrahma...@htcinc.com> wrote:
>>> 
>>> Hi,
>>> 
>>> We have millions of records in our db that we do a complete re-index of
>>> every fortnight or so. It takes around 11 hours or so and I was wondering
>>> if there was a way to fetch the records in batches parallel and issue the
>>> solr http command with the solr docs in parallel. Please let me know.
>>> 
>>> Thanks
>>> -Peri.S
>>> http://www.kuali.org/ole <http://www.kuali.org/ole>
>>> 
>>> 
>>> 
>>> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended
>>> recipient, please delete without copying and kindly advise us by e-mail of
>>> the mistake in delivery.
>>> NOTE: Regardless of content, this e-mail shall not operate to bind HTC
>>> Global Services to any order or other contract unless pursuant to explicit
>>> written agreement or government initiative expressly permitting the use of
>>> e-mail for such purpose.
>>> 
>> 
>> 
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>> 
>> <http://www.griddynamics.com <http://www.griddynamics.com/>>
>> <mkhlud...@griddynamics.com <mailto:mkhlud...@griddynamics.com>>
> 
> --- 
> This message has been scanned for viruses and dangerous content by HTC E-Mail 
> Virus Protection Service. 



*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.

Re: Parallel Indexing

Reply via email to