Re: Slow indexing speed when collection size is large

2017-05-07 Thread Zheng Lin Edwin Yeo
Hi Shawn, Are the two types of indexing (ERH with OCR, and indexing from a DB) happening on the same Solr server? A) Yes, they are happening on the same Solr server, but currently, only the indexing from a DB is running. Is Solr in a virtual machine? A) No Is the 384GB at the hypervisor level, o

Re: Slow indexing speed when collection size is large

2017-05-07 Thread Shawn Heisey
On 5/6/2017 6:49 PM, Zheng Lin Edwin Yeo wrote: > For my rich documentation handling, I'm using Extracting Request Handler, and > it requires OCR. > > However, currently, for the slow indexing speed which I'm experiencing, the > indexing is done directly from the Sybase database. I will fetch abo

Re: Slow indexing speed when collection size is large

2017-05-06 Thread Zheng Lin Edwin Yeo
Hi Shawn, For my rich documentation handling, I'm using Extracting Request Handler, and it requires OCR. However, currently, for the slow indexing speed which I'm experiencing, the indexing is done directly from the Sybase database. I will fetch about 1000 records at a time from Sybase, and store

Re: Slow indexing speed when collection size is large

2017-05-06 Thread Shawn Heisey
On 5/1/2017 10:17 AM, Zheng Lin Edwin Yeo wrote: > I'm using Solrj for the indexing, not using curl. Normally I bundle > about 1000 documents for each POST. There's more than 300GB of RAM for > that server, and I do not use any sharing at the moment. Looking over your email history on the list, I

Re: Slow indexing speed when collection size is large

2017-05-01 Thread Zheng Lin Edwin Yeo
Hi Rick, I'm using Solrj for the indexing, not using curl. Normally I bundle about 1000 documents for each POST. There's more than 300GB of RAM for that server, and I do not use any sharing at the moment. Regards, Edwin On 1 May 2017 at 19:08, Rick Leir wrote: > Zheng, > Are you POSTing using

Re: Slow indexing speed when collection size is large

2017-05-01 Thread Rick Leir
Zheng, Are you POSTing using curl? Get several processes working in parallel to get a small boost. Solrj should speed you up a bit too (numbers anyone?). How many documents do you bundle in a POST? Do you have lots of RAM? Sharding? Cheers -- Rick On April 30, 2017 10:39:29 PM EDT, Zheng Lin E