Re: Indexing (posting document) taking a lot of time

2016-08-16 Thread kshitij tyagi
I am posting json using curl. On Wed, Aug 17, 2016 at 4:41 AM, Alexandre Rafalovitch wrote: > What format are those documents? Solr XML? Custom JSON? > > Or are you sending PDF/binary documents to Solr's extract handler and > asking it to do the extraction of the useful stuff? If later, you > co

Re: Indexing (posting document) taking a lot of time

2016-08-16 Thread Alexandre Rafalovitch
What format are those documents? Solr XML? Custom JSON? Or are you sending PDF/binary documents to Solr's extract handler and asking it to do the extraction of the useful stuff? If later, you could take that step out of Solr with a custom client using Tika (what Solr has under the hood) and only s

Re: Indexing (posting document) taking a lot of time

2016-08-16 Thread Emir Arnautovic
That is quite big document! You need to minitor Solr to see if you are feeding documents fast enough or if you are saturating it with large number of large requests. Play with batch size and number of threads to find sweet spot. Maybe try extremes first (one doc/one thread, one doc many threads

Re: Indexing (posting document) taking a lot of time

2016-08-16 Thread kshitij tyagi
400kb is size of single document and i am sending 100 documents per request. solr heap size is 16gb and running on multithread. On Tue, Aug 16, 2016 at 5:10 PM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Hi, > > 400KB/doc * 100doc = 40MB. If you are running it single threaded, Solr

Re: Indexing (posting document) taking a lot of time

2016-08-16 Thread Emir Arnautovic
Hi, 400KB/doc * 100doc = 40MB. If you are running it single threaded, Solr will be idle while accepting relatively large request. Or is 400KB 100 doc bulk that you are sending? What is Solr's heap size? I would try increasing number of threads and monitor Solr's heap/CPU/IO to see where is t

Re: Indexing (posting document) taking a lot of time

2016-08-16 Thread kshitij tyagi
hi, we are sending about 100 documents per request for indexing? we have autocmmit set to false and commit only when 1 documents are present.solr and the machine sending request are in same pool. On Tue, Aug 16, 2016 at 4:51 PM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Hi,

Re: Indexing (posting document) taking a lot of time

2016-08-16 Thread Emir Arnautovic
Hi, Do you send one doc per request? How frequently do you commit? Where is Solr running? What is network connection between your machine and Solr? What are JVM settings? Is 10-30s for entire indexing or single doc? Regards, Emir On 16.08.2016 11:34, kshitij tyagi wrote: Hi alexandre, 1 do

Re: Indexing (posting document) taking a lot of time

2016-08-16 Thread kshitij tyagi
Hi alexandre, 1 document of 400kb size is taking approx 10-30 sec and this is varying. I am posting document using curl On Tue, Aug 16, 2016 at 2:11 PM, Alexandre Rafalovitch wrote: > How many records is that and what is 'slow'? Also is this standalone or > cluster setup? > > On 16 Aug 2016 6:3

Re: Indexing (posting document) taking a lot of time

2016-08-16 Thread Alexandre Rafalovitch
How many records is that and what is 'slow'? Also is this standalone or cluster setup? On 16 Aug 2016 6:33 PM, "kshitij tyagi" wrote: > Hi, > > I am indexing a lot of data about 8GB, but it is taking a lot of time. I > have read about maxBufferedDocs, ramBufferSizeMB, merge policy ,etc in > solr

Indexing (posting document) taking a lot of time

2016-08-16 Thread kshitij tyagi
Hi, I am indexing a lot of data about 8GB, but it is taking a lot of time. I have read about maxBufferedDocs, ramBufferSizeMB, merge policy ,etc in solrconfig file. It would be helpful if someone could help me out tune the segtting for faster indexing speeds. *I have read the docs but not able t