Re: Scaling Issues

2014-07-29 Thread Ameya Aware
PM > To: solr-user@lucene.apache.org > Subject: Re: Scaling Issues > > > I am using Apache ManifoldCF framework which connects to my local system > and passes all the documents in C drive to Solr. > > I am not doing any searches while indexing. > > There is total 362GB of

Re: Scaling Issues

2014-07-29 Thread Jack Krupansky
file caching to hold the entire Solr index. Do you have Solr auto-commit enabled? -- Jack Krupansky -Original Message- From: Ameya Aware Sent: Tuesday, July 29, 2014 3:01 PM To: solr-user@lucene.apache.org Subject: Re: Scaling Issues I am using Apache ManifoldCF framework which connects

Re: Scaling Issues

2014-07-29 Thread Ameya Aware
yeah.. i tried that.. with null output connector all the files gets crawled in simply one hour.. On Tue, Jul 29, 2014 at 4:00 PM, Toke Eskildsen wrote: > Ameya Aware [ameya.aw...@gmail.com] wrote: > > I am using Apache ManifoldCF framework which connects to my local system > > and passes all th

RE: Scaling Issues

2014-07-29 Thread Toke Eskildsen
Ameya Aware [ameya.aw...@gmail.com] wrote: > I am using Apache ManifoldCF framework which connects to my local system > and passes all the documents in C drive to Solr. > There is total 362GB of data needs to be indexed. I am not performing any > complex analysis. If you are indexing "random" fil

Re: Scaling Issues

2014-07-29 Thread Ameya Aware
I am using Apache ManifoldCF framework which connects to my local system and passes all the documents in C drive to Solr. I am not doing any searches while indexing. There is total 362GB of data needs to be indexed. I am not performing any complex analysis. Thanks, Ameya On Tue, Jul 29, 2014

RE: Scaling Issues

2014-07-29 Thread Toke Eskildsen
Ameya Aware [ameya.aw...@gmail.com] wrote: [Solr -Xmx5120m] > I need to index around 30 documents but with above parameters > performance is coming very poor around 15000-2 documents per hour. 4-5 documents/second is a lot less than the numbers people normally cite, but we need to know

Re: Scaling Issues

2014-07-29 Thread Erick Erickson
95+ % of the time problems like this are not Solr, but the data acquisition, i.e. querying the DB, traversing the file system etc. We need to have an idea of what the indexing pipeline is all about before saying anything coherent. If you're using extractingrequesthandler for Word, PDFs, etc, you

Re: Scaling Issues

2014-07-29 Thread Timothy Potter
Hi Ameya, Tough to say without more information about what's slow. In general, when I've seen Solr index that slow, it's usually related to some complex text analysis, for instance, are you doing any phonetic analysis? Best thing to do is attach a Java profiler (e.g. JConsole or VisualVM) using rm

RE: Scaling Issues

2014-07-29 Thread Boogie Shafer
when you say performance is very poor, what is happening at the system level? e.g. are cpu's pegged out? is there a lot of IO wait? is the storage busy? is the network busy? some easy tools to watch this stuff live if you arent sure and dont have full on system monitoring agents installed