PM
> To: solr-user@lucene.apache.org
> Subject: Re: Scaling Issues
>
>
> I am using Apache ManifoldCF framework which connects to my local system
> and passes all the documents in C drive to Solr.
>
> I am not doing any searches while indexing.
>
> There is total 362GB of
file
caching to hold the entire Solr index.
Do you have Solr auto-commit enabled?
-- Jack Krupansky
-Original Message-
From: Ameya Aware
Sent: Tuesday, July 29, 2014 3:01 PM
To: solr-user@lucene.apache.org
Subject: Re: Scaling Issues
I am using Apache ManifoldCF framework which connects
yeah.. i tried that.. with null output connector all the files gets crawled
in simply one hour..
On Tue, Jul 29, 2014 at 4:00 PM, Toke Eskildsen
wrote:
> Ameya Aware [ameya.aw...@gmail.com] wrote:
> > I am using Apache ManifoldCF framework which connects to my local system
> > and passes all th
Ameya Aware [ameya.aw...@gmail.com] wrote:
> I am using Apache ManifoldCF framework which connects to my local system
> and passes all the documents in C drive to Solr.
> There is total 362GB of data needs to be indexed. I am not performing any
> complex analysis.
If you are indexing "random" fil
I am using Apache ManifoldCF framework which connects to my local system
and passes all the documents in C drive to Solr.
I am not doing any searches while indexing.
There is total 362GB of data needs to be indexed. I am not performing any
complex analysis.
Thanks,
Ameya
On Tue, Jul 29, 2014
Ameya Aware [ameya.aw...@gmail.com] wrote:
[Solr -Xmx5120m]
> I need to index around 30 documents but with above parameters
> performance is coming very poor around 15000-2 documents per hour.
4-5 documents/second is a lot less than the numbers people normally cite, but
we need to know
95+ % of the time problems like this are not Solr, but the
data acquisition, i.e. querying the DB, traversing the file system
etc.
We need to have an idea of what the indexing pipeline is all about
before saying anything coherent.
If you're using extractingrequesthandler for Word, PDFs, etc,
you
Hi Ameya,
Tough to say without more information about what's slow. In general,
when I've seen Solr index that slow, it's usually related to some
complex text analysis, for instance, are you doing any phonetic
analysis? Best thing to do is attach a Java profiler (e.g. JConsole or
VisualVM) using rm
when you say performance is very poor, what is happening at the system level?
e.g.
are cpu's pegged out?
is there a lot of IO wait?
is the storage busy?
is the network busy?
some easy tools to watch this stuff live if you arent sure and dont have full
on system monitoring agents installed