HI Otis,
Thanks for the reply. I am using a pretty "vanilla approach" right
now and it's taking about 30 hours to build an index of about 5.5Gb.
Can you please tell me what some of the changes you made to optimize
the indexing process?
Thanks
Brendan
On Nov 21, 2007, at 2:27 AM, Otis Gospodnetic wrote:
Just tried a search for "web" on this index - 1.1 seconds. This
matches about 1MM of about 20MM docs. Redo the search, and it's 1
ms (cached). This is without any load nor serious benchmarking,
clearly.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
From: Eswar K <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, November 21, 2007 2:11:07 AM
Subject: Re: Any tips for indexing large amounts of data?
Hi otis,
I understand that is slightly off track question, but I am just
curious
to
know the performance of Search on a 20 GB index file. What has been
your
observation?
Regards,
Eswar
On Nov 21, 2007 12:33 PM, Otis Gospodnetic
<[EMAIL PROTECTED]>
wrote:
Mike is right about the occasional slow-down, which appears as a
pause and
is due to large Lucene index segment merging. This should go away
with
newer versions of Lucene where this is happening in the background.
That said, we just indexed about 20MM documents on a single 8-core
machine
with 8 GB of RAM, resulting in nearly 20 GB index. The whole process
took a
little less than 10 hours - that's over 550 docs/second. The vanilla
approach before some of our changes apparently required several days
to
index the same amount of data.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
----- Original Message ----
From: Mike Klaas <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, November 19, 2007 5:50:19 PM
Subject: Re: Any tips for indexing large amounts of data?
There should be some slowdown in larger indices as occasionally large
segment merge operations must occur. However, this shouldn't really
affect overall speed too much.
You haven't really given us enough data to tell you anything useful.
I would recommend trying to do the indexing via a webapp to eliminate
all your code as a possible factor. Then, look for signs to what is
happening when indexing slows. For instance, is Solr high in cpu, is
the computer thrashing, etc?
-Mike
On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
Hi,
Thanks for answering this question a while back. I have made some
of the suggestions you mentioned. ie not committing until I've
finished indexing. What I am seeing though, is as the index get
larger (around 1Gb), indexing is taking a lot longer. In fact it
slows down to a crawl. Have you got any pointers as to what I might
be doing wrong?
Also, I was looking at using MultiCore solr. Could this help in
some way?
Thank you
Brendan
On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
: I would think you would see better performance by allowing auto
commit
: to handle the commit size instead of reopening the connection
all the
: time.
if your goal is "fast" indexing, don't use autoCommit at all ...
just
index everything, and don't commit until you are completely done.
autoCommitting will slow your indexing down (the benefit being
that more
results will be visible to searchers as you proceed)
-Hoss