Thats great.

At what size of the index do you think we should look at partitioning the
index file?

Eswar

On Nov 21, 2007 12:57 PM, Otis Gospodnetic <[EMAIL PROTECTED]>
wrote:

> Just tried a search for "web" on this index - 1.1 seconds.  This matches
> about 1MM of about 20MM docs.  Redo the search, and it's 1 ms (cached).
>  This is without any load nor serious benchmarking, clearly.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> ----- Original Message ----
> From: Eswar K <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, November 21, 2007 2:11:07 AM
> Subject: Re: Any tips for indexing large amounts of data?
>
> Hi otis,
>
> I understand that is slightly off track question, but I am just curious
>  to
> know the performance of Search on a 20 GB index file. What has been
>  your
> observation?
>
> Regards,
> Eswar
>
> On Nov 21, 2007 12:33 PM, Otis Gospodnetic <[EMAIL PROTECTED]>
> wrote:
>
> > Mike is right about the occasional slow-down, which appears as a
>  pause and
> > is due to large Lucene index segment merging.  This should go away
>  with
> > newer versions of Lucene where this is happening in the background.
> >
> > That said, we just indexed about 20MM documents on a single 8-core
>  machine
> > with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process
>  took a
> > little less than 10 hours - that's over 550 docs/second.  The vanilla
> > approach before some of our changes apparently required several days
>  to
> > index the same amount of data.
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> > ----- Original Message ----
> > From: Mike Klaas <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Monday, November 19, 2007 5:50:19 PM
> > Subject: Re: Any tips for indexing large amounts of data?
> >
> > There should be some slowdown in larger indices as occasionally large
> > segment merge operations must occur.  However, this shouldn't really
> > affect overall speed too much.
> >
> > You haven't really given us enough data to tell you anything useful.
> > I would recommend trying to do the indexing via a webapp to eliminate
> > all your code as a possible factor.  Then, look for signs to what is
> > happening when indexing slows.  For instance, is Solr high in cpu, is
> > the computer thrashing, etc?
> >
> > -Mike
> >
> > On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
> >
> > > Hi,
> > >
> > > Thanks for answering this question a while back. I have made some
> > > of the suggestions you mentioned. ie not committing until I've
> > > finished indexing. What I am seeing though, is as the index get
> > > larger (around 1Gb), indexing is taking a lot longer. In fact it
> > > slows down to a crawl. Have you got any pointers as to what I might
> > > be doing wrong?
> > >
> > > Also, I was looking at using MultiCore solr. Could this help in
> > > some way?
> > >
> > > Thank you
> > > Brendan
> > >
> > > On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
> > >
> > >>
> > >> : I would think you would see better performance by allowing auto
> > >> commit
> > >> : to handle the commit size instead of reopening the connection
> > >> all the
> > >> : time.
> > >>
> > >> if your goal is "fast" indexing, don't use autoCommit at all ...
> >  just
> > >> index everything, and don't commit until you are completely done.
> > >>
> > >> autoCommitting will slow your indexing down (the benefit being
> > >> that more
> > >> results will be visible to searchers as you proceed)
> > >>
> > >>
> > >>
> > >>
> > >> -Hoss
> > >>
> > >
> >
> >
> >
> >
> >
>
>
>
>

Reply via email to