Hi Otis,

I think that my questions were not very well formulated.

We have dedicate machines for parsing, 2 machines (active/pasive) for
indexing, the index allocated in a SAN filesystem and dedicate machines for
searching.
All of my questions came because if i have an index of 300gb i dont know how
much ram i will need for searching in that index. I dont
find anywere documents about memory use in solr and i'm a
bit worried because i dont know how much memory i will need for attending
each search. I dont have much problems with concurrent searchs because I can
had machines to a cluster.

I read about the filterCache, queryResultCache and documentCache but if i
dont use those caches (set them to 0) i dont know how much memory solr will
need (if its needed) to store the docSets orden them, etc ... and attend a
search.

If some document explain it, it will be very usefull for me.



2008/6/15 Otis Gospodnetic <[EMAIL PROTECTED]>:

> Roberto,
>
> All I was trying to say that it *might* be cheaper to buy:
>
> 10 smaller servers with 4 GB RAM each, for a total of 40 GB RAM
> than
> 1 big server with 40 GB RAM and the CPU matching the CPU power of 10
> smaller servers
>
> Of course, there are other things to consider, too - power usage, hosting
> space, management, etc.
> There is no single answer, you'll have to evaluate pros and cons yourself.
>  I simply wanted to point out various factors that you and your IT team will
> need to consider.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> ----- Original Message ----
> > From: Roberto Nieto <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Sunday, June 15, 2008 8:38:15 AM
> > Subject: Re: doubt with an index of 300gb
> >
> > Hi Otis,
> >
> > Thanks a lot for your interest.
> >
> > The main thing i cant understand very well is that if I have 8 maquines
> that
> > will be searchers, for example, why they will have a higher cost of hw if
> I
> > have one big index. If I have 10 smaller indexes I will need
> > to search over all of them so...that won´t requiere the same hw? I
> > understand that if i can search in a subset of the index it would be
> better
> > to split the index but if i must search in the entire index?
> >
> > I can add new searcher maquines so i think that my hw problem is the ram,
> > its that right?
> >
> > Probably i'm missing something, sorry if my question have an obvious
> answer.
> >
> >
> >
> >
> > 2008/6/15 Otis Gospodnetic :
> >
> > > Hi Roberto,
> > >
> > > SAN is a fine choice, if that's what you were worried about.  There is
> no
> > > way to tell exactly how fast your searches will be, as that depends on
> a lot
> > > of factors -- benchmarking with your own data and hardware and queries
> is
> > > the best way to go.
> > >
> > > As for the cost of multiple smaller machines and one large one (if
> that's
> > > what's needed) is that, I *think*, the price of hw goes up
> significantly
> > > when you start working with high-end hw, and that cost may be higher
> than
> > > the cost of N smaller servers combined.  That's the cost difference
> that I
> > > was trying to point out.  That's for your IT people to figure out after
> you
> > > tell them what type of hw you need and what the options are.
> > >
> > > Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >
> > >
> > > ----- Original Message ----
> > > > From: Roberto Nieto
>  > > > To: solr-user@lucene.apache.org
> > >  > Sent: Saturday, June 14, 2008 5:05:54 PM
> > > > Subject: Re: doubt with an index of 300gb
> > > >
> > > > Hi Otis,
> > > >
> > > > Thanks for your fast answer.
> > > >
> > > > I understand perfectly your points. I will explain my limitations ...
> > > >
> > > > --Multiple smaller indices you can split them across several servers,
> but
> > > > you can't do that with a monolithic index.
> > > > The index will be allocated in a SAN that is not under my election. I
> can
> > > > decide to split the index or use a monolithic one but not the
> allocation
> > > >
> > > > --With multiple smaller indices you can choose to search only a
> subset of
> > > > them, should that make sense for your app.
> > > > --How much does it cost to have 1 server with a LOT of RAM that
> serving
> > > this
> > > > index will need?  Maybe it's cheaper to have multiple smaller
> machines.
> > > > This index will be an index public and i will always need to search
> in
> > > the
> > > > entire index. I understand the problem of the RAM, but if I use
> multiple
> > > > index and then i search in all of them i will use less RAM? The index
> > > will
> > > > have 10 fields, all of them excepting the content will be small and I
> > > will
> > > > only sort be score. If someone have any experience of how much ram i
> will
> > > > need or something about the response times with this kind of index it
> > > would
> > > > be very usefull for me.
> > > >
> > > > --How long does it take you to rebuild one big index, should it get
> > > > corrupted vs. rebuilding only a subset of your data?
> > > > This is a very important aspect, but my primary objective must be the
> > > > response time. I thought about using different index with different
> solr
> > > but
> > > > the problem is the mixture of results and how to sort them...so i
> think
> > > (but
> > > > not sure) that using only one index it will be faster knowing that i
> will
> > > > always need to search in the entire index.
> > > >
> > > >
> > > > Any help or suggestion will be very usefull.
> > > >
> > > > Thank you very much for your attention
> > > >
> > > >
> > > > 2008/6/14 Otis Gospodnetic :
> > > >
> > > > > Roberto,
> > > > >
> > > > > Here is some food for thought...
> > > > >
> > > > > Multiple smaller indices you can split them across several servers,
> but
> > > you
> > > > > can't do that with a monolithic index.
> > > > >
> > > > > With multiple smaller indices you can choose to search only a
> subset of
> > > > > them, should that make sense for your app.
> > > > > How much does it cost to have 1 server with a LOT of RAM that
> serving
> > > this
> > > > > index will need?  Maybe it's cheaper to have multiple smaller
> machines.
> > > > >
> > > > > How long does it take you to rebuild one big index, should it get
> > > corrupted
> > > > > vs. rebuilding only a subset of your data?
> > > > > How long does it take you to copy the index around the network
> after
> > > you
> > > > > optimize it vs. copying only a subset, or multiple subsets in
> parallel?
> > > > >
> > > > > etc.
> > > > >
> > > > > Otis --
> > > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > > > >
> > > > >
> > > > > ----- Original Message ----
> > > > > > From: Roberto Nieto
> > >  > > > To: solr-user@lucene.apache.org
> > > > > > Sent: Saturday, June 14, 2008 7:31:28 AM
> > > > > > Subject: doubt with an index of 300gb
> > > > > >
> > > > > > Hi users,
> > > > > >
> > > > > > I´m going to create a big index of 300gb in a SAN where i have
> 4TB. I
> > > > > read
> > > > > > many entries in the mail list talking about using multiple index
> with
> > > > > > multicore. I would like to know what kind of benefit can i have
> > > > > > using multiple index instead of one big index if i dont have
> problems
> > > > > with
> > > > > > the disk? I know that the optimizes and the commits would be
> faster
> > > with
> > > > > > smaller indexs, but in search? The RAM use would be the same
> using 10
> > > > > > indexes of 30gb than using 1 index of 300gb? Any suggestion or
> > > experience
> > > > > > will be very usefull for me.
> > > > > >
> > > > > > Thanks in advance.
> > > > > >
> > > > > > Rober.
> > > > >
> > > > >
> > >
> > >
>
>

Reply via email to