2x the index size is required for optimizing. Things that increase with index size: indexing time, query time and disk index size. My 500GB index at a previous job worked. Indexing was a little slow, queries were much slower. What finally made us split it up was that one binary blob of 500GB was too much to manage: back up, optimize etc. It was the IT that made it impossible. Lucene & Solr worked fine.
On Mon, Dec 20, 2010 at 4:53 AM, Tri Nguyen <tringuye...@yahoo.com> wrote: > Thought about it some more and after some reading. I suppose the answer > depends on what kind of response time is expected to be good enough. > > I can do some stress testing and see if disk i/o is the bottleneck as the > index grows. I can also look into optimizing/configuring solr parameters to > help performance. One thing I've read is my disk should be at least 2 times > the index. > > > > > --- On Mon, 12/20/10, Tri Nguyen <tringuye...@yahoo.com> wrote: > > > From: Tri Nguyen <tringuye...@yahoo.com> > Subject: Re: shard versus core > To: solr-user@lucene.apache.org > Date: Monday, December 20, 2010, 4:04 AM > > > Hi Erick, > > Thanks for the explanation. > > At which point does the index get too big where sharding is appropriate where > it affects performance? > > Tri > > --- On Sun, 12/19/10, Erick Erickson <erickerick...@gmail.com> wrote: > > > From: Erick Erickson <erickerick...@gmail.com> > Subject: Re: shard versus core > To: solr-user@lucene.apache.org > Date: Sunday, December 19, 2010, 7:36 AM > > > Well, they can be different beasts. First of all, different cores can have > different schemas, which is not true of shards. Also, shards are almost > assumed to be running on different machines as a scaling technique, > whereas it multiple cores are run on a single Solr instance. > > So using multiple cores is very similar to running multiple "virtual" Solr > serves on a single machine, each independent of the other. This can make > sense if, for instance, you wanted to have a bunch of small indexes all > on one machine. You could use multiple cores rather than multiple > instances of Solr. These indexes may or may not have anything to do with > each other. > > Sharding, on the other hand, is almost always used to split a single logical > index up amongst multiple machines in order to improve performance. The > assumption usually is that the index is too big to give satisfactory > performance > on a single machine, so you'll split it into parts. That assumption really > implies that it makes no sense to put multiple shards on the #same# machine. > > So really, the answer to your question is that you choose the right > technique > for the problem you're trying to solve. They aren't really different > solutions to > the same problem... > > Hope this helps. > Erick > > On Sun, Dec 19, 2010 at 4:07 AM, Tri Nguyen <tringuye...@yahoo.com> wrote: > >> Hi, >> >> Was wondering about the pro's and con's of using sharding versus cores. >> >> An index can be split up to multiple cores or multilple shards. >> >> So why one over the other? >> >> Thanks, >> >> >> tri > -- Lance Norskog goks...@gmail.com