2x the index size is required for optimizing.

Things that increase with index size: indexing time, query time and
disk index size. My 500GB index at a previous job worked. Indexing was
a little slow, queries were much slower. What finally made us split it
up was that one binary blob of 500GB was too much to manage: back up,
optimize etc. It was the IT that made it impossible. Lucene & Solr
worked fine.

On Mon, Dec 20, 2010 at 4:53 AM, Tri Nguyen <tringuye...@yahoo.com> wrote:
> Thought about it some more and after some reading.  I suppose the answer 
> depends on what kind of response time is expected to be good enough.
>
> I can do some stress testing and see if disk i/o is the bottleneck as the 
> index grows.  I can also look into optimizing/configuring solr parameters to 
> help performance.  One thing I've read is my disk should be at least 2 times 
> the index.
>
>
>
>
> --- On Mon, 12/20/10, Tri Nguyen <tringuye...@yahoo.com> wrote:
>
>
> From: Tri Nguyen <tringuye...@yahoo.com>
> Subject: Re: shard versus core
> To: solr-user@lucene.apache.org
> Date: Monday, December 20, 2010, 4:04 AM
>
>
> Hi Erick,
>
> Thanks for the explanation.
>
> At which point does the index get too big where sharding is appropriate where 
> it affects performance?
>
> Tri
>
> --- On Sun, 12/19/10, Erick Erickson <erickerick...@gmail.com> wrote:
>
>
> From: Erick Erickson <erickerick...@gmail.com>
> Subject: Re: shard versus core
> To: solr-user@lucene.apache.org
> Date: Sunday, December 19, 2010, 7:36 AM
>
>
> Well, they can be different beasts. First of all, different cores can have
> different schemas, which is not true of shards. Also, shards are almost
> assumed to be running on different machines as a scaling technique,
> whereas it multiple cores are run on a single Solr instance.
>
> So using multiple cores is very similar to running multiple "virtual" Solr
> serves on a single machine, each independent of the other. This can make
> sense if, for instance, you wanted to have a bunch of small indexes all
> on one machine. You could use multiple cores rather than multiple
> instances of Solr. These indexes may or may not have anything to do with
> each other.
>
> Sharding, on the other hand, is almost always used to split a single logical
> index up amongst multiple machines in order to improve performance. The
> assumption usually is that the index is too big to give satisfactory
> performance
> on a single machine, so you'll split it into parts. That assumption really
> implies that it makes no sense to put multiple shards on the #same# machine.
>
> So really, the answer to your question is that you choose the right
> technique
> for the problem you're trying to solve. They aren't really different
> solutions to
> the same problem...
>
> Hope this helps.
> Erick
>
> On Sun, Dec 19, 2010 at 4:07 AM, Tri Nguyen <tringuye...@yahoo.com> wrote:
>
>> Hi,
>>
>> Was wondering about  the pro's and con's of using sharding versus cores.
>>
>> An index can be split up to multiple cores or multilple shards.
>>
>> So why one over the other?
>>
>> Thanks,
>>
>>
>> tri
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to