Re: How can a distributed Solr setup scale to TB-data, if URL limitations are 4000 for distributed shard search?

Otis Gospodnetic Thu, 19 Jan 2012 20:41:12 -0800

Hi Daniel,


----- Original Message -----
> From: Daniel Bruegge <daniel.brue...@googlemail.com>
> To: solr-user@lucene.apache.org; Otis Gospodnetic <otis_gospodne...@yahoo.com>
> Cc: 
> Sent: Thursday, January 19, 2012 5:49 AM
> Subject: Re: How can a distributed Solr setup scale to TB-data, if URL 
> limitations are 4000 for distributed shard search?
> 
> On Thu, Jan 19, 2012 at 4:51 AM, Otis Gospodnetic <
> otis_gospodne...@yahoo.com> wrote:
>> 
>>  Huge is relative. ;)
>>  Huge Solr clusters also often have huge hardware. Servers with 16 cores
>>  and 32 GM RAM are becoming very common, for example.
>>  Another thing to keep in mind is that while lots of organizations have
>>  huge indices, only some portions of them may be hot at any one time.  
> We've
>>  had a number of clients who index social media or news data and while all
>>  of them have giant indices, typically only the most recent data is really
>>  actively searched.
>
> So let's say, if I have for example an index of 100GB with million of
> documents, but 99% of the queries only hit the latest 200.000 documents in
> the index, I can easily handle this on a machine which is not so powerful?
> So with 'hot' you mean a subset of the whole index. You don't mean, 
> that
> there is e.g. one huge archive-index and a active-index in separate Solr
> instances?

That's correct, I'm not referring to one huge archive index and one smaller 
active index.

Otis

----
Performance Monitoring SaaS for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html



>>  > Because I also read often, that the Index size of one shard
>>  >should fit into RAM.
>> 
>>  Nah.  Don't take this as "the whole index needs to fit in 
> RAM".  Just "the
>>  hot parts of the index should fit in RAM".  This is related to what I 
> wrote
>>  above.
>> 
> 
> Ah, ok. Good to know. I always tried to split the index over multiple
> shards, because I recognized a big performance loss, when I try to put it
> on one machine. But maybe this is also connected to the 'hot' and 
> 'not hot'
> parts. thanks.
> 
> 
>> 
>>  > Or at least the heap size should be as big as the
>>  > index size. So I see a lots of limitations hardware-wise. Or am I on 
> the
>>  > totally wrong track?
>> 
>>  Regarding heap - nah, that's not correct.  The heap is usually much
>>  smaller than the index and RAM is given to the OS to use for data caching.
>> 
> 
> Oh, ok. Thanks for this information. Maybe I can tweak the settings then a
> bit. But I got several GC-errors etc. so I am always trying to modify all
> these heap/gc settings. But I haven't found the perfect settings up to now.
> 
> Thanks.
> 
> Daniel
> 
> 
>> 
>>  Otis
>>  ----
>>  Performance Monitoring SaaS for Solr -
>>  http://sematext.com/spm/solr-performance-monitoring/index.html
>> 
>> 
>> 
>>  >On Thu, Jan 19, 2012 at 12:14 AM, Mark Miller 
> <markrmil...@gmail.com>
>>  wrote:
>>  >
>>  >> You can raise the limit to a point.
>>  >>
>>  >> On Jan 18, 2012, at 5:59 PM, Daniel Bruegge wrote:
>>  >>
>>  >> > Hi,
>>  >> >
>>  >> > I am just wondering how I can 'grow' a distributed 
> Solr setup to an
>>  index
>>  >> > size of a couple of terabytes, when one of the distributed 
> Solr
>>  >> limitations
>>  >> > is max. 4000 characters in URI limitation. See:
>>  >> >
>>  >> > *The number of shards is limited by number of characters 
> allowed for
>>  GET
>>  >> >> method's URI; most Web servers generally support at 
> least 4000
>>  >> characters,
>>  >> >> but many servers limit URI length to reduce their 
> vulnerability to
>>  >> Denial
>>  >> >> of Service (DoS) attacks.
>>  >> >> *
>>  >> >
>>  >> >
>>  >> >
>>  >> >> *(via
>>  >> >>
>>  >>
>> 
> http://lucidworks.lucidimagination.com/display/solr/Distributed+Search+with+Index+Sharding
>>  >> >> )*
>>  >> >>
>>  >> >
>>  >> > Is the only way then to make multiple distributed solr 
> clusters and
>>  query
>>  >> > them independently and merge them in application code?
>>  >> >
>>  >> > Thanks. Daniel
>>  >>
>>  >> - Mark Miller
>>  >> lucidimagination.com
>>  >>
>>  >>
>>  >>
>>  >>
>>  >>
>>  >>
>>  >>
>>  >>
>>  >>
>>  >>
>>  >>
>>  >>
>>  >
>>  >
>>  >
>> 
>> 
>

Re: How can a distributed Solr setup scale to TB-data, if URL limitations are 4000 for distributed shard search?

Reply via email to