Hi Daniel,
----- Original Message ----- > From: Daniel Bruegge <daniel.brue...@googlemail.com> > To: solr-user@lucene.apache.org; Otis Gospodnetic <otis_gospodne...@yahoo.com> > Cc: > Sent: Thursday, January 19, 2012 5:49 AM > Subject: Re: How can a distributed Solr setup scale to TB-data, if URL > limitations are 4000 for distributed shard search? > > On Thu, Jan 19, 2012 at 4:51 AM, Otis Gospodnetic < > otis_gospodne...@yahoo.com> wrote: >> >> Huge is relative. ;) >> Huge Solr clusters also often have huge hardware. Servers with 16 cores >> and 32 GM RAM are becoming very common, for example. >> Another thing to keep in mind is that while lots of organizations have >> huge indices, only some portions of them may be hot at any one time. > We've >> had a number of clients who index social media or news data and while all >> of them have giant indices, typically only the most recent data is really >> actively searched. > > So let's say, if I have for example an index of 100GB with million of > documents, but 99% of the queries only hit the latest 200.000 documents in > the index, I can easily handle this on a machine which is not so powerful? > So with 'hot' you mean a subset of the whole index. You don't mean, > that > there is e.g. one huge archive-index and a active-index in separate Solr > instances? That's correct, I'm not referring to one huge archive index and one smaller active index. Otis ---- Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html >> > Because I also read often, that the Index size of one shard >> >should fit into RAM. >> >> Nah. Don't take this as "the whole index needs to fit in > RAM". Just "the >> hot parts of the index should fit in RAM". This is related to what I > wrote >> above. >> > > Ah, ok. Good to know. I always tried to split the index over multiple > shards, because I recognized a big performance loss, when I try to put it > on one machine. But maybe this is also connected to the 'hot' and > 'not hot' > parts. thanks. > > >> >> > Or at least the heap size should be as big as the >> > index size. So I see a lots of limitations hardware-wise. Or am I on > the >> > totally wrong track? >> >> Regarding heap - nah, that's not correct. The heap is usually much >> smaller than the index and RAM is given to the OS to use for data caching. >> > > Oh, ok. Thanks for this information. Maybe I can tweak the settings then a > bit. But I got several GC-errors etc. so I am always trying to modify all > these heap/gc settings. But I haven't found the perfect settings up to now. > > Thanks. > > Daniel > > >> >> Otis >> ---- >> Performance Monitoring SaaS for Solr - >> http://sematext.com/spm/solr-performance-monitoring/index.html >> >> >> >> >On Thu, Jan 19, 2012 at 12:14 AM, Mark Miller > <markrmil...@gmail.com> >> wrote: >> > >> >> You can raise the limit to a point. >> >> >> >> On Jan 18, 2012, at 5:59 PM, Daniel Bruegge wrote: >> >> >> >> > Hi, >> >> > >> >> > I am just wondering how I can 'grow' a distributed > Solr setup to an >> index >> >> > size of a couple of terabytes, when one of the distributed > Solr >> >> limitations >> >> > is max. 4000 characters in URI limitation. See: >> >> > >> >> > *The number of shards is limited by number of characters > allowed for >> GET >> >> >> method's URI; most Web servers generally support at > least 4000 >> >> characters, >> >> >> but many servers limit URI length to reduce their > vulnerability to >> >> Denial >> >> >> of Service (DoS) attacks. >> >> >> * >> >> > >> >> > >> >> > >> >> >> *(via >> >> >> >> >> >> > http://lucidworks.lucidimagination.com/display/solr/Distributed+Search+with+Index+Sharding >> >> >> )* >> >> >> >> >> > >> >> > Is the only way then to make multiple distributed solr > clusters and >> query >> >> > them independently and merge them in application code? >> >> > >> >> > Thanks. Daniel >> >> >> >> - Mark Miller >> >> lucidimagination.com >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> > >> > >> >> >