On Thu, Jan 19, 2012 at 4:51 AM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > > Huge is relative. ;) > Huge Solr clusters also often have huge hardware. Servers with 16 cores > and 32 GM RAM are becoming very common, for example. > Another thing to keep in mind is that while lots of organizations have > huge indices, only some portions of them may be hot at any one time. We've > had a number of clients who index social media or news data and while all > of them have giant indices, typically only the most recent data is really > actively searched. >
So let's say, if I have for example an index of 100GB with million of documents, but 99% of the queries only hit the latest 200.000 documents in the index, I can easily handle this on a machine which is not so powerful? So with 'hot' you mean a subset of the whole index. You don't mean, that there is e.g. one huge archive-index and a active-index in separate Solr instances? > > > Because I also read often, that the Index size of one shard > >should fit into RAM. > > Nah. Don't take this as "the whole index needs to fit in RAM". Just "the > hot parts of the index should fit in RAM". This is related to what I wrote > above. > Ah, ok. Good to know. I always tried to split the index over multiple shards, because I recognized a big performance loss, when I try to put it on one machine. But maybe this is also connected to the 'hot' and 'not hot' parts. thanks. > > > Or at least the heap size should be as big as the > > index size. So I see a lots of limitations hardware-wise. Or am I on the > > totally wrong track? > > Regarding heap - nah, that's not correct. The heap is usually much > smaller than the index and RAM is given to the OS to use for data caching. > Oh, ok. Thanks for this information. Maybe I can tweak the settings then a bit. But I got several GC-errors etc. so I am always trying to modify all these heap/gc settings. But I haven't found the perfect settings up to now. Thanks. Daniel > > Otis > ---- > Performance Monitoring SaaS for Solr - > http://sematext.com/spm/solr-performance-monitoring/index.html > > > > >On Thu, Jan 19, 2012 at 12:14 AM, Mark Miller <markrmil...@gmail.com> > wrote: > > > >> You can raise the limit to a point. > >> > >> On Jan 18, 2012, at 5:59 PM, Daniel Bruegge wrote: > >> > >> > Hi, > >> > > >> > I am just wondering how I can 'grow' a distributed Solr setup to an > index > >> > size of a couple of terabytes, when one of the distributed Solr > >> limitations > >> > is max. 4000 characters in URI limitation. See: > >> > > >> > *The number of shards is limited by number of characters allowed for > GET > >> >> method's URI; most Web servers generally support at least 4000 > >> characters, > >> >> but many servers limit URI length to reduce their vulnerability to > >> Denial > >> >> of Service (DoS) attacks. > >> >> * > >> > > >> > > >> > > >> >> *(via > >> >> > >> > http://lucidworks.lucidimagination.com/display/solr/Distributed+Search+with+Index+Sharding > >> >> )* > >> >> > >> > > >> > Is the only way then to make multiple distributed solr clusters and > query > >> > them independently and merge them in application code? > >> > > >> > Thanks. Daniel > >> > >> - Mark Miller > >> lucidimagination.com > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > > > > > > > >