Hi Can someone please guide me to the right way to partition the solr index?
On Mon, May 7, 2012 at 11:41 AM, Yuval Dotan <yuvaldo...@gmail.com> wrote: > Hi All > Jan, thanks for the reply - answers for your questions are located below > Please update me if you have ideas that can solve my problems. > > First, some corrections to my previous mail: > > > Hi All > > We have an index of ~2,000,000,000 Documents and the query and facet > times > > are too slow for us - our index in fact will be much larger > > > Most of our queries will be limited by time, hence we want to partition > the > > data by date/time - even when unlimited – which is mostly what will > happen, we have results in the recent records and querying the whole > dataset is redundant > > > We want to partition the data because the index size is too big and > doesn't > > fit into memory (80 Gb's) - our data actually continuously grows over > time, it will never fit into memory, but has to be available for queries in > case results are found in older records or a full facet is required > > > > > 1. Is multi core the best way to implement my requirement? > > 2. I noticed there are some LOAD / UNLOAD actions on a core - should i > use > > these action when managing my cores? if so how can i LOAD a core that i > > have unloaded > > for example: > > I have 7 partitions / cores - one for each day of the week - we might > have 2000 per day > > > In most cases I will search for documents only on the last day core. > > Once every 10000 queries I need documents from all cores. > > Question: Do I need to unload all of the old cores and then load them on > > demand (when i see i need data from these cores)? > > 3. If the question to the last answer is no, how do i ensure that only > > cores that are loaded into memory are the ones I want? > > > > Thanks > > Yuval > * > * > *Answers to Jan:* > > Hi, > > First you need to investigate WHY faceting and querying is too slow. > What exactly do you mean by slow? Can you please tell us more about your > setup? > > * How large documents and how many fields? > small records ~200bytes, 20 fields avg most of them are not stored - > attached schema and config file > > * What kind of queries? How many hits? How many facets? Have you studies > &debugQuery=true output? > problem is not with queries being slow per se, it is with getting 50 > matches out of billions of matching docs > > * Do you use filter queries (fq) extensively? > user generated queries, fq would not reduce the dataset for some of our > usecases > > * What data do you facet on? Many unique values per field? Text or ranges? > What facet.method? > problem is not just faceting, it’s with queries – let’s start there > > * What kind of hardware? RAM/CPU > HP DL180G6 , 2 E5645 (12 core) > 48 GB RAM > * How have you configured your JVM? How much memory? GC? > java -Xms512M -Xmx40960M -jar start.jar > > As you see, you will have to provide a lot more information on your use > case and setup in order for us to judge correct action to take. You might > need to adjust your config, or to optimize your queries or caches, slim > your schema, buy some more RAM, or an SSD :) > > Normally, going multi core on one box will not necessarily help in itself, > as there is overhead in sharding multi cores as well. However, it COULD be > a solution since you say that most of the time you only need to consider > 1/7 of your data. I would perhaps consider one "hot" core for last 24h, and > one "archive" core for older data. You could then tune these differently > regarding caches etc. > > Can you get back with some more details? > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > Solr Training - www.solrtraining.com > >