: Actually I am storing twitter streaming data into the core, so the rate of : index is about 12tweets(docs)/second. The same solr contains 3 other cores ... : . At any given time I dont need data more than past 15 days, unless : someone queries for it explicetly. How can this be achieved?
so you are adding 12 docs a second, and you need to keep all docs forever, in case someone askes for a specific doc, but otherwise you only typically need to search for docs in the past 15 days. if you index is going to grow w/o bounds at this rate forever then it doesn't matter what tricks you try, or how you tune things -- you are always going to run out of resources unless you adopt some sort of distributed approach. off the cuff, i would suggest indexing all of the docs for a single "day" in one shard, and making most of your searches be a distributed request against the most recent 15 shards. you didn't say how people "query for it explicitly" when looking for older docs -- if it's by date then when a user asks for a specific date range you cna just query those shards explicitly, if it's by some unique id then you'll want to cache in your application the min/max id for each doc in each shard (easy enough to determine by looping over them all and doing a stast query) -Hoss